hackers looking for .zip files

Forum Moderators: phranque

Message Too Old, No Replies

hackers looking for .zip files

increase recently

LifeinAsia

8:08 pm on Feb 7, 2023 (gmt 0)

Over the past couple of weeks, I've seen an increase in my log files of people looking for .zip files- probably close to a dozen/day now. Most have the domain name in the file name and some have dates (from last year) in the file name. The IPs are mostly from China and Europe.

Anyone else noticing this?

phranque

8:18 pm on Feb 7, 2023 (gmt 0)

i haven't but if i had to guess they are using the naming convention of a website backup tool and looking for the resulting backup file in the home directory of every hostname they try.

Kendo

9:51 pm on Feb 7, 2023 (gmt 0)

Yes they are looking for fools that leave the site's backup file in the site root.

lucy24

9:59 pm on Feb 7, 2023 (gmt 0)

:: obligatory detour to raw logs ::

Nope, nothing but requests for downloadable files that really are--
Whoops!
Oh, willya look at that.

68.235.44.ddd - - [21/Nov/2022:08:15:17 -0800] "HEAD /backup.zip HTTP/1.1" 403 282 "-" "python-requests/2.27.1"

But in my case it's a whopping total of nine requests for /backup.zip over three sites in the past 13-plus months, mainly in October-November. Four of those nine were to my test site, including the only one that ever tried a GET instead of the usual HEAD. (Why HEAD, though? Do they need to make additional preparations and clear the decks in the rare case the HEAD results in �Yup, we�ve got that�?)

Six of the nine used the humanoid UA

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36

always asking for a string of five nonexistent files all named backup.xtn.

Two more were �python-requests� as shown above--oh! yeah! that UA will get you welcomed with open arms to be sure!--and the ninth was

Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)

--brazen of them to include an informational URL, I must say--as part of a package of

:: holy smokes, that's a long list ::

1254 blocked requests for every robotic target one can think of, including but not limited to obvious ones like /bak/. Since that was my test site, this effectively took up the entire day�s logs.

No, wait, I tell a lie. That string of 1254 requests began with two different humanoid page requests--with supporting files--and then a HEAD for robots.txt. (Once again: why HEAD? In what possible way can the presence or absence of robots.txt possibly affect your behavior?)

Sgt_Kickaxe

3:14 am on Feb 8, 2023 (gmt 0)

Don't assume they are bad guys, good guys think it's OK to do to 'help' you, by selling you a service to get paid to keep doing it.

For your protection, of course.

tangor

6:46 am on Feb 8, 2023 (gmt 0)

@lucy24 ... also seeing and increase in HEAD request... as for the other stuff, those are addressed in my "requests" denied .htaccess.

blend27

11:14 pm on Feb 16, 2023 (gmt 0)

-- backup.zip --

.. and what stopping anyone from renaming 3mb image file(of something nice) from IMG_20221006_094636.jpg to backup.zip and leaving it in the root of the site?

Have some fun once in a while :)

One could even send a cryptic response header back; something like x-when-recieved-action: 'rename back to .jpg'

I am pretty sure:

���L��mDv�Φ�u!M��$� �T�� �٩��s��0�=�qg���U|�*jnFMqwG3�JH�XT��W5��c���SO4!T��xi����2

would be appreciated.

tangor

11:14 am on Feb 17, 2023 (gmt 0)

Why waste the bandwidth? It all adds up---and comes out of your (our) pocket!

Deny is so much more thrifty!

Kendo

8:34 pm on Feb 17, 2023 (gmt 0)

Sometimes I do compress a site and database to download for backup, but I remove it soon after.

But if the file is not there then why bother with any further action beyond redirecting the 404 to a more useful link, like your home page?

phranque

8:38 pm on Feb 17, 2023 (gmt 0)

redirecting 404s to the home page is rarely useful.

not2easy

9:04 pm on Feb 17, 2023 (gmt 0)

Wouldn't that be a soft 404? If the requested files is a .zip file and the file loaded is the home page?

lucy24

9:56 pm on Feb 17, 2023 (gmt 0)

redirecting the 404 to a more useful link

Why redirect a 404 to anything? That's what a 404 page is for. And malign robots certainly don't need any further help. (Do as I say, not as I do. My own 404 and 403 pages are intended for humans and include links to most things linked from the front page.)

phranque

10:58 pm on Feb 17, 2023 (gmt 0)

a custom 404 error document is useful for humans.
the 404 status code that generates the error document is useful for all others.

Kendo

1:14 am on Feb 18, 2023 (gmt 0)

Why redirect a 404 to anything?

Because 10% of our hits went to a 404!

Sure, we get a lot of script-kiddy probes, but hose 404s can be misspelled links, old pages, discontinued downloads, etc.

So why be unproductive... this way lost visitors can be put back on track. Also, if they are probing for files to exploit, their script might stop if it actually finds something, instead of moving onto the next file in its list.

lucy24

1:42 am on Feb 18, 2023 (gmt 0)

this way lost visitors can be put back on track

On the contrary. By redirecting to the root or any other page, rather than serving a 404 at the originally requested URL, visitors lose the ability to see if they mistyped something.

Kendo

2:13 am on Feb 18, 2023 (gmt 0)

visitors lose the ability to see if they mistyped something

Ok, so we write a script that checks for misspelling... seriously, its a lot easier to go to the home page where they can find the correct links for everything that they might be looking for.

tangor

2:43 am on Feb 18, 2023 (gmt 0)

It is still a soft 404 --- and the search engines don't like that.

lucy24

4:06 am on Feb 18, 2023 (gmt 0)

I must be missing something. Otherwise I simply can't understand this fierce resistance to making a useful 404 page.

phranque

12:11 am on Feb 19, 2023 (gmt 0)

according to protocol:

The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists. A 404 status code does not indicate whether this lack of representation is temporary or permanent; the 410 (Gone) status code is preferred over 404 if the origin server knows, presumably through some configurable means, that the condition is likely to be permanent.

source:https://www.rfc-editor.org/rfc/rfc7231#section-6.5.4

according to Google:

If you removed the page and there's no replacement page on your site with similar content, return a 404 (not found) or 410 (gone) response (status) code for the page. These status codes indicate to search engines that the page doesn't exist and the content should not be indexed.

If you have access to your server's configuration files, you can make these error pages useful to users by customizing them. A good custom 404 page helps people find the information they're looking for, and also provides other helpful content that encourages people to explore your site further. Here are some tips for designing a useful custom 404 page:

Tell visitors clearly that the page they're looking for can't be found. Use language that is friendly and inviting.
Make sure your 404 page has the same look and feel (including navigation) as the rest of your site.
Consider adding links to your most popular articles or posts, as well as a link to your site's home page.
Think about providing a way for users to report a broken link.

Custom 404 pages are created solely for users. Since these pages are useless from a search engine's perspective, make sure the server returns a 404 HTTP status code to prevent having the pages indexed.

source:https://developers.google.com/search/docs/crawling-indexing/http-network-errors#soft-404-errors

if you use a 301 to redirect requests for non-existent resources:

Googlebot follows the redirect, and the indexing pipeline uses the redirect as a strong signal that the redirect target should be canonical.

which means that your home page would be the canonical url for all nonexistent files.
not sure what that would do for you but it can't be good.

according to Bing:

From an SEO perspective, a 404 page should return a 404 Status Code (Page Not Found) as opposed to a 200 (OK) status code. The return of a 404 status code alerts automated users, such as search engine crawlers, about a link that is broken; it is the only way an automated user can ascertain this. If 404 pages return a 200 status code then search engines consider the broken link to be valid, and the �404 page� can end up in the index.

source:404 Page best practices [bing.com]

for anyone reading this, i would always suggest you follow the protocol.
anyone interested in search traffic should probably follow their suggestions.

but to be clear, everyone should do what they think is best for their visitors, human or not.

Kendo

3:46 am on Feb 19, 2023 (gmt 0)

I simply can't understand this fierce resistance to making a useful 404 page.

Common sense... the links below did not come from search results, but now they go somewhere useful...

!.php
.aws/config
.aws/credentials
.DS_Store
.env
.env.bak
.git/config
.git/HEAD
.local
.phpmyadmin
...
wp-config.php
...
[~700 "probe" example requests deleted]
...
z.tar
z.zip
zb_users/plugin/LinksManage/zbignore.txt
zencart.sql
zencart.tar.gz
zimbraAdmin/0MVzAe6pgwe5go1D.jsp

PHP and WordPress has never been used on this site, not in 25 years.

[edited by: phranque at 4:11 am (utc) on Feb 19, 2023]
[edit reason] brevity [/edit]

phranque

4:22 am on Feb 19, 2023 (gmt 0)

But if the file is not there then why bother with any further action beyond redirecting the 404 to a more useful link, like your home page?

assuming these probe requests are coming from unwanted bots or script kiddies or whatever, why spend your server resources to serve a home page or whatever useful content?
to unwanted bots or script kiddies?
404s are cheap...

tangor

4:19 pm on Feb 19, 2023 (gmt 0)

Heh! For these types they get a 403, which is WAY smaller than my 404!

Filtered on the request, not the ip ... saves time ... and since I do not WP or PHP makes for a much cleaner .htaccess with significantly fewer ip denies necessary.

Kendo

5:21 am on Feb 21, 2023 (gmt 0)

whatever, why spend your server resources to serve a home page or whatever useful content?

I did remove all of site specific requests, discontinued links and downloads. Simpler for me to redirect everything that is missing and that way I give a home for them all. It cannot affect search because of late I don't get broken links from search results... used to get 10 year old links, but they seem to have fixed their indexing at long last.

tangor

6:29 am on Feb 22, 2023 (gmt 0)

SIMPLE is a custom 404 that says something like:

That page is not found. Perhaps a typo, or the content has changed. To find what you are looking for click here for our HOME PAGE.

Put a link attribute on HOME PAGE, of course.

If they are HUMAN they will click. If not, don't worry! They were never a user in the first place.

That's SIMPLE. And saves bandwidth and concurrent connections.

Aside: also provides more accurate logs.