Forum Moderators: open
Probably because I block all images from being downloaded ;)
Actually, they could be combining the actions of the normal crawler and the image crawler but if you're seeing back-to-back download a page then download the page again with all the images it sounds like possible screen shots.
I see it crawling but didn't see any images being downloaded.
...I block all images from being downloaded
I block image downloads from remote servers and other off-site referrers (w/ some exceptions)
www.example.com 65.55.106.nnn - - [19/Sep/2009:14:14:37 -0500] "GET /widget HTTP/1.1" 200 16511 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
www.example.com 65.55.106.nnn - - [19/Sep/2009:14:14:46 -0500] "GET /widget HTTP/1.0" 200 85960 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
HTTP/1.1 vs HTTP/1.0 the former uses .gz
LOL, really? Then what's the point of having images?
I let users download them, but not the SEs.
The SE image index is one huge image theft ring that people use without asking.
Worse case, I found a bunch of unscrupulous sites using Google images to locate my thousands of images and hotlink them into their pages.
We had a massive assault on that nonsense, all hotlinks blocked, all images blocked from SEs downloading.
FYI, the images I'm talking about in this case was my library of 40K+ site screen shots so you could see why some wise guys thought I'd be a good source for a free ride.
HTTP/1.1 vs HTTP/1.0 the former uses .gz
Isn't that backwards? Shouldn't it be the HTTP/1.1 using .gz?
I wouldn't dare to even wager a guess on why M$ is doing this. To me it falls in the same category as the referrer spam discussed here before, or their attempts to grab images and truncated urls with the WinHTTP user agent...
Very tempting to add yet another directive to my rewrite rules...
I let users download them, but not the SEs.
I take a slightly different tactic. I block all image requests from off-site origins, but I do allow the Big 3 (4?) SEs to put most image files in their image search libraries.
When the SE users click on these thumbnailed images, instead of the SE's page that hot-links to my image file, my scripting displays the page of origin (my site.) Thus my 10k images serve another function to increase traffic.