File under: “
Now what are they up to?” because I don’t remember ever seeing this before.
IP: the usual FB ranges
UA: facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
robots.txt: YES, apparently compliant
Status: 301 leading to mixed 206 and 200
Requests: most of site, excluding roboted-out directories (but see below)
Headers: identical to everyday FB
Protocol: HTTP redirected to HTTPS
I didn’t do a methodical count, but I tend to think they crawled the entire site, in no particular order, meaning that they’d picked up a shopping list somewhere. Some subdirectories were skipped, by no discernible pattern. In a behavior I’ve come to associate with the DotBot, all requests were initially made with HTTP--including pages that were created after the site went HTTPS.
About 90% of the HTTPS requests got a 206 rather than a 200; if there's a pattern I couldn’t spot it. I thought it might be because of the Range header
Range: bytes=0-524287
but very few of my pages are larger than 512K--probably less than 10%, rather than the 90% implied by the number of 206 responses.
There was one odd exception to the pages-only rule. They also requested a few images ...
but only ones that are not displayed with the page by default, such as close-ups or enlargements. And not all of those.
Alternate heading: wtf?