Forum Moderators: phranque

Message Too Old, No Replies

WMT - huge spike in crawls

         

paranoid android

12:56 am on Mar 1, 2016 (gmt 0)

10+ Year Member



Hi there,

Just two days ago WMT is reporting a massive jump in craws to our site. Its' 5x the normal rate. I can't see whether this is a one off spike or something else as it isn't reporting the two days following (yet).

Just wondered if there were any common reasons why this might happen. Nothing on our site has changed, no increase in article output, other than the usual, no new configurations or anything like that.

lucy24

2:47 am on Mar 1, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Marvin, right?

I think it's some kind of housekeeping. For the past several weeks I've seen a big spike in visits to one site they've got no reason to crawl at all, at least not more than every month or so, because nothing on it ever changes. Looking more closely, there's a whole lot of this pattern:
/directory/
/directory/index.html
/directory
where the second form hasn't returned content in, oh, four or five years at least, and the third form has never returned content.

In addition they've been requesting pages that were redirected anywhere from two to five years ago-- not just the occasional one or two as a reminder of "Don't think we've forgotten this URL!" but massive lists.

Since all of this leads to more 301 responses, they've also been doing a fair amount of soft-404 testing (where they request some garbage URL like djhrubvghuyvcmhdf.html to ensure that it gets a 404). I've always assumed this was programmatically triggered any time they pass some number, or some percentage, of redirects.

So they're doing something, but I really don't think it's anything nefarious. Maybe they're tweaking the algorithm to give more-- or less!--weight to "technical quality" issues.

Robert Charlton

9:09 am on Mar 4, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



They definitely seem to be in a pre-algo update mode, maybe using a deep crawl to get to what they consider a "clean" database, to be used as a reference as they start calibrating what I've been assuming will be refined engagement and UX metrics for the long tail.