First a reminder if you use webmaster tools
When you're research 404 Crawl errors in webmaster tools, frequently you must "view source" to see what link Googlebot actually used. In many cases there are hidden characters that were used in the link that are not displayed by the webmastertools html. A link in Webmaster Tools may look fine, but the href in the Crawl Errors page source code is BAD!
Googlebot is crawling content and creating BAD links For me in many cases the text content of typically an MFA site displays a link in text only (not an href in the source code of the page). Googlebot is turning this text into links it is crawling with. The problem is Googlebot is not validating the link in any way at all. So, lots of 404 "file not found" errors are produced in my sites logs. (Again please see the reminder above).
This is getting tiresome. In one case an MFA site had 74 bad "text only" links over that many pages to a page on one of my sites. I don't know what Google's algo thinks of all this. Does Google's algo even realize it is being fed this crap by Googlebot? I don't even expect webmasters to keep text in a web page, which is not encoded as a link in the first place, as a legitimately formatted hyperlink. Sure it would be nice.
I'm sure Google's looking for links in javascript etc, but just grabbing any old text and using it to crawl with is
JUST GOING TOO FAR! Google invented "nofollow", but this type of "text" link crawling certainly circumvents this to some extent. Nofollow was a big mistake.
I do get tired of webmasters that actually declare a site as reference material, but then in the source code they tell Google this is untrusted content with a "nofollow"!
It seems like Google is trying to undermine their own invention.
I'm just wondering how pervasive this Googlebot bad link/ crawl phenomena is?