Forum Moderators: open
The issue is that i have a table of content list that id like the search engines to index. But that table of content list links to around a 1000 assets, which all require you to login first.
So the crawlers are just seeing a login page for all of them.
How can i tell the crawlers not to follow the links, but just to index the TOC page.
There is no common folder name that i can dissallow in the robots.txt file.
Eg, here is an example of a link to an asset:
domain.com/forum/topic123.html
The only possible option is to add rel="nofollow" to the links in the TOC page, however thats just an indication of not to rank page, they will be included anyways. (info from google)
Any suggestions ?
What you can do if you don't want the engines to see and record the URL's of the PW protected pages is to link to those pages via redirects, and run the redirect through a robots.txt protected directory. That way engines never get to the target URL's at all (assuming that no other pages link directly to the PW protected pages).
"The nofollow attribute is just a mechanism that gives webmasters the ability to modify PageRank flow at link-level granularity. Plenty of other mechanisms would also work (e.g. a link through a page that is robot.txt'ed out), but nofollow on individual links is simpler for some folks to use. There's no stigma to using nofollow, even on your own internal links; for Google, nofollow'ed links are dropped out of our link graph; we don't even use such links for discovery. By the way, the nofollow meta tag does that same thing, but at a page level."