Forum Moderators: phranque

Message Too Old, No Replies

How do we stop spiders from crawling a certain part of a page

         

OnPerformance

8:45 pm on Jan 13, 2009 (gmt 0)

10+ Year Member



I am not a "computer" person, so I would appreciate it if you wouldn't make fun of me. I am a business student, and I have taught myself basic SEO while interning at an internet firm. Most of the elements of SEO seem logical and I have been able to figure out, but the robots.txt concept has given me some trouble.

Our site has an advanced search in the left column that is on (almost)every page of the site. The advanced search has a dropdown with a bunch of different locations from US - Alabama - Anniston all the way to US - Wyoming - Sheridan. Google crawls this dropdown on every page and when I look at the top words on our sites content, the first 50 are all from this dropdown list. From my reading of websites, it seems like robots.txt can only be used to block out urls or directories. Is there any way that a robots.txt file can be used to block spiders from crawling a specific element on a page but not the whole page? Or is there someother way to make it clear to search engines that these lists are not to be indexed? I would really appreciate any help on this issue, I am perplexed.

P.S. Don't worry about being too technical with your answers, luckily, I only have to find out the answer, not actually implement it.

Demaestro

8:57 pm on Jan 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Instead of using robots.txt to limit the crawling you could load the data into the drop down list using Ajax. Since indexing bots don't execute the JS it would appear empty to them, depending on how it was implemented it wouldn't see the drop down at all.

londrum

8:59 pm on Jan 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



most people want spiders to ignore links, but it sounds like you're asking for the spider to ignore the actual words in the HTML.

unfortunately you can't stop a spider from reading a segment of your HTML. you can only stop it from crawling the page itself, or from crawling your links.

the only way you could achieve what you're asking is to remove the words from the HTML completely, either by including it in an iframe, or writing it on with javascript.
or maybe you could change the code into an HTML form, and redirect them that way.

koan

11:33 pm on Jan 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One other thing you can do is generate that code at the very bottom of the page so it is not included as prominently in the html code and use CSS to position it wherever you want.

phranque

7:24 am on Jan 14, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld [webmasterworld.com], OnPerformance!

I would appreciate it if you wouldn't make fun of me.

if anyone makes fun of you i'll be sure to give them a friendly reminder about mutual respect.
that's how we roll here.
be sure to stop by the WebmasterWorld Community Center [webmasterworld.com] forum and introduce yourself - nobody will make fun of you unless you are really trying and you should get several warm welcomes.

regarding your question, what are you trying to accomplish:
- prevent the indexing of the pages linked?
- control the page rank flow through your internal linking?
- reduce the importance of the anchor text in your document?
- something i didn't consider?

the only thing i can think of that specifically addresses your approach is that yahoo search supports a class attribute value of "robots-nocontent" as described here:
How do I mark web page content that is extraneous to the main unique content on the page? - Yahoo! Search [help.yahoo.com]

kaled

9:50 am on Jan 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are two main options

1) Use an <iframe>
2) Use javascript to create the content.

The overall goal may also be achieved by moving the required content to the bottom of the html (so that that search engines give it less prominence) and use CSS to place it higher on the page. Depending on exact requirements, this is probably the best solution.

Kaled.