Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt usage and examples

Using robots.txt to improve SEO

         

cduke250

1:46 pm on Feb 7, 2007 (gmt 0)

10+ Year Member



I am curious to see some example uses and implementations of robots.txt.. specifically implementations and the reasons behind them for increasing SEO.

The robots.txt I am using is based on the example:

WordPress 2.1 robots.txt

User-agent: * 

# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /comments/
Disallow: /z/j/
Disallow: /z/c/
Disallow: /about/legal-notice/
Disallow: /about/copyright-policy/
Disallow: /about/terms-and-conditions/
Disallow: /about/feed/
Disallow: /about/trackback/
Disallow: /contact/
Disallow: /stats*
Disallow: /tag
Disallow: /category/uncategorized*

# disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$

# disallow all files in /wp- directorys
Disallow: /wp-*/

# disallow all files with? in url
Disallow: /*?

Basically this helps get rid of duplicate content, low-quality content, css, javascript, php, etc.. but does allow search engines to read the articles, find images, find pdfs, etc.

I know the wildcard in my robots.txt works for google bots, but do you know if it works for other search engine bots?

Anyone else have improvements or other robots.txt examples?

[removed link]

[edited by: goodroi at 6:08 pm (utc) on Feb. 8, 2007]
[edit reason] removed link [/edit]

cduke250

1:59 pm on Feb 7, 2007 (gmt 0)

10+ Year Member



I also use a custom robots.txt for phpBB


User-agent: *
# disallow all files with a? in url
Disallow: /*?*

# disallow all files ending in specific extension
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$

# disallow these dirs
Disallow: /js/
Disallow: /css/
Disallow: /cgi-bin/
Disallow: /db/
Disallow: /admin/
Disallow: /cache/
Disallow: /includes/
Disallow: /templates/

# disallow these files and dirs
Disallow: /V
Disallow: /stats*
Disallow: /post
Disallow: /member
Disallow: /mx_

# disallow these urls
Disallow: /rss.php
Disallow: /viewtopic.php
Disallow: /viewforum.php
Disallow: /index.php?
Disallow: /posting.php
Disallow: /groupcp.php
Disallow: /search.php
Disallow: /login.php
Disallow: /profile.php
Disallow: /memberlist.php
Disallow: /faq.php
Disallow: /common.php
Disallow: /index.php
Disallow: /memberlist.php
Disallow: /modcp.php
Disallow: /privmsg.php
Disallow: /viewonline.php

# disallow urls starting with quote
Disallow: /"

but this phpBB robots.txt is different than the default because it is using a mixture of mods and .htaccess mod_rewrite to be more SEO friendly.

Anyone else?

cduke250

3:50 pm on Feb 7, 2007 (gmt 0)

10+ Year Member



Ok heres what I have now..

WordPress robots.txt<snip>


# Allow all
User-agent: *
Disallow:

# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /comments/
Disallow: /js/
Disallow: /css/
Disallow: /about/legal-notice/
Disallow: /about/copyright-policy/
Disallow: /about/terms-and-conditions/
Disallow: /about/feed/
Disallow: /about/trackback/
Disallow: /contact/
Disallow: /stats
Disallow: /tag
Disallow: /category/uncategorized
Disallow: /wp-

# disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$

# disallow all files with? in url
Disallow: /*?*

# disallow all files in /wp- directorys
Disallow: /wp-*/

# disallow archiving site
User-agent: ia_archiver
Disallow: /

# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*.gif$
Allow: /*.png$
Allow: /*.jpeg$
Allow: /*.jpg$
Allow: /*.ico$
Allow: /*.jpg$
Allow: /images/

# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:

User-agent: *
Crawl-delay: 2

phpBB robots.txt


# Allow all
User-agent: *
Disallow:
Disallow: /js/
Disallow: /css/
Disallow: /cgi-bin/
Disallow: /db/
Disallow: /admin/
Disallow: /cache/
Disallow: /includes/
Disallow: /templates/
Disallow: /V
Disallow: /stats
Disallow: /post
Disallow: /member
Disallow: /mx_
Disallow: /index.php?
Disallow: /posting.php
Disallow: /groupcp.php
Disallow: /search.php
Disallow: /login.php
Disallow: /privmsg.php
Disallow: /post
Disallow: /profile.php
Disallow: /memberlist.php
Disallow: /faq.php
Disallow: /archive

# disallow archiving site
User-agent: ia_archiver
Disallow: /

# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*.gif$
Allow: /*.png$
Allow: /*.jpeg$
Allow: /*.jpg$
Allow: /*.ico$
Allow: /*.jpg$
Allow: /images/

# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:

User-agent: *
Crawl-delay: 2

For SEO Optimized phpBB


# Allow all
User-agent: *
Disallow: /js/
Disallow: /css/
Disallow: /cgi-bin/
Disallow: /db/
Disallow: /admin/
Disallow: /cache/
Disallow: /includes/
Disallow: /templates/
Disallow: /V
Disallow: /stats
Disallow: /post
Disallow: /member
Disallow: /mx_

# disallow these urls
Disallow: /viewtopic.php
Disallow: /viewforum.php
Disallow: /index.php?
Disallow: /posting.php
Disallow: /groupcp.php
Disallow: /search.php
Disallow: /login.php
Disallow: /profile.php
Disallow: /memberlist.php
Disallow: /faq.php
Disallow: /common.php
Disallow: /index.php
Disallow: /memberlist.php
Disallow: /modcp.php
Disallow: /privmsg.php
Disallow: /viewonline.php

# disallow urls starting with quote
Disallow: /"

# disallow all files with a? in url
Disallow: /*?*

# disallow all files ending in specific extension
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$

# disallow archiving site
User-agent: ia_archiver
Disallow: /

# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*.gif$
Allow: /*.png$
Allow: /*.jpeg$
Allow: /*.jpg$
Allow: /*.ico$
Allow: /*.jpg$
Allow: /images/

# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:

User-agent: *
Crawl-delay: 2

[edited by: goodroi at 6:17 pm (utc) on Feb. 8, 2007]
[edit reason] please no links [/edit]