Forum Moderators: goodroi
The robots.txt I am using is based on the example:
WordPress 2.1 robots.txt
User-agent: *# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /comments/
Disallow: /z/j/
Disallow: /z/c/
Disallow: /about/legal-notice/
Disallow: /about/copyright-policy/
Disallow: /about/terms-and-conditions/
Disallow: /about/feed/
Disallow: /about/trackback/
Disallow: /contact/
Disallow: /stats*
Disallow: /tag
Disallow: /category/uncategorized*# disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$# disallow all files in /wp- directorys
Disallow: /wp-*/# disallow all files with? in url
Disallow: /*?
Basically this helps get rid of duplicate content, low-quality content, css, javascript, php, etc.. but does allow search engines to read the articles, find images, find pdfs, etc.
I know the wildcard in my robots.txt works for google bots, but do you know if it works for other search engine bots?
Anyone else have improvements or other robots.txt examples?
[removed link]
[edited by: goodroi at 6:08 pm (utc) on Feb. 8, 2007]
[edit reason] removed link [/edit]
User-agent: *
# disallow all files with a? in url
Disallow: /*?*
# disallow all files ending in specific extension
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$# disallow these dirs
Disallow: /js/
Disallow: /css/
Disallow: /cgi-bin/
Disallow: /db/
Disallow: /admin/
Disallow: /cache/
Disallow: /includes/
Disallow: /templates/# disallow these files and dirs
Disallow: /V
Disallow: /stats*
Disallow: /post
Disallow: /member
Disallow: /mx_# disallow these urls
Disallow: /rss.php
Disallow: /viewtopic.php
Disallow: /viewforum.php
Disallow: /index.php?
Disallow: /posting.php
Disallow: /groupcp.php
Disallow: /search.php
Disallow: /login.php
Disallow: /profile.php
Disallow: /memberlist.php
Disallow: /faq.php
Disallow: /common.php
Disallow: /index.php
Disallow: /memberlist.php
Disallow: /modcp.php
Disallow: /privmsg.php
Disallow: /viewonline.php# disallow urls starting with quote
Disallow: /"
but this phpBB robots.txt is different than the default because it is using a mixture of mods and .htaccess mod_rewrite to be more SEO friendly.
Anyone else?
WordPress robots.txt<snip>
# Allow all
User-agent: *
Disallow:
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /comments/
Disallow: /js/
Disallow: /css/
Disallow: /about/legal-notice/
Disallow: /about/copyright-policy/
Disallow: /about/terms-and-conditions/
Disallow: /about/feed/
Disallow: /about/trackback/
Disallow: /contact/
Disallow: /stats
Disallow: /tag
Disallow: /category/uncategorized
Disallow: /wp-
# disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$
# disallow all files with? in url
Disallow: /*?*
# disallow all files in /wp- directorys
Disallow: /wp-*/
# disallow archiving site
User-agent: ia_archiver
Disallow: /
# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*.gif$
Allow: /*.png$
Allow: /*.jpeg$
Allow: /*.jpg$
Allow: /*.ico$
Allow: /*.jpg$
Allow: /images/
# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
User-agent: *
Crawl-delay: 2
phpBB robots.txt
# Allow all
User-agent: *
Disallow:
Disallow: /js/
Disallow: /css/
Disallow: /cgi-bin/
Disallow: /db/
Disallow: /admin/
Disallow: /cache/
Disallow: /includes/
Disallow: /templates/
Disallow: /V
Disallow: /stats
Disallow: /post
Disallow: /member
Disallow: /mx_
Disallow: /index.php?
Disallow: /posting.php
Disallow: /groupcp.php
Disallow: /search.php
Disallow: /login.php
Disallow: /privmsg.php
Disallow: /post
Disallow: /profile.php
Disallow: /memberlist.php
Disallow: /faq.php
Disallow: /archive
# disallow archiving site
User-agent: ia_archiver
Disallow: /
# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*.gif$
Allow: /*.png$
Allow: /*.jpeg$
Allow: /*.jpg$
Allow: /*.ico$
Allow: /*.jpg$
Allow: /images/
# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
User-agent: *
Crawl-delay: 2
For SEO Optimized phpBB
# Allow all
User-agent: *
Disallow: /js/
Disallow: /css/
Disallow: /cgi-bin/
Disallow: /db/
Disallow: /admin/
Disallow: /cache/
Disallow: /includes/
Disallow: /templates/
Disallow: /V
Disallow: /stats
Disallow: /post
Disallow: /member
Disallow: /mx_
# disallow these urls
Disallow: /viewtopic.php
Disallow: /viewforum.php
Disallow: /index.php?
Disallow: /posting.php
Disallow: /groupcp.php
Disallow: /search.php
Disallow: /login.php
Disallow: /profile.php
Disallow: /memberlist.php
Disallow: /faq.php
Disallow: /common.php
Disallow: /index.php
Disallow: /memberlist.php
Disallow: /modcp.php
Disallow: /privmsg.php
Disallow: /viewonline.php
# disallow urls starting with quote
Disallow: /"
# disallow all files with a? in url
Disallow: /*?*# disallow all files ending in specific extension
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$
# disallow archiving site
User-agent: ia_archiver
Disallow: /
# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*.gif$
Allow: /*.png$
Allow: /*.jpeg$
Allow: /*.jpg$
Allow: /*.ico$
Allow: /*.jpg$
Allow: /images/
# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
User-agent: *
Crawl-delay: 2
[edited by: goodroi at 6:17 pm (utc) on Feb. 8, 2007]
[edit reason] please no links [/edit]