how to disallow feed in robot.txt readable by all bots
bhavana
5:34 am on Aug 16, 2011 (gmt 0)
how to disallow feeds in robots.txt i tried disallow /feed/ it is not working how disallow url ending with feed is it disallow /feed$ or disallow /feed/$
tangor
5:56 am on Aug 16, 2011 (gmt 0)
Either are correct, but bad bots will ignore... and will also use those "hints" as to what to rip.
feed all by itself will work for bots that honor...
lucy24
6:08 am on Aug 16, 2011 (gmt 0)
:: cough, cough ::
Both $ forms are incorrect in robots.txt [robotstxt.org], because it doesn't "do" Regular Expressions.
Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot".
So if
Disallow: /feed/
isn't working, you need to bring out the heavy artillery, starting with .htaccess.
tangor
7:18 am on Aug 16, 2011 (gmt 0)
Correct, the regex $ is not required. Done. Otherwise, it correct. Best method is to disallow ALL BOTS then list which bots ARE ALLOWED, but that put me in the minority (called whitelisting)...
the more important issue for your problem statement is that the Disallow syntax matches the url path left-to-right.
therefore if you want to take advantage of REP extensions to pattern matching you can disallow a url ending with "feed" using: Disallow: /*feed$
however if you also/instead want to disallow a "feed" subdirectory url (i.e. ending with "feed/") you need a different rule: Disallow: /*feed/$
also note that without the end anchor in the pattern (the "$") you will match more than intended, such that disallowing the pattern "/*feed" will disallow urls such as "/feedme" and disallowing the pattern "/*feed/" will disallow urls such as "/feed/me"