Forum Moderators: open

Message Too Old, No Replies

robot.txt

How to fix it

         

kkonline

5:30 am on Aug 18, 2007 (gmt 0)

10+ Year Member



As you must be aware that robot.txt is a file on the root of the server which publishes the information the bots require regarding the site i.e. to prevent the bot indexing the private safe directories.

Now the situation...

I can just write mysite.com/robot.txt on the browser and can get to know all the names of private/protected directories and files existing to which the administrator of the site doesn't want any public access.

And now the site mysite.com becomes more vulnerable to attacks as the internal protected directories and files names have been known and can be used by any hacker to hack it...

So what is the solution to allow bots to index the site leaving the protected files/directories and also not being vulnerable to attacks...

Matt Probert

8:16 am on Aug 18, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A bot can only follow links. If a private directory has no public links to it, it will not be spidered. Hence, no need to mention it's existence within a public robots.txt file.

If you're more paranoid, try password protecting private directories.

Matt

londrum

9:38 am on Aug 18, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



you could try leaving the directories out of the robots.txt file, and including a robots meta tag on the individual pages instead. that way if a robot finds the pages it will still know not to index it.