Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt code format.

         

born2run

1:55 pm on Jun 24, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So my robots.txt file starts with:

User-agent: *
Crawl-delay: 10

So I want to disallow Yandex bot. Is this the right code? :

====
User-agent: *
Crawl-delay: 10

User-agent: YandexDisallow: / # blocks access to the whole site
====

Dimitri

2:16 pm on Jun 24, 2018 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



You are missing a line break after "Yandex", but I assume it's a typo.

[yandex.com...]

ps: hum, I might be wrong, apparently, we can do what you did, I didn't know that. sorry.

Dimitri

2:18 pm on Jun 24, 2018 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



I am confusing myself, forget my comment.

born2run

2:22 pm on Jun 24, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hmm, also it seems from google search that Yandex bot doesn't obey robots.txt ...

born2run

2:27 pm on Jun 24, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes I added the following to robots.txt :

User-agent: *
Crawl-delay: 10

User-agent: Yandex
Disallow: /

Shiv Bhan Singh

10:53 am on Jul 10, 2018 (gmt 0)



If you want to block all the Yandex bots then:

User-agent: Yandex
Disallow: /

lucy24

5:17 pm on Jul 10, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



it seems from google search that Yandex bot doesn't obey robots.txt
I think it had issues years ago, but currently it’s compliant.

:: detour to check ::

Oh, cripes, what are they doing in a roboted-out directory? (No, not Yandex, someone else I'd authorized.)

:: wandering off to address unexpected and unrelated issue ::

tangor

8:36 pm on Jul 10, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@Shiv Bhan Singh

Happy to have you join Webmasterworld! Others will greet you as well, with a link to charter and all that happy stuff.

Brilliant in reminding all of us to check our robots.txt directives from time to time. :)

keyplyr

3:45 am on Jul 11, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi Shiv Bhan Singh and welcome to WebmasterWorld [webmasterworld.com]

@born2run - Yandex does respect robots.txt, but does not support some directives that some other bots support.

This is the problem why robots.txt fails in much of what it was originally intended to do. It never did become a standard and is interpreted differently by different robots.

BTW - Why would anyone want to disallow one of the top search engines in the world? Yandex has offices in Silicon Valley close to Google, Yahoo & Facebook. They are a major player and contribute more that just search. They can be highly beneficial to websites. Open a Yandex Webmaster Tools [webmaster.yandex.com] account to manage your presence at Yandex and find out more about the way they support robots.txt.

Murari Kumar

10:39 am on Oct 10, 2018 (gmt 0)

5+ Year Member



User-agent: *
Disallow: /framework/
Allow: /framework/admin-ajax.php

Can anyone verify this,

which i am using for<snip>

[edited by: engine at 10:48 am (utc) on Oct 10, 2018]
[edit reason] Please see WebmasterWorld TOS [/edit]

keyplyr

10:56 am on Oct 10, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry, that does not accomplush what you want.

When you Disallow: /framework/ you disalliw anything after that, so admin-ajax.php would also be disallowed.

not2easy

5:12 pm on Oct 10, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



That is true for many bots, but not for Google. If you use the code as posted it can work for Googlebot. Once you Disallow the /framework/ directory, not all robots follow the Allow: permission, but Google does - as long as the Allow: follows the Disallow: directive.

keyplyr

7:24 pm on Oct 10, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Good point, thanks for reminding me of that irregularity. It's things like this that sway me to use other tactics and avoid robots.txt.