Forum Moderators: open
I have a pretty extensive set-up to block spam, many IP blocks for those areas, plus scripts you suggested quite a few years back Incredibill but I've wondered about Baidu's value if any.
The Asian market may indeed be crucial for underwriting interests here in the Western regions someday and seeding your presence now may develop into an asset later.
As we know that Traditional chinese would be widely use in Hong Kong and Simplified is mainly used in China itself.
Baidu search engine is only support Simplified Chinese font character and will automatically translated my Traditional chinese website into Simplified.
This is why Baidu consider my these 2 sites as a Duplicate content which it's actually not. Baidu is now dropping my Simplified site instead of Traditional which I think it should be vice versa.
They also stop crawling Simplified site as well as reduce pages indexed to 4 results!
So since I want my Simplified site to be recognized by Chinese internet user in China which using Baidu as a main search engine, so I'm thinking of blocking Baidu spider from crawling my Traditional site so they would turn into the Simplified instead.
My question is how to block this Baidu spider since I've heard somewhere that they don't obey the robots.txt command.
Any suggestion.... please help.
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [NC]
RewriteRule .* - [F]
I included the [NC] which allows for case differences since at least one of the Baidu bots uses "BaiDuSpider"
Other variants are:
BaiduImagespider+(+http://www.baidu.jp/search/s308.html)
Baiduspider+(+http://help.baidu.jp/system/05.html)
Baiduspider+(+http://www.baidu.com/search/spider.htm)
Baiduspider+(+http://www.baidu.com/search/spider_jp.html)