Forum Moderators: open

Message Too Old, No Replies

Quantbot

search engine based in France

         

SumGuy

2:33 pm on Jan 23, 2024 (gmt 0)

5+ Year Member Top Contributors Of The Month



Technically this didn't hit my server because it was IP-blocked by my router, the IP's being 194.187.171.0/24. IP host names are qwantbot-(ip-address).qwant.com. A little digging on that CIDR turns up AS199064 (quant.com). Their total IP range could be 194.187.168.0/22. I'm not sure why I'm blocking it, probably part of a larger CIDR. I'll investigate this, will probably open it up, might be an ok/legit search engine.

So I can't say what the UA is or their robots policy.

Has this bot been mentioned here before?

lucy24

6:20 pm on Jan 23, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



These guys?
194.187.171.abc - - [01/Apr/2023:07:25:36 -0700] "GET / HTTP/1.1" 403 3495 "-" "Mozilla/5.0 (compatible; Qwantify/1.0; +https://www.qwant.com/)"
I've had them blocked (or possibly: I've chosen not to poke holes) for years due to persistent non-compliance.

:: detour to At Home With The Robots ::

IP: 91.242.162, 194.187.170-171

At some time in the distant past it looks as if I authorized-and-ignored them. And then they started ignoring robots.txt--in the form of crawling disallowed directories--leading to a comprehensive ban which they also ignored.

In the past year I also find user-agents along the lines of Qwantify-dev and Qwantify-prod, so it looks like I need to update the page. Here's a recent one, suggesting they've also added a new IP:
162.19.101.abc - - [05/Jan/2024:07:55:45 -0800] "GET /hovercraft/duct_tape.html HTTP/2.0" 403 3354 "-" "Mozilla/5.0 (compatible; Qwantify-prod112/1.0; +https://help.qwant.com/bot/)"

dstiles

9:36 am on Jan 24, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They're ok by me. Not the best SE but reasonable.

lucy24

5:32 pm on Jan 24, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



On your site, does the crawler honor robots.txt? (The perpetual head-scratcher...)

I spent some time with logs yesterday, and found that they added the 162.19.101 IP around the middle of last year. Another inexplicable feature is that years ago they had a UA involving Qwantify/2.3w and later Qwantify/2.4w; it went away in mid-2022, while the one with Qwantify/1.0 remains in use since 2021.

The “dev” and “prod” versions of the UA include a link to
https://help.qwant.com/bot/
which lists additional user-agents, several of which I have never seen in my life.

SumGuy

1:52 am on Jan 25, 2024 (gmt 0)

5+ Year Member Top Contributors Of The Month



Yea, turns out I did see quantbot first time back in 2016. It did ask for robots, grabbed my landing page and I don't think anything else. I'm not sure why, but I IP-blocked it soon after that. It tried sporadic contact once every few months for a couple years, then I lost track when I started IP-blocking in the router.

I've opened it back up now, if I notice it misbehaving or doing anything strange I'll post it.

tangor

4:26 am on Jan 25, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



On your site, does the crawler honor robots.txt? (The perpetual head-scratcher...)


Usually.

That said I get the occasional "spoof" of Qwant from IPs outside of the expected.

I give robots.txt to everyone at all times. Those that honor it are not inconvenienced. Those that do the spoof, however, get the 403 boot.

Over the last year's logs that turn out to be about 50+ instances.