Forum Moderators: open

Message Too Old, No Replies

LinkedInBot

new UA, new range

         

keyplyr

11:01 pm on Jun 20, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



UA: LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 +http://www.linkedin.com)/1.0 (LinkedInBot; https://www.linkedin.com/; omni-crawler@linkedin.com)
Protocol: HTTP/1.1
Robots.txt: Yes
Host: AWS
54.64.0.0 - 54.71.255.255
54.64.0.0/13

Looks like a generic UA string being used for several web clients.

Previous discussion: [webmasterworld.com...]

lucy24

11:51 pm on Jun 20, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Heh, eleventh-hour change. When this version first showed its face, a few months ago, the UA string was
LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 +http://www.linkedin.com)/1.0 (LinkedInBot; https://www.linkedin.com/; wkrupa@linkedin.com)
This month they changed to omni-crawler. Possibly W. Krupa was getting too much irate mail.

I used to deny "Jakarta Commons" comprehensively, but it's now subsumed under a generic header-based lockout. (Haven't bothered to check, but LinkedInBot has got straight 403s to date so they must be missing something.) I've yet to see if they obey robots.txt; I only denied them after the most recent visit, last week I think. They've never happened to request anything in a roboted-out directory, so an unconditional Disallow is the only test. We Shall See.

UA strings that contain the same name more than once annoy me because a quickie global search without fancy RegEx turns up twice as many hits as there should be. Yes, yes, robot, I heard you the first time.

keyplyr

1:58 am on Aug 21, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



New UA and now has its own assigned crawl range:

UA: LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 +http://www.linkedin.com)"
Protocol: HTTP/1.1
Robots.txt: No
Host: 108-174-2-205.fwd.linkedin.com
108.174.0.0 - 108.174.15.255
108.174.0.0/20

BTW - I block both Jakarta Commons & HttpClient with prejudice, allowing LinkedInBot and a few others. I may have missed the request for robots. Don't really care since robots.txt support for UAs other thans SEs is pretty much archaic.

dstiles

11:15 am on Aug 22, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I noted that crawl range back in 2013, keyplr - with that UA. I have it enabled.