Forum Moderators: open

Message Too Old, No Replies

Examples for detecting bots

         

jpmmedia

4:12 pm on Dec 9, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



Anyone have any good code examples for detecting bots? I created the code below, but was seeing if anyone had any other good ideas.

It can track unique users using your website. In the example below I am using jQuery to send the HTTP get request. The idea is to send a Ajax get request to record any mouse movement on the webpage. This helps to decrease the amount of bot traffic stats etc.


window.onload = function(){
document.body.onmousemove = function() {
$.ajax({type: "GET",url: "track.php",timeout: 180000,data: "ajax=true&req=somepage",dataType: "HTML",success: function(data) {},error: function(XMLHttpRequest, textStatus, errorThrown) {}});
document.body.onmousemove = undefined;
}
}

NickMNS

4:43 pm on Dec 9, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Am I reading this correctly, each time the user moves the mouse it sends an AJAX request to the server? That would be like 50 requests from when I stop typing and move the mouse from it's current location to over the submit button.

jpmmedia

4:52 pm on Dec 9, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



Nope, it only sends one Get request per page visit. The "document.body.onmousemove = undefined;" removes the event from the document / page after the Get request is sent / mouse moved.

NickMNS

4:58 pm on Dec 9, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What happens when bots (most bots) don't execute Javascript?

jpmmedia

5:56 pm on Dec 9, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



This detects unique visitors, moving the mouse in the browser. Then you can tell who is real or a bot. Seems to work on most coded crawlers and bots. I am sure you could create a outside program to move the mouse on the screen, but for most HTTP request bots it should work.

graeme_p

2:10 am on Dec 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What about devices with touchscreens? No pnmousemove, right?

jpmmedia

2:24 am on Dec 10, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



just add touchmove, touchstart event listeners for mobile code version ;)

graeme_p

11:55 am on Dec 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What is a user on mobile looks at the page and clicks the back button? That would not trigger those events AFAIK?

What about a bot emulating or using a browser that supports touch events? If it follows links by emulating clicks on links it could trigger touch start.

A quick search suggests that onmousemove can be fired by bots too - by Selenium's movetoelement, for example.

Its a clever idea, and I think it might work with some bots, its will have both false positives and false negatives, so its not reliable overall.

It may work well enough, and I think it would work correctly with all four of the bots I have been involved in recently (wrote or consulted on) it would be trivial to modify at least one and easy to modify one to fool it. On the other hand if you are using it for stats rather than blocking there is no reason for a bot to do that.

JorgeV

12:51 pm on Dec 10, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hello,

Look if the IP belongs to a range owned by a Datacenter, if so, 90% of chances this is a bot. 10% mistake is acceptable to me.

edit: all depends of the motivation to detect bots. but in my case I never saw a single request from a Datacenter IP range, which is legitimate visitor. (I allow known legitimate bots).

not2easy

2:44 pm on Dec 10, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I'm with Graeme and Jorge on this. Blocking datacenter IPs and logging headers doesn't depend on any visitor doing anything and gives clear indication of bots vs. humans. Automatic and unobtrusive. Blocking unwanted behavior and UAs is something that can be done before they hit the page and saves on bandwidth.

In cases of peculiar but 'human' activity it is pretty simple to determine what's what with a look at access logs. I just don't see the attraction of this user activity tracking method but as Jorge mentioned, there may be reasons for it. I do not think there is one single perfect method for all of us. We generally choose methods that fit our preferences.

Humans can visit with js disabled and can just go away if they see unknown scripts. Though there are hordes of users with no idea about anything they're doing online or how they got to where they are, nor even where they are. ;)

jpmmedia

4:05 pm on Dec 10, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



Thanks everyone for your input.

JavaScript disabled is a good point. My objective here was to find a easy way to identify human vs bot interaction with another machine online. Cognitive identification like CAPTCHA works too, but requires more interaction.

graeme_p

5:19 pm on Dec 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@not2easy how likely is blocking DC IPs to block human visitors using VPNs or proxies.

If your site is targettED by bots willing to spend a bit more they will use residential proxies.

As you say, what works is going to vary.

all depends of the motivation to detect bots. but in my case I never saw a single request from a Datacenter IP range, which is legitimate visitor.


How can you verify this?

@jpmmedia Captcha is not userfriendly, and the common implementations infringe users. privacy and turn them into free labour for the provider (e.g. image recognition ML training). Its fine on things like forms, but there are often better options there (e.g. honeypot fields).

not2easy

7:43 pm on Dec 10, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



@graemp_p - 2 things I use to ensure that does not happen: One is that the headers are also monitored/logged and the 403 page has some features to help ensure that what gets blocked should be blocked. Humans can always click from the 403, and I would have a notification in addition to the headers.

I use different tools or techniques on different sites, not all the same. Historical traffic patterns help, too.

Captcha is fine with me except the Google ReCaptcha, they make these teensy images that I can barely make out what's there. I believe they use the same resolution and size for mobile and desktop.

graeme_p

4:08 pm on Dec 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



ne is that the headers are also monitored/logged and the 403 page has some features to help ensure that what gets blocked should be blocked Humans can always click from the 403


Sounds pretty robust. Nice.

Captcha is fine with me except the Google ReCaptcha, they make these teensy images that I can barely make out what's there.


I loathe reCaptcha as a user. HCaptcha seems better. I have not seen many others in use. All are irritating for those of us who block JS (although I realise we are a minority, and a small one at that).

It is still a last resort for me, if something less intrusive does not work, or to show only to suspicious users.

not2easy

6:45 pm on Dec 12, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Oh, I prefer HCaptcha - but it causes PHP errors on a client's site and I need to keep deleting the logs as trying to work with the dev did not resolve the cookie/headers already sent issue. Not sure why it does that but otherwise it is my favorite.

NickMNS

9:06 pm on Dec 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Regarding HCpatcha, it looked interesting, checked out the website, was great but then I read this:

Other captchas are provided by Gigantic advertising companies. This means every bot they block may reduce their revenue.

Where the word "gigantic" is styled to look like Google. The issue is that they are insinuating that Google's reCaptacha let's some bots pass such that Google can make more money. Really? That is the marketing pitch, a conspiracy theory. Wow! No thanks, I'll keep using Google's product.

jpmmedia

5:29 pm on Dec 20, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



Okay, to add a little cognitive mix into it I used the location of a object on the page to trigger the event. Add a onmouseenter event to a enter the site with a image element... [w3schools.com...] or you could use onfocus etc

Now it's not just random, it is targeted to trigger the event.

jpmmedia

5:38 pm on Dec 20, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



Using maps too

<img src="planets.gif" width="145" height="126" alt="Planets" usemap="#planetmap">
<map name="planetmap">
<area shape="rect" coords="0,0,82,126" alt="Sun"
href="sun.htm" rel="alternate">
</map>

[w3schools.com...]