Forum Moderators: open

Message Too Old, No Replies

robots and HTTP/2

         

lucy24

10:22 pm on Sep 22, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



On the ever-expanding list of Things To Keep An Eye On...

This past Sunday when I processed my site logs, I ran into a snag because nothing was working: input became output without, apparently, anything having been done to it. I was tearing my hair. Finally after much poring over the raw files I established that at some time on Friday morning, my host had enabled HTTPS/2. (It is characteristic of them that they didn’t see fit to mention this fact; it’s probably buried in an obscure blog that nobody reads.) And my log-wrangling functions include a lot of patterns containing the string "HTTP/1\.[01]”.

I don’t know exactly how this is handled at the server end; there’s nothing analogous to the “Upgrade-Insecure-Requests” header (meaning “I’m OK with being redirected to HTTPS”, as if any site really needs the user’s permission at this late date). Further poking around, including visiting my own test site and scanning favicon requests in recent days’ logs, suggests that the great majority of human requests use HTTP/2.

. . . but so far, most robots don’t. In fact, among the rare favicon requests using HTTP/1.1 after the /2.0 change was, drumroll, DDG’s Faviconbot. The one immediate exception among established robots is, of all things, AfrefsBot.* I suppose that means it’s been pounding on the HTTP/2 door for a couple of years now, and is happy to finally get it.

Poring over blocked HTTP/2 requests, I find
Http2Bot v1.0.7 (https://http2.pro/about?bot_v=1.0.7)
which can happily remain blocked, because the 403 response doesn't seem to affect the tool's ability to check if your site is HTTP/2 enabled.

I will be watching with interest to see other robots changing over.


* This discovery led to further log-investigation and the discovery that a whole lot of miscellaneous robots have been upgrading their UA version numbers behind my back ... but that’s another matter.

phranque

11:44 pm on Sep 22, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



this reminds me - something needed to be posted here about this:
Googlebot will start crawling over HTTP/2 in November 2020 [webmasterworld.com]

tangor

3:05 am on Sep 26, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Begs the question, will g eventually NOT CRAWL HTTP/1 ?

JorgeV

9:04 am on Sep 26, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hello,

suggests that the great majority of human requests use HTTP/2

This is the case since 4-5 years, when all major browsers turned ON HTTP/2 support. : [caniuse.com...]

From what I understand, your host switched your site to HTTPS too, they were not before? Because you mentioned redirects. If so, don't be surprised if your google ranking fluctuate more than usual during the next few weeks, also, check your pages, to be sure that your internal links are not including absolute URLs, to avoid unnecessary redirects.

lucy24

3:52 pm on Sep 26, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



From what I understand, your host switched your site to HTTPS too, they were not before?
No, all sites were already HTTPS; the most recent change was almost a year ago.

It would be pretty nervy of a host to add HTTPS without consulting the site owner, since that involves an actual URL change. Maybe it's something you'd do for, say, WP clients who can't or won't touch their own htaccess and need everything done for them.

lucy24

5:45 pm on Sep 27, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ka-ching! Found another one. facebookexternalhit is also using HTTP/2. So far I haven't seen their secondary UAs such as cortex and adreview, so I don't know if those are also /2. (And, mercifully, they seem to have dropped their annoying habit of sometimes crawling with no UA at all, so that isn't an issue.)

dstiles

2:41 pm on Sep 28, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> dropped their annoying habit

On the other hand they are crawling sites whether they are in facebook or not.

SumGuy

12:57 pm on Sep 30, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



> Begs the question, will g eventually NOT CRAWL HTTP/1 ?

Or maybe they will rank lower the sites that don't serve HTTP 2?

> suggests that the great majority of human requests use HTTP/2

So help me out here. Since I don't see the method (http1 or 2) show up in the logs, will a human browser operate normally (and render the site normally) when it makes an http2 request but the server can only respond with http1? Or will the transaction just end there, with me seeing a request for my landing page and then nothing else?

lucy24

4:44 pm on Sep 30, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't believe there is any perceptible difference whatsoever; it's all under the hood. Human browsers must have been sending in HTTP/2 requests for several years before my server became HTTP/2 enabled. During all that time, the request ended up being processed as HTTP/1.1, with nobody being any the wiser.

Someone else (phranque?) (why do I always assume it will be phranque?) may need to explain the technicalia. Apache's HTTP/2 page--by some weird oversight, apparently written by humans for humans--suggests getting more information from [http2-explained.haxx.se...] In particular there is a page on http2 Concepts [http2-explained.haxx.se] that includes the vexed question of secure-or-not. Turns out I was mistaken ... sort of. It is theoretically possible to make a non-secure HTTP2 request; it's just that most human browsers have chosen not to permit it:
Today, no major browser supports http2 without TLS.

I can say with confidence that it does not work like making HTTPS requests to a server that is not yet listening on port 443, where there is a long delay and the request eventually just times-out. If that had been the case, we would all have noticed. At most, there is one extra round-trip, where the request says “Yo! I’ll take this in HTTP2 if you’ve got it” and the server optionally says “Yup, we’ve got that, send in the request that way”. (It isn’t clear what the server says if it can’t do HTTP2, especially if it doesn’t understand the question in the first place.) This might make a perceptible difference if your human users live in remote areas with nothing but satellite--or, perish the thought, dial-up--connections.

tangor

10:23 am on Oct 1, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



perish the thought, dial-up--connections.


We do need to be reminded that still exists!

JorgeV

11:30 am on Oct 1, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hello,

If I don't make mistake, HTTP/2 (as well as HTTP/3) is exclusively over a TLS connection.

A browser (or other client) sends a request GET or POST (could be HEAD too) and specifies the protocol . If the web server doesn't support this protocol, it returns "505 HTTP Version Not Supported". The client then sends the request with a lower protocol.

As for HTTP/3 there is a short cut. Let say the client requests the page using the HTTP/2 , 1.1 or 1.0 protocol. The web server sends the file using the requested protocol, but, if the web server supports HTTP/3 it will add an extra header field, stating which HTTP/3 draft protocol(s) it supports. So the client receives the page, and also knows, that, it can later talk in HTTP/3 with the server, so next requests will be made through HTTP/3 (if the client supports it of-course).

lucy24

4:12 pm on Oct 1, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If I don't make mistake, HTTP/2 (as well as HTTP/3) is exclusively over a TLS connection.
If you’re talking about human browsers, yes. But at least in the case of HTTP/2 this was a choice made by the various browser developers, not an inherent property of the protocol. (See earlier post and linked information sites.)

it will add an extra [response] header field, stating which HTTP/3 draft protocol(s) it supports
Useful to know.

It's interesting how this ended up being handled differently from http vs https. In theory, human browsers send in their http requests with the header Upgrade-Insecure-Requests (which simply has a value of 1 if present), and then the server can choose to issue a redirect to https if available. This may have been meaningful when https was in its infancy. But now #1 all browsers can handle https, #2 human requests routinely include this header even if the request was already made as https, and #3 every https-enabled site I've ever heard of issues the redirect to everyone, without waiting for permission. (And if your respectable robot can't handle it, well, that's the robot's problem.)

SumGuy

11:29 pm on Oct 1, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



> If the web server doesn't support this protocol, it returns "505 HTTP Version Not Supported"

I've never seen a 505 response code in my server's logs, and I know my HTTPS server (Abyss) does not "do" HTTP2. So either no browser has ever performed an HTTP2 request to my server, or the server does not generate a 505 log entry, or there is something else going on.

Also, when someone clicks on a search result that links to my site, is there anything in the link URL that predefines whether the request will be http1 or 2? IE - if Google knows my site doesn't do HTTP2, then will it present only http1 URL's as search results?

(and again, Google and Bing hit my site constantly, and I never see a 505 code for those hits).

lucy24

12:32 am on Oct 2, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've never seen a 505 response code in my server's logs, and I know my HTTPS server (Abyss) does not "do" HTTP2.
Same here. I never saw HTTP/2.0 in logs until two weeks ago when the server suddenly gained the ability to handle it.

:: quick detour to Apache docs [httpd.apache.org] ::

Darn, that’s unhelpful. I was hoping for details about the standard log element
"GET /blahblah HTTP/number"
but all it says is
%r = First line of request.
(First line? What's in the subsequent lines?)
I was wondering if the HTTP/number part is explicitly stated in the request, and what then happens if a human browser sends in an HTTP2 request to a server that is only able to handle HTTP1.x. In the case of shared hosting, one explanation would be that this part happens before the request ever reaches an individual vhost and/or site directory, so it's never logged until it has been replaced with a fresh HTTP1.x request. (My site logs do show 418 errors, even though these are generated by mod_security at the server level, so the request never reaches my userspace. Then again, that's a third-party mod and I don't really know how it works. Cursory research says it can be disabled either by directory or by VirtualHost, which gets me no further.) That wouldn't explain the lack of information if it’s your own server, though.

So either no browser has ever performed an HTTP2 request to my server, or the server does not generate a 505 log entry, or there is something else going on.
Probably both B and C, because it is flat-out impossible that no human browser since 2015 has sent an HTTP2 request to your site. (Unless you’ve got a VERY niche audience.)

You wouldn't be seeing a 505 in any case on G requests, because as discussed in a parallel thread they won't be starting HTTP2 until next month.

Dimitri

5:31 pm on Oct 2, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



For those wondering.

This is during the TLS handshake, that a client and a server decide on using HTTP/2 or not. This is one of the reasons why HTTP/2 is used over a TLS connection, to avoid extra round trips.

When the client sends the ClientHello message (first message in a handshake), it lists the TLS protocols, ciphers list, compression methods, etc... as well as the list of application protocols it supports. The application protocols are, for example, HTTP/2 or 1.1.

The server replies with the ServerHello message, which will tell the client how they'll talk to each others. (which TLS protocol, cipher, application protocol, etc...).

So, no need of a 505 error message. If the server doesn't speak HTTP/2, it will simply inform the client that they'll use the HTTP 1.1 protocol and things go on.

lucy24

9:46 pm on Oct 2, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks, Dimitri. That's the part I didn't know about--and couldn't figure out the right questions to get the search engine to point me in the right direction.

Wonder if any robot in the past five years has tried to use HTTP2 over a non-secure connection?

Edit: It also means that SumGuy was right in the first place: nobody has made an HTTP2 request to his site ... because the site has told all human browsers up-front that it can’t be done. (I picture the elderly server saying “What is this HTTP/2 of which you speak? You’ll get your stuff in HTTP/1.1 and like it.”)

JorgeV

12:08 pm on Oct 3, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hello,

This is during the TLS handshake, that a client and a server decide on using HTTP/2 or not

I had no idea of this ! But it makes sense, indeed.