Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google discovers links with URL fragments - Safe to remove after page load?

         

Selen

5:33 am on Jan 21, 2019 (gmt 0)

10+ Year Member Top Contributors Of The Month



I work on a small site that has long content and uses URL fragments (regular, not Ajax, like example.com/page-one/#section2 etc.). These anchor links are useful for readers who can quickly jump to relevant sections on the page, but Google started to discover such links in Webmaster console. The person who owns the site worries that such links could be classified as "duplicate links" (or at least would waste "crawl budget"). The fact that Google lists / finds both pages like: example.com/page-one/ and example.com/page-one/#section2 is a little worrying.

So I was thinking to remove these URL fragments by some Javasccript script that would execute after page load, for example:

history.pushState("", document.title, window.location.pathname + window.location.search);

Would it be a good solution to make Google "forget" and not find such links with URL fragments? Would it be a safe way to do so?

dennisjensen

12:29 pm on Jan 21, 2019 (gmt 0)

5+ Year Member Top Contributors Of The Month



Hi,
Could be, I'm missing the big picture here. But, the way you describe it. I'd make sure to canonicalize the parent page, and then not worry about the Url fragments anymore.

Selen

4:02 pm on Jan 21, 2019 (gmt 0)

10+ Year Member Top Contributors Of The Month



Yes, the page has proper canonization (without the URL fragment), but in Google console such pages ending with #section2 etc. appear as "Discovered / Not indexed yet" - so Google finds them and somehow considers them to be separate URLs, but they technically are not.

lucy24

5:40 pm on Jan 21, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The fact that Google lists / finds both pages like: example.com/page-one/ and example.com/page-one/#section2 is a little worrying.
It shouldn't cause worry. It means that a human searcher is sent directly to the part of the page that contains the desired content. I see this regularly. (Not in logs, of course, but in analytics.) I first noticed it years ago with fragment anchors attached to headers, but now see it with almost any type of anchor, like individual images or even page numbers--that is, page numbers of books, where the anchor appears in the middle of nowhere, not attached to any element. (And then I'm left wondering what, say, is so especially interesting on page 239 of Directions to Servants, but it can't be helped.)

It doesn't mean Google is crawling the same page redundantly; a single crawl reveals all the anchors.

Selen

6:03 pm on Jan 21, 2019 (gmt 0)

10+ Year Member Top Contributors Of The Month



Yes, thanks - so you suggest that I don't implement this JS anchor removal at all? (in this case some people link to URLs containing the /#sectionX fragment so that's how Google finds them - I just hope Google doesn't consider such pages/links as duplicate or some kind of manipulation).

phranque

11:17 pm on Jan 21, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



don't remove the named anchors!

the technical term for this is document fragment identifier.
from Section 4. URI References of RFC 2396 [ietf.org]:
A URI reference may be absolute or relative,
and may have additional information attached in the form of a
fragment identifier. However, "the URI" that results from such a
reference includes only the absolute URI after the fragment
identifier (if any) is removed and after any relative URI is resolved
to its absolute form.


in google terms this means google will crawl the document and if a meaningful usage of fragment ids is found google may link to these document fragments in the search results.

more from the "Webmaster Central Blog":
Using named anchors to identify sections on your pages [webmasters.googleblog.com]

tangor

3:17 am on Jan 22, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is ordinary behavior, and in fact is a benefit in refining your content to express user queries. g and the rest have been dealing with #fragments for many years. :)

No worries!