Google Will Eventually Stop Following Links on Noindex Pages

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Will Eventually Stop Following Links on Noindex Pages

engine

3:54 pm on Dec 21, 2017 (gmt 0)

In a Google Webmaster hangout, John Mueller said, Google will eventually stop following links from a page that has noindex on it.
You may have to rethink how noindex is used on certain pages.

[youtube.com...]

Robert Charlton

12:42 am on Jun 11, 2018 (gmt 0)

Just some quick thoughts...

...is it better to "index,follow" low-value pages to make sure the link juice from those pages is not lost OR it is better to "noindex" low-value pages and lose the link juice?

On numerous occasions, John Mueller has suggested that improving the content on those pages would be preferable to noindexing them. .

I've also seen a Google product forum discussion, though, in which John suggested removing unnecessary pages.

To a degree, I'm thinking, these are consistent approaches. Google likes the site visitor to have a navigation experience that corresponds to what Googlebot is seeing. Google also doesn't like wasting resources on pages that aren't helping site visitors.

IMO, whether you use noindex would depend on these factors...
- the site-architecture, along with the function and location of the page
- why you are no-indexing the page
- and how important it is (in terms, say, of privacy) to keep any reference to the url out of the index.

Note that while robots.txt will keep Google from spidering the content of a page, it won't necessarily keep the url of a page out of the publicly viewable index.

Pretty much from the beginning of Panda, when Google suggested using the robots noindex tag to "hide" low quality pages, I was adamant that, for most pages, noindexing was a temporary, band-aid approach only.

Robert Charlton

1:28 am on Jun 11, 2018 (gmt 0)

Pagination is now best done via canonical tags, not noindex.

Shaddows, I think I understand where this is coming from, but I feel your comment by itself is dangerous to suggest without at least one caveat and perhaps more detail.

Noindex tags were used in pagination by some SEOs when they felt it was desirable to have the first page in an article or series of pages being the one that ranked. I agree that this is generally not a good idea.

It is possible, though, to misread your comment, suggesting canonical tags in pagination, as a recommendation that all pages in a paginated series should be canonicalized to a root or central page. This would be a misuse of the canonical tag, as individual pages in a series aren't usually similar enough to a root page for the canonical to be applicable, etc etc. I don't think this is what you meant.

I'm thinking you might mean that self-canonicals are often used in the individual pages in a series in conjunction with rel="next" and rel="prev", but I won't presume to interpret your comment further than that.

I would appreciate more detailed thoughts.

Selen

1:57 am on Jun 11, 2018 (gmt 0)

Thank you. In particular I mean profile pages just like here on WebmsterWorld. Are such profile pages worth indexing or not? I see WW doesn't block them via noindex, BUT Google only seems to have indexed like 80 members (out of thousands). Why is that so? - [google.com...]

So - using this specific example - is it better for long-term SEO gains to have these member profile pages 'noindex' OR better to allow to index them and let Google take care of indexing them or not. I understand that allowing to index = not losing Pagerank.. BUT .... wouldn't it be risky as far as Panda penalty is concerned?

I've read about it on numerous resources and nobody seems to know which approach is better..

engine

8:59 am on Jun 11, 2018 (gmt 0)

Profile pages at WebmasterWorld are not treated the same way as ordinary noindex pages. We're talking ordinary pages that someone decides to noindex.

Selen

1:31 pm on Jun 11, 2018 (gmt 0)

I noticed some big sites noindex such profile pages while others do not. Everybody is confused about it, especially after Google's comments that noindex,follow = noindex,nofollow in long term. So the question is if such profile pages are treated by Google as low-quality or not (it seems they are - otherwise they would index them all).

IF Google isn't going to apply Panda penalty because of such low-quality pages then I tend to think it's better to allow them to be indexed, but the risk could still be there.

aristotle

6:53 pm on Jun 11, 2018 (gmt 0)

So what is the consensus on that: is it better to "index,follow" low-value pages to make sure the link juice from those pages is not lost OR it is better to "noindex" low-value pages and lose the link juice?

What I generally do is to only index the home page and the article pages. Other pages (like "privacy policy" and contact pages, for example) in my view shouldn't be indexed because they normally don't have much content that's relevant to the overall theme or subject of the site.

If you're worried about "lost link juice", you can easily minimize it by using prudent internal navigation; i.e. by limiting the number of internal links that point to the noindexed pages.

Selen

7:14 pm on Jun 11, 2018 (gmt 0)

Thank you -- what about member profile pages (that mostly contain links to posts made by this member and no other relevant info - such pages are 'thin content') - should these pages be noindex? But if they are noindex, the 'link juice' is lost..

aristotle

9:16 pm on Jun 11, 2018 (gmt 0)

My sites don't have any profile pages, but if they did, I would noindex them. Otherwise they might hurt rankings and traffic because their content doesn't fit in well with the overall theme of the site.

As I said, you can avoid losing a lot of link juice by prudent internal linking. One way would be to put a list of links to all the members' profile pages together on one page. and let those be the only internal links to those pages.

robzilla

9:55 pm on Jun 11, 2018 (gmt 0)

Talking specifically about profile pages on a forum, I would make them and, importantly, any links pointing to them available only to users who are logged in. They're only interesting to members anyway, so what's the point of having them crawled at all.

koan

6:39 am on Jun 12, 2018 (gmt 0)

Profile pages can be great even for unregistered visitors if it displays a list of posts from a registered user that they find interesting. Personally I've noindexed my user profile pages and it seemed to have helped my sites out of Panda penalties back in 2012-2013 after a penalized period. I was told this was a better solution to blocking them via robots.txt because you can still benefit from external links juice to your user profile pages. Noindexing these pages was also a form of privacy protection if they wrote a bio, some people tend to divulge a bit too much. But if there are no benefits, I might as well put them in my robots.txt file now.

Selen

2:50 pm on Jun 12, 2018 (gmt 0)

If we assume the official Google position today is true then "noindex, follow" = "noindex, nofollow" = 404.
That means linking to a "noindex" page is like linking to a non-existing page with 404 response = losing all 'link juice.'

aristotle

3:38 pm on Jun 12, 2018 (gmt 0)

"noindex, nofollow" = 404

That's incorrect. The pages still exist and people can still visit them.

Selen

3:58 pm on Jun 12, 2018 (gmt 0)

404 header status is just a server status that doesn't control whether pages can be visited or not. But for Google, I think the above is correct, ie. "noindex" becomes like a non-existing page for Google. And linking to such pages is like linking to non-existing pages = wasting Pagerank.

Butes

3:59 pm on Jun 12, 2018 (gmt 0)

a 404 is a server response, so all search engines see is "404: resource not found".

A noindex, follow will eventually be treated as a noindex,nofollow in due time without removal of the robots directive is what the statement was dictating. the link "juice" (::shudder::) is not lost, it simply hits a dead-end because of the nofollow. nofollows on the other hand are a whole other can of worms. When a rel="nofollow" is appended to a link element, the link equity continues to be divided among all links on the page, however, it does not get passed through to the target URL under a nofollow, and therefore - for all intents and purposes - is lost to the internet. With a noindex,nofollow, the link equity pours into the target URL that is directed to be noindexed, but because every link is treated with a nofollow, the equity is divided among the links and passes on to nowhere.

Shaddows

4:00 pm on Jun 12, 2018 (gmt 0)

Pagination is now best done via canonical tags, not noindex.
...
It is possible, though, to misread your comment, suggesting canonical tags in pagination, as a recommendation that all pages in a paginated series should be canonicalized to a root or central page. This would be a misuse of the canonical tag, as individual pages in a series aren't usually similar enough to a root page for the canonical to be applicable, etc etc. I don't think this is what you meant.

I'm thinking you might mean that self-canonicals are often used in the individual pages in a series in conjunction with rel="next" and rel="prev", but I won't presume to interpret your comment further than that.

That's an old quote, and I can't remember what I was thinking. It wouldn't have been self-canonical though.

But you're right, rel="prev" and rel="next" is one correct way of doing it (we use this for results pages), or alternatively canonical to an unpaginated version (I see this used for long articles). Either way, noindex on pagination is not ideal- and canonical to page1 is plain wrong.

Selen

4:04 pm on Jun 12, 2018 (gmt 0)

Yes, that's about how I understand it too, Butes. But to me hitting a dead-end is like losing it (the 'juice' won't flow anywhere but will end on this "noindex" page - eventually if you link to such "noindex" pages your 'juice' is lost on them without getting back to your other pages linked from this "noindex" page).

lucy24

4:22 pm on Jun 12, 2018 (gmt 0)

I see WW doesn't block them via noindex

In the specific case of the present site, this happens to be true. But a page may also be noindexed via the X-Robots-Tag header, which Google recognizes (don't know about other SEs). That wouldn't be visible in the page.

A page can be perfectly valid, but meaningless as an entry page. Whether you disallow it in robots.txt or attach a noindex to the page itself, G### will eventually complain about your decision. You can only do what, in your judgement, is most appropriate for the site.

Robert Charlton

9:43 am on Jun 19, 2018 (gmt 0)

Pagination is now best done via canonical tags, not noindex....

...That's an old quote, and I can't remember what I was thinking. It wouldn't have been self-canonical though. // But you're right, rel="prev" and rel="next" is one correct way of doing it (we use this for results pages), or alternatively canonical to an unpaginated version (I see this used for long articles). Either way, noindex on pagination is not ideal- and canonical to page1 is plain wrong

Shaddows, thanks for getting back on this... and I'm sorry to be taking so long to reply to you.

I'm guessing, from your response, that you may be thinking of using canonicals (rather than noindex) on faceted pages... ie, filtered pages (like brands) that you want to be indexed. Particularly if these pages also require pagination, things can get very complicated very fast, and there's no one-size fits all approach.

Getting back to the topic of this thread... one of the issues with noindex pages is that they use crawl budget for the robots noindex meta tag to be read by Googlebot, so this is where you'd want to avoid noindex. I'm thinking that such noindexed filtered pages that aren't worth displaying are just what Google doesn't want to deal with unnecessarily often... so it would make sense for Google to stop following links from these and let their influence essentially vanish if you did decide to noindex them.

Robert Charlton

11:45 am on Dec 17, 2018 (gmt 0)

Just kicking this up to include a PS to the above... to make sure that the official word on this has been included in this discussion. It's possibly also in another thread here, but I'd rather be safe than incomplete in this case. As I read this, Google's problem in particular is with the vaguenss of the the noindex. (I'm also concerned with the ambiguity of where rel=canonical is used, as I noted above...)

In July 2018, Barry covered John Mueller's comments on mixing the two...

Google: Noindex & Rel=Canonical Should Not Be Mixed
Jul 20, 2018 - by Barry Schwartz
https://www.seroundtable.com/google-noindex-rel-canonical-confusion-26079.html [seroundtable.com]

There is an outstanding explanation from Google's John Mueller about the differences around the noindex and rel=canonical signals and why they should not be mixed. In short, Google wants clear signals that are consistent and straightforward....

John Mueller quoted by Barry. (Note that robots.txt also sneaks in here)....

The general rule of thumb is that signals get forwarded & combined with canonicalization. When Google sees two URLs from your site, they look the same, and you tell us your preference clearly, we'll try to combine them and treat them as one (usually stronger) URL instead of separate ones....

On the other hand, noindex (alone) & robots.txt disallow (in general) are not clear signs for canonicalization. Just having a noindex on a page doesn't tell us that you want to have it combined with something else, and that signals should be forwarded. A robots.txt disallow is even trickier, we don't even know if the page matches anything else on your site, so we couldn't even use it for canonicalization if we wanted to.

This is also where the guide that you shouldn't mix noindex & rel=canonical comes from: they're very contradictory pieces of information for us. We'll generally pick the rel=canonical and use that over the noindex, but any time you rely on interpretation by a computer script, you reduce the weight of your input :) (and SEO is to a large part all about telling computer scripts your preferences).

See Barry's article for full text.

aristotle

4:07 pm on Dec 17, 2018 (gmt 0)

If I understand this, judicious use of a canonical tag could allow you to create a new version of an old article which would enjoy the "combined strength" of both articles to boost it higher in the rankings.

So instead of deleting or no-indexing the old version of the article, you could keep it on the site and indirectly benefit from it.

This 50 message thread spans 2 pages: 50