Google Updates PageRank Patent

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Updates PageRank Patent

goodroi

12:57 pm on Apr 25, 2018 (gmt 0)

Bill Slawski the King of SEO Patent Perusal posted a great read about Google updating PageRank [seobythesea.com...]

The information is complicated & not easy to read but it is very important. When a company like Google files this, it is often because they have changed how they are doing business. This modified approach to PageRank seems to makes it a bit harder to manipulate link power.

If you want to better understand how Google is thinking about link power you should read this.

engine

4:31 pm on Apr 25, 2018 (gmt 0)

Google has updated its patent for PageRank, reports Bill Slawski. [seobythesea.com]

Here's the patent dated April 24, 2018
[patft.uspto.gov...]

What is claimed is:

1. A method, comprising: obtaining data identifying a set of pages to be ranked, wherein each page in the set of pages is connected to at least one other page in the set of pages by a page link; obtaining data identifying a set of n seed pages that each include at least one outgoing link to a page in the set of pages, wherein n is greater than one; accessing respective lengths assigned to one or more of the page links and one or more of the outgoing links; and for each page in the set of pages: identifying a kth-closest seed page to the page according to the respective lengths, wherein k is greater than one and less than n, determining a shortest distance from the kth-closest seed page to the page; and determining a ranking score for the page based on the determined shortest distance, wherein the ranking score is a measure of a relative quality of the page relative to other pages in the set of pages.

One possible variation of PageRank that would reduce the effect of these techniques is to select a few "trusted" pages (also referred to as the seed pages) and discovers other pages which are likely to be good by following the links from the trusted pages.

[edited by: goodroi at 4:50 pm (utc) on Apr 25, 2018]
[edit reason] thread formatting [/edit]

martinibuster

6:30 pm on Apr 25, 2018 (gmt 0)

My take on it:

How Does this Affect Link Building?
This changes the game for link building. Actually the game has been changed for awhile now. The algorithms described here are closely tied to what we know about the Penguin Algorithm.

This affects link building because it is calculating link distances between an authoritative and spam free site and the sites it links to. These links are also divided by topic.

For link building, the ideal link is going to be a link from a site that is as close as possible to the most authoritative and high quality site in that niche. The difference is that the high quality sites are different for every niche. This changes what is meant by an authority site.

Google�s Patent Does Not Use the Word Trust
Google�s patent doesn�t even use the word trust. And they are not using a thing called Trust to calculate PageRank.

There are more insights in the rest of the article about how it affects link building and SEO. [searchenginejournal.com]

keyplyr

1:25 am on Apr 26, 2018 (gmt 0)

Patents, ToS, Privacy Policies... all in transition around the web in preparation to EU's upcoming GDPR on/about May 25.

McMohan

10:36 am on Apr 26, 2018 (gmt 0)

The difference is that the high quality sites are different for every niche.

Would I be wrong if I said the "high quality" sites in a niche are those sites that rank consistently for keywords in that niche, short or long-tailed? And the only way to winning the SEO game through link building is to somehow get linked by such sites, nofollow or not?

martinibuster

5:50 pm on Apr 26, 2018 (gmt 0)

Would I be wrong if I said the "high quality" sites in a niche are those sites that rank consistently for keywords in that niche, short or long-tailed?

Yes, those conclusions are wrong.

1. Measuring how spam free and full of backlinks a site is produces a metric of how spam free and full of backlinks the site is.

2. Those qualities of being spam free and full of backlinks are not what makes a site popular with users.

3. Google ranks sites that are popular with users. <-- That is what relevance means.

4. All this patent does is create a reduced link graph, a starting point to begin the ranking calculations.

5. In order to rank you have to be in the reduced link graph.

6. This patent is important because you can't rank if you're not in the reduced link graph.

So yeah, those are six reasons why those conclusions are wrong.

Should You Care About this Patent?
You don't need to know this patent to rank if you accidentally follow best practices for promoting a site. But I value knowledge and don't like the idea of throwing the dice and hoping I'm doing things right. I like to know I researched a project to the best of my ability before embarking on that journey.

McMohan

9:21 am on Apr 27, 2018 (gmt 0)

All this patent does is create a reduced link graph, a starting point to begin the ranking calculations.

In order to rank you have to be in the reduced link graph.

How does a new site (or a site not yet part of this coterie) gatecrash into this reduced link graph? I would assume it must get linked by sites within this link graph (Besides relevancy and popularity among users). Shorter the link distance from niche-authority sites to this wannabe site, closely knit it is within the group? Or am I missing something?

I would assume the set of sites that belong in the link graph is quite fluid and dynamic? Is there a way to know which sites Google considers as niche authority sites or part of the link-graph? I thought the obvious answer is the sites that rank consistently high for keywords from the niche.

tangor

1:45 pm on Apr 27, 2018 (gmt 0)

Among other reasons for updating the patent is the automatic 17 year exclusive use.

Narrowing the link graph might make it difficult for startups.

Keywords are again reduced.

I suspect that cash sent g's way might make all the above disappear.

martinibuster

4:44 pm on Apr 27, 2018 (gmt 0)

Or am I missing something?

Yes.

martinibuster

8:46 pm on Apr 29, 2018 (gmt 0)

Being spam free or authoritative about an entire niche or slice of a niche topic does not make a site useful for answering questions across a wide range of user intents within that niche slice. It's simply spam free and trustworthy for being spam free. Thus it is useful for being a seed site.

Links continue to be a ranking factor. But it's just one of hundreds of ranking factors.

So when it comes to new sites, they can still compete for long tail queries. That's been the case for a long time.

There are further permutations. But that should be enough information to clear up your questions.

Good luck,
;)

Roger Montti

Marketing Guy

9:50 am on Apr 30, 2018 (gmt 0)

Could this system be what has been traditionally observed as the 'sandbox effect'?

A new site launches and does all the things right (entirely subjective of course, but let's assume decent content and non spammy links) - but still doesn't rank particularly well. Sandbox theories assume that this is something to do with the age of the site and at some point over the next 12 months a threshold of sorts is reached. After this, rankings 'kick in' as they normally would and it's business as usual.

Instead, consider the "distance from seed" factor for new sites. Content is irrelevant (well, not quite - but in most cases) and easily acquirable links aren't likely to be "near-seed". However, over the course of the 12 months from launch, a regular website undertaking regular business activities will at some point acquire natural links from "seed" or "near-seed" sites (i.e. the hub network within a particular vertical). At that point, PR is refactored / switched on / whatever and rankings improve in line with what we'd generally expect.

It would be an effective test of how legitimate a new site was (in line with the original idea of PR) and from the outside the effects would be observed as being sandbox-like. It would also explain conflicting or contradictory reports of the sandbox effect.

That would of course mean this particular algorithm or some iteration of it has been live for many years (circa 2004?). Nissan Hajaj, who the patent is attributed to, joined Google in 2004.

And we all recall the first piece of advice for "sandbox victims" - get links from an authoritative site... ;)

McMohan

10:57 am on Apr 30, 2018 (gmt 0)

Being spam free or authoritative about an entire niche or slice of a niche topic does not make a site useful for answering questions across a wide range of user intents within that niche slice. It's simply spam free and trustworthy for being spam free. Thus it is useful for being a seed site.

Thanks for explaining. Now I have more questions :-)

1. Seed pages - I am sure Google wouldn't handpick seed pages for every niche. For a site to be seed site, it must be 1. Spam-free, 2. Trustworthy and 3. Relevant. Though in the patent Google cleverly avoids using "Trustworthy" and replaces it with "Seed". If a site qualifies for all these 3 qualities (Spam-free, Trustworthy and Relevant), what prevents such pages from ranking for keywords in the niche, and not just rank, but rank high and wide? In other words, why can't we assume the sites that are consistently ranking high and wide for keywords in the niche as actually the seed sites?

2. Another interesting point from the patent -

...and from among the n seed pages, a kth-closest seed page to a first web page in the plurality of web pages according to the lengths of the links, wherein k is greater than one and less than n;...

n --> Number of seed pages (integer greater than one)
k --> denotes proximity of the page from the "Plurality of pages" to a seed page

And k lies between 1 and n. This is interesting. The maximum value for k can be n. In other words, a page having its k=n can actually be a seed page itself, provided it is spam-free and relevant. That is, it is linked directly by all the seed pages for the niche.

does not make a site useful for answering questions across a wide range of user intents within that niche slice

Absolutely. A page can't conceivably answer all questions, no matter how micro that niche is. That necessitates Google to assess the quality of the plurality of webpages, so that when a keyword is typed for which seed-pages cannot sufficiently answer, it looks in the plurality of webpages and fishes out pages that are 1. Relevant, in that they answer the question sufficiently 2. Trustworthy, determined by the kth distance from seed pages. If ever a question is typed in Google for which seed pages themselves answer sufficiently, you will find the plurality of pages always trailing the seed-pages. Makes sense?

McMohan

11:11 am on Apr 30, 2018 (gmt 0)

over the course of the 12 months from launch, a regular website undertaking regular business activities will at some point acquire natural links from "seed" or "near-seed" sites

Makes sense, though it is strictly not a function of time, but the rate of link acquisition from seed-sites or from "seeded" sites. A newly launched official Commonwealth Games website for instance might acquire links from seed pages on day one and start ranking from the first week.

martinibuster

3:49 pm on Apr 30, 2018 (gmt 0)

...why can't we assume the sites that are consistently ranking high and wide for keywords in the niche as actually the seed sites?

The criteria for being chosen as a seed set are vastly different from the criteria for being chosen to rank in the SERPs. Two different criteria.

The seed sets are only a starting point. Many more ranking factors and decisions follow on and happen elsewhere.

Conceivably some of the seed sites could rank well but don't count on it. It would be reckless to make business decisions based on guessing such a thing when it's well understood that ranking factors have not been applied to the seed set.

McMohan

11:59 pm on Apr 30, 2018 (gmt 0)

The criteria for being chosen as a seed set are vastly different from the criteria for being chosen to rank in the SERPs. Two different criteria.

Agreed, my surmise is based on deduction and reasoning and not on any written evidence.

But I find the reasoning to the contrary to be less convincing and logical. That is, what other factors(spam-free, trust, relevancy) make a page a seed page. And if these three are the only factors (rather main factors), then what makes pages from the crowd that are tested against the seed pages to rank ahead of seed pages.

Until I get a convincing answer, I will hold on to my surmise.

martinibuster

6:01 am on May 1, 2018 (gmt 0)

...what other factors(spam-free, trust, relevancy) make a page a seed page.

That's a reasonable question. Let's speculate. :)

Clean and on topic Inbound links
Clean and on topic Outbound links

However, if a site like NYTimes is considered a part of the seed set, then one has to consider if the seed set is at the page level rather than the site level.

The NYTimes can't be a niche specific site level seed site except for news. But if you set the seeds to the page level then the NYTimes can get back into contention.

McMohan

8:53 am on May 1, 2018 (gmt 0)

That's a reasonable question. Let's speculate. :)
Clean and on topic Inbound links
Clean and on topic Outbound links

Excellent! :-) But, wouldn't "Clean and on topic Inbound links" and "Clean and on topic Outbound links" come under the umbrella of "Spam-free" and "Niche-relevance"? Clean = Spam-free, and On Topic = Relevance?

Nowhere in the patent "seed site" is mentioned as far I could see. It is only "Page". So, taking these words in the patent on face value, we can deduce Google treats each seed page in isolation. But that is only concerning the relevancy part. If a page on NYT details in good measure a particular niche, it qualifies on "Relevance" and "Spam-free" criteria. It is still not a seed page, since it is not "trusted" in isolation. But a site like NYT is likely to have citations from seed-pages from both within and outside.

Now, if a page on NYT qualifies on all these 3 criteria (Trust, Relevance, Spam-free), why would Google rank any other non-seed page ahead of it?

martinibuster

12:44 pm on May 1, 2018 (gmt 0)

why would Google rank any other non-seed page ahead of it?

If the page was about products and the user intent is shopping, then a shopping page would outrank it.

tangor

4:29 pm on May 1, 2018 (gmt 0)

Of the three criteria, the one lacking quality is "spam-free". What is spam for one might be ham for another. Once again, the black box remains mysterious.

McMohan

5:28 pm on May 1, 2018 (gmt 0)

If the page was about products and the user intent is shopping, then a shopping page would outrank it.

That may be so if you start with the premise that seed pages need not be the pages that rank high for a given query. That, somehow Google designates a pool of pages as seed pages for every niche and sub-niche.

But, if the query has shopping intent and the seed pages do not necessarily satisfy this intent, IMO those pages fail the Relevancy criterion? That, what we assumed to be seed pages are not necessarily seed pages, but the pages that rank are?