Duplicate Content demystified at Webmaster Central Blog

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Duplicate Content demystified at Webmaster Central Blog

travelin cat

7:57 pm on Sep 12, 2008 (gmt 0)

This is a very helpful discussion straight from Google:

Demystifying the "duplicate content penalty" [googlewebmastercentral.blogspot.com]

Robert Charlton

8:18 pm on Sep 12, 2008 (gmt 0)

The above discussion links to an article in the Webmaster Help Center that is also worth noting here...
Duplicate content [google.com]

For much deeper discussion about many specific issues involving duplicate content, I'd suggest looking at the Duplicate Content discussions in our Hot Topics [webmasterworld.com] section, pinned to the top of the Google Search forum home page.

g1smd

8:27 pm on Sep 12, 2008 (gmt 0)

No matter how many times you say it, you'll never get through to the "alt tag" (sic) and "dofollow" (sic) brigade.

steveb

8:29 pm on Sep 12, 2008 (gmt 0)

It's actually not very helpful. It's one of the FUDiest things they have put out.

"we do a good job of choosing a version of the content to show in our search results..."

Yeah, G, sure you do. On another planet perhaps.

Duplicate content can effect a site's performance, and they choose a version of content to rank, thinking they are "good" at that. "Good" is not "perfect". Here on earth duplicate content can hurt you badly because Google is not close to perfect in detecting and understanding the genuine circumstance of duplicate content they discover.

JoeSinkwitz

9:04 pm on Sep 12, 2008 (gmt 0)

It didn't touch upon duplicate content on titles and meta descriptions, of which I know firsthand can get slapped very hard. Maybe they don't call it a penalty, but filtering has the equivalent effect of 0 traffic.

g1smd

9:13 pm on Sep 12, 2008 (gmt 0)

There's something new in what they say though...

Did you see the bit that said:

1. When we detect duplicate content, such as through variations caused by URL parameters, we group the duplicate URLs into one cluster.
2. We select what we think is the "best" URL to represent the cluster in search results.
3. We then consolidate properties of the URLs in the cluster, such as link popularity, to the representative URL.

That is much wider reaching that the types of URL they have previous admitted to performing this action for.

Before, I understood that there was limited action in just two cases:
1. www and non-www.
2. named index file and "/".

This opens up more possibilities. However the post may represent an ideal rather than 100% reality.

I can think of many reasons why the routine would fail, especially on any dynamic site.

steveb

9:17 pm on Sep 12, 2008 (gmt 0)

That is new, true, but that is just another thing they can screw up, as anyone who has lost 100s of places in rank when the wrong URL is chosen can attest.

Certainly that is admireable for them to TRY to do, but the point is they still are just not that good at it, and sometimes pages are effected in a way that looks like a "penalty" but is just a google screwup.

Whitey

1:16 am on Sep 13, 2008 (gmt 0)

I understood that there was limited action in just two cases

Externally linked domains carrying the same content has been detected for some time. Affiliate and syndicated content sites were particularily vulnerable, if they were not the original source of the content.

I think it's clear that Google only wants to show one version of content :

" Most search engines strive for a certain level of variety; they want to show you ten different results on a search results page, not ten different URLs that all have the same content. To this end, Google tries to filter out duplicate documents so that users experience less redundancy." Susan Moskwa @ Google [googlewebmastercentral.blogspot.com...]

youfoundjake

1:25 am on Sep 13, 2008 (gmt 0)

I think they did, in brief, distinguish between a penalty and a filter, and specifically addressed CMS's that out of the box will have duplicate content, only because of the URL structure, which gets "filtered" as oppossed to scraping in which an actual penalty may occur...

g1smd

2:55 am on Sep 13, 2008 (gmt 0)

*** Externally linked domains carrying the same content has been detected for some time. ***

Yes they have, and that's a whole different story.

I doubt, in those cases, that they re-assign back-links that point to resource X and count them as if they actually pointed to resource Y.

Or do they?

It would seem a bit risky to take things that far, but on the other hand, it might explain the catastrophic drop in performance when another resource is named as the original.

So far, in my post above I was only thinking about URL variations and duplication within one domain - and maybe that is all the Google announcement was thinking about too, and we're reading too much into it.

Whitey

3:46 am on Sep 13, 2008 (gmt 0)

I doubt, in those cases, that they re-assign back-links that point to resource X and count them as if they actually pointed to resource Y.
Or do they?

I think the links and the content on duplicate URL's get treated seperately , and it may throw up some issues on how pages are scored.

Firstly, if 2 pages interlink with the same content, one seems to get dumped [ filtered out ].

Links on the "dumped" pages seem to get counted for zilch ie they appear to pass no PR or link text. They do seem to get counted for content value though , if they are unique.

A "dumped" page does seem to show for unique snippets. I've seen pages with PR0 [ that would otherwise have green TBPR ) that are duplicates and still rank for those unique terms.

Is this what others are seeing ?

tedster

4:04 am on Sep 13, 2008 (gmt 0)

Firstly, if 2 pages interlink with the same content, one seems to get dumped [ filtered out ].
Links on the "dumped" pages seem to get counted for zilch ie they appear to pass no PR or link text. They do seem to get counted for content value though , if they are unique.

I was with you up to that last bit. I'm not sure what you mean there. Here's my reading (and my confusion).

Two urls with the same content interlink, so one is dumped. So how can one of those dumped pages be unique? It was dumped because it's a duplicate, no? I must be misreading you - help me out, please.

Whitey

4:51 am on Sep 14, 2008 (gmt 0)

What i meant was , that the links on the dumped page seem to pass no effect [ PR or link text ].

However, if the link text only is unique [ on an otherwise duplciate page ] it may rank , but only for the link text.

I hope that explains my thoughts better.

tedster

11:42 am on Sep 14, 2008 (gmt 0)

Thanks - now I get it. I don't ever remember seeing a unique anchor text on an otherwise duplicate page, so I can't say much about it - but it sounds possible in the abstract.

potentialgeek

12:05 pm on Sep 18, 2008 (gmt 0)

< moved from another location >

In summary:

Having duplicate content can affect your site in a variety of ways; but unless you've been duplicating deliberately, it's unlikely that one of those ways will be a penalty. This means that:
* You typically don't need to submit a reconsideration request when you're cleaning up innocently duplicated content.
* If you're a webmaster of beginner-to-intermediate savviness, you probably don't need to put too much energy into worrying about duplicate content, since most search engines have ways of handling it.
* You can help your fellow webmasters by not perpetuating the myth of duplicate content penalties! The remedies for duplicate content are entirely within your control.
[url=http://googlewebmastercentral.blogspot.com/2008/09/demystifying-duplicate-content-penalty.html]Google Blog - Susan Moskva, Webmaster Trends Analyst

[edited by: tedster at 12:28 pm (utc) on Sep. 18, 2008]