Forum Moderators: Robert Charlton & goodroi
Demystifying the "duplicate content penalty" [googlewebmastercentral.blogspot.com]
For much deeper discussion about many specific issues involving duplicate content, I'd suggest looking at the Duplicate Content discussions in our Hot Topics [webmasterworld.com] section, pinned to the top of the Google Search forum home page.
"we do a good job of choosing a version of the content to show in our search results..."
Yeah, G, sure you do. On another planet perhaps.
Duplicate content can effect a site's performance, and they choose a version of content to rank, thinking they are "good" at that. "Good" is not "perfect". Here on earth duplicate content can hurt you badly because Google is not close to perfect in detecting and understanding the genuine circumstance of duplicate content they discover.
Did you see the bit that said:
1. When we detect duplicate content, such as through variations caused by URL parameters, we group the duplicate URLs into one cluster.
2. We select what we think is the "best" URL to represent the cluster in search results.
3. We then consolidate properties of the URLs in the cluster, such as link popularity, to the representative URL.
That is much wider reaching that the types of URL they have previous admitted to performing this action for.
Before, I understood that there was limited action in just two cases:
1. www and non-www.
2. named index file and "/".
This opens up more possibilities. However the post may represent an ideal rather than 100% reality.
I can think of many reasons why the routine would fail, especially on any dynamic site.
Certainly that is admireable for them to TRY to do, but the point is they still are just not that good at it, and sometimes pages are effected in a way that looks like a "penalty" but is just a google screwup.
I understood that there was limited action in just two cases
Externally linked domains carrying the same content has been detected for some time. Affiliate and syndicated content sites were particularily vulnerable, if they were not the original source of the content.
I think it's clear that Google only wants to show one version of content :
" Most search engines strive for a certain level of variety; they want to show you ten different results on a search results page, not ten different URLs that all have the same content. To this end, Google tries to filter out duplicate documents so that users experience less redundancy." Susan Moskwa @ Google [googlewebmastercentral.blogspot.com...]
Yes they have, and that's a whole different story.
I doubt, in those cases, that they re-assign back-links that point to resource X and count them as if they actually pointed to resource Y.
Or do they?
It would seem a bit risky to take things that far, but on the other hand, it might explain the catastrophic drop in performance when another resource is named as the original.
So far, in my post above I was only thinking about URL variations and duplication within one domain - and maybe that is all the Google announcement was thinking about too, and we're reading too much into it.
I doubt, in those cases, that they re-assign back-links that point to resource X and count them as if they actually pointed to resource Y.Or do they?
I think the links and the content on duplicate URL's get treated seperately , and it may throw up some issues on how pages are scored.
Firstly, if 2 pages interlink with the same content, one seems to get dumped [ filtered out ].
Links on the "dumped" pages seem to get counted for zilch ie they appear to pass no PR or link text. They do seem to get counted for content value though , if they are unique.
A "dumped" page does seem to show for unique snippets. I've seen pages with PR0 [ that would otherwise have green TBPR ) that are duplicates and still rank for those unique terms.
Is this what others are seeing ?
Firstly, if 2 pages interlink with the same content, one seems to get dumped [ filtered out ].Links on the "dumped" pages seem to get counted for zilch ie they appear to pass no PR or link text. They do seem to get counted for content value though , if they are unique.
I was with you up to that last bit. I'm not sure what you mean there. Here's my reading (and my confusion).
Two urls with the same content interlink, so one is dumped. So how can one of those dumped pages be unique? It was dumped because it's a duplicate, no? I must be misreading you - help me out, please.
In summary:
Having duplicate content can affect your site in a variety of ways; but unless you've been duplicating deliberately, it's unlikely that one of those ways will be a penalty. This means that:* You typically don't need to submit a reconsideration request when you're cleaning up innocently duplicated content.
* If you're a webmaster of beginner-to-intermediate savviness, you probably don't need to put too much energy into worrying about duplicate content, since most search engines have ways of handling it.
* You can help your fellow webmasters by not perpetuating the myth of duplicate content penalties! The remedies for duplicate content are entirely within your control.
[url=http://googlewebmastercentral.blogspot.com/2008/09/demystifying-duplicate-content-penalty.html]Google Blog - Susan Moskva, Webmaster Trends Analyst
[edited by: tedster at 12:28 pm (utc) on Sep. 18, 2008]