Forum Moderators: Robert Charlton & goodroi
I was wondering what do you think about blogs and WordPress. As you know wordpress can have categories in which it'll show certian posts.
So now I can have 3 categories: A,B,C and then make a post which will be posted in all 3 cats...it'll show in each category, as well as on main page and in archives. As you can see there are many places on the site where that certian post shows.
What do you think, is this duplicate content, or not? How does Google treat such a behaviour?
Any clues?
THanks,
Manca
I ran into huge duplicate problems because of this about a year ago and many pages went supplemental. I resolved the issue by putting an on-the-fly generated robot meta "noindex,follow" in the date pages, and category pages. The indexable version is the post itself, which has a proper title to display in the SERPs and is therefore the most likely to be clicked. After this auto-generation of robot meta tags all supplementals eventually disappeared and rankings increased.
I run a number of wordpress blogs, and recently my site got hit by google, and I believe it's because of the duplicate content.
I have about 20 categories, and often post articles in at least two of these categories.
So there is:
1) The post
2) The Index
3) Category 1
4) Category 2 (maybe more categories)
5) The Monthly Archive.
That is a LOT of duplicate content. 5x or more.
I now try to post my articles in as few categories as possible, and have blocked google from my monthly archives, (As it's easier to navigate using the categories).
I'm not sure what else we can do to minimise dup content. Maybe add a "noindex" tag to the pages on the index (after page one of course)
If you only want the first page to be indexed, add in
<?php if ( is_home() ) {?>
<?php if ( $paged < 2 ) {?>
code...
<?php }?>
<?php }?>
On thinking about this more, my posts, and categories are most import to me, and not so much page2, page3, page4, etc from the index.
I feel it would be best for me to allow indexing of posts, categories, and the front index page only.
What do you think?
===
Im not sure how some large blogs can get away with posting in many sections, AND using the tagging system. Some posts have about 8 tags, leading to many many duplicate pages!
[edited by: Ma2T at 12:59 am (utc) on Sep. 28, 2006]
<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo ‘<meta name="robots" content="index,follow">’;
} else {
echo ‘<meta name="robots" content="noindex,follow">’;
}?>
[edited by: Marcia at 1:40 am (utc) on Sep. 28, 2006]
Just my 2 cents. I've already done that. Let's see the results :)
Manca
I think we first have to make a choice and use one of two systems, Date based (Monthly archives), or categories. For me categories are very important, so I will go with these rather than the dates.
Also I think that categories are more important to me than say, page 4 and page 5 of my main site. (Also we link to categories from every page, and we don't link to page 5 from every page)
I think this is my final answer for my situation.
Allow:
Main Index page, Articles, Categories.
Disallow:
Page 2, 3,5 etc from the index, monthly archives.
Pretty good thinking ;) Thanks for giving me some clues. I was blind definitely.
The more parents the worse it is I guess.
You can add a "noindex" to certain categories
Tags:
is_category('6')
When the archive page for Category 6 is being displayed.
is_category('Cheeses')
When the archive page for the Category with Name "Cheeses" is being displayed.
Eg:
<?php if ( is_category('6') ) {?>
code..
<?php }?>
Now its time to add this to my blog!
--
No problem manca, im glad I could help, im still giving this some thought to work out the best way. I agree with you on the whole page number thing also, good thought. It's going to be hard to eliminate all duplicate content, but hopefully it won't be too much of a problem.
[edited by: Ma2T at 1:55 am (utc) on Sep. 28, 2006]
If I add a "noindex" to site.com/category/ (only that page)
Would it stop site.com/category/article-name/ from being indexed?.. This page would not include the "noindex" tag.
I'm just wondering if Google would pass this restriction down the rest of the folder?
I'm hoping not.. I assume it wouldn't, but I would like some confirmation if possible.
If you block it in Robots.txt then that would be a different story ;)
I had this same duplicate content problem, but it didn't become a problem until I got some serious link juice, which caused Google to finally deep-crawl my site, and hence find all of those category pages.
I managed to fix it via robots.txt and meta noindex tags.
Works like a dream, but may take a while for Google to sort it out once you make the changes.
I now only allow indexing of my index page, and my post pages. Everything else is blocked.
[edited by: Dead_Elvis at 3:06 am (utc) on Sep. 28, 2006]
I'm not sure how concerned I should be with this since my WP is displayed in an iframe, however I made some of the modifications. The container page has been sitting at noindex for a while and this was changed today. We'll see how the iframe gets handled from here.
These are the same sorts of issues that I have been banging on about with forums, such as vBulletin, for the last year or two.
If you herd the bot into indexing what you want to be indexed and restrict all the alternative URLs you will not see any Supplemental Results for your site.
If you are already indexed, it will take a year for the supplemental results to fade out, but you will notice other improvements within a month or so of making the changes.
Your measure of success is in seeing how well the URLs that you do want to be indexed are doing.
And another thing I noticed about those pages, is that google indexed both domain.com/page and domain.com/page/ but interesting both of them aren't in supplemental index. Very weird...I don't get this.
What do you recommend me do with those pages? Should I interlink them as page/ or page, cause they are not actual directories, but as you know mod_rewritten dynamic urls.
Got clues?
Get your .htaccess file to rewrite one form to the other, and issue the "301" for the original one. That will cure it.
Which one, and which way, is up to you...
.
Don't worry about any URLs that appear as Supplemental Results after they have been turned into redirects. That is normal. Google hangs on to URLs that return a 301 or a 404 for one year after they start doing so.
They do NOT count as Duplicate Content if their HTTP code is 301 or 404. They will get cleaned up soon enough.
Your measure of success is in seeing that the URLs that you do want to be indexed do get indexed, and that they are no longer tagged as Supplemental, perhaps a few weeks after the fixes are put in place.
Again, you need to look into why a URL is Supplemental, only if that URL returns a "200 OK" response.