Everything just went supplemental - Removal tool too drastic? - Google Search and SEO forum at WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Everything just went supplemental - Removal tool too drastic?

internetheaven

6:58 pm on Oct 3, 2006 (gmt 0)

The entire site just went supplemental. Over 90,000 pages.

Considering using the removal tool and then starting again. Can't think of a better way to do this. Supplementals can stay in there for over a year and if more than 50% of my pages in the Google index are supplemental, surely that will simply destroy any chances of ranking new pages well?

texasville

11:10 pm on Oct 9, 2006 (gmt 0)

g1smd, is that really a good example? I agree with you so far as that can cause supplementals but it isn't really what I am talking about. WW has a main url with a pr8 and most subtopics are pr7.
Try looking at smaller sites with pr of 4 or less.
Look at commercial sites. Particularly sites that are static html that have never had dupe content problems and 301's in place since day 1. No canonicals. These are good examples of what I am talking about.

Robert Charlton

11:22 pm on Oct 9, 2006 (gmt 0)

It's now not just the site: operator. I'm seeing a significant number of url-only results, without titles or caches, in serps for some of the more competitive searches I monitor.

Apparently just a late-night feature. ;) Titles in the serps back to normal today....

g1smd

11:35 pm on Oct 9, 2006 (gmt 0)

It lasted almost a couple of weeks, but the problem was only noticed by most people during the last few days of it actually happening.

It all got fixed the day before yesterday, and Matt Cutts posted in a way-out corner of the web that "they were pushing binaries".

caveman

11:40 pm on Oct 9, 2006 (gmt 0)

> way-out corner of the web

g1smd, you're supposed to say: "Ducks and runs for cover..." after comments like that. ;-)

g1smd

11:48 pm on Oct 9, 2006 (gmt 0)

Naah.

DaveN isn't reading this thread.
:-)

jexx

1:28 am on Oct 10, 2006 (gmt 0)

Try looking at smaller sites with pr of 4 or less.
Look at commercial sites. Particularly sites that are static html that have never had dupe content problems and 301's in place since day 1. No canonicals. These are good examples of what I am talking about.

texasville, i think g1smd makes a good point regardless of pr. i've got lots and lots of pr4 pages that remain indexed due to the reasons i've mentioned before (no dupe content due to several accessible URL variations, perodically changing content, title/meta etc.) while other similar pages (non-changing/old, or accidentally dup's due to some URL variation such as?var1=foo etc..) have gone supplemental)

i am in 100% agreement with g1smd that supplemental is most likely being used as a "recycle bin" of sorts and not to stuff all your pages that don't have enough inbound links.
i think we're taking what matt cutts is saying a little to literally here. no ibl's is one reason for supplemental, but active pages with relevant content and no or low ibl's should remain in the main index

texasville

3:20 am on Oct 10, 2006 (gmt 0)

>>>>texasville, i think g1smd makes a good point regardless of pr. i've got lots and lots of pr4 pages that remain indexed due to the reasons i've mentioned before (no dupe content .....<<<<<

Yeah, right now you do...and pr4 pages might make it....but pr1,2,3...are going to go supplemental. All pr1's will and pr2's. Definitely. Google can't buy enough machines to keep indexing the trillion pages that are going to be on the web shortly. All the mfa's are going to disappear. All the lesser pages will be supplemental.
Read what MC said:
>>>>>it's just a matter of we have to select a smaller number of documents for the web index. If more people were linking to your site, for example, I'd expect more of your pages to be in the main web index<<<<<<

That statement is very telling. I see into the future where anything less than a pr5 page will be shuffled into the supplemental index...to be seen only when the site: operator is used.
I understand that a lot of webbers are not going to want to face facts...but the day of 50,000 plus page sites bringing in an income for them (thru ads)are going to go to the wayside....unless they can get all those pages a pr5 or better.

Marcia

3:43 am on Oct 10, 2006 (gmt 0)

I have a theory that any page on a site that doesn't have a significant backlink(s) is going supplemental. I have 3 sites I am managing now that have gone entirely supplemental except for the index page.
All of the sites have one thing in common: all links point to the .com and not to any interior pages.
One of the sites is a brand new domain as of 10 months ago and has no dupe content issues. That I am positive of. And another has had a 301 since the first week it was up and all original content.
Those two sites are merely for very small businesses and haven't had much backlink production.

I'm absolutely in agreement with that and it's stood the test of time - and IBLs & PR. Nope, the toolbar ain't dead yet; the reports of its demise are grossly exaggerated and contra-indicated by the Supp results and indicators.

Not one single speck of duplication, what's Supplemental and what isn't on the test site(s) is 100% dependent on the amount of link love the pages are getting.

People who are looking for dup issues where none exist are, unfortunately, chasing their tails.

DaveN

8:17 am on Oct 10, 2006 (gmt 0)

way-out corner of the web that "they were pushing binaries ..

It's called leading edge ... take a few step towards the darkside only then will you see the light.

Daven

jexx

3:22 pm on Oct 10, 2006 (gmt 0)

marcia and texasville..
sure, pr1 and pr2 pages will probably go the way of the supplemental index. however, from my experience on any reasonably ranking domain i've never had problems getting internal pages (that are sufficiently internally linked and _most_ importantly have **good relevant and updated content** to at least pr3 in a few months).
but pr aside.. we all know that pr as it used to be is a way of the past..

ibl's do matter.. we know that..however, just blindly stating that pages without many ibl's will go supplemental i don't agree with..

yes, the main index is going to get compacted and as i said before, the supplemental will be a recycle bin of sorts.. there is just no way that google will throw _every_ page without many ibl's in the supplemental without severely impacting relevancy (which translates to ad revenue for them so no way!)

ibl's is just one of the criteria in the ranking algo, albeit a relatively high weight one as it seems..

content is still king.. keep your pages relevant and _updated_ and see what that does to those in the supplemental index..

also, i think the main trend is that google is going to rely less on the "traditional" methods such as tags, links, old pr etc. (not saying they won't at all, just to a lesser degree) and more on user data aquired from the desktop/toolbar (on every Dell PC) and also the Firefox browser sync extension, which let's you sync your browsign history between computer.. nifty feature.. also lots of browsing data for Google. .

texasville

10:31 pm on Oct 10, 2006 (gmt 0)

jexx...then I guess Matt Cutts was wrong in his explanation to a site owner. And I guess I am just imagining it when I am seeing that very thing happen.

caveman

10:56 pm on Oct 10, 2006 (gmt 0)

Hey Marcia, care to define 'significant number of backlinks'? I assume you do not mean to imply external backlinks only. Just wanna make sure I'm understanding the comments.

SteveWh

1:14 am on Oct 11, 2006 (gmt 0)

Most of my pages are supplemental, but they do still turn up in search results and I get traffic to them, so the "it might not be as bad as you think" comment could be correct.

This post [webmasterworld.com...] says that original intent of the supplemental index was "to augment the results for obscure queries". I interpret this to mean "these are low priority pages". The topic they're about is something rarely searched for. They're not *popular*. They're cached, but they are indexed only as time permits. Whether they're good or not really doesn't matter; no one requests them, and that's that. Google ranking and indexing *is* a popularity contest (popularity measured by inbound links, which will be greater if the site topic is a popular one). It's not a measure of quality or topic expertise. Google's real market is their search engine users, not webmasters, and they will give highest priority to whatever the largest percentage of their searchers want the largest percentage of the time, to give *them* the best possible user experience.

Adsense for Search is useless now except for the ads.

No kidding. When I tested AFS, I'd get: targeted ads at the top of the page, targeted ads at the bottom, and in between, 0 Search Results, so I use MSN site search. I get no revenue from them, but they provide what they're supposed to: site search results, and therefore a good user experience.

I also vote strongly with the many others who have said the removal tool is way too drastic for this situation. Poor indexing is a lot better than no indexing.

g1smd

1:18 am on Oct 11, 2006 (gmt 0)

>> to augment the results for obscure queries <<

Yes, by including URLs that did have content, but are now serving a redirect, or a 404, or the domain has expired. They also include old cached data for live URLs, so you get a result for the previous version of a page. Only a small amount are "unimportant", low PR, untrusted sites. Most of it is simply stuff stuck in the Recycle Bin waiting to be dumped after being in there for a year.

SteveWh

1:41 am on Oct 11, 2006 (gmt 0)

My site is likely in that category, though, so it's probably why I view it from that perspective. :) It's obscure. None of its pages fall into any of the other categories.

In another thread, there is a comment about supplemental URLs being "not good enough for the regular index". How does Google decide what's GOOD? This relates to my comment above about popularity. Google doesn't peer-review sites; it doesn't even human-review them. They have no algorithm whatever to determine the quality or authoritativeness of a site with respect to its subject matter. All they can rank is popularity as measured by IBL. So in other words, when they use the phrase "not good enough", they imply that they do have some algorithm to determine a site's quality, which is an impression they'd no doubt like to foster, but it's not true. They don't. All they can measure is popularity.

[edited by: SteveWh at 1:42 am (utc) on Oct. 11, 2006]

texasville

3:40 am on Oct 11, 2006 (gmt 0)

>>>>>>Only a small amount are "unimportant", low PR, untrusted sites.<<<<

But it is growing daily. It's what is happening now. And the trust factor is defined in the low number of ibl's to url's on sites.

g1smd

9:33 am on Oct 11, 2006 (gmt 0)

I only included that bit because I see so many other people talking about that in the last few weeks, and being so insistent about it.

Personally, I don't see that happening with any of the stuff that I am looking at.

However, I have made the mistake of ignoring something that only a very few people report (and which I don't see), before, only to see it become widespread only a few weeks or months later. So, I will be looking for that effect...

DaveN

1:50 pm on Oct 11, 2006 (gmt 0)

but google changed the way it handles IBL's a few months ago .. didn't they ;)

DaveN

g1smd

10:14 pm on Oct 11, 2006 (gmt 0)

Matt Cutts says he first noticed the shrinking site: results on the 5th. That's several days after it was first posted in here.

>> - For a brief while last week, site: only returned three results from a host. Someone mentioned it to me by email, but the first web report I saw was by DaveN on Friday (there�s your link, Dave). Fixed/working by the end that day, I think. It was related to a binary/executable that was going out, but a different binary than the one mentioned above. <<

[mattcutts.com...]

g1smd

1:15 am on Oct 12, 2006 (gmt 0)

Heh, Marcia, you're gonna love this Matt Cutts comment:

>> PageRank is the primary factor determining whether a url is in the main web index vs. the supplemental results, so I�d concentrate on good backlinks more than worrying about varying page layouts, etc. <<

[mattcutts.com...]

Note: The "primary" factor.

Jeez. No mention of Duplicate Content, and Redirects and 404 URLs at all.

Ah, but maybe he only means for "live" URLs, or maybe redirects and 404s no longer have any PageRank associated with them.

Whatever, it agrees with what you're saying: and I guess that fuels another link frenzy to start all over again.

caveman

2:01 am on Oct 12, 2006 (gmt 0)

g2smd, I alluded to this earlier in this thread; but will expand a bit more. IMO, the nature of the Supplementals has been changing before our eyes, with more to come. We've been catching screen shots over the last few weeks.

MC's comment above seems to me very consistent with other comments G has dropped over time including comments about storage and what they wish to focus on, and what we've been actually seeing. Plus my own hunch that they would not want to keep showing forever the evidence of what they've trashed. They keep obscuring windows into their machinery. One particularly interesting thing in past few weeks was lots of pages were newly supp, off and on, for a while.

I was gonna call this a prediction ... but it's getting closer to just a run of the mill opinion in light of MC's comment above:

Supps have just become less "bad" than they are generally perceived to be, and more like how they've been referred to by G off and on since their inception. Essentially it looks like thay are becoming a back up set of pages to be shown when more prominent pages won't do for a given query. Now, that's sorta been true all along, but as we know, there were lots of other situations that could get pages into the Supps ... most of them bad. Various forms of dup stuff, crappy feed pages, etc.

But what if in the future, Supps are just a repository for weaker pages, and not really page hell? So Supps in future might be more like, for example: Pages with very low PR, pages with no changes to the page in X period of time, weak pages with dup titles/meta but that represent some possibility of providing useful content, etc.

Where will the truly useless/garbage pages and URL's go, e.g., exact dup content URL's? A deeper form of dungeon: The place pages really don't want to be.

The coming hierarchy: Normal, Supplemental, Oblivion. Or, maybe not. Lot of hunches combined with observation and listening to G over the past several months. ;-)

tedster

2:28 am on Oct 12, 2006 (gmt 0)

What caveman is saying can fit nicely with observations like this thread started by youfoundjake: Number of results goes down by 60% for my niche keyword [webmasterworld.com]. In other words, more urls might be getting shuffled off to Supplemental and therefore removed from the "total" number of regular results.

steveb

2:37 am on Oct 12, 2006 (gmt 0)

youfoundjake's thread was from when the site operator was broken.

If there are more supplementals out there, the results for searches should show a higher number, not a smaller one.

A smaller number represents the opposite, less pages either indexed or supplemental. A smaller number would reflect the greatly preferable situation of URLs falling out of the index, instead of sticking as supplementals.

caveman

3:54 am on Oct 12, 2006 (gmt 0)

Been thinking about this for a while now, given:
- based on G's history (e.g., TBPR, backlink tool),
- comments made by G and MC about storage and what is important to feature as the Web grows ever bigger,
- and the mish mosh of reasons pages could go Supp (which IMO are more than commonly noted).

To me, it just makes sense that G would go in the direction of having a starting line-up of pages, and a bench. Why would they want to keep giving clues into what pages got dumped to the darkest depths? Throw them away from public view. Down to AA ball or worse.

If it goes as I'm (admittedly) guessing, it would also potentially have the effct of motivating sites to behave in ways that might heighten the value of the Supp pages. Downside: Linkfest 2.0. Upside: Webmasters revisit how to make those under-loved pages more valueable. ;-)

OK, back to things I think I probably know, instead of things I think might be so at some point, hehe.

Robert Charlton

4:54 am on Oct 12, 2006 (gmt 0)

What Google is doing is what Inktomi used to do.... Inktomi spidered and indexed everything, but only displayed what they considered useful.

What Inktomi spidered they called their "web map," and what they displayed they called their "best of the web."

If Google had done it that way from the start, there might be less screaming now (well no, maybe not ;) ).

Supps... Essentially it looks like thay are becoming a back up set of pages to be shown when more prominent pages won't do for a given query.

Some random thoughts about this....

I think this is very much like the way Google treats dupes... and that's an extension of the way it treats rankings in general. If a page isn't differentiated in ways the algo likes from the rest of the pack for a given query, then it doesn't rank. Similarly, the rankings of dupes on Google can be very query dependent, and in ways that are surprising. Depends on how the query relates to the inbound links, the page content, to the PageRank, etc etc.

Duped pages may, eg, drop out on a keyword search for a long common text string that they all share... even though it's not a very competitive search... but then return for searches that are more competitive but which are influenced by inbound anchor text.

In this regard, it might help to think of the "site:" operator as a query to find differentiated pages in a site. When those pages are not differentiated from each other in important ways on the site level (eg, they have identical titles, or not enough distinct content), the pages go supplemental on a "site;" search. But if specific pages are recognized by enough outside links, then they stay in the visible index for the "site:" search.

I don't think there's one simple answer to why a page disappears. It's sort of the other end of asking why it ranks. But it's query dependent, I feel, and there are 100 or 200 fabled factors.

Robert Charlton

4:56 am on Oct 12, 2006 (gmt 0)

PS... forgot to include an excellent WebmasterWorld reference to the Inktomi site map concept...

How Inktomi works...
...although some consider Inktomi doesn't "work" at all
[webmasterworld.com...]

g1smd

1:50 pm on Oct 12, 2006 (gmt 0)

>> When those pages are not differentiated from each other in important ways on the site level (eg, they have identical titles, or not enough distinct content), the pages go supplemental on a "site;" search. <<

There is another level above that. It's the "click for omitted results" feature. Those "omitted results" are pages in the normal index that don't cut it for the search term, as well as all the supplementals.

Most supplementals are for duplicate content, and for URLs that now redirect or are 404. I'll be looking more closely for those that are there only for low PR reasons.

texasville

2:39 pm on Oct 12, 2006 (gmt 0)

Robert, I think your inktomi analogy is close to the mark. However, I think Google has put one twist on it.
I have one site I manage that has all pages supplemental except for the index page. It has a pr3. Not what you would deem a high trust. However, it has now moved into the top ten for all it's main search terms. It has steadily moved up over the last three months.
It at times will be #1 for two word key phrase that returns 17 mil results.
Admittedly, none of it's terms are extremely competitive, most in the 2 to 5 mil returns size. But I think this was the starting point for google.

Robert Charlton

7:00 pm on Oct 12, 2006 (gmt 0)

There is another level above that. It's the "click for omitted results" feature. Those "omitted results" are pages in the normal index that don't cut it for the search term, as well as all the supplementals.

Playing with these is in fact the "inspiration" for my theorizing about how Google treats marginal results.

I think your inktomi analogy is close to the mark. However, I think Google has put one twist on it.

texasville - How are you determining that your inner pages are "supplemental?" Do you mean by this that they don't show up with the "site:" operator? If so, do they display if you use repeat the search with the omitted results included, or are they never visible to you in the index?

I think it's perfectly consistent with the Inktomi model that Google may rank the home page and not display the internal pages on a "site:" search. Also, if Google displays your home page, there's no reason it couldn't be ranking, regardless of how the inner pages perform.

As for the inner pages dropping out, you may simply have too many pages to support your small PR, coupled perhaps with a nav structure that doesn't distribute your PR wisely.

But I agree that there is a twist, and maybe lots of them. It appears that the Google model involves a combination of factors that might make "supplemental" pages or dupes show up on one query and not on another. I don't think that ever happened with Inktomi.

g1smd

9:35 pm on Oct 12, 2006 (gmt 0)

Supplemental Results are results where that URL has those exact words printed in green in the SERPs.

A URL might be returned as a normal result for one query and as a Supplemental Result for some other query; those I call "historical supplemental results". That effect is NOT a problem.

When you do a, typically, site:domain.com search, some URLs will be hidden behind the "click for omitted results" link. That is because their title and snippet was deemed too similar to those already listed OR they are "fully" supplemental results, supplemental whatever the query is (i.e. they are for URLs that are now redirects, or for URLs that are 404, or deemed totally unimportant). When you click the link, they all magiclly appear again.

It helps to think in terms of two main things here:

- what is stored in the main index, and what is stored in the supplemetal index,

and,

- what do they actually show for a search query, which index does each item come from, and what is still hidden behind the "click for omitted results" link, and why?

This 120 message thread spans 4 pages: 120