Forum Moderators: Robert Charlton & goodroi
Considering using the removal tool and then starting again. Can't think of a better way to do this. Supplementals can stay in there for over a year and if more than 50% of my pages in the Google index are supplemental, surely that will simply destroy any chances of ranking new pages well?
Try looking at smaller sites with pr of 4 or less.
Look at commercial sites. Particularly sites that are static html that have never had dupe content problems and 301's in place since day 1. No canonicals. These are good examples of what I am talking about.
texasville, i think g1smd makes a good point regardless of pr. i've got lots and lots of pr4 pages that remain indexed due to the reasons i've mentioned before (no dupe content due to several accessible URL variations, perodically changing content, title/meta etc.) while other similar pages (non-changing/old, or accidentally dup's due to some URL variation such as?var1=foo etc..) have gone supplemental)
i am in 100% agreement with g1smd that supplemental is most likely being used as a "recycle bin" of sorts and not to stuff all your pages that don't have enough inbound links.
i think we're taking what matt cutts is saying a little to literally here. no ibl's is one reason for supplemental, but active pages with relevant content and no or low ibl's should remain in the main index
Yeah, right now you do...and pr4 pages might make it....but pr1,2,3...are going to go supplemental. All pr1's will and pr2's. Definitely. Google can't buy enough machines to keep indexing the trillion pages that are going to be on the web shortly. All the mfa's are going to disappear. All the lesser pages will be supplemental.
Read what MC said:
>>>>>it's just a matter of we have to select a smaller number of documents for the web index. If more people were linking to your site, for example, I'd expect more of your pages to be in the main web index<<<<<<
That statement is very telling. I see into the future where anything less than a pr5 page will be shuffled into the supplemental index...to be seen only when the site: operator is used.
I understand that a lot of webbers are not going to want to face facts...but the day of 50,000 plus page sites bringing in an income for them (thru ads)are going to go to the wayside....unless they can get all those pages a pr5 or better.
I have a theory that any page on a site that doesn't have a significant backlink(s) is going supplemental. I have 3 sites I am managing now that have gone entirely supplemental except for the index page.All of the sites have one thing in common: all links point to the .com and not to any interior pages.
One of the sites is a brand new domain as of 10 months ago and has no dupe content issues. That I am positive of. And another has had a 301 since the first week it was up and all original content.
Those two sites are merely for very small businesses and haven't had much backlink production.
Not one single speck of duplication, what's Supplemental and what isn't on the test site(s) is 100% dependent on the amount of link love the pages are getting.
People who are looking for dup issues where none exist are, unfortunately, chasing their tails.
ibl's do matter.. we know that..however, just blindly stating that pages without many ibl's will go supplemental i don't agree with..
yes, the main index is going to get compacted and as i said before, the supplemental will be a recycle bin of sorts.. there is just no way that google will throw _every_ page without many ibl's in the supplemental without severely impacting relevancy (which translates to ad revenue for them so no way!)
ibl's is just one of the criteria in the ranking algo, albeit a relatively high weight one as it seems..
content is still king.. keep your pages relevant and _updated_ and see what that does to those in the supplemental index..
also, i think the main trend is that google is going to rely less on the "traditional" methods such as tags, links, old pr etc. (not saying they won't at all, just to a lesser degree) and more on user data aquired from the desktop/toolbar (on every Dell PC) and also the Firefox browser sync extension, which let's you sync your browsign history between computer.. nifty feature.. also lots of browsing data for Google. .
This post [webmasterworld.com...] says that original intent of the supplemental index was "to augment the results for obscure queries". I interpret this to mean "these are low priority pages". The topic they're about is something rarely searched for. They're not *popular*. They're cached, but they are indexed only as time permits. Whether they're good or not really doesn't matter; no one requests them, and that's that. Google ranking and indexing *is* a popularity contest (popularity measured by inbound links, which will be greater if the site topic is a popular one). It's not a measure of quality or topic expertise. Google's real market is their search engine users, not webmasters, and they will give highest priority to whatever the largest percentage of their searchers want the largest percentage of the time, to give *them* the best possible user experience.
Adsense for Search is useless now except for the ads.
I also vote strongly with the many others who have said the removal tool is way too drastic for this situation. Poor indexing is a lot better than no indexing.
Yes, by including URLs that did have content, but are now serving a redirect, or a 404, or the domain has expired. They also include old cached data for live URLs, so you get a result for the previous version of a page. Only a small amount are "unimportant", low PR, untrusted sites. Most of it is simply stuff stuck in the Recycle Bin waiting to be dumped after being in there for a year.
In another thread, there is a comment about supplemental URLs being "not good enough for the regular index". How does Google decide what's GOOD? This relates to my comment above about popularity. Google doesn't peer-review sites; it doesn't even human-review them. They have no algorithm whatever to determine the quality or authoritativeness of a site with respect to its subject matter. All they can rank is popularity as measured by IBL. So in other words, when they use the phrase "not good enough", they imply that they do have some algorithm to determine a site's quality, which is an impression they'd no doubt like to foster, but it's not true. They don't. All they can measure is popularity.
[edited by: SteveWh at 1:42 am (utc) on Oct. 11, 2006]
Personally, I don't see that happening with any of the stuff that I am looking at.
However, I have made the mistake of ignoring something that only a very few people report (and which I don't see), before, only to see it become widespread only a few weeks or months later. So, I will be looking for that effect...
>> - For a brief while last week, site: only returned three results from a host. Someone mentioned it to me by email, but the first web report I saw was by DaveN on Friday (there’s your link, Dave). Fixed/working by the end that day, I think. It was related to a binary/executable that was going out, but a different binary than the one mentioned above. <<
[mattcutts.com...]
>> PageRank is the primary factor determining whether a url is in the main web index vs. the supplemental results, so I’d concentrate on good backlinks more than worrying about varying page layouts, etc. <<
[mattcutts.com...]
Note: The "primary" factor.
Jeez. No mention of Duplicate Content, and Redirects and 404 URLs at all.
Ah, but maybe he only means for "live" URLs, or maybe redirects and 404s no longer have any PageRank associated with them.
Whatever, it agrees with what you're saying: and I guess that fuels another link frenzy to start all over again.
MC's comment above seems to me very consistent with other comments G has dropped over time including comments about storage and what they wish to focus on, and what we've been actually seeing. Plus my own hunch that they would not want to keep showing forever the evidence of what they've trashed. They keep obscuring windows into their machinery. One particularly interesting thing in past few weeks was lots of pages were newly supp, off and on, for a while.
I was gonna call this a prediction ... but it's getting closer to just a run of the mill opinion in light of MC's comment above:
Supps have just become less "bad" than they are generally perceived to be, and more like how they've been referred to by G off and on since their inception. Essentially it looks like thay are becoming a back up set of pages to be shown when more prominent pages won't do for a given query. Now, that's sorta been true all along, but as we know, there were lots of other situations that could get pages into the Supps ... most of them bad. Various forms of dup stuff, crappy feed pages, etc.
But what if in the future, Supps are just a repository for weaker pages, and not really page hell? So Supps in future might be more like, for example: Pages with very low PR, pages with no changes to the page in X period of time, weak pages with dup titles/meta but that represent some possibility of providing useful content, etc.
Where will the truly useless/garbage pages and URL's go, e.g., exact dup content URL's? A deeper form of dungeon: The place pages really don't want to be.
The coming hierarchy: Normal, Supplemental, Oblivion. Or, maybe not. Lot of hunches combined with observation and listening to G over the past several months. ;-)
If there are more supplementals out there, the results for searches should show a higher number, not a smaller one.
A smaller number represents the opposite, less pages either indexed or supplemental. A smaller number would reflect the greatly preferable situation of URLs falling out of the index, instead of sticking as supplementals.
To me, it just makes sense that G would go in the direction of having a starting line-up of pages, and a bench. Why would they want to keep giving clues into what pages got dumped to the darkest depths? Throw them away from public view. Down to AA ball or worse.
If it goes as I'm (admittedly) guessing, it would also potentially have the effct of motivating sites to behave in ways that might heighten the value of the Supp pages. Downside: Linkfest 2.0. Upside: Webmasters revisit how to make those under-loved pages more valueable. ;-)
OK, back to things I think I probably know, instead of things I think might be so at some point, hehe.
What Inktomi spidered they called their "web map," and what they displayed they called their "best of the web."
If Google had done it that way from the start, there might be less screaming now (well no, maybe not ;) ).
Supps... Essentially it looks like thay are becoming a back up set of pages to be shown when more prominent pages won't do for a given query.
Some random thoughts about this....
I think this is very much like the way Google treats dupes... and that's an extension of the way it treats rankings in general. If a page isn't differentiated in ways the algo likes from the rest of the pack for a given query, then it doesn't rank. Similarly, the rankings of dupes on Google can be very query dependent, and in ways that are surprising. Depends on how the query relates to the inbound links, the page content, to the PageRank, etc etc.
Duped pages may, eg, drop out on a keyword search for a long common text string that they all share... even though it's not a very competitive search... but then return for searches that are more competitive but which are influenced by inbound anchor text.
In this regard, it might help to think of the "site:" operator as a query to find differentiated pages in a site. When those pages are not differentiated from each other in important ways on the site level (eg, they have identical titles, or not enough distinct content), the pages go supplemental on a "site;" search. But if specific pages are recognized by enough outside links, then they stay in the visible index for the "site:" search.
I don't think there's one simple answer to why a page disappears. It's sort of the other end of asking why it ranks. But it's query dependent, I feel, and there are 100 or 200 fabled factors.
How Inktomi works...
...although some consider Inktomi doesn't "work" at all
[webmasterworld.com...]
There is another level above that. It's the "click for omitted results" feature. Those "omitted results" are pages in the normal index that don't cut it for the search term, as well as all the supplementals.
Most supplementals are for duplicate content, and for URLs that now redirect or are 404. I'll be looking more closely for those that are there only for low PR reasons.
There is another level above that. It's the "click for omitted results" feature. Those "omitted results" are pages in the normal index that don't cut it for the search term, as well as all the supplementals.
Playing with these is in fact the "inspiration" for my theorizing about how Google treats marginal results.
I think your inktomi analogy is close to the mark. However, I think Google has put one twist on it.
texasville - How are you determining that your inner pages are "supplemental?" Do you mean by this that they don't show up with the "site:" operator? If so, do they display if you use repeat the search with the omitted results included, or are they never visible to you in the index?
I think it's perfectly consistent with the Inktomi model that Google may rank the home page and not display the internal pages on a "site:" search. Also, if Google displays your home page, there's no reason it couldn't be ranking, regardless of how the inner pages perform.
As for the inner pages dropping out, you may simply have too many pages to support your small PR, coupled perhaps with a nav structure that doesn't distribute your PR wisely.
But I agree that there is a twist, and maybe lots of them. It appears that the Google model involves a combination of factors that might make "supplemental" pages or dupes show up on one query and not on another. I don't think that ever happened with Inktomi.
A URL might be returned as a normal result for one query and as a Supplemental Result for some other query; those I call "historical supplemental results". That effect is NOT a problem.
When you do a, typically, site:domain.com search, some URLs will be hidden behind the "click for omitted results" link. That is because their title and snippet was deemed too similar to those already listed OR they are "fully" supplemental results, supplemental whatever the query is (i.e. they are for URLs that are now redirects, or for URLs that are 404, or deemed totally unimportant). When you click the link, they all magiclly appear again.
.
It helps to think in terms of two main things here:
- what is stored in the main index, and what is stored in the supplemetal index,
and,
- what do they actually show for a search query, which index does each item come from, and what is still hidden behind the "click for omitted results" link, and why?