Forum Moderators: Robert Charlton & goodroi
I recall discussing last year whether adding too many pages suddenly might trigger a flag or some kind of "Sandboxing". And we were guessing at that time.
However, our kind fellow member member Matt Cutts has posted on his blog [mattcutts.com] recently very interesting remark which might confirm what we were guessing:
"We saw so many urls suddenly showing up on spaces.live.com that it triggered a flag in our system which requires more trust in individual urls in order for them to rank (this is despite the crawl guys trying to increase our hostload thresholds and taking similar measures to make the migration go smoothly for Spaces). We cleared that flag, and things look much better now."
So it seems that we should be very careful in future when adding too many pages at the same time, otherwise sandboxing of our new pages would be a high possibility!
Thoughts?
but still actually work
It's wise to keep an eye on what works today, but it's even wiser to try to understand what will still work next week, or next year. That's where I want to focus my efforts.
I interpret Matt's comments as a signpost indicating the direction they're trying to go, rather than a milestone for how far they've come so far. If you take his advice you'll be fairly close to the right path a year from now.
First off, it isn't my intention in this particular post to start a Google-Bashing campaign. Rather I wish to share with you few thoughts and hopefully trigger few objective feedbacks.
It seems that the majority here agree that Google might sandbox sites adding suddenly relatively big portion of files/contents.
Reasons for such action could be fighting back on spam or/and resolving capacity problems of Google-boxes. And of course there could be other reasons too.
Regardless of the reasons, the results of such Google policy could be limiting the legitimate contents growth of legitimate websites. And accordingly limiting the growth and exchange of available information on the web. I.e Google mightbe playing the rule of Information-growth-Inhibitor instead of "to organize the world's information and make it universally accessible and useful", unfortunately. And I really doubt that Google's founders Larry Page and Sergey Brin will accept or encourage such policy once they are made aware of.
That matter could be a sign of very serious threat to the availability and flow of information on the web. And I don't wish to accuse Google of anything at the moment. But I do wish that Google pays much serious attention to such serious or rather critical matter.
Therefore we need at this point of discussion some feedback from Google employees as to how to resolve the problem of sandboxing ligitimate sites, once they add suddenly relatively large number of legitimate files, and thereby limiting the availability of ligitimate contents on the web.
...the results of such Google policy could be limiting the legitimate contents growth of legitimate websites. And accordingly limiting the growth and exchange of available information on the web. I.e Google mightbe playing the rule of Information-growth-Inhibitor instead of "to organize the world's information and make it universally accessible and useful"
- Just because a site is "legitimate" doesn't mean its content is intrinsically valuable or unique. For Google, usable, uncluttered search results are more important than an indexed site's legitimacy as a business.
- SERPs, like most other things in life, involve trade-offs. If Google's experience and statistical analysis suggested that the addition of 10,000 or 100,000 new pages overnight was a negative "signal of quality," then it wouldn't be unreasonable for Google to judge those pages by a higher standard than it might do under normal circumstances.
- I personally doubt that a site is going to get whacked only because it added a bunch of pages at once--or even that large numbers of new pages from every site will be sandboxed. And if either of things is happening, it's probably only until Google's profiling of "whole buncha pages added" sites is refined.
SERPs, like most other things in life, involve trade-offs. If Google's experience and statistical analysis suggested that the addition of 10,000 or 100,000 new pages overnight was a negative "signal of quality," then it wouldn't be unreasonable for Google to judge those pages by a higher standard than it might do under normal circumstances.
You assume that those filters that trigger flags are intelligent ones, which I doubt its the case. IMO, we are dealing with very primitive filters which flags files based on proportional numbers .. not the quality of files.
In his article, Matt confirmed that it was a number game not a quality one.
We saw so many urls suddenly showing up on spaces.live.com that it triggered a flag in our system
We had guessed that adding these pages might help with Y! and MSN but worried that Google would see it as spam. So far it appears that Google has not penalized us at all. We've even climbed a few position on our main search phrase, though I obviously can't point only to the added pages as the reason. But, it appears that adding 15k related pages to a site with under 2k pages does not trigger the alarm, at least in our case.
You assume that those filters that trigger flags are intelligent ones, which I doubt its the case. IMO, we are dealing with very primitive filters which flags files based on proportional numbers .. not the quality of files.
Google has historical data on every page and associated site...if a site (legitimate/established in the SERPs) suddently shows an unusual spike of new urls being either crawled or submitted (who submits pages these days?)...then this could certainly trigger a filter (suppression, penalty, exclusion) and possibly even a manual review...as to what is going on with this established site and why are there suddenly 100,000 (example) new pages...
(How are these new pages adding value to the respective sector?...and from a usability perspective ... for positive click thru indicators from Google's SERPs-to-page traffic tracking techniques...)
Regarding the quality of the files...this can be determined when the files are crawled and run through the ranking algo variables...do these pages indicate over optimization...or are they set up for the human visitor? (easy for Google to get this)...
- "Trusted Sites" are able to add too many files suddenly without triggering a flag.
- The rest of sites can't add the same number of "too many files" without triggering a flag and accordingly being sandboxed. I.e such sites have the inhereted disadvantage of being equated to spam sites! I.e such sites are judged guilty until they prove otherwise!
Its that way of thinking from our friends at Googleplex which I question. As such, this thread isn't only about technicality, filters and flags rather its also about moral issues which the friends at the plex need to address!
As such, this thread isn't only about technicality, filters and flags rather its also about moral issues which the friends at the plex need to address!
What moral issues?
If some sites (or types of sites) are more trusted than others, what's immoral about that?
It's legitimate to question whether "whole buncha pages added" filters are implemented correctly, or whether they work as well as they should, but what's morally wrong about discriminating between pages that are statistically likely to be junk and pages that aren't? Webmasters need to get over the idea that they're entitled to listings no matter what. Search results are based on editorial judgments (whether made by humans or by algorithms that use criteria set by humans), and SEs like Google have the right to make editorial decisions just as we do.
I don't think anyone is saying it isn't. What they are saying is that the criteria can cause massive amounts of collateral damage.
Its pretty tough to follow all the rules when the rules are constantly changing. Its even tougher when you have to read through the cryptic lines of the unofficially blog. On top of that it seems everyone thinks the rules are different based on trustrank, a concept we really know nothing about. Throw that all together and you have a lot of confused people that are being told that Matt Cutts video's aren't really accurate or truthful based on the some the experts of this site.
Its pretty tough to follow all the rules when the rules are constantly changing.
The "rules" are constantly changing because the demand in the marketplace at Google's level is constantly changing...
One rule that never changes...know everything you can about your target audience...and speak to them directly through your web design, information design, copy...etc..etc..stop chasing the algos...and you think smoking shortens you life?
It just so happens that Google is targeting everyone on the planet...so their need to constantly adjust their approach to satisfy this is important...plus their insatiatable appetite to reap in staggering profits every quarter..
:)
[edited by: Aforum at 8:49 pm (utc) on Sep. 7, 2006]
This is yet another example where there is nothing to be confused about, but webmaster fud starts on a Chicken Little spiral.
Adding a few hundred thousands of pages in a day will probably get them viewed as what they are, not very important.
Nor are a lot of folks who setup a site only to discover that they are being treated as spammers.
There are too many cooks in the process and the chicken has been over spiced and under cooked.
I know of many members here who could all of a sudden look like they were spammers.
In short we heve met the enemy, and they are we.
MSN's free host pages are "fly by night". The addition of hundreds of thousands pages on MSN spaces is simply the addition of free host pages, which should always be viewed very, very skeptically.
The real problem of course is Google's TrustSpam algo continues to love these free host pages. It seems suddenly they woke up, duh, and noticed their index is full of putrid garbage on free web hosts, that ranks due to hundreds of thousands of links from other free webpages/blogs.
They like free hosts. They like blogs. They are so far up their wrong end in this regard they will need a year or more to extract their heads. Finally noticing, duh again, that spammers are exploiting their incredibly stupid algo leads to a simplistic response wholly inadequate to deal with the problem they have created.
If someone is confident enough to contradict Matt Cutts on an indexing issue, they have labeled themselves.
I'll assume you mean me. And as oft uttered, The SERPS speak for themselves.
MC is not God, nor even all-knowing as to how the 1000s of algo factors interact with each other.
So I'll repeat again, anyone can be an "expert" on to what really works if they study the various SERPS for various keywords thoroughly, whether it's for white, grey, or black hat techniques.
--------
Note- if you're worried about "where" you rank for any given page and trying to "avoid" sandboxes, filters, or whatever, by definition you are trying to "game" Google. I have no problem with that per se, but at least be honest about it.
True "legitimate" white hatters focus on building authority sites that can live off of word-of-mouth, bookmarks, natural memes, etc. The top rankings come naturally... whenever they come.
if one is trying to "rank legitimately" and needs to add 100k "legitimate" pages to their 2k page site....
So you have a relatively small 2K site...and you discover overnight that in order for your site to rank in your sector you need to add 100K pages immediately..(and forget "legitimate")...adding this many pages constitutes spam...plain and simple...it is not for your user base...but as you describe...for ranking purposes....you are spamming the index ... plain and simple...and this should trigger some sort of suppression filter (at the least)...
but as you describe... for ranking purposes....you are spamming the index ... plain and simple
Lol, you put quite a few words in my mouth.
1.) I never said it was spamming or not spamming.
2.) When did I say for "ranking purposes"?
3.) Who says that is G going to see it as spamming?
MC says it will be seen as spamming, but the results for certain sites currently ranking using this techniques says different.
My point is...make your own informed judgement call on what will happen. Not to take MC's proclamation as the final decision.
I recently added 50k pages to a 2k "authority" site as I wanted to add a backend "cafepress-like" store to it.
No ranking issues for the existing pages.
New pages are ranking probably as they should considering how new they are.
Will this work for your site?
How the heck do I know!?
Do your research and take an educated guess.
Do you know if your site is "trusted" by G?
Are your pages able to pass a visual inspection?
Or take MC's comments as gospel and throw the pages up on a different domain and be sandboxed anyways.
I can't make anybody do their own research for what will or will not work for their site.
Actually, no, I didn't at all. Its more of a general statement as I've seen numerous people contradict what Matt says.
haha, no worries.
Just remember, no matter how nice (or knowledgeable) a guy he may be, his paycheck is signed by G. And their interests and goals don't always coincide with the people on this board.
He has every motive to "claim" they are cracking down on sites adding gobs of pages because currently that's what every good black hatter is getting away with.
Whether it's true or not may be a different story.
The whole point was "IF" a sudden influx can create a sandbox effect or trigger a flag. Most experienced people here think it does.
I just hate to see those who were told to submit sitemaps and a sudden influx of links form a site that has previously not used one be penalized for following Google's own advice.
Lets take a look again at Matt's statement:
By the way, it looks like the primary issue with the Windows Live Writer blog was the large-scale migration from spaces.msn.com to spaces.live.com about a month ago. We saw so many urls suddenly showing up on spaces.live.com that it triggered a flag in our system which requires more trust in individual urls in order for them to rank (this is despite the crawl guys trying to increase our hostload thresholds and taking similar measures to make the migration go smoothly for Spaces). We cleared that flag, and things look much better now.
There is no doubt whatsover that : so many urls suddenly showing up on a site have triggered a flag in Google "system".
Then we have a site (Windows Live Writer) which was sandboxed because of that.
Then somebody write about the problem on two popular blogs. The same matter was discussed by a person visiting Googleplex.
Then the kind folks at the plex resolve the problem MANUALLY .
Now... allow me just to ask you simple questions:
Do you think the same problem would have been resolved at the same manner if the owner of the site was a public-mom or a public-pop?
Do you think that if the same problem happen to your site, would the folks at the plex resolve it as they did in the case of (Windows Live Writer)?
So what do they do? Panic because they just LOVE this sort of utterly useless puke in their index. So what happens? spaces.msn.com spam had been put a great deal under control after being so dominant earlier this year, but now... Google races to highly rank a lot of spam on spaces.live.com
What kind of kool aid did they serve down there at the plex the first week of july that got everybody to go along with the group lobotomy?
The answers to your questions are of course no. Mom-and-pop, small business, small site, whatever...does not have that kind of access to the plex to get a manual review over anything. Had the same thing happened, as I'm sure it has thousands of times, to a webmaster without the ability to go visit google, he/she would be posting here, there, and everywhere. They'd be told to write good original content and get great links that are not bought or traided, check their titles and descriptions, correct the 301 redirect...blah blah.
Of course if my site gets banned, then I fix it, it probably won't be back in the index tomorrow either, like the BMW incident, as google doesn't accept phone calls or emails regarding this sort of request, they only have a reinclusion request that does not give any feedback whether it was read, denied, ignored, or deleted.
Thanks for a truthful reply. Much appreciated.
I must say, that I'm both surprised and disappointed at Google's dicriminating policy at the moment.
What Google consider spaming the index is allowed if you are a multi million company or a popular figure or having a popular website. You can add suddenly all the files you wish or even add all the gateway pages you think of. In that case you aren't only allowed to spam but you will recieve proudly a free promotion on a Google employee blog.
You might even be invited to Googleplex to discuss your issues.
While if you are just a public-mom or public-pop, nobody cares. No ivitation to Googleplex. No discussion of issues. Only filing a reinclusion request is all what you get.
Talking about Google needs to address few moral issues!
[edited by: reseller at 3:18 pm (utc) on Sep. 8, 2006]