Forum Moderators: Robert Charlton & goodroi
Something is now becoming clear, and I think it's time we put away the name "950 Penalty". The first people to notice this were the heaviest hit, and 950 was a descriptive name for those instances.
But thanks to the community here, the many examples shared in our monthly "Google SERP Changes" threads and in the "950 Penalty" threads themelves, we now can see a clearer pattern. The demotion can be by almost any amount, small or large -- or it even might mean removal from the SERP altogether.
It's not exactly an "OOP" and it's not the "End of Results" penalty. From the examples I've seen, it's definitely not an "MSSA Penalty" -- as humorous as that idea is. (Please use Google to find that acronym's definition.)
It's also not just a Local Rank pheomenon, although there are defiitely some similarities. What it seems to be is some kind of "Phrase-Based Reranking" - possibly related (we're still probing here) to the Spam Detection Patent [webmasterworld.com] invented by Googler Anna Lynn Patterson.
So let's continue scrutinzing this new critter - we may not yet have it nailed, but I'm pretty sure we're closer. The discussion continues:
[edited by: tedster at 9:18 pm (utc) on Feb. 27, 2008]
Page-based penalty, not phrase based. Definitly seems like Google is assessing the penalty after the url's are retrieved, like the spam patent says.
I have done nothing, and yet on ip 72.3.232.139 one of the phrases has bounced up to 45th. The other phrase that we rank for remains 6th still in Google, untouched by this whatsoever. The rankings are from the same page
[edited by: Marcia at 9:19 am (utc) on Feb. 9, 2007]
Is it possible this is a time-orientated 'zap' of some time, where by pages are instantly demoted to the back end, and 'asked' to start the climb again based on x factors?Or is this some kind of reshuffling whereb we will see these sites return again.
I know this is not phrase-based but this keeps popping to the forefront of my mind too as I read the various experiences and accounts.
Matt has said on various occassions - and repeated just the other day in his DLTV interview - words to the effect that G is constantly updating everyday. The way he detailed it the other day seemed to suggest a "ratcheting effect", which he further demonstrated through the use of hand-gestures.
... which makes it very difficult for me to describe!
I don't think Cain is the only person to have said he's done nothing and is seeing a comeback. Others have said they've done something and seen a comeback.
Anyone that's made changes very recently and then seen SERP changes can't be sure if that particular change was the reason, simply because, IMO, the changes probably won't have gone through the mill yet, but rather, was more influenced by a freshness score, and/or, it could even have been due to happen anyway and thus now, might experience a further change down-the-line to reflect your last change/tweak.
I don't want to potentially further pollute this thread but I think the stories from those who have made no changes, and seen pages rise back up again (even if left stranded on page 2 or wherever) , is probably more of an interesting story, at this stage of the cycle. It could, afterall, be telling us that there is a new, more granular analysis of the entire index being performed, bit by bit, and what we may be seeing is a ripple-effect which will calm down.
That might be then the time to look at phrase-ranking etc...
Identical html? may be that falling somewhere under the "boiler plate" that adam mentioned not to long ago?
"If there's a mistake or error then they either all have it, or none have it."
Ahh, the key is to find the errors, fix the errors then sit back and wait to see it get re indexed. The good thing is if there are html errors on a template, its usually very easy to fix every page.
If you have errors, you have to clean those up first before speculating on reasons why things are not ranking well. Fixing the errors is the easiest and best thing any site can do! Both HTML and SPELLING.
Tedster made a great comment the other day about the keyword "amature" and all of its variations and how the safe search filter is looking for those types of words.
"Red widgets great jones bills companies sing loudly really great"
So google is probally looking for proper grammar as well as spelling in the filter. We recently had some pages drop -100 and the first thing we noted on the page was fragmented sentances and misspelling. We fixed it and it bounced right back to the top 10.
JerryRB couldn't it be phrase-based penalties for specific pages, filtered after the URLs are retrieved at query time by a few miliseconds lookup match against a pre-processed list of pages predetermined to be "bad" for the phrase, with quick lookup being accomplished by identifying the pages by means of the DocID?
I doubt it and let me explain why. I have unique content on each page, usually around 500 or so words. If I copy an entire sentence from the content and search for it, the page is end of results. If the page is about red widgets, even if the sentence doesn't mention red widgets, it is still end of results.
If I put the same sentence in quotes the page returns #1 of 1, simply because that sentence does not exist anywhere else on the web. When this happens there is not any preview of the searched content on the SERP (for any unpenalized page a search for a sentence would bring up the page as a result with the sentence in bold as the description).
This would mean discuss possible related issues – on some possible phrase based rerankings and/or the paten. Since this is related, I’m on topic. So please quit trashing up this thread trying to get people stay on topic with what your opinion of the topic is.
Now, I did state my thoughts, I don’t think people are being effected by it – not to the level that this post is taking it. I think the “ranking” issues we are discussing in this topic, has to do with other possible issues and not with keyword phrases or waiting for your site to re-index.
The patent says the amount of data they have to gather and use this on is large. Google is known for rolling things out slow and its safe to assume based on data size (which they comment amount) that they wouldn’t target every set of keywords or spam phrases – even though the paten uses the page / links, ect to build it’s possible spam phrases from – not all cross data is kept – so its pretty safe to assume that they will start in one direction, on a set of phrases, and let the engine build.
So staying on topic – Are we seeing a new Google patent take effect or are the recent mass ranking issues something else?
but first some quick background - i consider my sites as historical "trusted sites" - we have 1,000's of natural inbounds, been up for 8 years, have never been adversely affected by an update (ie Florida), have consistently ranked well on competitive kws
1. Our upper level pages (which have significant natural inbound links) have been largely unaffected.
2. Lower level pages have been variously affected - more than half are -150, -250, bottom, or nowhere on their (typically) 2 word phrases
3. If, however, you add another word to the 2 word phrase (fuzzy blu widgets instead of blue widgets) the page pops to #1 - even though it is "optimized" (in the traditional 26 steps fashion) for blue widgets
this tells me that, from what i see at least, the penalty is not directory or url based - for me, its phrases (btw, all of our pages validate perfectly)
for me, the spam filter explanation holds - i think that pages that are highly optimized using the 26 steps approach are likely to be hit - in my case, the pages that aren't affected, tend to have fewer keywords on the page (this can happen randomly because our pages are database generated - some will have higher keyword density than others).
we are rapidly deconstructing 8 years worth of work and effectively making our pages worse - i believe that the 26 steps methodology actually leads to better html and more targeted (and relavent) pages - it is this understanding (and matt has always - in so many words - embraced the 26 steps approach - in effect there was a partnership and trust between knowing developers and google - the result of that partnership was relavent serps and prosperity for all - now that is jeopardized - just take a good hard look at the junk in the serps
So many unknown factors could do this.
You state "highly optimized using the 26 steps approach" and "the pages that aren't affected, tend to have fewer keywords on the page"
This sounds more like errors. Errors don't have to be on site html problems, that’s just a mistake :) Errors happen in site flow, inbound links, errors in the listings, 404 pages, and even keyword rich pages produced across an entire website.
The Webmaster Guidelines is where I get the idea of errors from, having a flow of pages, all keyword rich, with no mistakes, perfectly optimized, is no part built for human interaction in the eyes of Google, and that's an error.
However, it could be phrase keyword spamming or a general keyword spam filter. Again, so many unknown factors.
These pages used to rank fine because (i assume) of our trusted site status
The pages that are generally faring better are those where the category has maybe only 3 items and where the item names don't all contain the phrase blue widgets - these appear not to have tripped the filter - since we have 10s of thousands of pages, it is difficult to hand tune each page - hence the random comment in my earlier post
seems like the trust factor has been decreased in the overall algo in favor of the new spam detection filter component - a dangerous departure in my view....
"seems like the trust factor has been decreased in the overall algo in favor of the new spam detection filter component - a dangerous departure in my view.... "
What is the overall % that the word "blue widget" or even "widget" appears - if the % is much higher than all other words, that could be a general keyword spam filter on that keyword or "phrase".. But I still don't see the phrase patent be what is effecting your pages. (based on the very limited info I have)
There is other evidence suggesting it is in play in the form of the solution to the 'GoogleBomb' affair... combined with this and other evidence, it does seem quite likely.
1. Affiliates
2. People chasing links -30 penalty
3. Sites that google does not think is useful to visitors -950
We have seen sites hit with this thing that don't fall into # 1 or # 2. Sites having nothing to do with affiliates and chasing links
As far as # 3, that very well could be, beauty is in the eye of the beholder (or algorithm) but that would mean the Google "algorithm" calculated pretty consistently for 4 years that certain sites were useful, then one day the "algorithm" calculated there were 949 sites that were more useful. That’s a pretty big swing, and means their definition of “useful” has changed. In a way it doesn’t matter if your site is useless, what matters is why the sudden change of heart.
The trick to seeing this thing is to forget about your own sites, and pre-conceived notions about what you think or don’t think it is. Go and search 20 or 30 reasonably competitive terms and look at positions around 930 to 1,000. Do that and you will see sites that any person would consider useful, and in fact out of place if lined up against the other 950 sites that are being ranked.
Even if your sites have not been affected by this, dismissing it is a lost opportunity to think about the types of changes Google makes and what they are up to right now; A routine purging of affiliate based or useless sites? Boy I don’t know.
But the thing is, imho, Google is always rocking the search boat around Dec, Jan, I think it's some kind of reshuffle. It affects the good content sites, not the ones that rely on links. After a while the affected sites return better than before.
Tedster seems to be the master of ceremonies in these matters, but I would say that an experienced and clever man like him (and most probably closer to the kitchen than I am) would spot these recurring, longer term considerations.
Instead of getting bogged down in details, why don't we ask ourselves the question why these proceedings have to last 3 months?
[edited by: Martin40 at 6:44 pm (utc) on Feb. 9, 2007]
Call me crazy, but if I was shopping for that product, I would find that site useful.
* I hope this does not violate the TOS, if it does apologies and delete away.
*Found it at # 798 on some dc's so I do believe you can be hit by this issue and not have to be at 950.
Now, I did state my thoughts, I don’t think people are being effected by it – not to the level that this post is taking it.
Then you have not done any research on it. Type in a keyword then go to the back of the results and find sites that were normally listed on page one two.
Randle - I do see pages that were at 950 now being re shuffled to various points throught the serps. I do still see websites at the end of the shuffle as well, but there is not as many. It is as if the index was re sorted as Tedster said based on some secondary refiling process.
I am seeing associated pages moving up as the root pages or pages attached by linking structure move up.
I should clear a few things up:
My site incurred the 950 penalty, currently is now 51st.
I do not affiliate or partner with anyone on any of my sites, hence we can theoretically throw that one out.
On this particular page one phrase has been holding the number 6 spot for months while the rest of the phrases tanked to 950. The phrase that stayed is a variation of the ones that tanked.
The abstract of the patent says, "Phrases are identified that predict the presence of other phrases in documents. Documents are indexed according to their included phrases. A spam document is identified based on the number of related phrases included in a document."
It doesn't say "a spam document is identified based on an unusually high number of related phrases included in the document."
Why can't it be exactly the opposite? I think it's equally likely that Google is punishing sites for not having related phrases on a page.
Let's assume that:
1. Google finds a page and determines correctly that it is about blue widgets
and
2. Because of its knowledge of semantic relationships, Google assumes that any page about blue widgets should also mention "whatzits", "thingamajigs" and "what-do-ya-call-'ems", all of which are closely related to blue widgets.
What kinds of pages probably won't have those related phrases?
Scraped pages, boilerplate pages and computer-generated pages. In short, any page that doesn't really offer good information about blue widgets, whether human or computer-generated.
To summarize, wouldn't it also make sense to read the abstract as saying, "a spam document is identified when there are too few related phrases included in a document"?
Scraped pages, boilerplate pages and computer-generated pages. In short, any page that doesn't really offer good information about blue widgets, whether human or computer-generated.
Scraper sites, if done right, generally contain an amazing amount of related phrases, and links to pages focused on those related phrases.
I tried to get the Z site to rank for anything involving either "life" or "insurance" and couldn't. Thoughts?
For just the term “insurance” which is on the site, and has a return of 426,000,000 it ranks # 870
For the term “financial services” which has a return of 412,000,000 it ranks # 24.
When you search “financial services” and find it, then click similar pages all the major insurance companies come up (the Hartford, Progressive, Chubb, AIG, CAN, Travelers, MetLife, ect) conversely, if you search “insurance” and find the “Z” site and click similar pages you get many of the same sites returned. So Google must relate it to insurance in some way.
But to search “insurance” or “life insurance” Google returns it at the end of the line, way past what we all can categorize as some serious junk.
A few facts;
PR 6
Domain first registered in 1994
12,000 back links
3,700 pages
Cached on Feb. 8th
If you search for it even using a close proximity to its name you get a listing with all the nice sub links under it.
Way back machine shows it up since at least 1996
It does have some errors, validation problems and is extremely weak for allinanchor;insurance. However, it is an interesting case. There very well could be an explanation (the lack of anchor text) but position # 870 for one of the very product it sells?
If Google thinks its worthy of position # 24 for “financial services” out of over 400,000,000 returns, and Google knows the site is similar to lots of other “trusted” old time insurance sites, why does it place it at # 870 for people searching for insurance?
Again, maybe a clear and easy explanation somewhere, (wouldn’t surprise me though if the owners of the site would be rather interested in that answer)
Maybe a lot of this does have to do with an ultimate re-ranking based upon the quality of the linkers being on topic and anchor text. Could it be that specific? to take a site this strong and due to a weakness in rote anchor text push it to the 800's?
We are talking about:
- Spam pages on .edu domains (most of which are 404 now)
- Domains with almost zero content on the ranked page and in one case, a domain that doesn't resolve
- Sites so far off-topic that the searcher would be hard-pressed to find anything of value
Back to the 950 vs. Reranking.
Is there not some validity to it being both? For the one site I have 950ed, I was top 20 for a wide range of 2 and 3-word phrases all thematically related. Many of which I did target *not* in any way, including IBLs, page titles, content, etc.
In late Dec. I dropped to the end for all the major ones I tested. Around the same time Google de-listed nearly half of my pages and threw 95% of what was left in the supplemental hell. Also, Googlebot slowly started decreasing visits to my site.
Clearly, in my case, Google decided to penalize my site for just about every phrase in a topical theme. Sorry, if this doesn't add anything new to the thread but I can't see how this change isn't penalty-related - why else dump sites to the very back (in some cases)?
Edit: Should add, the 950 drop in my case came about 2 weeks *before* the dropped pages/supplemental shake-up.
if they wanted to rank for life, that word would be on visible text on the page and not just in the meta description. should they be google bombed into that business or car or health? they mention "global network" in the description, should search engines come to the conclusion they sell bandwidth or long distance insurance?
have a great weekend everybody!
[webmasterworld.com...]
No dropping 300 spots has little in common with the end of the results penalty.
"Which is why tedster wanted to change the name from "The -950 Penalty" to something more appropriate"
Hijacking a thread doesn't make it so, or even sensible. This thread was about the 950 penalty before it was hijacked and turned into a is it apples or pajamas thread.
Comments are all over the place here because people are talking about completely different phenomenons. Some URLs are pinned to the bottom of the results for any search. Some URLs rank for some terms but are penalized for others. They aren't the same thing. Likewise the content on a URL and links to it are irrelevant sometimes (in other words, redoing a page with completely different words and linking can't make that URL Lazarus and raise it from the dead), while obviously a phrase penalty would change if every single aspect of content and linking was 100% different.
There is a phrase based indexing post now
[webmasterworld.com...]
so perhaps people wanting to talk about that can go there, while people trying to diagnose the 950 penalty can discuss that here.
Something is now becoming clear, and I think it's time we put away the name "950 Penalty". The first people to notice this were the heaviest hit, and 950 was a descriptive name for those instances.But thanks to the community here, the many examples shared in our monthly "Google SERP Changes" threads and in the "950 Penalty" threads themelves, we now can see a clearer pattern. The demotion can be by almost any amount, small or large -- or it even might mean removal from the SERP altogether.