Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Matt Cutts: Adding Too Many URLs Triggers A Flag!

Shouldn't We Be More Careful When Adding New Contents?

         

reseller

10:19 pm on Sep 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi Folks

I recall discussing last year whether adding too many pages suddenly might trigger a flag or some kind of "Sandboxing". And we were guessing at that time.

However, our kind fellow member member Matt Cutts has posted on his blog [mattcutts.com] recently very interesting remark which might confirm what we were guessing:

"We saw so many urls suddenly showing up on spaces.live.com that it triggered a flag in our system which requires more trust in individual urls in order for them to rank (this is despite the crawl guys trying to increase our hostload thresholds and taking similar measures to make the migration go smoothly for Spaces). We cleared that flag, and things look much better now."

So it seems that we should be very careful in future when adding too many pages at the same time, otherwise sandboxing of our new pages would be a high possibility!

Thoughts?

theBear

3:13 pm on Sep 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes steveb, the flaging system worked, and yes reseller and others the mom and pop folks probably don't have the same access to the folks at the 'plex (maybe they do, however a form from BMW may mean something that a form from mom and pop webmaster wouldn't), now I don't know if spaces is spam central or not (if it is then MSN should clean it up), if it was then it should not have ranked to start with, then they would not have even noticed the switch.

In short the failure is in the ranking to start with, not the flaging of vast increases in pages (whatever "vast" means for an individual site) which if the ranking was correct wouldn't need to even happen.

You can argue any or all of the above, however I have yet to see a computerized system that is capable of understanding most wetware concepts. Even wetware makes major blunders in this area.

I can see the next WebmasterWorld thread topic now:

Help: Adding thousands of off page links tanked my site.
Link creation flag triggered, site deep sixed (glub).

(I just added a number of off page links on a couple of thousand pages of content, will I get zapped by the link creation flag? Is there also a link destruction flag? The pages that I'm linking to have existed from yesterday all the way back to June of 1999. Come join us on the next episode of As The Web Wobbles).

europeforvisitors

3:56 pm on Sep 8, 2006 (gmt 0)



The pages that I'm linking to have existed from yesterday all the way back to June of 1999.

Just don't link to any pre-1989 Web pages unless you're willing to risk being caught by the "Huh?" filter.

reseller

9:37 pm on Sep 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi Folks,

Just wish to add, for future reference, a very relevant post [webmasterworld.com] to this thread posted on Sep. 10, 2006 by our kind fellow memeber ryanfromaustin:

We have a subdirectory on our site that contains most of our static content. As of about a month ago, we had only about 100 pages in the directory and all were indexed in Google. We then added several thousand pages at once into this directory. Now, Google appears to not be indexing any files in that directory, though they are still index files in our root directory as well as other subdirectories.

reseller

6:34 am on Sep 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi Folks

To conclude this thread with something constructive and posative, and to be of help for other kind fellow members, I ask you kindly to suggest solutions for:

How to avoid sandboxing your site/subdirectories while adding large number of files of contents.

Thanks!

whitenight

6:57 am on Sep 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let's make this simple...

If you absolutely, positively NEED to add 50k+ pages to maintain the integrity of your site then do it.

I simply don't BUY the original premise that it's an automatic sandboxing, but whatever...

If you don't absolutely, positively NEED to add 50k+ pages for your users, then expect those pages to be sandboxed.
Or put it on a different domain.

Why are we making this so difficult?

decaff

9:55 am on Sep 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How to avoid sandboxing your site/subdirectories while adding large number of files of contents.

You only need to answer this question one way:
What is your intent for adding large numbers of pages (if your site is the type of site where the user base builds your content for you by interacting with each other...then this type site does not factor in to my comments...and these type of sites can grow very rapidly if they take on in their target market..).....

If your intent is to go after listings by flooding the index with a high volume of new content (proportionally out of sync with your normal page growth rate)....then expect to run into problems...

If you are building out for your user base...then chances are the build would be more incremental and not raise any eyebrows...

Aforum

2:01 pm on Sep 11, 2006 (gmt 0)

10+ Year Member



I simply don't BUY the original premise that it's an automatic sandboxing, but whatever...

Easy way to find out. Go try it.

I've seen it happen with less than 1000 pages simply because someone fixed their meta tags.

[edited by: Aforum at 2:14 pm (utc) on Sep. 11, 2006]

Aforum

2:08 pm on Sep 11, 2006 (gmt 0)

10+ Year Member



Why are we making this so difficult?

I don't think anyone is making it difficult, I think most people are just upset at the prospect of getting flagged and sandboxed for actually fixing a site.

[edited by: Aforum at 2:14 pm (utc) on Sep. 11, 2006]

Aforum

2:13 pm on Sep 11, 2006 (gmt 0)

10+ Year Member



What is your intent for adding large numbers of pages

Maybe fixing a site that wasn't indexed properly?

Mod_Rewrite?

Not using the sitemaps program?

Fixing meta and title tags that cause pages not to be indexed?

There are many scenarios in which you can fix a site but cause a large amounts of url's to be submitted.

Not everyone is trying to flood the index.

[edited by: Aforum at 2:18 pm (utc) on Sep. 11, 2006]

financialhost

5:29 pm on Sep 11, 2006 (gmt 0)

10+ Year Member



Afourm that is exactly what i was thinking - in actual fact, most large companies will flood the index at some point due to general maintenance & so the theory of better serps through this kind of filter is flawed.

You can't have your cake & eat it - lets hope there is an air of senseabily from this prospect & that innocent sites don't get kicked by mistake - but if that is the case it will only take so long before the kick arrives back at Google itself.

Its never worked to remove 10 spammers but accidently remove 2 good quality sites - you just end up with an index with is full of average sites - its at this point the users start to leave.

So - the point is, this effects Googles bottom line if they get it wrong.

tiori

8:49 pm on Sep 11, 2006 (gmt 0)

10+ Year Member



So - the point is, this effects Googles bottom line if they get it wrong.

Yes, it does, but Google has such a large "bottom line" that just about everyone else will go broke before them including the big boys and mom and pops!

carminejg3

8:54 pm on Sep 11, 2006 (gmt 0)

10+ Year Member



wish we knew this 2 weeks ago.... we have a directory portion of our site that we hand edit links. has about 50k pages. seems like the filter was triggered when i updated all pages all that was changed as a link attribute from target="_blank" to target="_top"

sad part is all goggle will catch with this filter is mostly innocent sites just making updates..... since most "index spammers" will know what to watch out for and have the time and Resources to make daily changes. Since they probably read MC's blog.

I simply don't have the time to keep up with the google dance, read mc's blog, or worry about seo tricks, when I'm simply trying to update a fairly large site on a fairly low budget....

does anyone know how long the sandbox period is or it until they spiders can check out the changes and see if they merit a hold?

caveman

9:48 pm on Sep 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



<rant>

Time for my twice annual rant on "the sandbox." ;-)

This thread is one good example of why I hate the term "sandbox" so much. (Those who have used the term, please take no offense; none intended.)

"The Sandbox" -- as was most commonly understood initially -- was really just a series of algo elements that in tandem, conspired to keep new sites (in most but not all cases) from ranking well immediately after launch. The presumed reason for "the sandbox" is simply that the spammers got too good at launching sites rapidly and cleaning up in the SERP's. So G implemented certain hurdles or barriers to entry, to make it harder for new sites to rank. The criteria have changed over time, but the bottom line now is that many if not most sites still need time before they can rank well for important terms, and that time varies from site to site, depending upon: The contents and structure of the site itself, the category the site is in (I think), and links into the site.

But there are lots of algo elements at play, and the age of a site (or age of backlinks, or both) is just one measure of a site's quality and "rankability." This is why it is good that people think in terms of "trust" and not just age. There is a lot more to trust than age.

If one thinks about why it takes newer sites longer to rank now, it's not a big leap to understand that there are lots of things a site might do to improve of fall off in the rankings: Positive things, or potentially negative things. We try to understand the importance of individual measures within the algo, but a site's ranking is the combined scoring of all the positive and negative traits is displays, more or less.

Making it even more complicated, there are certain hurdles that sometimes must be cleared; so, sometimes a small change in one area of a site's profile (especially if a site is not very strong) can have a large effect, while in other cases (say, a stronger site) a comparable change might have very little effect.

Most of the time, when a site or parts of a site tank, the issue is strictly algo based, and if fixed, resolves itself. Some issues seem to involve a time penalty which make them seem more "sandbox like." But then again, it is ALL RELATIVE, because if certain other aspects of a site profile change (for example, the appearance of a lot of new high quality inbound), then the time-based penalties seem to evaporate.

Basically, G is looking for "signals of quality" and "absense of negatives (and potential negatives)." Sometimes, a site introduces changes that represent a high degree of probability that the change is spam related. Perhaps a site only barely clears the hurdles needed to prove itself but then does something that call it into question again. That may take time to overcome.

Perhaps a more worthy site (by virtue of age, links, whatever) does something that calls it into question again, but because of the intensity of proof the site has previously displayed, a "red flag" is triggered that required attention of some sort, and the negative consequences, if the site is clear, might be very short lived.

Granted, the unfortunate part of all that has been happening with G for the past 18 months or so is that a seeminly growing percent of quality sites, especially quality niche sites, have trouble from time to time G's with algos: The collateral damage that has been referenced in this thread. G has come to rely too heavily on age and links, IMHO.

But, that's life. They are a business; they can do what they want within legal guidelines. If they go too far they already know there might be a price to be paid. So far, I see very few signs that they are in much trouble. ;-)

With G's current algo, it pays to be a bit cautious and conservative, unless you're a spammer who really knows the in's and out's and can tolerate a high site mortality rate.

My unsolicited advice: Forget about "the sandbox." Think in terms of quality and trust; positives and negatives. Scan the landscape, learn your category, grow your site organically. And when faced with decisions that involve large scale changes that might signal spam -- or attempts to overly optimze the site -- excercise caution. Take it in steps. Test if you can. It makes all the difference, and site owners tend to learn things that way too. ;-)

</rant>

reseller

3:54 pm on Sep 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Great post, caveman! Thanks a bunch.

That have inspired me to think, it could be another "unintended" effect of "algos-mix" that is sandboxing some sites that add large number of files suddenly.

Which also reminds me of a great thread last year.

Matt Cutts on the Google Sandbox - Secrets of the sandbox revealed at Pubcon? [webmasterworld.com]

And as you might have noticed at that thread, Matt hasn't revealed all the secrets of the sandbox yet. Maybe he will do it after recovering of that nasty "BRON-FRIGGIN-CHITIS" [mattcutts.com] . Lets hope so ;-)

And of course we wish Matt a speedy recovery.

hvacdirect

4:12 pm on Sep 12, 2006 (gmt 0)

10+ Year Member



or attempts to overly optimze the site

I've seen this phrase used many times, what exactly does it mean?

Should I deliberately have some broken code, stuff some keywords? Or does it have to do with off-page factors like unnatural link building?

decaff

8:46 pm on Sep 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



some sites that add large number of files suddenly...

I think this will be sector and site/technology style driven...if in one sector a very clear set of development markers are present and then one site shows an unusual spike in activity (where 100s of pages were added overtime and then suddenly 1000s or even 100s of 1000s of pages show up)...then something is wrong ... and this could trigger a look or filter...

Should I deliberately have some broken code, stuff some keywords? Or does it have to do with off-page factors like unnatural link building?

If you are in the hvac sector .. then you are in a very well mapped out sector from Google's perspective for many of the very SEO variables that people clamour over...

Breaking code in your site...stuffing keywords..etc...to disrupt some filters is not the way to go...you should continually refocus your efforts on how your serve your customers through their interaction with your site...this will bring in much better returns ...

Regarding "unnatural link building"...this will be well established in the hvac sector as a quickly identified anomaly if a site suddenly shows a spike in inbounds...

Of course...only the engines (Google) really know the exact factors that set up filter trips and all...but one can certainly hope along the way...

This 106 message thread spans 4 pages: 106