Forum Moderators: Robert Charlton & goodroi
20 to 25% of the queries we see today, we have never seen before
That is pretty mind blowing when you think about the challenge it implies to all search engines. And here we sit, all focused on those high volume searches!
The article has some other interesting points, but this one just jumped out. Thanks to Barry Schwartz at SearchEngineLand for [url=http://searchengineland.com/070622-085337.php]tipping us off to the ReadWriteWeb article. [readwriteweb.com]
And just imagine trying to coordinate all those PhDs and other engineers! That first page of ten search results looks like such a simple thing, but what must go on behind the scenes to create it. Sheesh!
This talk by Udi Manber reminds me of another paper from Googler Anna Lynn Patterson, Why Writing Your Own Search Engine is Hard [acmqueue.com].
20 to 25% of the queries we see today, we have never seen before
That just seems like an astonishingly high percentage.
I'm trying to wrap my head around what a number like that means when it comes to writing content. Of course some of those new queries will be related to current events in the news. But that probably leaves more than a few that relate to existing content on the web.
How to capture those?
I don't know that there's any specific approach to target those search terms, except for the same things you would do to attract long tail terms and watch your logs for remarkably new hits.
Since the SERPs are weighed based on word order, number of words, if you add a non-competitive word in front, in the middle or in the back of a competitive phrase, you'll get a control group which will help you pinpoint the problems.
Out of 10 searches I make for SEO, 9 are like this.
...
On the other hand, when I just do a simple web search, I often get results so bad that I need to make it so obscure, so long, so detailed that it don't direct me to the first spammy forum. I mostly make technical related searches. While three years back copy pasting an error message I've never seen before or entering just the keywords, phrases would get me relevant results and solutions, nowadays it's increasingly harder to dig deep into on-topic discussions and technical advice. Because trusted comes first ( too generic ), spammy comes second ( no info ), official comes third ( sometimes helps, sometimes doesn't )...
So yes, I can imagine that at LEAST 20-25% of every search is brand new.
The work of half a million "SEOs" in an increasingly over-populated and filtered market, and Google - with every update - inching closer to become a generic directory instead of the expert SE it was before. Let them steer away from that path as soon as they find the balance to do so.
Computer science isn't really about programming, but is more a branch of mathematics in which one calculates the time and memory required to perform calculations involving very large data sets.
Software engineering, for the most part, is quite unrelated to that; it has been very rare in my twenty-year career that I've had to employ algorithm analysis in my work.
But Google, unlike the vast majority of software companies, has to deal with enormous data sets for just about everything it does.
Typical interview questions involve figuring out how to do things when the data won't all fit in memory, or how to distribute a problem across thousands of computers in a way that the load is shared equally by each.
To understand Google's computational problem: for any search engine to do its work, it has to store the entire World Wide Web on its internal hard drives, and analyze the lot of it for the answer to each query it gets.