Forum Moderators: goodroi
A European Union committee is questioning how Google Inc. manages and stores the personal data it collects from consumers who use the company's popular search engine.The inquiry, confirmed in a letter sent to Google, concerns the massive amount of information the powerful Mountain View business gathers about its users, and what could be done with it.
Google: EU Wants Answers on Google Data Storage [sfgate.com]
Google handled nearly 3.8 million queries in April in the United States, or about 55 percent of all search traffic, according to Nielsen/NetRatings.
That is 1.5 query per second. You don't need a dozen datacenters for that :) Journalist should first investigate and analyze, and only after that write an article.
[nielsen-netratings.com...] (the report)
EU's concern stems from Google's decision in March to keep that data, but to "anonymize" it -- for instance, deleting eight digits from the computer's network address -- after 18 to 24 months.
The obsessive privacy concerns regarding Google are, IMO, ignorant and potentially damaging to everyone that operates a website.
There are a lot of very good reasons to store usage data. Some sites could not operate without that data (in unadulterated form). No site could operate as effectively without it.
Google is an easy target because they're big, but the truth is that they do more to protect privacy than most, if not all, other major sites.
I don't know of any other site that has a policy of anonymizing user data after X amount of time.
Everyone keeps web logs, or visitor stats, or usage data, whatever you want to call it. Some sites do more with it, some sites delete it more frequently than others, but the data is the same.
Should they not be allowed to do that?
To me that just doesn't make sense... And lets be clear, if a major website gets legislated or sued into changing the way they store data then that effects everyone because we're all doing exactly the same thing. Even those of us who are unaware of it.
The difference is that G has that info categorized and can map your entire life
That's definitely true. But it's also a separate issue.
The issue here, unless I misread the article, is that Google has chosen to store usage data and then anonymize it in the future.
Again, all websites store usage data. Aside from possibly one that has specifically told their web server and firewall not to, a mistake as the server in question would then have little defense against DOS and brute force attacks.
Beyond that any website that needs to fight fraud or abuse absolutely has to store usage data (including IP addresses) they really have no choice.
I'm not saying that there isn't a real possibility that Google will at some point become irresponsible about the way they use data. At that point governments would need to step in, but the answer should not have anything to do with what they are allowed to store. There is already little enough data available to websites about their visitors.
Google knows all the websites you reached through Google Search. It knows what you did on the website if the websites uses Googles Analitycs. Google knows who you are if you use one of it's services like Google Checkout. And Google even has access to your emails if you are a Gmail user.
And they want to store all that information for an undefined period of time.
In my opinion this is simply too much information for one company to handle without any supervision.
And when I think of it now that is a lot more information than I want any company to have about me.
The thing is, aside from search, all of this data is collected by services where users sign up (or download an application) and give Google the data of their own free will. Most of the services could not function without the data collection, and storage thereof.
No one is in any way required to use these services.
The same thing applies to other major web service providers such as Yahoo and MSN which both have a similar array of services/web properties. Both also have more traffic than Google, at least according to all the metrics I've seen.
Heck, Microsoft makes Google look like Uncle Joe's Sausages on a Stick.com by comparison. They've got the Windows automated updates service running on PCs beyond count sending them IP addresses AND a unique ID associated with the OS installation every single day. It would be a simple thing for them to cross reference this with data collected from their various web services. Talk about privacy issues.
Any large Internet company has enough data to make people uncomfortable.
The point, as I see it, is that they need this data to function and it's perfectly reasonable for them to collect and keep it. Users are always free to opt out.
I agree that limits as to what they can ultimately do with this data should be considered. It does not make sense, however, to talk about limiting their ability to collect and store the data.
You cannot have a catch all rule separating data storage rules from the power and reach of the data collection process.
The question is...how can you quanitfy the ability of each organization to collect data? Tough one. This may be a whole new science one day.
[edited by: SlyOldDog at 9:10 pm (utc) on May 29, 2007]
Basically we're talking about three kinds of data here:
1) User details from a signup or application form. So then would a large web property be compelled to use signup forms that ask for less data?
2) Access logs. Basically IP Address, Date/Time, User Agent and Page Visited. That's not much data, which parts of it would a company with a large reach be disallowed from collecting/storing? Keeping in mind that the most personal of these (IP address) is required for fraud and abuse control.
3) Historical usage data. This is probably the one that I can see it making the most sense to limit. The problem though is that a lot of businesses rely on that kind of data. Amazon.com, for instance, owes a lot of it's success to innovative use of historical shopping data.
Search engines don't really need historical data to be associated with individual users in order to operate effectively (at least not long term) so it's fair to ask that they don't store the data in that way.
But then isn't anonymizing 100% of all data after X months reasonable?
No one is in any way required to use these services.
This is an argument that does not count. Just because you offer a service does not mean you can do what you want and tell people just to beat it if they do not like it.
Imagine phone companies would start to tape and analyze your phone calls (Bad enough that governments are doing this) to analyze your behaviour and send your data to telemarketers. Hey if you do not like it, don't use your phone anymore.
But that is not an option. Especially when services have become indispensable (like search or email has) - and companies have reached a certain market share (like google has) it is a normal thing to regulate what they may do and what not.
Besides many countries already have rules about what user data you may collect and how long you may store user data. For example in Germany you are not even allowed to offer a newsletter service on your webpage and make information like name or address a requirement for subscription. You have to offer the possibility to subscribe anonymously.
Just because you offer a service does not mean you can do what you want and tell people just to beat it if they do not like it
I may have missed something but I'm pretty sure that none of the major web service providers are ignoring widespread user concerns and telling people to "beat it" :-)
In fact ironically the EU's issues are in response to Google taking a step to protect privacy completely on their own initiative in response to concerns.
Imagine phone companies...
I understand what you're getting at here but it's not reasonable to compare web based email and search to a public utility.
...a certain market share (like google has)...
Google's large market share is pretty much restricted to search. Other companies dwarf them in other areas (with the possible exception of YouTube).
So going back to public utilities... Search is not a communication channel, there are no conversations being recorded. The data being collected and stored is part of the necessary operation of the service. Phone companies log every call you make.
For example in Germany...
Wow, I hadn't heard that. Entertaining, thanks.
Look up "EU Data Retention Directive" and "NSA data snooping" if you aren't familiar with the issues.