Forum Moderators: goodroi
In February, the National Science Foundation announced the Cluster Exploratory, a program that funds research designed to run on a large-scale distributed computing platform developed by Google and IBM in conjunction with six pilot universities.The cluster will consist of 1,600 processors, several terabytes of memory, and hundreds of terabytes of storage, along with the software, including Google File System, IBM's Tivoli, and an open source version of Google's MapReduce. Early CluE projects will include simulations of the brain and the nervous system and other biological research that lies somewhere between wetware and software.
[wired.com...]
The entire article is an excellent read. The idea is that with enough data, we no longer need to create and test theories. At the O'Reilly Emerging Technology Conference this past March, Google's research director Peter Norvig observed: "All models are wrong, and increasingly you can succeed without them."
The gene tracking likewise, would be impossible without established models of gene behaviour.
On the subject of using empirical data correlation in place of theoretical equations, that's not science, that's engineering and it's been going on for a very very long time.
An engineer, given the task of determining the thickness of a new material as part of a car will test the material at various thicknesses, draw a curve, read off the minimum predicted thickness and then add a nice safety margin. That's all that this article is talking about - and it certainly is not science.
Google's preference toward models for the sake of models, the inbound link formula being representative of it, is all to prevalent in the modern world where we are faced with such an influx of data which nobody knows how to analyze sensibly. Google does a hash job of it, and nobody seems to mind - but turning the web into a popularity contest is no basis on which science can be prosecuted.
For instance, a genius biologist (a subscriber to the avant garde biology Wired tells us of?) published the cure to cancer on his personal website last year. In fact the page is still online. But since he tragically got hit by a bus an hour after uploading the paper, he never got around to telling anyone else, nor all that annoying business of search engine optimization. Google bot traversed the page, but lacking any number of external inbound links, ignored it...
In science, consensus is utterly meaningless. All that matters are the facts as can be proven. In terms of data presentation, peer review is the best we have - Google's page rank should be entirely discarded.
In other words, this idea of patterns being the defining be all and end all of science, and models for the sake of models, is very dangerous.
The future ultimately has to be about much more intelligent computing, and pulling apart patterns - smashing through illusions - to see the underlying realities.
Physics Nobel Prize winner Richard Feynman warned of this in his Caltech commencement speech about Cargo Cult Science [lhup.edu]. What is cargo cult science? I'll let the late great Professor Feynman explain it himself.
In the South Seas there is a cargo cult of people. During the war they saw
airplanes land with lots of good materials, and they want the same
thing to happen now. So they've arranged to imitate things like
runways, to put fires along the sides of the runways, to make a
wooden hut for a man to sit in, with two wooden pieces on his head
like headphones and bars of bamboo sticking out like antennas--he's
the controller--and they wait for the airplanes to land. They're
doing everything right. The form is perfect. It looks exactly the
way it looked before. But it doesn't work. No airplanes land. So
I call these things cargo cult science, because they follow all the
apparent precepts and forms of scientific investigation, but
they're missing something essential, because the planes don't land.
The analogy may not be perfect with respect to what's discussed in the article, but the idea is the same. We're bound to run into trouble if we don't have a good model that explains the data that we're looking at.
Also, don't forget that truly good models do more than explain the data that we have. They predict new things and explain other current data that no one previously understood. This happened often in the development of particle physics, where a new model based on symmetry predicted particles that had not yet been seen and the particles were later discovered. The added benefit was that we came to understand that symmetry is a fundamental basis for how things work in the universe. If it wasn't for hypotheses and models, if we had just looked at the data, we would have missed out on this.
That brings me to my next point, that there is a group of people who have had more data than they know what to do with for quite a while now - the particle physicists! Over six petabytes of particle physics data are stored at Fermilab [isgtw.org]. However, when physicists want to study the data they don't just grab a bunch of it and run a statistical analysis without any reason behind the analysis (as proposed in the article). In fact, physicists are very careful to develop an advanced analysis first (it is tested on randomly generated 'Monte Carlo' data). This is called a blind analysis (pdf) [slac.stanford.edu] and is the particle physics equivalent of a double-blind randomized clinical trial in medical research. Only once they know that the analysis is correctly targeted to test a certain hypothesis do they actually run it on real data. Once the analysis is run on real data it is final. There is no going back to tweak it to try to get a 'better' answer. If the answer is surprising or seems incorrect it should not be thrown out because it might just be a new discovery! This is the only way to truly do what one can call science.
Now that isn't to say that one cannot learn things by looking at a large corpus of data and trying to find patterns. I think statisticians have been doing this for quite a while. No doubt some interesting things can be learned, such finding new species, which was mentioned in the article. But that's not the scientific method and won't replace science. For instance, I didn't see anything mentioned in the article about the scientist discovering a new model for the evolution of species and proving that the model is correct. In fact, though, one might even see some patterns and say "Hey, look at that" and form a new hypotheses based on it. Then to be a real scientific theory that model would have to be tested against other data to see if it still held true. To be a great theory it would need to predict new phenomena which have not yet been observed. A truly great theory will even teach us something fundamental about the nature of the universe, or of the body, or human nature, etc.
In summary, the new tools for analyzing large data sets in a statistical way will no doubt be very useful. But they will not replace the scientific method.
As I see it, we are moving to penetrate into the "sub-causal" world of mental models, in parallel with the physical penetration of the sub-atomic "reality". If you can make valid predictions with an approach, then where's the problem in not having a model?
The applications of complexity theory to biology may well depend on this kind of approach. For example, contemporary pace makers, for the heart and other organs incuding the brain, are still quite brutish. They deliver a mega whack to the organ, smacking it back in line to get the desired behaviior. But complexity theory describes the phenomenon of a strange attractor. With the gentlest of touches at exactly the right moment, the heart or brain may be nudged into a completely different (and more healthful) state.
We know the two states exist and that transitions between them can be profound in their effect. It appears that we don't need to address the "cause" of a heart attack or a seizure in order to provide relief, if we have a large enough data model to work from.
There has been some remarkable work from Stephen Wolfram in addressing macro-scale phenomena with micro-scale data approaches. When I read his book "A New Kind Of Science", it opened me to wildly different understanding of the phenomenal world.
Traditional science will long be with us, most definitely. Newton serves just fine in much of the practical world, and Einstein needs to get hauled out in other situations. Just so with cause-and-effect theory models compared to no-model data analysis.
This all gets fuzzy around the edges - but so does matter/energy/space/time when you observe it in depth. And none of it accounts for consciousness itself very well.
I doubt that Google's systems will be able to do much better at that, either. AI and machine learning are not the same as consciousness itelf, the very ground where mind stuff appears. Science is worked through the manipulation of mind stuff, but consciousness itself is prior, or senior to it all. The tail cannot wag the dog! (Am I at all coherent at this point?)
Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity.
The models we were taught in school about "dominant" and "recessive" genes steering a strictly Mendelian process have turned out to be an even greater simplification of reality than Newton's laws...
Speaking of the human genome, a disclaimer here that, as someone who thinks science is helped by people sharing their research results, I have no love lost on Venter. I'll choose Francis Collins to be in charge of my genome, thank you:
Venter can tell you almost nothing about the species he found. He doesn't know what they look like, how they live, or much of anything else about their morphology...... By analyzing [data] with Google-quality computing resources, though, Venter has advanced biology more than anyone else of his generation.
Of course, you can ask anyone who uses AdWords or who tries to produce good SEO about the definition of "Google quality." Is that really what you want your doctor to base his treatment decisions on?
___________
Once a week or so, I change the quote I have hanging on my office door. One of my favorites is from Isaac Asimov (who actually wrote a lot more scientific books than he did science fiction):
My personal belief is that, no matter how much data we manage to aggregate, it will still take a human mind to look at it and say, 'That's funny...' and then to go on and make sense of it.
patterns after all are but a subdivision of math.
True, but massive computing power is also setting the traditional world of math on its head.
For instance, the "four color theorem" has been proved only via automated computer work on a massive scale, and that proof has also be verified only through similar massive computation. It seems this proof cannot be made or checked "by hand", and many mathematicians are more than a bit disturbed by the implications of all that.