Forum Moderators: goodroi
Our analysis identifies several parameters from the drive's self monitoring facility (SMART) that correlate highly with failures. Despite this high correlation, we conclude that models based on SMART parameters alone are unlikely to be useful for predicting individual drive failures. Surprisingly, we found that temperature and activity levels were much less correlated with drive failures than previously reported.
I confess that I was surprised. Though I knew that G had already stated that it uses "cheap" hardware, I had still assumed that that would mean SCSI drives. Not at all:
More than one hundred thousand disk drives were used for all the results presented here. The disks are a combination of serial and parallel ATA consumer-grade hard disk drives, ranging in speed from 5400 to 7200 rpm, and in size from 80 to 400 GB. All units in this study were put into production in or after 2001. The population contains several models from many of the largest disk drive manufacturers and from at least nine different models ... They were deployed in rack-mounted servers
Failure rates are known to be highly correlated with drive models, manufacturers and vintages. Our results do not contradict this fact ... in this paper, we do not show a breakdown of drives per manufacturer, model, or vintage due to the proprietary nature of these data.
> scsi
no - According to numerous and varied past reports, G uses the barest of barebones consumer grade commodity motherboards and components. You can buy a better pc at walmart than Google uses.
Failure rates are known to be highly correlated with drive models, manufacturers and vintages
Amen to that. Was in the industry off and on for years. Only predictor of hard dive failure I recall was multiple engineering changes manifested in little wires tack soldered to the circuit board. A hot drive in and of itself isn't so much a problem as long as the case and the room are well ventillated. A hot drive in a badly ventillated case will probably kill the CPU before the drive.
According to reports, G uses the barest of barebones consumer grade boxes
Same here. I use disks to store the data for the long haul but for selects from the db the results are almost always in memory. The only time they are not is when it is a query that has never been run before. I'm guessing Google does the same because when you do a search that Google would not have cached in memory you can get the seek time to 0.5 seconds or so and then immediately do the search again and it will be 0.06 seconds.
JAG
SCSI hasn't offered any significant performance advantages (over ATA) for a very long time - at least ten years. Manufacturers have traditionally sold their newest/largest drives as SCSI only (at a price premium) simply to milk the gullible.
Gomvents, unless they sponsor the event, I don't think either of the manufacturers you mention is ever likely to win an award for reliability.
Kaled.
SCSI hasn't offered any significant performance advantages (over ATA) for a very long time
The seek time and max data transfer rate in a U320 SCSI drive will out do almost all ATA and SATA drives you can buy.
You've just associated a mechanical characteristic (seek time) with an electronic interface!
Consider this... there is no fundamental reason why SCSI interfaces should be more expensive than ATA, therefore, if SCSI were faster, motherboard manufactures would have ditched ATA years ago. Also, most super-fast PCs submitted for magazine review/testing use ATA disks.
There has been much nonsense spoken about IDE/ATA/SCSI interfaces over the years. I remember a review of a SCSI CDROM drive that had a fast access time due its SCSI interface (nonsense) and I remember many, many people who said that you shouldn't place a CDROM drive and a hard disk on the same cable - this was because the lambs just repeated what the sheep said and the sheep remembered something about master disks slowing down if a slow slave drive was fitted, however, disk synchronization ceased in the 1980s!
However, if you still don't believe me consider this, maximum transfer (burst) speed is only achieved when retrieving data from the cache, otherwise the maximum speed is determined by mechanical components (rotational velocity, data density and number of heads/platters). Arguably, SCSI did have an advantage years ago in multitasking operating systems due to DMA (direct memory access - CPU is not required to copy the data across) but ATA has supported DMA for at least ten years.
AlexK,
In your particular example, copying data from one disk to another, there might be a benefit but only if the SCSI controller itself has a large memory buffer and the operation is optimised so that motherboard memory/CPU is hardly used. If you are using specialist software, this is a possibility but I do not believe a bog standard Windows installation would not do this (Linux might possibly).
Kaled.
What is the sustained transfer rate of your SCSI drive? SCSI may be faster at transferring data out of into the drive's onboard cache, so, when writing data blocks up to the size of that cache, theoretically, there should be a performance gain (provided you don't want to access the disk again until the data has been written to the disk) but, in practice, this will not be noticeable. If you want to test this, in Windows, try disabling write-behind caching and run a few tests (not quite the same thing but near enough).
Clearly, Google don't feel that SCSI represents value for money. As for performance figures, do you also believe the speeds claimed by printer manufacturers?
Comparing SCSI and ATA is a bit like comparing Intel and AMD chips CPUs - Intel chips rarely achieve the figures claimed of them, but there are still people out there that believe they are faster.
Question
Why do manufacturers release their newest and fastest drives with SCSI interfaces (I assume they still do but I haven't checked)?
1) Because people believe SCSI means faster and will pay more.
2) To perpetuate the myth that SCSI really is faster (in order to achieve 1)).
Question
On the same motherboard, have any scsi-philes actually conducted realistic benchmark tests with identical ATA and SCSI drives? My guess is that Google have.
Kaled.
Today I own a Raptor and I'm getting a second one too. But I'd bet if I threw in that old (8 years old now...) SCSI it would still beat the Raptor.
Anyway, I think it's unfair to say that Google uses cheap hardware. I'd rather we call it "inexpensive." For the cost of $800 on a SCSI drive, you could RAID a lot of inexpensive SATA drives. And at the scale Google has, their's probably a lot of ways to optimize the use of SATA drives.
Google may use "consumer grade" components, but I think describing them as "consumer grade boxes" is misleading. I doubt that they have thousands of desktop PCs sitting on benches, I rather suspect they have hundreds of cabinets with dozens of boards/drives in each (per data center).
And this was 1999. Chances are their philosophies have evolved some (they now have many datacenters of their own), but this is their roots.
(a tangent thought here is how much power consumption G was responsible for, and how good G was at moving from one host to another as hosts would go bankrupt ... open question as to whether G had anything to do with many hosts going bankrupt)
The SCSI drives spin at 15k compared to 7.2k for almost all IDE drives
If the answer to this question is no, then my point is proven (that manufacturers use SCSI interfaces on their newest/fastest/largest disks to achieve premium prices rather because the ATA interface isn't quick enough).
Just to be clear, I don't doubt the experiences of people that say their SCSI drives are faster than their ATA drives, but that extra speed is more likely to be due to mechanical design than the SCSI interface.
Kaled.
Thing is, my experience has confirmed the hype/reputation. Work on my (Linux, SCSI) server is blisteringly fast on disk-bound operations, whilst the same operations on my (Windows, ATA) desktop stumble along. And of course I'm not comparing like-for-like:
Colo Server:
It seems to me that for someone in my position, with a single machine in a remote Colo, I would go for SCSI because of it's reputation for speed and reliability (which has, so far, been confirmed).
BTW: does anyone know whether SCSI drives report scan errors, sector reallocations, offline reallocations or sector Probational Counts and--even more important--how to pick up the info under Linux?
Comment: with hindsight, and in G's position, I would have gone for the cheapest discs also. They have simply taken the idea of RAID (remember: "Redundant Array of Inexpensive Discs") to it's ultimate and run with it. Good on 'em.
Acme Drives Inc. produces two almost identical drives. One runs at 7,200 RPM and has 4 independent heads and one runs at 14,400 RPM and has 2 independent heads. Both achieve precisely the same sustained data transfer rates but the 7K2 drive uses less power and runs cooler. Both drives cost $100 but which drive will fly off the shelves and which will gather dust?
Kaled.
SCSI hasn't offered any significant performance advantages (over ATA) for a very long time
I was then accused of being "nuts". Is anyone saying that SCSI hard disks can sustain transfer rates beyond the capacity of ATA? If not, then I think my original comment is valid.
Having said that, I could actually be wrong, I haven't bothered to check the specs of the fastest drives (or run tests) for several years. Nevertheless, no one has provided figures that contradict what I've said.
Kaled.
the standard may be as fast but the drives aren't.
SCSI drives are faster, more reliable, etc. not because of the interface-- you're right. but if you want the fastest, enterprise-class drives, S/ATA is simply not going to be an option for heavy IO random reads & writes; SCSI drives blow them away.
again, you don't pay the premium for the interface. you pay the premium because the enterprise level drives just happen to only come in SCSI formats.
I've never seen figures that support your claim that SCSI drives are more reliable (nor have I ever read such a claim in any reputable publication).
I don't believe that many drives perform mostly random IO (unless horribly fragmented) but, again, there would be no advantage to a SCSI interface.
the enterprise level drives just happen to only come in SCSI formats.
I've explained why that is (several posts ago) but if you don't believe me, that's fine.
Kaled.