Friday, August 3, 2007

HARD DRIVE Guide

Hard Disk Drive (HDD) - Guide
Describing the Hard Disk
Hard Disks are magnetic media - electrons are arranged on strong circular metal plates called platters.As technology has grown more and more complex, the Arial density level increases, allowing for larger amounts of storage. There is an end to conventional storage in sight though - the Super Paramagnetic Limit, once this limit is passed, electrons begin to disperse and interact with each other in ways that corrupts all data on the disk.
Forum Factors
A single platter is not enough for the storage needs of today, and it never really was. The super-paramagnetic limit restricts the amount of data each platter holds, so hard disk companies need to use common tricks to get around it: using more than one platter, or increasing the diameter of a platter. Because of these tricks, there are two distinctly different hard drive form factors:
3.5" - The same shape as a 3.5" floppy drive.
5.25"- The same shape as a CD-ROM Drive.
For Notebooks, the sizes need to be even smaller. The most common size is 2.5 inches, but some are 1 inch in size! IBM's new matchbox drive is literally the same size as a match box, with a single platter supporting in excess of 300 megabytes.
The problem with increasing platter size are threefold. Not only are there PC Case size limitations, but larger platters need more powerful motors, and are more susceptible to vibration damage. With the effects of gravity, uniform flatness is also a serious issue.
Capacity
The largest consumer level drive today has 19.2 Gigabytes of space. These totals can be deceptive though - the actual formatted and usable storage area is often less than what is advertised on the boxes of today's hard disks. It's not that the manufactures are outright lying, instead they are taking advantage of the fact that there's no standard set for how to describe a drives storage capacity. Here's and explanation snipped from a hard drive review at Zdnet
This results from a definitional difference among the terms kilobyte (K), megabyte (MB), and gigabyte (GB). In short, here we use the base-two definition favored by most of the computer industry and used within Windows itself, whereas hard drive vendors favor the base-10 definitions. With the base-two definition, a kilobyte equals 1,024 (210) bytes; a megabyte totals 1,048,576 (220) bytes, or 1,024 kilobytes; and a gigabyte equals 1,073,741,824 (230) bytes, or 1,024 megabytes. With the base-10 definition used by storage companies, a kilobyte equals 1,000 bytes, a megabyte equals 1,000,000 bytes, and a gigabyte equals 1,000,000,000 bytes.
Put another way, to a hard drive manufacturer, a drive that holds 6,400,000 bytes of data holds 6.4GB; to software that uses the base-two definition--including CHKDSK, and portions of Windows 95 and Windows 98--the same drive holds 6GB of data, or 6,104MB.
So, be prepared when you format that new 6.4GB drive and find only 6GB of usable storage space. Isn't marketing wonderful?
Rotations per Minute (RPM)
The platters in a hard disk are connected in the middle by a spindle and motor. The motor spins the platters at a specific rate, known as RPM. Higher speeds allow data to be read/written much faster, along with reducing seek time. However, it's also a proportional increase in heat. The following list shows the typical RPM of today's hard disks:
3600 RPM (Pre-IDE)
5200 RPM (IDE)
5400 RPM (IDE/SCSI)
7200 RPM (IDE/SCSI)
10000 RPM (SCSI)
At higher speeds there is more stress and heat on the platters and electronics, 10000 RPM seems to be the current maximum. With the advent of Ultra-ATA 66, manufactures are working hard to release 10000 RPM IDE drives. They promise even lower access times and higher transfer rates, but are still not nearly as effective as their SCSI counterparts in terms of CPU Utilization and thoroughput."
The Hard Disk Cache
A small amount of memory incorporated into the hard disk electronics to accelerate read/write times. When the computer requests data from the hard disk if that data is in the cache, there is a performance boost directly related to the speed of the cache.
A visual representation: Imagine you are assembling something and have a box of different size screws, you need eight identical screws for this step in the project. When you look into the box (hard disk) for the first screw you happen to see 5 of the eight that you need so you grab the five that you see and put them onto the table (Cache) now when you need the next screw you won't have to dig into the box, instead you grab one from the table (Cache). Much faster than digging into the box each time.
The hard disk cache controller works in a similar manner except instead of seeing the data needed the cache controller guesses and reads a small amount of data just before and just after the data it was requested. When the program requests more data the hard drive first looks into the cache to see if the data is there.
The Hard Disk Cache is also used as a queue - if there is more than one operation to carry out, the instructions can be left in the Cache.
Technology and Specifications
Technical Settings
Although you may never need to know the specific settings for your hard disk, your BIOS does. The settings are essentially map guides - detailing how much room (in bytes), how many tracks, sectors, heads and cylinders are on your hard disk.
Tracks: Hard Disk platters arrange data into concentric circles, rather than one large spiral, as some other mediums use. Each circle is called a Track.
Sectors: The smallest addressable unit on a Track. Sectors are normally 512 bytes in size, and there can be hundreds of sectors per track, depending on location.
Heads: The devices used to write and read data on each platter.
Cylinders: The number of tracks on a platter. This one is a bit hard to explain: Platters on a hard disk are stacked up, and so are the heads. Because of this, all of the heads move simultaneously, so they can read separate tracks, but technically at the same physical location (only on at a different platter). If you combine the concentric circles on each platter being accessed by the drive heads, you get a Cylinder.
The Cylinder number keeps track of how many there are from inside to out, although it means the same thing as the amount of tracks on one platter.
Interrupt 13h Problem
Early on in the evolution of PCs, the standard for hard disks was to for the BIOS to use Interrupt 13h for setting hard disk information. However, the IDE interface also needs to set the information, but lacks the same number of bits for each part! Because of this, each number is reduced to the lowest of the two, as shown in the diagram below.
BIOS HARD DISK RESULT
Maximum Sectors/Track 63 225 63
Number of Heads 255 16 16
Number of Cylinders 1024 65536 1024
Maximum Capacity 8.4GB 136.9GB 528MB
Interrupt 13h Workaround
Breaks through the 528 MB barrier through the use of a Logical Block Address (LBA). By modifying the BIOS to translate the information that is received into a 28-bit LBA, and instructing the BIOS to load the LBA driver from the hard disk, the Hard Disk is given enough room (bitspace) to load all of its information regardless of BIOS limits.
Maximum Cylinder Problem
PC's ran into problems again with Maximum Cylinder Limit. Although most systems by then had employed translation to get past the first Int13h problem, the amount of bitspace allocated was not enough to get past 4096 cylinders, which was quickly being surpassed. This limited hard disk space to 1.97 or 2.1 Gigabytes. The problem was only solved through a new BIOS translation mode, or a new BIOS altogether.
The 8GB BIOS problem
The final and most pervasive limit to hard disks. The problem is no longer truncated numbers, but the actual total numbers the BIOS can recognize at all. The only way to get past this problem was through changes to the BIOS to enable Int13h extensions. It's problematic for some disk utilities, but with newer operating systems such as Windows 95 and Windows 98, the OS is already set up to recognize it.
Super Paramagnetic Limit
The density at which opposite magnetic charges begin to degrade each others signatures, resulting in data loss. The limit is at roughly 20 Gigabits per square inch, which is 4 times greater than today's popular 5 Gigabit per square inch limit.
Read/Write Heads
Essentially electronically controlled magnets. The heads are responsible for converting electrical signals into magnetic data streams on the hard disk and vice versa. Despite their importance, the R/W heads are also the most volatile part of the hard disk assembly.
Because each head is literally a microscopic distance from touching the platters, there is a danger of collision between the the two. This is called a Head Crash, and can have catastrophic effects on your disk, including data loss or worse - physical platter damage. Although today's disks use heads that are even closer to the platters, superior shock suppression and disk enclosure technologies keep problems to a minimum.
There have been several different technologies for Read/Write heads, and each of them has brought dramatic increases in storage size. The most recently used is IBM's Giant Magnetoresistive Head (GMR), which has provided the latest 14-19 gigabyte densities. Newer GMR models have the ability to handle 10 gigabits (not gigabytes) of data per square inch.
The latest and most powerful technology, however, comes from Seagate's subsidiary, Quinta. Using a new 'Optically Assisted Winchester Technology' (OAW), they have managed to squeeze an incredible 40 gigabits of data into a square inch, well beyond the super paramagnetic limit. This breakthrough is obtained through the combination of fiber-optics, MEMS mirrors and specific RE-TM platter media. Quinta estimates the ability to hold at least 100,000 sectors per track while lowering part costs and increasing interface speed.
File Allocation Table Systems - FAT
Every computer needs a system to keep track of files on the hard disk - otherwise there are just random sectors on the disk with no way to interpret them. The system used is called the File Allocation Table.
FAT16: The file system used for MS-DOS. 16-bit numbers are used to represent cluster numbers, which allows for partitions of up to 2 Gigabytes. It was efficient for its time, but cluster sizes are far too large for today's large hard disks. It is also restricted in cluster size.
FAT32 / VFAT: The preferred file system for Microsoft Windows 95 and Windows 98. It is an extension to FAT16, providing 32-bit numbers for clusters. Like FAT16, there are cluster size restrictions.
NTFS: The file system for Microsoft Windows NT. NTFS is considerably better than FAT32 and has no cluster size restrictions, although at a certain point the slack space consumes so much of the hard disk that alternate systems are needed. NTFS also provides permission controls and RAID support.
These are the mainstream PC file systems - other notable systems include HPFS - IBM's OS/2 File System, the Unix File System, and the 64-bit BeOS file system.
Slack Space - The Cluster Size Problem
With any file system, each file is allocated 'clusters' to be placed into. Since each Cluster is a locked size, not every one can be filled - If a system uses 16 KB clusters, but the file is only 2 KB, the remaining space - 14 KB, is automatically wasted and unusable for other files. This can result in hundreds of megabytes in lost space.
The only solution is to partition your hard disks or use the FAT32 file system. By creating multiple partitions you lower the potential amount of lost space - you still lose some, but since the cluster size on each drive is lower you can greatly reduce the loss.
FAT16 Cluster
Size Maximum
Partition Size
2 KB 128 MB
4 KB 256 MB
8 KB 512 MB
16 KB 1 GB
32 KB 2 GB
The limits of FAT16 are obvious. Partitions of 512 MB or less are the most attractive. However trying to manage 512MB partitions on multi-gigabyte drives is a real pain, not many people want to deal with 12 logical drives on a 6.5GB hard disk. On top of that with several programs installed the Windows9X folder itself can easily grow to over 500MB.
FAT32Cluster
Size Maximum
Partitoin Size
4 KB 8 GB
8 KB 16 GB
16 KB 32 GB
32 KB 64 GB
FAT32 is the solution for large disk drives. There is a small performance loss in using FAT32 because of the increased amount of clusters however, the benifits far outway the performance loss. Managing the one or two partions is much easier than trying to figure out which drive your information is on or asking yourself "Did I backup that seventh partion or not?".
Hard Disk Interfaces
Integrated Drive Electronics (IDE)
The defacto consumer standard for drive interfaces. It's beaten out by SCSI in almost every way, but it wins because of the price.
Today's IDE interface has two channels which allow for two devices each, whether they be Hard Disks, CD-ROM's, or other storage drives. Transfer speeds automatically drop to the speed and capabilities of the slowest drive on a channel for compatibility reasons.
The original form of IDE is self named. It only allows hard disks on the channel, while offering a measly 2-3 MB/s of average transfer rate. Most IDE boards have only one channel, allowing only two drives (CD-ROM drives of the time used a floppy drive like interface based off of a Sound Card).
EIDE
A substantia improvement of IDE in order to keep SCSI from the mainstream. It provides improvements to drive throughput, capacity, as well as integrating dual channels for up to 4 devices combined. Non-HD support was also added by the first AT Attachment Packet Interface Mode (ATAPI) which added support for devices like CD-ROM and tape drives.
The throughput problem was solved basically by moving the IDE interface from the ISA to PCI/VLB bus. It also adds support for Direct Memory Access (DMA) mode, where the hard disk can transfer to RAM directly without the CPU being involved. With the PCI bus EIDE allows throughput of 6.66 MB/s, 8.33 MB/s, 13 MB/s, and 16 MB/s.
Ultra DMA (AKA DMA-33, Ultra ATA-33, Fast ATA-2)
The current step in the evolution of IDE. Ultra DMA doubles the burst transfer rate to 33.3 MB/s, while integrating Cyclical Redundancy Check (CRC) support. In order for this mode to operate however, it requires the drive, BIOS, Motherboard Chipset and software drivers to support it. In DOS, it simply reverts to EIDE. There is also an 18 inch cable limit.
Ultra DMA-66 (Ultra ATA-66)
Is the next step in IDE evolution, invented by Quantum Corporation. The maximum theoretical transfer rate rises to 66.6 MB/s. Again, revised BIOS, chipsets, drivers, and DMA-66 hard drives are needed to support this new mode. It's future viability seems up in the air right now, because the performance gains do not seem to be as extreme as the theoretical throughput assumes.
Small Computer System Interface (SCSI)
SCSI (pronounced 'Scuzzy') is the do-everything high speed bus interface. It provides support for literally dozens of devices simultaneously along with high speed transfer rates, multithreading, parity checking, and bus mastering. For the cost of an expansion slot and SCSI hard disk, CPU utilization can be dramatically reduced, especially in Windows NT.
The key to allowing so many devices is termination - the host adapter (beginning of the chain) and last device (end of the chain) must be terminated in order to keep the connection intact. The general difficulty involved in properly terminating devices (as well as configuring) has kept SCSI as a workstation/server solution. For most consumers, IDE provides a much cheaper, easier to maintain solution.
Specifications
Level Speed Width Devices SE HVD LVD
SCSI-1 5 MB/s 8 Bits 8 6m 25m N/A
SCSI-2
(Fast SCSI) 10 MB/s 8 bits 8 6m 25m N/A
SCSI-3
(Ultra SCSI) 20 MB/s 8 bits 8/4 1.5/3m 25/0m N/A
SCSI-3
(Fast Wide SCSI) 20 MB/s 16 bits 16 6m 25m N/A
SCSI-3
(Wide Ultra SCSI) 40 MB/s 16 bits 16/8/4 0/1.5/3 25/0/0 N/A
SCSI-3
(Ultra 2 SCSI) 40/80 8/16 8/2 N/A 12/25 12/25
SCSI-3
(Wide Ultra 2) 80 MB/s 16 bits 16/2 N/A 12/25 12/25
SCSI-3
(Ultra 3) 160 MB/s 16 bits ?? N/A ?? ??
SCSI-1
The original SCSI specification. SCSI-1 is rarely used, if at all now, because of the low transfer rates, bus width, and terrible maximum cable length support. At the time however, it was enough to break through the powerhouse 'Enhanced Small Device Architecture' (ESDI) spec.
SCSI-2
The current 'bottom level' of the SCSI specification - its generally used for Scanners and CD-R drives. The new spec added support for Tag-Queuing, which allowed instruction use regardless of whether data was currently travelling through the bus. The instructions could also be prioritized for out of order execution.
Fast SCSI-2 allowed for a doubled transfer rate of 10 MB/s through a 50-pin connector, but was meant for differential SCSI in order to combat noise on the bus.
SCSI-3
The current standard is actually a family of different commands:
Fast/Wide SCSI: The Bus width, transfer rate (with differential support), and device support doubled over SCSI-2.
Ultra SCSI: With a doubled clock speed over SCSI-2 and backwards compatibility, it allowed for double the transfer rate over the stale 8bit wide bus. This was the basis for further Ultra SCSI improvements.
Ultra Wide SCSI: Simply put, this is the combination of Ultra SCSI with the 16-bit data path of Fast/Wide.
Ultra 2 SCSI: The most currently used high speed SCSI technology. It is the first to implement LVD signals for less noise and higher transfer rates - as high as 80 MB/s!
Ultra 3 SCSI: Just recently ratified by the SCSI Trade Association. It doubles the transfer rate yet again to an astounding 160 MB/s while adding advanced CRC support and easy hot-plugging technology (Installing devices without reboot).
According to Quantum the Ultra 3 SCSI interface is basically the same improvement thats involved in DDR SDRAM.
Transceivers
A Device which determines how the data will travel between adapter and drive. There are currently 3 levels: Single Ended (SE), High Voltage Differential (HVD), and recently Low Voltage Differential (LVD). The most popular is also the cheapest - Single Ended. SE signal transmission requires that the current for all devices comes from the same source, which reduces possible speed with "line noise".
HVD provides improved termination abilities, less "noise", and more cable length to work with, but at the cost of increased power usage and a lack of available part types (like CD-ROM). The power levels also required higher cost parts to cope with the temperatures involved, as most of the voltage is supplied through the cable, rather than from a separate power connector.
The LVD transceiver is the latest electrical signal based technology, using both the powerful differential technology and the low power consumption of SE. The Ultra-2 specification also allows a bus speed of 80 MB/s and the ability to support SE devices simultaneously (at the cost of speed). Most companies get around this by bundling multiple buses to handle SE and LVD separately.
Incidentally, there is a 4th standard - but it's not as clear cut as the other technologies. The Fiber Channel and Signaling Interface (FC-PH) uses fiber-optics to transfer signals (electrical signals are still supported with hybrid cables). FC-PH offers high bandwidth with almost no problems in signal colliding - heavy duty performance for the network crowd. The performance does come at substantial cost though, and isn't really suited for the consumer crowd.
SCAM Technology
Stands for SCSI Configured automatically. When SCAM compliant devices are attached, software can automatically allocate IDs for each device.
Redundant Array of Independent Disks (RAID)
A subset of SCSI/IDE technology that allows the combining of two or more hard disks in various fashions to provide redundancy, and additional speed.
Almost all of the levels work off the theory of 'striping', where blocks of data are written across drives. Typical hard disks must read/write data concurrently - one after another to the same disk. RAID avoids this by writing concurrently to separate disks - in a 4 disk array that would allow 4 blocks to be written/read at once!
There are several levels of RAID, each with their own number. Those numbers are merely for identification - RAID 4 is not necessarily better than RAID 1, or vice versa.
0 - Striped Disk Array without Fault Tolerance - Breaks up files into blocks that run across drives, the hard disks are combined to act like one. Because information is written and read alternately to each drive speed is increased. There is no fault-tolerance however, so if a drive were to fail, all data on the array would be lost.
1 - Mirroring and Duplexing The RAID controller essentially writes to two or more disks simultaneously. Each drive contains the same information at all times, which provides you with a backup in case a drive fails.
2 - Hamming Code ECC - Data is striped across an array of disks by bits rather than blocks. For each word (2 bytes) written, a connected array with an equal number of disks simultaneously writes a "Hamming Code ECC word". This provides absolute fault tolerance with on-the-fly correction, but at the cost of a large amount of drives.
3 - Parallel transfer with parity - The Data Blocks are striped across disks, except that a separate drive is used to hold checksums (parity information).
4 - Independent Data disks with shared Parity disk - Entire Data Blocks are striped across disks, except that a separate drive is used to hold checksums (parity information).
5 - Independent Data disks with distributed parity blocks - Much like RAID 0 - except that parity is added to protect the data blocks written to the drives.The parity is written across with the data blocks, unlike RAID 4.
6 - Independent Data disks with two independent distributed parity schemes - This is basically an extension of RAID 5. Data Blocks + Parity Blocks are written across the drives, but there is also a second set of parity blocks to cross check the disks for errors. A rarely used solution.
7 - Optimized Asynchrony - A proprietary RAID setup involving the use of a hard coded real time operating system.
10 - Striping + Mirroring - Two or more sets of paired RAID 1 arrays are combined using RAID 0 to provide a single array that is redundant against failure. This means that if even the entire set of data fails, there is a complete backup.
53 - Striped Array of Parallel Disks with Parity - RAID 0 is used to stripe data across RAID 3 arrays, which means that the Parity Drives are also striped.
IEEE 1394 - FireWire
The next 'consumer-level' bus that is bound for motherboard integration and replacement of IDE. IEEE 1394 is a serial bus that promises to offer transfer rates of up to 50 MB/s with guaranteed or asynchronous transfers. It also supports up to 16 devices per channel, hot-swapping and automatic termination/ID assignment.
IEEE 1394 is geared to support all media drives, digital cameras/video cameras, and laser printers. It is currently available as a PCI card for digital video camera users, but is not expected to go mainstream for another year or two so the technology can mature. Although it offers small footprint ports (sort of like stunted serial ports) and fairly stable transfers the needed silicon needs to be shrunk further, and another internal channel needs to be added (or so it looks) before Intel plans on integrating it into their core logic chipsets.
Why not just call it 'FireWire'? The term is actually exclusive to Apple and their PowerPC enabled FireWire computers. The name is great slang, but for legal reasons don't expect to see the name on store shelves.
Links to more information
SCSI Trade Organization
http://www.scsita.org/
A good site for SCSI resources and information.
SCSI T10 Committee
http://www.symbios.com/t10/
The group responsible for setting all SCSI standards.
Quantum
www.quantum.com
Hard Disk manufacturer - site also contains good information on Ultra 3 SCSI.
Maxtor
www.maxtor.com
Hard Disk Manufacturer
Seagate
www.seagate.com
Hard Disk Manufacturer
Questions and Answers
What does MTBF stand for?
MTBF stands for Mean Time Before Failure. MTBF numbers represent the average number of working hours (when the disk is being used) before the device is expected to fail. Usually the number is in the 100,000s but can go up to 6 or 7 times higher.
Don't misunderstand the number though - its only an average. Your drive could stop functioning after only days, or even surpass the average by a large amount. The only real thing that is going to help you is the Warranty.
What does SMART stand for?
SMART stands for Self Monitoring And Reporting Technology. It is an additional silicon feature on some hard disks, providing a way to learn beforehand if a drive is malfunctioning. SMART uses techniques to detect bad sector formation, error correction levels, and more. The actual techniques used are up to the manufacturer though, so effects may differ.
What is Thermal Recalibration?
Some parts of the hard disk are vulnerable to heat - enough to affect size or shape temporarily. Because of this, a common hard disk has built in Thermal Recalibration - where the drive checks itself to confirm and readjust measurements between sectors and tracks. This method takes time though - enough to cause slight pauses in operation. For data intensive tasks such as creating CD-ROM's or playing back video, thermal recalibration is not good, causing frames to be dropped or buffer underruns for CDRs.
The only way to really get around the problem is to use large data buffers (common for A/V drives), or bypass it and use completely different methods altogether.
What are Computer Viruses (virii)
Computer Viruses are malignant code and can come up in thousands of ways. The www.pcguide.com PCGuide sums it up best though: "... in order to be a virus, the program must have the ability to do all of the following:
Run without the user wanting it to and/or create effects that the programmer wants but that the user did not want or request.
Have the ability to "infect" or modify other files or disk structures.
Replicate itself so it can spread to other files or systems.
Most of the time, viruses are destructive - they attempt to damage or replace data on your hard disk. However, that is not the only example of a virus - they can also do something as simple as playing music through your speakers or writing a message on screen.
More to the point, Computer Viruses are always 'Executables', or files that do something only when the computer follows its instructions. Executables are one of the most commonly used files on any computer, but their only security measure is that *you* have to choose to run it, or choose to automatically let the computer run it, and so on. You cannot be infected by a virus by simply reading text, because nothing is executed when doing so.
There are several major types of viruses:
Trojan Horses: Aptly named because these are executables which do something the user does not want when used. They typically look like normal executables and when run have the ability to turn other files into Trojan horses.
Worms: Self contained executables. When run, they multiply and spread through networks independently, usually without affecting current data.
Droppers: Self contained Virus delivery systems. They are typically executables that contain data encrypted (scrambled) so that virus scanners cannot tell what is inside. Once they are used, droppers can install and run virus files.
'Macro' Viruses are related to Word Processing programs. 'Macros' are shortcuts - key combinations or scripts designed to do something for you. However, just like normal computer executables, macros are also run as such and can be just as vulnerable.
Viruses are usually structured to attack other executables or your disk's Boot Record. The Boot Record is code used every time the computer is started up, in order to relay disk information. Because it has to be loaded every time the computer is on, it makes for a very likely target. Bootable external disks are also vulnerable for the same reason.
The most recent form of virus attacks your computer's BIOS. Because most BIOS's are Flash ready (which means they can be overwritten with new information), a successful strike will erase it and leave your computer as unbootable.

No comments: