Home Technology Disk array technology

Disk array technology


The full name of disk array is: RedundanArrayofInexpensiveDisk, referred to as RAID technology. It is a disk redundancy technology proposed by Professor David Patterson of the University of California at Berkeley in 1988. Since then, disk array technology has developed rapidly and gradually matures. People gradually realized the disk array technology. Disk array technology can be divided into several levels 0-5 RAID technology in detail, and new levels of so-called RAID Level 10, 30, 50 have been developed. RAID is short for Redundant Array of Inexpensive Disk. The advantages of using RAID are simply: high security, fast speed, and large data capacity. Certain levels of RAID technology can increase the speed to 400% of a single hard drive. The disk array connects multiple hard disk drives to work together, greatly improving the speed, and at the same time improving the reliability of the hard disk system to a near error-free state. These "fault-tolerant" systems are extremely fast and extremely reliable.

From the perspective of the disk array

The most important specification of the disk array is the speed, that is, the type of CPU. We know that the evolution of SCSI is based on SCSI 2

(Narrow, 8 bits, 10MB/s), SCSI 3 (Wide, 16bits, 20MB/s), Ultra Wide (16bits, 40MB/s), Ultra 2 (Ultra Ultra Wide, 80MB/s), Ultra 3 (Ultra Ultra Ultra Wide, 160MB/s), from SCSI to Serial I/O, also known as Fibre Channel (FC-AL, Fibre Channel-Arbitration Loop, 100 – 200MB/s), SSA (Serial Storage Architecture, 80 – 160 MB/s), when Ultra Wide SCSI, 40MB/s disk arrays were used in the past, the CPU requirements do not have to be too fast, because SCSI itself is not Soon, but when SCSI evolves to Ultra 2, 80MB/s, the CPU requirements are very critical. General CPU (such as 586) must be changed to high-speed RISC CPU (such as Intel RISC CPU, i960RD 32bits, i960RN 64 bits), not only RISC CPU, but also the difference between 32bits and 64 bits RISC CPU. The difference between 586 and RISC CPU can be imagined! This is from the point of view of the disk array.

From the perspective of the server

The structure of the server has been changed from the traditional I/O structure to the I2O (Intelligent I/O, I2O for short) structure, and its purpose is to Reducing the burden on the server CPU will separate the system I/O from the server CPU load. Intel therefore proposes the I2O architecture, and I2O is also handled by a RISC CPU (i960RD or I960RN). Imagine if the RISC i960 CPU is already responsible for I/O in the server, but the 586 CPU is still used on the disk array, will the speed be faster?

From the perspective of the operating system


SCO OpenServer 5.0 32 bits

MicroSoft Windows NT 32 bits

SCO Unixware 7.x 64 bits

MicroSoft Windows NT 2000 32 bit 64 bits

SUN Solaris 64 bits …….. Other operating systems

The operating system has been changed from 32 bits to 64 bits, and the CPU on the disk array must be Intel i960 RISC CPU In order to meet the speed requirements. 586 CPU is not enough!


Does the hard disk connection in the disk array use the SCA-II integral backplane or just use the SCSI cable? Is there an isolation chip on the back panel of the SCA-II to prevent the high/low voltage generated by the hard disk during hot plugging, causing the system voltage to flow back, causing system instability and data loss.

We must pay attention to this issue, because many hard drives in the disk array share the same SCSI bus! One hard disk hot-swappable can not cause other hard disks to be affected! What is hot swap or hot swap? Hard disks are divided into hot-swappable hard drives, 80-pin hard drives are hot-swappable hard drives, 68-pin hard drives are not hot-swappable hard drives, whether there is hot plug, the difference in circuit design lies in whether there is a protection circuit design, the same The hard drive tray is the same, there is a difference between real hot swap and fake hot swap.

Are there any order requirements for the hard drives in the disk array? That is to say, can the hard drives be inserted back into the array out of order, and the data can still be accessed normally? Many people think that it is not very important, and it is unlikely to happen, but if it may happen, we must prevent it from happening.

If you use six hard disks as an array, at the initial initialization, the six hard disks are placed in the disk array in order, divided into the first, second... to the sixth hard disk. There is an order. If the disk array you buy has an order requirement, you should pay attention: one day you take out the hard disk and you must insert it back into the disk array in the original order when cleaning, otherwise your data The data may be lost because the hard disk sequence is not the same as the original one, and the controller on the disk array does not recognize it! Because the SCSI ID number of your hard disk is messed up. Disk array products have this kind of function that does not require the order of the hard disks. In order to prevent the occurrence of the above-mentioned events, the order of the hard disks is not required.

We will discuss these new technologies, as well as the advantages and disadvantages of different levels of RAID. We do not want to involve the key technical details, but introduce disk arrays and RAID technologies to people who are not familiar with them. I believe this will help you choose the right RAID technology.

Eight series

The following eight series have been basically recognized.

RAID0 (Level 0 Disk Array)

RAID0 is also called data partitioning, that is, data is distributed on multiple disks without fault tolerance measures. Its capacity and data transfer rate is N times the capacity of a single machine. N is the total number of disk drives that constitute the disk array. The I/O transfer rate is high, but the mean time to failure (MTTF) (MeanTimeToFailure) is only one-Nth of that of a single disk drive. Therefore, the reliability of the zero-level disk array is the worst.

RAID1 (Level 1 Disk Array)

RAID1, also known as Mirror disk, uses mirroring fault tolerance to improve reliability. That is, each working disk has a mirror disk. When writing data, it must be written to the mirror disk at the same time. When reading data, it is only read from the working disk. Once the working disk fails, immediately transfer to the mirror disk, read data from the mirror disk, and then the system will restore the correct data on the working disk.

Therefore, data can be reconstructed in this way, but the working disk and the mirror disk must maintain a one-to-one correspondence. This kind of disk array is highly reliable, but its effective capacity is reduced to less than half of the total capacity. Therefore, RAID1 is often used in applications that require extremely strict error rates, such as finance and finance.

RAID2 (Level 2 Disk Array)

RAID2 is also called bit interleaving. It uses Hamming code for disk error check, without the need for CRC (CyclicReDundancycheck) check after each sector. Hamming code is a (n, k) linear block code, n is the length of the codeword, k is the number of data bits, and r is the number of bits used for verification, so: n=2r-1r=n-k< /p>

So bitwise interleaving is most beneficial for Hamming code checking. This disk is suitable for reading and writing large data. However, the overhead of redundant information is still too large, which prevents the wide application of such disks.

RAID3 (Level 3 disk array)


RAID3 is a single-disk fault-tolerant parallel transmission array disk. Its characteristic is to reduce the check disk to one (the RAID2 check disk is multiple, DAID1 check disk is 1:1), and the data is stored in each disk in the form of bits or bytes (the same sector number is scattered and recorded in the group. On each disk machine). Its advantage is that the bandwidth of the entire array can be fully utilized to reduce the batch data transmission time; its disadvantage is that each read and write has to affect the entire group, and can only be completed once each time

RAID4 (Level 4 disk Array)

RAID4 is an array that can independently read and write to each disk in the group. There is only one verification disk.

The difference between RAID4 and RAID3 is: RAID3 is interleaved by bit or byte, while RAID4 is accessed by block (sector), which can operate on a certain disk individually, it does not need Like RAID3, even if every small I/O operation involves the entire group, only two disk drives in the group (a data disk and a test disk) are involved. Thereby improving the I/O rate of a small amount of data.

RAID5 (Level 5 Disk Array)

RAID5 is an array with independent access to rotating parity. The difference between it and the RAID1, 2, 3, and 4 disk arrays is that it does not have a fixed parity disk, but distributes its redundant parity information evenly on all disks belonging to the array according to a certain rule. . Therefore, there are both data information and verification information on the same disk drive. This change solves the problem of parity disk contention, so multiple write operations are allowed in the same group in RAID5. Therefore, RAID5 is not only suitable for large data volume operations, but also suitable for various transaction processing. It is a fast, large-capacity and fault-tolerant disk array with reasonable distribution.

RAID6 (Level 6 Disk Array)


RAID6 is a dual-vecchio and even parity independent access disk array. Its redundant detection and error correction information is evenly distributed on all disks, while data is still stored in interleaved blocks in variable-size blocks. This type of disk array can tolerate double disk errors.

RAID7 (Level 7 Disk Array)

RAID7 is based on RAID6 and adopts cache technology, which greatly improves the transmission rate and response speed. Cache is a kind of high-speed buffer memory, that is, data is written into the cache before being written into the disk array. Generally, the cache block size is the same as the data block size in the disk array, that is, a cache block corresponds to a disk block. When writing, the data is written into two independent caches, so that even if one of the caches fails, the data will not be lost. The write operation will respond directly at the cache level, and then go to the disk array. When data is written from the cache to the disk array, the data of the same track will be completed in one operation, which avoids the problem of multiple writes of many blocks of data and improves the speed. When reading, the host also reads directly from the cache instead of reading from the array disk, reducing the number of read operations with the disk, so that the disk bandwidth is fully utilized.

This combination of cache and disk array technology makes up for the deficiencies of the disk array (such as poor response to block write requests), so that the entire system is efficient, fast, large-capacity, high-reliability and flexibility , Convenient storage system is provided to users, thus meeting the needs of current technological development, especially the needs of multimedia systems.


Data Spanning (Spanning)

Data spanning technology enables multiple hard drives to work like one hard drive, which allows users to combine Some resources or increase some resources to cheaply break through the existing hard disk space limitations.

Figure 2 shows four 300 megabyte hard drives connected together to form a SCSI system. Instead of seeing C, D, E, F, 4 hard drives with 300 megabytes, the user only sees one C drive with 1200 megabytes. In such an environment, the system administrator does not have to worry about a situation where the hard disk security check space is insufficient on a certain hard disk. Because now the capacity of 1200 megabytes is all on one volume (for example, hard disk C). The system administrator can safely establish any level of file system required, without the need to plan his file system under the constraints of multiple separate hard disk environments. The hard disk data span is not a RAID itself, and it cannot improve the reliability and speed of the hard disk. But it has the advantage that multiple small and inexpensive hard disks can be added to the hard disk subsystem as needed.

Hard Disk Striping

(Disk Striping, RAID 0)

The hard disk striping method writes data to multiple hard disks instead of just one On the disk, this is also called RAID O. In the disk array subsystem, data is written to multiple hard disks in the unit of "segment" specified by the system. For example, data segment 1 is written to the hard disk.

0, segment 2 is written to hard disk 1, segment 3 is written to hard disk 2, and so on. When the data is written to the last hard disk, it starts to write again from the next available segment of Disk 0, and the whole process of writing data is repeated until the data is written. Segments are made up of blocks, and blocks are made up of bytes. Therefore, when the size of the segment is 4 blocks and the block is composed of 256 bytes, the size of the segment is equal to 1024 bytes based on the byte size. The first to 1024 bytes are written to Disk 0, the 1025 to 2048 bytes are written to Disk 1, and so on. If our hard disk subsystem has 5 hard disks and we want to write 20,000 bytes, the data will be stored as shown in Figure 3.

In short, because the hard disk segmentation method is to write (read) data to multiple hard disks at once, its speed is relatively fast. In fact, the transmission of data is sequential, but multiple read (or write) operations can overlap each other. That is to say, just when segment 1 is writing to drive 0, the operation of writing segment 2 to drive 1 also starts; while segment 2 is still writing to drive 1, the data of segment 3 has been sent to drive 2; and so on, in Several disks (even if not all disks) are writing data at the same time. Because the speed at which data is sent to the disk drive is much faster than the speed at which it is written to the physical disk. Therefore, as long as the control software is compiled according to this feature, the above-mentioned data writing operation can be realized at the same time.

Unfortunately, RAID 0 does not provide redundant data, which is very dangerous. Because the entire hard disk subsystem must be ensured to work properly, the calculator can work normally. For example, if a file has segment 1 (in drive 0), segment 2 (in drive 1), and segment 3 (in drive 2), then as long as the drive If one of 0, 1, 2 fails, it will cause problems; if drive 1 fails, we can only physically obtain segment 1 and segment 3 data from the drive. Fortunately, a solution can be found, which is hard disk segmentation and data redundancy. This issue will be discussed in the next subsection.

Hard disk mirroring

Hard disk mirroring (RAID 1) is the most traditional form of fault-tolerant disk array technology. It is relatively well-known in the industry. Its most important advantage is 100% data redundancy. RAID 0 achieves data redundancy by simply copying all the data on one disk to the second disk (or equivalent storage device). Although this method is simple and relatively easy to implement, it has disadvantages It is twice as expensive as a single non-redundant hard drive, because another hard drive must be purchased as a mirror image of the first hard drive.

The simplest form of hard disk mirroring is realized by connecting two hard disks to a controller. Figure 4 illustrates hard disk mirroring. When data is written on a certain hard disk, it is written on the corresponding mirror disk at the same time. When a disk drive fails, the calculator system can still work because it can manipulate data on the remaining good disk.

Because the two disks are mirror images of each other, it does not matter which disk fails. Second, the disk contains the same data at any time, and any one can be used as a working disk. In the simple RAID method of hard disk mirroring, some speed optimization methods can still be used, such as balancing the load of read requests. When multiple users request data at the same time, the request for reading data can be distributed to the two hard disks, so that the reading load is evenly distributed on the two hard disks. This method considerably improves the performance of reading data, because two hard drives read different pieces of data at the same time. But hard disk mirroring cannot improve the performance of writing data. The hard disk that is "mirrored" can also be mirrored to other storage devices, such as a rewritable optical drive. Although the speed of using an optical disk as a mirror disk is not as fast as a hard disk, this method reduces the loss of data compared with not using a mirror disk. The danger.

In short, the mirroring system has very good fault tolerance performance and can increase the speed of reading data; its disadvantage is that it requires double hard disks, so the price is higher.

Segmentation redundancy


Hard disk segmentation improves the performance of the hard disk subsystem, because the speed of reading and writing data to the hard disk is The number of hard disks in the hard disk subsystem increases proportionally, but its disadvantage is that the failure of any hard disk in the hard disk subsystem will cause the entire computer system to fail. The entire segmented hard disk subsystem can be mirrored. If 4 hard disks have been used for segmentation, we can add 4 more segmented hard disks as a mirror image of the original 4 hard disks. Obviously this is expensive (although it may be cheaper than mirroring an expensive large hard drive). You can use other data instead of mirroring

redundancy to provide high fault tolerance. You can choose a magic even code mode to achieve the above method, you can add a hard disk dedicated to parity (such as in RAID 3), or you can distribute the parity data across all hard disks in the disk array. An example of distributed parity data (RAID 5) is shown in Figure 5.

No matter what level of RAID is used, the disk array always uses an exclusive OR (XOR) operation to generate parity data. When a hard disk in the subsystem fails, the exclusive OR operation is also used to reconstruct the data. The following briefly analyzes how XOR works.

Hard disk ABC parity disk (the result of XOR of A, B, C)

Data 1 0 1 0

First remember that in the XOR operation, 2 When the result of the exclusive OR of the numbers is true (ie "1"), one of the two numbers is 1 (the other is 0). We assume that Disk B of A, B, and C is faulty. At this time, we can XOR A, C and the parity data to get the data 0 lost on disk B. Similarly, if disk C fails, we can compare disks A, B and parity. XOR the data to get the original data 1 of the C drive.

If it is extended to the hard disk subsystem of 7 disks:

The hard disk ABCDEF parity bit

Data 0 0 0 1 0 1 0

If the data on Disk B is lost, we can XOR A, C, D, E, F and the parity bit to get the lost disk B data 0. And XOR A, B, C, D, E, F and parity bit can recover data 1 of D disk.

Using a dedicated parity disk (as described above, RAID 3), when multiple write operations occur at the same time, each operation must write to the parity disk. This will produce an I/O bottleneck effect.

RAID 5 disperses the parity information on all hard disks in the hard disk subsystem (instead of using a dedicated parity disk 0, which improves the bottleneck effect of the parity disk in the above RAID 3. Figure 5 Describes a configuration of RAID 5. The parity information in the figure is scattered on each hard disk in the subsystem. A part of each hard disk is used to form a parity disk, and the parity information written to the hard disk will be more evenly distributed across all hard disks. So a user may write one of its data segments on hard disk A and the parity information on hard disk B, and the second user may write data on hard disk C and the parity information on hard disk D. From here too It can be seen that the performance of RAID 5 will be improved.

This method will increase the transaction processing speed of the hard disk subsystem. The so-called transaction processing refers to processing from many different users

Multiple hard disk I/O operations, because there may be many users dealing with the hard disk at the same time, writing data to the hard disk quickly, sometimes almost at the same time, in this case, using a distributed parity disk method is better than using a dedicated parity disk , The possibility of bottleneck effect is small. For hard disk operation, the write performance of RAID 5 is not as good as that of direct hard disk segmentation (referring to RAID 0 without parity information). Because generating or storing parity codes requires some additional operations. For example, when modifying data on a hard disk, the data of the corresponding segment on other disks (even irrelevant data) must be read into the host to generate the necessary parity information. After the parity segment is generated (it will take some time), We need to write the updated data segment and parity segment to the hard disk. This is usually called a read-modify-write strategy. Therefore, although RAID 5 is superior to RAID 0, RAID 5 is inferior to RAID 0 in terms of write performance. p>

When mirroring technology (RAID 1) and data parity segmentation (RAID 5) are used in the above-mentioned hard disk subsystems, redundant information is generated. However, in RAID 1, all data is copied to the first Two identical hard disks. In RAID 5, the XOR code of the data is copied instead of the data itself, so the data can be used in a very compact representation to recover the data lost due to a hard disk failure.

< p>When using RAID 5, for an array of 5 hard disks, about 20% of the hard disk space is used to store the parity code, while the array of ten hard disks has only about 10% of the space to store the parity code. The total format in the available space In terms of space, the more hard disks in a hard disk system, the more money the system will save.

In short, RAID 5 combines the advantages of hard disk segmentation and parity redundancy technology. Such a hard disk The system is particularly suitable for transaction processing environments, such as civil aviation ticket offices, car rental stations, sales system terminals, etc. In some cases, RAID 1 may be given priority (in those situations where data is written more frequently than data is read). But In many cases, RAID 5 provides a solution that combines high performance, low price, and data security.

Failure Recovery

Mirroring and RAID provide new ways to recover data from hard disk failures. Because all parts of the data are redundant, the data validity is very high (even when the hard disk fails). Another important advantage is that the work of restoring data does not need to be carried out immediately, because the system can work normally in the case of a hard disk failure. Of course, in this case, the remaining system will no longer have fault tolerance. To avoid data loss, data must be restored before the second hard drive fails. After replacing the failed hard disk, data recovery is required. In the mirroring system, there is a data backup on the "mirror" disk, so the failed hard disk (primary hard disk or mirrored hard disk) can rebuild the data through a simple hard disk to hard disk copy operation, as shown in Figure 6. This copy operation is much faster than restoring data from tape.

In the RAID 5 hard disk subsystem, the failed hard disk uses the error correction (parity) code information stored on the non-faulty hard disk to reconstruct data. The data on the normal disk (including the parity information part) is read out, and the data lost by the failed disk is calculated, and then written to the newly replaced disk. This process is shown in Figure 7, and it is much faster than restoring data from tape.

The flexible design of the disk array can be reconfigured, and the address of the replacement disk may not be the same as the address of the failed disk, as shown in Figure 8. This flexibility makes the installation process easier. The spare disk can even be pre-connected to the system before the hard disk fails. In that case, it becomes a ready-to-use backup disk. This type of disk is usually called a "hot spare."


Although these two terms are related to each other, they actually represent two different aspects of hard disk failure. Reliability

refers to the hard disk. The probability of failure under given conditions. Availability refers to the time that the hard disk may be used in a certain purpose. Using these two terms, we can see how the disk array improves the reliability of our hard disk system to nearly 100%. Disk array can improve the reliability of hard disk system. Because the data in a hard disk can be reproduced from the data in other hard disks (such as RAID 5), there is little chance that the entire hard disk system will fail. The reliability of the hard disk subsystem is thus greatly improved.

Chart 9 is a comparison of the reliability of a RAID hard disk subsystem and a single hard disk subsystem:

We must also consider the availability of the system. The availability of a single hard disk system is better than that of a disk array without data redundancy, and the availability of a redundant disk array is much better than that of a single hard disk. This is because the redundant disk array allows a single hard disk to fail and continue to work normally. In addition, the system recovery time after a hard disk failure is greatly reduced (compared to recovering data from tape). Finally, because when a failure occurs, the data on the hard disk is the data at the time of the failure, and the replaced hard disk will also contain the data at the time of the failure (for example, the backup data the night before). To obtain complete fault tolerance, other components of the computer's hard disk subsystem must also be redundant, such as providing two power supplies, or equipped with dual hard disk controllers. Without the redundancy of other components, even with a very reliable hard disk subsystem, it still cannot completely prevent the failure of the computer system.

Fault-tolerant system

As mentioned earlier, the directly segmented subsystem (RAID 0) can greatly increase the speed of reading and writing (relative to a single hard disk), because the data is scattered across multiple hard disks , The hard disk operation can be carried out at the same time.

Mirroring two directly segmented hard disk subsystems can effectively form a fully redundant fast hard disk subsystem. Such a subsystem, its hard disk operation is even faster than the direct segmented hard disk subsystem, because the system can perform two read operations at the same time (one read operation for each hard disk), and the speed of the write operation is different from that of non-mirroring. The segment subsystem is almost the same, because writing data to two hard disks at the same time only costs little extra overhead.

Through the concepts we described earlier, such as duplex: (dual controllers, dual power supplies, etc.), redundancy issues can be further improved. Dual controllers also allow us to get higher data transmission speeds, because the controller is less likely to become a bottleneck in the performance of the subsystem.

Technical terms

Disk Mirroring: The simplest form of hard disk mirroring is a host controller with two hard disks that mirror each other. Data is written to two hard disks at the same time, and the data on the two hard disks are exactly the same, so when one hard disk fails, the other hard disk can provide data.

Disk Spanning: Using this technology, several hard disks look like one large hard disk; this virtual disk can store data across disks on different physical disks. Need to be concerned about which disk contains the data he needs.

Disk Striping: Data is stored on several disks. The first segment of the data is placed on disk 0, the second segment is placed on disk 1, ... until the last disk in the hard disk chain is reached, then the next logical segment will be placed on hard disk 0, and the next logical segment will be placed on disk 1 , And so on until the write operation is completed.

Duplexing: This refers to the use of two controllers to drive a hard disk subsystem. When one controller fails, the other controller immediately controls the operation of the hard disk. In addition, if the appropriate controller software is written, different hard drives can work at the same time.

Fault Tolerant (Fault Tolerant): The machine with fault tolerance function has the ability to resist failure. For example, the RAID 1 mirroring system is fault-tolerant. If one of the mirrored disks fails, the hard disk subsystem can still work normally.

Host Adapter: This refers to the control component (such as SCSI controller) that enables the host and peripherals to exchange data.

Hot Fix: Refers to using a hard disk hot backup to replace a failed hard disk. It should be noted that the failed disk is not actually physically replaced. The disk used as a hot backup is loaded with the original data of the failed disk, and then the system resumes work.

Hot Patch: A system with hard disk hot backup that can replace failed disks at any time.

Hot Spare: The hard disk that is electrically connected to the CPU system, which can replace the failed disk in the system. The difference from the cold backup is that the cold backup disk is usually not connected to the machine, and the failed disk is replaced only when the hard disk fails.

Mean Time Between Data Loss (MTBDL-Mean Time Between Data Loss): The average time between data loss events.

Mean Time Between Failure (MTBF-Mean Time Between Failure or MTIF): Mean time between failures of equipment.

RAID-Redundant Array of Inexpensive Drives: A technology that combines multiple inexpensive hard disks into a fast, fault-tolerant hard disk subsystem.

Reconstruction or Rebuild: After a hard disk fails, the process of restoring the data of the failed disk from other correct hard disk data and parity information.

Reconstruction Time: The time required to rebuild data for the failed disk.

Single Large Expensive Disk (SLED-Single Large Expensive Disk).

Transfer Rate: Refers to the speed of accessing data under different conditions.

Virtual Disk: Similar to virtual storage, a virtual disk is a conceptual disk, and the user does not need to care about which physical disk his data is written on. Virtual disks generally span several physical disks, but the user sees only one disk.

This article is from the network, does not represent the position of this station. Please indicate the origin of reprint