Understanding How RAID Storage Works

03 November 2021 by Ontrack Team

DST_image_970x300_hero-RAID

 

Redundant Array of Independent Disks (RAID) is a term used for computer data storage systems that spread and/or replicate data across multiple drives. RAID technology has revolutionized enterprise data storage and was designed with two key goals:  increase data reliability and increase I/O (input/output) performance.

Unfortunately, RAID storage isn't a perfect technology and as a result, data loss can still occur when using these systems. In this post we’ll explore how RAID levels work and how data can be stored (and lost) using this type of storage.

How does RAID work?

A RAID combines physical disks into a single logical unit by using special hardware or software. Hardware RAID solutions can come in various styles, from built onto the motherboard or add in cards to large enterprise NAS or SAN servers. The operating system (OS) of this type of setup is unaware of the technical workings of the RAID so software solutions are typically implemented within the OS.

RAID is traditionally used on servers, but can also be used on workstations. The latter is typical in storage-intensive computers such as those used for video and audio editing, where high storage capacities and data transfer speeds are required.

Commonly used RAID storage terms

Let’s take a look at some of the technical terms that are commonly used to describe aspects of RAID storage:

  • RAID - A technology that supports various hard drive configurations to achieve greater performance, reliability and larger volume sizes through the use of consolidating disk resources and parity calculations.

  • Parity – Distributed information which allows the recreation of data stored within a RAID array, even if a disk fails.

  • Mirroring – When data from 1 or more hard drives is duplicated onto another physical disk(s).

  • Striping – A method where data can be written across multiple disks. In the example below, data is written across the drives in a sequential order until it reaches the last drive. It then jumps back to the first drive and starts a second stripe, etc.

  • Block – A block is the logical space on each disk where the data is written; the amount of space is set by the RAID controller.
  • Left / right symmetry – Symmetry in a RAID controls how the data and parity are distributed across the drives. There are four main styles of symmetry that can be used depending on the RAID vendor. Some companies also make proprietary styles depending on their business needs.

  • Hot spare – There are a few different methods for dealing with drive failures within a RAID. One method is the use of a ‘hot spare’, which is a spare disk that can be used in place of a failed disk.

  • Degraded mode – This happens when a drive in the RAID becomes unreadable. The drive is then considered bad and is withdrawn from the RAID. The new data and parity are then written to the remaining drives within the RAID. If any data is requested from the failed drive, it is worked out with the parity on the others. This decrease in drives degrades the performance of the RAID.

Now let’s examine the three key concepts in RAID (mirroring, striping and error correction) and explore how some of the traditional level configurations work.

RAID storage levels

As previously mentioned, mirroring involves the copying of data to more than one disk, striping occurs when data is split across more than one disk and error correction occurs when redundant data is stored as a means of allowing problems to be detected and possibly fixed (known as fault tolerance). One or more of these techniques can be used within different RAID setups, depending on the system requirements.

Standard RAID configurations are referred to as levels. Five levels were originally created, however, many more variations have evolved, including several nested levels and many non-standard (mostly proprietary) levels. The industry has already seen levels expand from RAID 0 to RAID 51 (and beyond). Because different levels have different types of redundancy, a trade-off usually has to be made between fault tolerance and performance, depending on the application.

Basic RAID levels include:

  • RAID 0 – Often called ‘striping’, this is considered the most basic RAID level. It offers no redundancy but excellent performance. Data is striped across at least two disks and with every disk added, the read/write performance and storage capacity are increased over a single drive.

  • RAID 1 – This level is also called ‘mirroring’, which (as the name suggests) mirrors the same data across two disks; providing the lowest level of RAID redundancy. This level offers up to double the read performance over a single drive, but no increase in write speed. Stored data is always accessible if one disk is still working.

  • RAID 5 – This is a common configuration that offers a decent compromise between security and performance. It requires at least three disks and provides a gain in read speeds but no increase in write performance. RAID 5 introduces ‘parity’ to the array, which takes up the space of one disk in total. This level can also tolerate one disk failure.

  • RAID 6 – This takes the concept of RAID 5 a step further – a minimum of four disks are required and dual-parity is introduced, meaning data can be recreated even if two disks fail within the array.

Modern RAID arrays

There are many ways to get more out of your RAID system. However, given the highly complicated and technical nature of modern arrays (and how they can be utilized with other complex systems for significant efficiency and cost benefits, such as virtualization), it is not uncommon for one of these technologies to suffer a fault. This can cause significant data loss due to the interconnectivity of multiple systems and can cost businesses millions in downtime as a result.

Modern RAID arrays can also use multiple file systems, like BTRFS or ZFS at hardware level, with NTFS or HFS layered over the top for application support via virtualization.

RAID data recovery challenges

RAID arrays are highly complex. This is often intensified within enterprise IT infrastructures, as RAID systems are used mostly for business-critical applications, where availability and efficiency are crucial factors. Add-on technologies like virtualization or database applications can also spell disaster for a business if the system were to fail.

From a data recovery perspective, it would usually be necessary to reconstruct the RAID file system and bypass any physical failures as well as assess any virtualized architecture that may exist. This process can often make a recovery attempt extremely complex and time-consuming; however, in many cases data recovery can be very successful.

Prepare for drive failure

Unfortunately drives can (and will) fail at some point in their lifetime. If a failure occurs involving individual drives (assuming it is a RAID 1 or greater), the faulty drive can just be replaced with a new one and the data storage map can be rebuilt with zero data loss. However, if a drive failure exceeds the redundancy capacity of the RAID, a professional raid data recovery specialist  should be contacted immediately for the best chance of recovering your data. It is imperative to make sure that your chosen provider has the tools and expertise to recover from any configuration or data loss situation. You should also assess whether they have direct partnerships with storage vendors and development capabilities for accommodating new or custom configurations.

Have you recently experienced RAID data loss? Contact the experts at Ontrack for help retrieving your mission-critical data. Our staff is standing by 24/7/365 to help with your needs, from everyday to once-in-a-lifetime data loss scenarios.