This guide describes what to do in case of disk failure in a RAID setup.

Guide assumptions

The guide assumes the RAID setup is similar to the one descirbed in HOWTO Gentoo Install on Software RAID mirror and LVM2 on top of RAID:

RAID DevicePhysical partitionsMount point
/dev/md0/dev/hda1 /dev/hdc1/boot
/dev/md1/dev/hda3 /dev/hdc3/
/dev/md2/dev/hda4 /dev/hdc4LVM2

/dev/hda2 and /dev/hdc2 are both swap (not mirrored)

It is also assumed that mdadm is installed.

How to identify disk failure

Automated notification

The preferred way is to setup automatic e-mail notification.

Manual checking

Check the status of our arrays with cat /proc/mdstat:

Code: example output
Personalities : [raid1] 
md1 : active raid1 hdc3[0]
      10008384 blocks [2/1] [U_]

md2 : active raid1 hdc4[0]
      145669312 blocks [2/1] [U_]

md0 : active raid1 hdc1[0]
      104320 blocks [2/1] [U_]
unused devices: <none>

The [U_]'s indicate that a disk is down (represented by the underscore) in each of the arrays. We can see that hdc is present, but hda is missing. If hda was actually faulty, we would need to power down, replace the bad drive with another of equal or larger capacity, boot up, partition the new disk, and continue (or hotswap if we're so lucky). I don't know, but perhaps the arrays would begin rebuilding automatically after the reboot. If not, we would need to add the new drive to our arrays manually.

Repairing the RAID setup

Recreate partitions on current or new disk

After verifying the current disk is functional, or replacing if, rebuild the partition table on it:

sfdisk <source device> -d | sfdisk <destination device>

where <source device> is a disk currently in the raid, and <destination device> is the verified or new disk.

Resynchronizing the Arrays

Add the disk to the raid:

Code: Adding partitions to existing raid
# mdadm --add /dev/md0 /dev/hda1
# mdadm --add /dev/md1 /dev/hda3
# mdadm --add /dev/md2 /dev/hda4

Check the rebuild status with watch -n 6 cat /proc/mdstat

Code: Sample output
 Every 6.0s: cat /proc/mdstat                            Fri Oct 13 15:39:22 2006
 Personalities : [raid1]
 md1 : active raid1 hda3[1] hdc3[0]
       10008384 blocks [2/2] [UU]
 md2 : active raid1 hda4[2] hdc4[0]
       145669312 blocks [2/1] [U_]
       [>....................]  recovery =  4.3% (6329664/145669312) finish=43.7m
 in speed=53078K/sec
 md0 : active raid1 hda1[1] hdc1[0]
       104320 blocks [2/2] [UU]
 unused devices: <none>
Warning: Do not reboot until synchronization is complete.
Note: If possible, reboot the system after synchronization is complete to verify the RAID is set up correctly.
