I upgrade the disks in my servers a lot, and often times this requires replacing 3-4 drives. Throwing the old drives out would be a huge waste, so I bring them back to my office and put them in a separate Linux file server with a ton of drive bays. I wrote about the fileserver previously.

In the file server, I configure the drives into multiple RAID 5 volumes. Right now, I have 3 RAID volumes, each with four drives. Yesterday, one of the disks in an older volume went bad. So right now I’m running 3 out of 4 drives in a RAID 5. No data loss yet, which is good. Since this is an older RAID volume, I’ve decided not to replace the failed drive. Instead, I’ll just shrink the RAID from 4 disks into 3 disks. It was quite a hassle to figure out how to do this by researching online, so I thought I would document the entire process here, step by step, to save other people some time in the future. It should go without saying that you should have a recent backup of everything on the volume you are about to change.

  1. Make sure the old disk really is removed from the array. The device name shouldn’t show up in /proc/mdstat and mdadm –detail should say “removed”. If not, be sure you mdadm –fail and mdadm –remove the device from the array.
    # cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6]... 
    md0 : active raid5 sdh2[1] sdj2[0] sdi2[3]
          1452572928 blocks level 5, 64k chunk, algorithm 2 [4/3] [UU_U]
          unused devices: <none>
    # mdadm --detail /dev/md0
            Version : 0.90
      Creation Time : Wed Apr  8 12:24:35 2009
         Raid Level : raid5
         Array Size : 1452572928 (1385.28 GiB 1487.43 GB)
      Used Dev Size : 484190976 (461.76 GiB 495.81 GB)
       Raid Devices : 4
      Total Devices : 3
    Preferred Minor : 0
        Persistence : Superblock is persistent
        Update Time : Tue Aug 16 13:33:25 2011
              State : clean, degraded
     Active Devices : 3
    Working Devices : 3
     Failed Devices : 0
      Spare Devices : 0
             Layout : left-symmetric
         Chunk Size : 64K
               UUID : 02f177d1:cb919a65:cb0d4135:3973d77d
             Events : 0.323834
        Number   Major   Minor   RaidDevice State
           0       8      146        0      active sync   /dev/sdj2
           1       8      114        1      active sync   /dev/sdh2
           2       0        0        2      removed
           3       8      130        3      active sync   /dev/sdi2
  2. Unmount the filesystem:
    # umount /dev/md0
  3. Run fsck on the filesystem:
    # e2fsck -f /dev/md0
  4. Shrink the filesystem, giving yourself plenty of extra space for disk removal. Here I resized the partition to 800 GB, to give plenty of breathing room for a RAID 5 of three 500 GB drives. We’ll expand the filesystem to fill the gaps later.
    # resize2fs /dev/md0 800G
  5. Now we need to actually reconfigure the array to use one less disk. To do this, we’ll first query mdadm to find out how big the new array needs to be. Then we’ll resize the array and reconfigure it for one fewer disk. First, query mdadm for a new size (replace -n3 with the number of disks in the new array):
    # mdadm --grow -n3 /dev/md0
    mdadm: this change will reduce the size of the array.
           use --grow --array-size first to truncate array.
           e.g. mdadm --grow /dev/md0 --array-size 968381952
  6. This gives our new size as being 968381952. Use this to resize the array:
    # mdadm --grow /dev/md0 --array-size 968381952
  7. Now that the array has been truncated, we set it to reside on one fewer disk:
    # mdadm --grow -n3 /dev/md0 --backup-file /root/mdadm.backup
  8. Check to make sure the array is rebuilding. You should see something like this:
    # cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6]... 
    md0 : active raid5 sdh2[1] sdj2[0] sdi2[3]
          968381952 blocks super 0.91 level 5, 64k chunk, algorithm 2 [3/2] [UU_]
          [>....................]  reshape =  1.8% (9186496/484190976) 
                                      finish=821.3min speed=9638K/sec
  9. At this point, you probably want to wait until the array finishes rebuilding. However, Linux software RAID is smart enough to figure things out if you don’t want to wait. Run fsck again before expanding your filesystem back to it’s maximum size (resize2fs requires this).
    # e2fsck -f /dev/md0
  10. Now do the actual expansion so the partition uses the complete raid volume (resize2fs will use the max size if a size isn’t specified):
    # resize2fs /dev/md0
  11. (Optional) Run fsck one last time to make sure everything is still sane:
    # e2fsck -f /dev/md0
  12. Finally, remount the filesystem:
    # mount /dev/md0

Everything went smoothly for me while going through this process. I could have just destroyed the entire old array and recreated a new one, but this process was easier and I didn’t have to move a bunch of data around. Certainly if you are using a larger array, and are going from 10 disks to 9 or something along those lines, this benefits of using this process are even greater.