Is it possible to shrink ext4 filesystem and the underlying raid5 array in a safe way?
I would like to shrink my 15 TB / 6 drive raid array that contains an ext4 filesystem.
Before I do that on live system, I decided to give it a try on a test environment. I wrote a script that simulates raid+filesystem lifecycle (assemble, mkfs, resize2fs, shrink, ...) but in some cases cases it corrupts the filesystem. The script was run on two different distros (one of them was Centos-8).
I tried to understand the failures and unless I am missing something, mdadm, during raid shrink process (mdadm --grow) knows nothing about the ext4 filesystem and it seems it is not possible to help this tool behave properly.
In my scenario, a script to simulates a process:
- selects a random number num_devices (between 5 and 10) is chosen - that determines number of devices in our test array
- selects random number device_size (between 300 and 350) - size (in MiB) of a single device
- creates and assembles /dev/md0 - an RAID 5 array (in my case it was 0.90 metadata) - size of an array is array_size=($num_devices-1)*$device_size
- creates ext4 filesystem on /dev/md0 and mounts it to /mnt
- copies a reference file (in my case it was one of the kernel images from /boot) $num_devices times to /mnt (to have some data to validate filesystem integrity) - the filesystem has about 80% of free space avaialable
the filesystem is unmounted, fscked (
e2fsck -f) then shrunk (eitherresize2fs -Mfor minimum size orreisze2fs /dev/md0 {calculated_size}), and fscked againthe script waits for mdadm rebuild process to finish (by looking at /proc/mdstat)
- new array size is calculated: new_array_size=($num_devices-2)*$device_size
- hard disk failure is simulated by
mdadm --manage /dev/md0 --fail /dev/loop3followed bymdadm --manage /dev/md0 --remove /dev/loop3 - waits for the reshape process to finish
Once the reshape process is finished, /dev/loop3 is marked as removed and another loop device (e.g. /dev/loop2) is marked as spare.
- the process determines the spare, and re-adds it to an array (
mdadm --manage /dev/md0 --remove /dev/loop2followed bymdadm --manage /dev/md0 --add /dev/loop2) - script waits for raid rebuild the to finish (watching /proc/mdstat)
At this moment the corrupt occurs:
- filesystem is mounted again at /mnt
- md5 checksum comparision between reference file and copies on a shrunk filesystem either succeds or fails for 1-2 files
- filesystem is unmounted, fscked (
e2fsck -f), grown to maximum (resize2fs) and fscked again - corruptions is still present
Am I doing something wrong or raid5 shrink process is really unsupported? or is 0.90 metadata the reason?