One of ZFS’ most compelling features is detection of bit rot. Though rare, rot is almost impossible to detect until well after the event occurs and rotating backups have promulgated the error.
One ends up with photos where the top half is fine but the bottom is semi-random junk. Or glitches in home movies.
ZFS checksums everything – your data, its own data – and verifies the data on disk matches the checksum every access.
If you have mirroring or RAIDZ, then not only can ZFS tell you about the error, it can pull the correct data from a good disk and overwrite the bad data.
Bit rot, or data degradation, happens on hard drives via humidity, cosmic rays (we are bathed in the light of 150 million stars, one of them quite close), and loss of magnetism. Hard drives have a limited life time and individual disk sectors can go bad before the entire drive dies. Manufacturer specifications give error rates for their drives. This might be one in several terabytes, but nowadays terabyte drives are common; multiply the rate by your amount of data, give it enough time, and sooner or later you will experience it. ZFS to the rescue!
The Set Up
- Linux Mint – I’m using version 17.
- ZFS-on-Linux installed.
- No need for special hardware – we’ll fake this all in the local filesystem.
- A binary file editor – I use wxHexEditor.
Let’s create a zpool, RAIDZ1, from 3 files.
truncate -s 200M file1 truncate -s 200M file2 truncate -s 200M file3 sudo zpool create mypool raidz1 \ /home/graham/Documents/test/file1 \ /home/graham/Documents/test/file2 \ /home/graham/Documents/test/file3 sudo chown -R graham:graham /mypool
Full paths to the files must be given to zpool create, and we need to grant ownership to the pool so we can create files in it.
We can check the zpool is healthy:
$ sudo zpool status pool: mypool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 /home/graham/Documents/test/file1 ONLINE 0 0 0 /home/graham/Documents/test/file2 ONLINE 0 0 0 /home/graham/Documents/test/file3 ONLINE 0 0 0 errors: No known data errors
The Test
Let’s create a small text file:
echo abcdefghijk > /mypool/a.txt
That gives us text that’s easily found in the pool’s data files:
$ grep abcde file* Binary file file1 matches Binary file file3 matches
ZFS is storing the data in 2 of the 3 storage devices, because we’re using RAIDZ1, which offers one mirror for all data.
Cosmic Ray Simulation!
We can flip some bits if we first offline one of the drives:
sudo zpool offline mypool /home/graham/Documents/test/file1
Now load up file1 in wxHexEditor, press Ctrl-F, and search for abcd.
Type over abcd to change it to anything you like.
You’ll need to change the file mode to “Overwrite”, so you can then File->Save.

Once saved, file1 no longer contains the correct data:
$ grep abcde file* Binary file file3 matches
Online file1 again, so the zpool is whole:
sudo zpool online mypool /home/graham/Documents/test/file1
Results – Catastrophe?
ZFS hasn’t accessed any data yet, so doesn’t yet realize the error:
$ sudo zpool status pool: mypool state: ONLINE scan: resilvered 2.50K in 0h0m with 0 errors on Sun May 24 10:30:30 2015 config: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 /home/graham/Documents/test/file1 ONLINE 0 0 0 /home/graham/Documents/test/file2 ONLINE 0 0 0 /home/graham/Documents/test/file3 ONLINE 0 0 0 errors: No known data errors
Though we introduced a data error, ZFS gives good data:
$ cat /mypool/a.txt abcdefghijk
No sign of any error! In my case, a status report still shows no errors, perhaps because the data came from a cache or because it read the data from file3, which we didn’t change.
We can force ZFS to check integrity with the scrub command:
$ sudo zpool scrub mypool $ sudo zpool status pool: mypool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-9P scan: scrub repaired 512 in 0h0m with 0 errors on Sun May 24 10:34:42 2015 config: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 /home/graham/Documents/test/file1 ONLINE 0 0 1 /home/graham/Documents/test/file2 ONLINE 0 0 0 /home/graham/Documents/test/file3 ONLINE 0 0 0 errors: No known data errors
ZFS has certainly sat up and noticed now!
Note the changed status and the ‘1’ in the ‘CKSUM’ column for our cosmically zapped file1. ZFS wrote good data over bad (“scrub repaired”):
$ grep abcde file* Binary file file1 matches Binary file file3 matches
“Applications are unaffected.” Music to my ears – thanks to mirroring we had redundancy and ZFS was able to recover our data.
We can safely clear the error. If these were real disks, we would now decide whether to replace the potentially failing drive.
Catastrophe avoided!

