Testing Bit Rot

One of ZFS’ most compelling features is detection of bit rot. Though rare, rot is almost impossible to detect until well after the event occurs and rotating backups have promulgated the error.

One ends up with photos where the top half is fine but the bottom is semi-random junk. Or glitches in home movies.

ZFS checksums everything – your data, its own data – and verifies the data on disk matches the checksum every access.

If you have mirroring or RAIDZ, then not only can ZFS tell you about the error, it can pull the correct data from a good disk and overwrite the bad data.

Bit rot, or data degradation, happens on hard drives via humidity, cosmic rays (we are bathed in the light of 150 million stars, one of them quite close), and loss of magnetism. Hard drives have a limited life time and individual disk sectors can go bad before the entire drive dies. Manufacturer specifications give error rates for their drives. This might be one in several terabytes, but nowadays terabyte drives are common; multiply the rate by your amount of data, give it enough time, and sooner or later you will experience it. ZFS to the rescue!

The Set Up

Let’s create a zpool, RAIDZ1, from 3 files.

truncate -s 200M file1
truncate -s 200M file2
truncate -s 200M file3
sudo zpool create mypool raidz1 \
  /home/graham/Documents/test/file1 \
  /home/graham/Documents/test/file2 \
  /home/graham/Documents/test/file3
sudo chown -R graham:graham /mypool

Full paths to the files must be given to zpool create, and we need to grant ownership to the pool so we can create files in it.

We can check the zpool is healthy:

$ sudo zpool status 
  pool: mypool
 state: ONLINE
  scan: none requested
config:

	NAME                                   STATE     READ WRITE CKSUM
	mypool                                 ONLINE       0     0     0
	  raidz1-0                             ONLINE       0     0     0
	    /home/graham/Documents/test/file1  ONLINE       0     0     0
	    /home/graham/Documents/test/file2  ONLINE       0     0     0
	    /home/graham/Documents/test/file3  ONLINE       0     0     0

errors: No known data errors

The Test

Let’s create a small text file:

echo abcdefghijk > /mypool/a.txt

That gives us text that’s easily found in the pool’s data files:

$ grep abcde file*
Binary file file1 matches
Binary file file3 matches

ZFS is storing the data in 2 of the 3 storage devices, because we’re using RAIDZ1, which offers one mirror for all data.

Cosmic Ray Simulation!

We can flip some bits if we first offline one of the drives:

sudo zpool offline mypool /home/graham/Documents/test/file1

Now load up file1 in wxHexEditor, press Ctrl-F, and search for abcd.

Before Edit

Type over abcd to change it to anything you like.

After edit

You’ll need to change the file mode to “Overwrite”, so you can then File->Save.
Menu

Once saved, file1 no longer contains the correct data:

$ grep abcde file*
Binary file file3 matches

Online file1 again, so the zpool is whole:

sudo zpool online mypool /home/graham/Documents/test/file1

Results – Catastrophe?

ZFS hasn’t accessed any data yet, so doesn’t yet realize the error:

$ sudo zpool status   pool: mypool
 state: ONLINE
  scan: resilvered 2.50K in 0h0m with 0 errors on Sun May 24 10:30:30 2015
config:

	NAME                                   STATE     READ WRITE CKSUM
	mypool                                 ONLINE       0     0     0
	  raidz1-0                             ONLINE       0     0     0
	    /home/graham/Documents/test/file1  ONLINE       0     0     0
	    /home/graham/Documents/test/file2  ONLINE       0     0     0
	    /home/graham/Documents/test/file3  ONLINE       0     0     0

errors: No known data errors

Though we introduced a data error, ZFS gives good data:

$ cat /mypool/a.txt 
abcdefghijk

No sign of any error! In my case, a status report still shows no errors, perhaps because the data came from a cache or because it read the data from file3, which we didn’t change.
We can force ZFS to check integrity with the scrub command:

$ sudo zpool scrub mypool 
$ sudo zpool status 
  pool: mypool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 512 in 0h0m with 0 errors on Sun May 24 10:34:42 2015
config:

	NAME                                   STATE     READ WRITE CKSUM
	mypool                                 ONLINE       0     0     0
	  raidz1-0                             ONLINE       0     0     0
	    /home/graham/Documents/test/file1  ONLINE       0     0     1
	    /home/graham/Documents/test/file2  ONLINE       0     0     0
	    /home/graham/Documents/test/file3  ONLINE       0     0     0

errors: No known data errors

ZFS has certainly sat up and noticed now!

Note the changed status and the ‘1’ in the ‘CKSUM’ column for our cosmically zapped file1. ZFS wrote good data over bad (“scrub repaired”):

$ grep abcde file*
Binary file file1 matches
Binary file file3 matches

“Applications are unaffected.” Music to my ears – thanks to mirroring we had redundancy and ZFS was able to recover our data.

We can safely clear the error. If these were real disks, we would now decide whether to replace the potentially failing drive.

Catastrophe avoided!