Replacing a drive

I’ll replace one of the 3TB drives with a 4TB drive. This’ll allow for a size upgrade (eventually–once all the drives are replaced with 4TB drives). It also means I’m rotating out older drives with new ones. I added a sticker to the drive to show the date, so future me can see which drives are oldest.

With the old drive removed, zpool status shows the pool as “degraded” with a drive missing.


NAME                     STATE     READ WRITE CKSUM
tank                     DEGRADED     0     0     0
  raidz2-0               DEGRADED     0     0     0
    ada0                 ONLINE       0     0     0
    ada1                 ONLINE       0     0     0
    ada2                 ONLINE       0     0     0
    ada3                 ONLINE       0     0     0
    ada5                 ONLINE       0     0     0
    9875896178717210589  UNAVAIL      0     0     0  was /dev/ada6

Plugging in the new drive makes no change here. Off to NAS4Free’s “Disk -> Management” screen. It shows a warning saying the physical devices have changed, and to import disks with the “clear configuration” option enabled. Do that, and Apply Changes. The disk is now listed normally, but with the Filesystem marked “unknown or unformatted”.

Now to the “ZFS -> Tools” screen. Select ‘replace a device’. Select the pool, tap next. Select “ada6” and tap next. It ran “zpool replace ‘tank’ ‘/dev/ada6′”, and now the status shows it silvering the new drive.


  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Jan 19 16:29:06 2018
        144G scanned out of 5.07T at 538M/s, 2h40m to go
        24.0G resilvered, 2.77% done
config:

NAME                       STATE     READ WRITE CKSUM
tank                       DEGRADED     0     0     0
  raidz2-0                 DEGRADED     0     0     0
    ada0                   ONLINE       0     0     0
    ada1                   ONLINE       0     0     0
    ada2                   ONLINE       0     0     0
    ada3                   ONLINE       0     0     0
    ada5                   ONLINE       0     0     0
    replacing-5            UNAVAIL      0     0     0
      9875896178717210589  UNAVAIL      0     0     0  was /dev/ada6/old
      ada6                 ONLINE       0     0     0  (resilvering)

Testing Bit Rot

One of ZFS’ most compelling features is detection of bit rot. Though rare, rot is almost impossible to detect until well after the event occurs and rotating backups have promulgated the error.

One ends up with photos where the top half is fine but the bottom is semi-random junk. Or glitches in home movies.

ZFS checksums everything – your data, its own data – and verifies the data on disk matches the checksum every access.

If you have mirroring or RAIDZ, then not only can ZFS tell you about the error, it can pull the correct data from a good disk and overwrite the bad data.

Bit rot, or data degradation, happens on hard drives via humidity, cosmic rays (we are bathed in the light of 150 million stars, one of them quite close), and loss of magnetism. Hard drives have a limited life time and individual disk sectors can go bad before the entire drive dies. Manufacturer specifications give error rates for their drives. This might be one in several terabytes, but nowadays terabyte drives are common; multiply the rate by your amount of data, give it enough time, and sooner or later you will experience it. ZFS to the rescue!

The Set Up

Let’s create a zpool, RAIDZ1, from 3 files.

truncate -s 200M file1
truncate -s 200M file2
truncate -s 200M file3
sudo zpool create mypool raidz1 \
  /home/graham/Documents/test/file1 \
  /home/graham/Documents/test/file2 \
  /home/graham/Documents/test/file3
sudo chown -R graham:graham /mypool

Full paths to the files must be given to zpool create, and we need to grant ownership to the pool so we can create files in it.

We can check the zpool is healthy:

$ sudo zpool status 
  pool: mypool
 state: ONLINE
  scan: none requested
config:

	NAME                                   STATE     READ WRITE CKSUM
	mypool                                 ONLINE       0     0     0
	  raidz1-0                             ONLINE       0     0     0
	    /home/graham/Documents/test/file1  ONLINE       0     0     0
	    /home/graham/Documents/test/file2  ONLINE       0     0     0
	    /home/graham/Documents/test/file3  ONLINE       0     0     0

errors: No known data errors

The Test

Let’s create a small text file:

echo abcdefghijk > /mypool/a.txt

That gives us text that’s easily found in the pool’s data files:

$ grep abcde file*
Binary file file1 matches
Binary file file3 matches

ZFS is storing the data in 2 of the 3 storage devices, because we’re using RAIDZ1, which offers one mirror for all data.

Cosmic Ray Simulation!

We can flip some bits if we first offline one of the drives:

sudo zpool offline mypool /home/graham/Documents/test/file1

Now load up file1 in wxHexEditor, press Ctrl-F, and search for abcd.

Before Edit

Type over abcd to change it to anything you like.

After edit

You’ll need to change the file mode to “Overwrite”, so you can then File->Save.
Menu

Once saved, file1 no longer contains the correct data:

$ grep abcde file*
Binary file file3 matches

Online file1 again, so the zpool is whole:

sudo zpool online mypool /home/graham/Documents/test/file1

Results – Catastrophe?

ZFS hasn’t accessed any data yet, so doesn’t yet realize the error:

$ sudo zpool status   pool: mypool
 state: ONLINE
  scan: resilvered 2.50K in 0h0m with 0 errors on Sun May 24 10:30:30 2015
config:

	NAME                                   STATE     READ WRITE CKSUM
	mypool                                 ONLINE       0     0     0
	  raidz1-0                             ONLINE       0     0     0
	    /home/graham/Documents/test/file1  ONLINE       0     0     0
	    /home/graham/Documents/test/file2  ONLINE       0     0     0
	    /home/graham/Documents/test/file3  ONLINE       0     0     0

errors: No known data errors

Though we introduced a data error, ZFS gives good data:

$ cat /mypool/a.txt 
abcdefghijk

No sign of any error! In my case, a status report still shows no errors, perhaps because the data came from a cache or because it read the data from file3, which we didn’t change.
We can force ZFS to check integrity with the scrub command:

$ sudo zpool scrub mypool 
$ sudo zpool status 
  pool: mypool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 512 in 0h0m with 0 errors on Sun May 24 10:34:42 2015
config:

	NAME                                   STATE     READ WRITE CKSUM
	mypool                                 ONLINE       0     0     0
	  raidz1-0                             ONLINE       0     0     0
	    /home/graham/Documents/test/file1  ONLINE       0     0     1
	    /home/graham/Documents/test/file2  ONLINE       0     0     0
	    /home/graham/Documents/test/file3  ONLINE       0     0     0

errors: No known data errors

ZFS has certainly sat up and noticed now!

Note the changed status and the ‘1’ in the ‘CKSUM’ column for our cosmically zapped file1. ZFS wrote good data over bad (“scrub repaired”):

$ grep abcde file*
Binary file file1 matches
Binary file file3 matches

“Applications are unaffected.” Music to my ears – thanks to mirroring we had redundancy and ZFS was able to recover our data.

We can safely clear the error. If these were real disks, we would now decide whether to replace the potentially failing drive.

Catastrophe avoided!

Experimenting with ZFS Failures

While waiting for all the drives to arrive, I built a 3-disk RAIDZ1 configuration to perform tests on. Each of the 3 drives has a capacity of 3TB. RAIDZ1 means one of the disks is used for redundancy; instead of 9TB of storage space, there’s only 6TB. For that capacity loss we gain resilience to any one of the drives failing. If a drive were to fail, we could simply replace it and ZFS would continue like nothing had happened! Let’s try some experiments and see how that works.

Working Configuration

These shots from NAS4Free show the three disks, configured into one Virtual Device (vdev), inside one pool; the pool has one dataset.

Three disks:

 

Inside one pool:

 

The pool has one “virtual device”, a RAID-Z1.

 

The disks are bound into one dataset in the pool.

To begin with, the pool is ‘ONLINE’ and all three disks are working fine.

Unplugging cables

Let’s simulate a drive going bad; what happens if we unplug the SATA cable from one of the drives?

 

The pool is now DEGRADED and one of the drives is marked UNAVAIL. Uh oh! ZFS tells us to use zpool online to bring the drive back.

Even in this degraded state, I’m able to access my data – in fact our home folder (~) is located on this dataset, and operating perfectly. Even with only 2 out of 3 drives running. (If we were to lose a second drive, our data would be inaccessible).
Let’s re-attach the 3rd drive’s SATA cable and tell ZFS to online the drive.

It all works great; I wonder why it says it “resilvered 68K”. There are several MB of data in the pool. 68K is perhaps just some metadata.

Moving cables

How about we unplug a SATA cable from the motherboard and connect it to a different SATA port?

ZFS didn’t blink an eye; we didn’t even have to online the drive.

Let’s pretend the entire motherboard needed replacing, and we forgot which drives were plugged in where. I shutdown; unplug all the SATA connectors and re-connect them in different connectors. Power on and what happens?

Switching connections around made no difference at all! If that had been a real motherboard replacement I would have had not had to worry where the drives were connected.

New boot drive

Things get more interesting here. I’ve been booting this box off a 16GB USB flash drive. What if this drive went bad?

The boot drive contains the ZFS configuration; losing that means the fresh NAS4Free installation will need to discover what state ZFS and the drives were in. NAS4Free does recommend backing up your configuration, but let’s say you forgot to…

I installed NAS4Free on a new 8GB flash drive. I told it to configure the network card (option 2 on the main menu) and then visited the displayed IP address from my Mac’s browser.

Uh oh. Nothing here in ZFS-land! No pool, no disks!

Here’s the configuration screen, with no pool but a useful button labeled “Import on-disk ZFS config”. Later we can see what ZFS commands the button runs.

After clicking that – look what happened!

The ZFS pool, vdev, and dataset are back! While the Pools and Datasets web pages still show nothing, we can fix that, too – read on.
At this point I enabled SSH access so I could have a poke around and try to access some data. I enabled root SSH access, and was able to navigate to the ZFS dataset directory!

zpool status shows the pool as ONLINE.

It’s educational to run zfs history, which shows all the commands that were used to create the pool and also what command was executed when we imported after booting from the fresh USB drive:

Though NAS4Free’s WebGUI showed no pools or datasets, I was able to fix that using the ZFS -> Configuration page, which has a “Synchronize” button. After using that the rest of the WebGUI shows the pool and dataset correctly. This also fixed the Disks -> Management page, which had been showing no disks. As far as I can tell, that puts everything back the way it was (as far as ZFS goes – you may have had SAMBA shares etc too; so remember to backup your NAS4Free configuration!)

Summary

I simulated loss of a hard drive, loss of motherboard, and loss of boot USB drive. These simulations of failures turned out to all be recoverable situations! No data was lost at any step, which is great news for anyone with data they want to keep safe.

Note that even ZFS is not a substitute for backups – preferably off site, e.g. at a family member’s house or in a bank vault. An errant script or accidental manual file deletion means that ZFS will safely replicate that deletion across its RAID. ZFS snapshots could help here, but even so, your box is still vulnerable to flood, a lightning strike, power surge, or brown out which could damage one or all of the hardware components.
So far though I’m very pleased that ZFS, FreeBSD, and NAS4Free have lived up to their claims and provided a safe haven for my data!

I’ll be adding three more drives and setting up RAIDZ2. This will allow data access even with two drives gone. Research has shown that RAIDZ1 is not as safe an option as you might think – once one drive goes bad the odds of a second following it shoot up, and may not give you time to resilver a replacement disk.

References:

http://www.nas4free.org

General overview and advice for ZFS, Freenas, and configuration: https://forums.freenas.org/index.php?threads/slideshow-explaining-vdev-zpool-zil-and-l2arc-for-noobs.7775/