SMART: Disks maintenance

ZFS mirror and

by ross at 09:02:52 on November 18, 2012

Instead of hunting down add dd'ing every pending sector like was shown on the previous page you could instruct ZFS to rewrite the entire partition.

Reallocating of the sectors only happens on write so by detaching/reattaching the partition you overwrite every sector there.

# gpart show -l ada1
=>      34  72303773  ada1  GPT  (34G)
        34       128     1  (null)  (64k)
       162   4194304     2  swap0  (2.0G)
   4194466  68109341     3  system0  (32G)
# zpool status system
  pool: system
 state: ONLINE
 scan: scrub repaired 0 in 0h35m with 0 errors on Thu Dec  1 18:05:34 2011
config:

        NAME             STATE     READ WRITE CKSUM
        system           ONLINE       0     0     0
          mirror-0       ONLINE       0     0     0
            gpt/system0  ONLINE       0     0     0
            gpt/system1  ONLINE       0     0     0

errors: No known data errors
# zpool detach system gpt/system0
# zpool attach system gpt/system1 gpt/system0

The order of devices in the command above is important. The first device (gpt/system1 here) is the one that is online, the second one (gpt/system0) is the new one.

# zpool status system
  pool: system
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Fri Dec  2 15:50:43 2011
    1023M scanned out of 22.1G at 6.69M/s, 0h53m to go
    1023M resilvered, 4.51% done
config:

        NAME             STATE     READ WRITE CKSUM
        system           ONLINE       0     0     0
          mirror-0       ONLINE       0     0     0
            gpt/system1  ONLINE       0     0     0
            gpt/system0  ONLINE       0     0     0  (resilvering)

errors: No known data errors

Some time later:

# smartctl -A /dev/ada1
smartctl 5.42 2011-10-20 r3458 [FreeBSD 9.0-RC1 amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   167   166   021    Pre-fail  Always       -       2691
  4 Start_Stop_Count        0x0032   099   099   040    Old_age   Always       -       1740
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   200   200   051    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   067   067   000    Old_age   Always       -       24147
 10 Spin_Retry_Count        0x0012   100   100   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1622
194 Temperature_Celsius     0x0022   096   094   000    Old_age   Always       -       47
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always       -       1
200 Multi_Zone_Error_Rate   0x0008   200   200   051    Old_age   Offline      -       1

So the sector was overwritten and cleared during resilvering of the device. Don't reboot until resilvering is over.

Comments
I had a similar problem but was unable to force the disk to resilver except by wiping the entire drive. I was off-lining and then trying to replace however rather than using detach/attach as you have done.

Is the order special? Does attaching the first vdev (system0) after the second cause the first to be resilvered from the second?
-- Errol
Thursday, February 9, 2012, 14:42:08
Sorry, I don't get it. Order of what?

Order of the devices in this command?
# zpool attach system gpt/system1 gpt/system0

If so, then yes, order is important. The first device is currently part of the mirror and is online. The second one is the new device. ZFS resilvers it immediately. See zpool(8).

No need to take the device offline. Because while the device is offline, no attempt is made to read or write to the device. And you need the faulty sector to be rewritten, as reallocation of pending sectors only happens on write.
-- ross
Friday, February 10, 2012, 4:40:23
Yes, I was asking if that order was important. I thought when I had tried that before zfs just recognized that the recently reattached drive already contained a copy of the mirror and didn't force a resilver.

I'll try this again next time.
-- Errol
Wednesday, February 15, 2012, 4:46:22