SMART: Disks maintenance

Table of Contents

Configuration
Current_Pending_Sector
ZFS mirror and
Offline_Uncorrectable

Current_Pending_Sector

by ross at 10:50:46 on November 18, 2012

If you enabled smartd daemon it's possible that one day you will receive and email like this:

This email was generated by the smartd daemon running on:
   host name: coffin.local
  DNS domain: local
  NIS domain:

The following warning/error was logged by the smartd daemon:
Device: /dev/ad4, 1 Currently unreadable (pending) sectors

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.

Check the SMART values:

# smartctl -A /dev/ad4
smartctl 5.41 2011-06-09 r3365 [FreeBSD 8.2-STABLE amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       17
  3 Spin_Up_Time            0x0003   172   165   021    Pre-fail  Always       -       2383
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1538
  5 Reallocated_Sector_Ct   0x0033   190   190   140    Pre-fail  Always       -       75
  7 Seek_Error_Rate         0x000f   200   200   051    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   057   057   000    Old_age   Always       -       32073
 10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1509
194 Temperature_Celsius     0x0022   103   080   000    Old_age   Always       -       40
196 Reallocated_Event_Count 0x0032   195   195   000    Old_age   Always       -       5
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   168   000    Old_age   Always       -       230
200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  Offline      -       0

We see that there is one Current Pending Sector. Run the self test:

# smartctl -t long /dev/ad4
smartctl 5.41 2011-06-09 r3365 [FreeBSD 8.2-STABLE amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 35 minutes for test to complete.
Test will complete after Fri Jul 29 08:53:29 2011

Use smartctl -X to abort test.

Wait until test is completed or until you get email stating that self-test failed ("Self-Test Log error count increased from 0 to 1") and run:

# smartctl -l selftest /dev/ad4
smartctl 5.41 2011-06-09 r3365 [FreeBSD 8.2-STABLE amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     32074         37831812
# 2  Extended offline    Completed without error       00%     25300         -

We see that LBA of the sector in question is 37831812.

First, disable GEOM protection and try to refresh the sector as reallocating bad sectors only happens on write:

# sysctl kern.geom.debugflags=16
kern.geom.debugflags: 0 -> 16
# dd if=/dev/ad4 of=/dev/ad4 bs=512 count=1 iseek=37821812 oseek=37821812 conv=noerror,sync
1+0 records in
1+0 records out
512 bytes transferred in 0.000346 secs (1478983 bytes/sec)

Check the SMART values:

# smartctl -A /dev/ad4
smartctl 5.41 2011-06-09 r3365 [FreeBSD 8.2-STABLE amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       17
  3 Spin_Up_Time            0x0003   172   165   021    Pre-fail  Always       -       2383
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1538
  5 Reallocated_Sector_Ct   0x0033   190   190   140    Pre-fail  Always       -       75
  7 Seek_Error_Rate         0x000f   200   200   051    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   057   057   000    Old_age   Always       -       32073
 10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1509
194 Temperature_Celsius     0x0022   108   080   000    Old_age   Always       -       35
196 Reallocated_Event_Count 0x0032   195   195   000    Old_age   Always       -       5
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   168   000    Old_age   Always       -       230
200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  Offline      -       0

We see that there is no more Current Pending Sectors and Reallocated Sector Count didn't increase. So we have just cured the sector.

Run the self test again, if you get another read error repeat the dd command for the next reported sector. If you get "Status: Completed without error", then you are done.

Comments