Search:  
Gentoo Wiki

Smartmontools


This article is part of the HOWTO series.
Installation Kernel & Hardware Networks Portage Software System X Server Gaming Non-x86 Emulators Misc

Contents

Introduction

Aim of this howto is to exploit SMART technology (nowadays every hard disk has got it) to check if it is ok or not. SMART-enabled hard disks are able to continuously monitor their own health and alert the user if any anomaly is detected, and most of them are also able to carry out specific tests for better analysis.

Warning: An important thing before going on: always backup your important data, regardless of what SMART says! Even though SMART is very reliable, sometimes it may be wrong; also, hard disks often die in an unexpected way and even if SMART has told you something was wrong you may not have enough time to put your data in a safe place.

Installation Procedure

First of all make sure SMART is enabled in the BIOS. For example, in my BIOS I have this:

Code: BIOS
S.M.A.R.T. for Hard Disk: Enabled

Some BIOSes don't have the option, and report S.M.A.R.T. as disabled, but don't worry, smartctl can enable it (see below). Carefully read the SMART instructions for your motherboard. Sometimes this option maybe intentionally hidden, as shown in this example.

Now let's install the smartmontools package:

# emerge -av smartmontools

Finally, you have to check if your hard disk(s) support SMART:

# smartctl -i /dev/hda

For SATA drives:

# smartctl -i -d ata /dev/sda

To enable SMART on IDE drives:

# smartctl -s on /dev/hda

To enable SMART on SATA drives:

# smartctl -s on -d ata /dev/sda

Using smartctl

SMART Health Status

Let's check the SMART Health Status:

# smartctl -H /dev/hda

If you read PASSED it's ok, but if you read FAILED you have to backup your data now: the disk has already failed or it's predicted to fail within 24 hours!

Smart Error Log

Now let's check the SMART Error Log (it's a list of errors detected by SMART during the disk's life):

# smartctl -l error /dev/hda

If we read No Errors Logged it's ok. If there are a few errors (and they are not so recent) you don't have to worry too much. If there are a lot of errors it's better if you backup your data as soon as you can.

Reading the SMART Health Status and the SMART Error Log is not enough: you really should do some other specific tests.

SMART Testing

These tests don't interfere with the normal functioning of the disk and they can be carried out when you want. I'll only describe here how to launch them and read their reports; if you want to learn more go here and/or read the man page.

First you should know which tests are supported by your drive:

# smartctl -c /dev/hda

In this way you can also know how much time each of them require.

Now let's execute the SMART Immediate Offline Test (if supported, of course):

# smartctl -t offline /dev/hda

You only have to wait (smartctl will show you how long). When it finishes, you should check the SMART Error Log again for the report.

If you need to check multiple disks, you could use a small script like this, which will dump the relevant smart logs in appropriately named files, after all the tests have completed. Run the script with the test type(s) you want as it's arguments. If you do not understand what this script does, do not use it.

Code: smart.sh
#!/bin/sh

# Script by Meliorator. irc://irc.freenode.net/Meliorator

[ ! "$@" ] && echo "Usage: $0 type [type] [type]"

[ ! -e smart-logs ] && mkdir smart-logs
[ ! -d smart-logs ] && Can not create smart-logs dir && exit 1

a=0

for t in "$@"; do

        case "$t" in
                offline) s=1 && l=error;;
                short|long) s=60 && l=selftest;;
                *) echo $t is an unrecognised test type. Skipping... && continue
        esac

        for hd in /dev/hd*[^0-9]; do
                r=$(( $(smartctl -t $t $hd | grep 'Please wait' | awk '{print $3}') * $s ))
                echo Check $hd - $t test in $(( $r / 60 )) minute\(s\)
                [ $r -gt $a ] && a=$r
        done

        echo Waiting $(( $a / 60 )) minute\(s\) for all tests to complete
        sleep $a

        for hd in /dev/hd*[^0-9]; do
                smartctl -l $l $hd 2>&1 >> smart-logs/smart-${t}-${hd##*/}.log
        done

done

for i in {1..10}; do
        sleep .01
        echo -n -e \\a
done

echo "All tests have completed"

Don't forget to make the script executable.

# chmod +x smart.sh

Now let's carry out the SMART Short Self Test or the SMART Extended Self Test (again, only if they are supported by your drive). They are similar, but the second one is more accurate then the first:

# smartctl -t short /dev/hda
# smartctl -t long /dev/hda

Then check the SMART Self Test Error Log:

# smartctl -l selftest /dev/hda

Now let's execute the SMART Conveyance Self Test:

# smartctl -t conveyance /dev/hda

Then check the SMART Self Test Error Log again:

# smartctl -l selftest /dev/hda

Automatically monitor your drive(s)

If you want to automatically monitor your drive(s) you have to configure the smartd daemon and make it be launched at boot.

If you use SATA or SCSI drives, the drive devices may move around during boot, so you should not use /dev/sd? to find your drives. The kernel assigns these names as it sees fit, so there's no guarantee that /dev/sda will always refer to the same physical device.

You should use the symlinks in /dev/disk/by-id/:

lrwxrwxrwx 1 root root  9 2008-02-20 17:16 scsi-SDNS0P6B00FED -> ../../sdb
lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-SDNS0P6B00FED-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  9 2008-02-20 17:16 scsi-SDNS0P6B00FTH -> ../../sda
lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-SDNS0P6B00FTH-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 2008-02-20 17:16 scsi-S_5QHZ0BRZ -> ../../sdc
lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-S_5QHZ0BRZ-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 2008-02-20 17:16 scsi-S_5QH02EWC -> ../../sdf
lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-S_5QH02EWC-part1 -> ../../sdf1
lrwxrwxrwx 1 root root  9 2008-02-20 17:16 scsi-S_5QH02EX3 -> ../../sdh
lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-S_5QH02EX3-part1 -> ../../sdh1
lrwxrwxrwx 1 root root  9 2008-02-20 17:16 scsi-S_5QH02EYT -> ../../sdd
lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-S_5QH02EYT-part1 -> ../../sdd1
lrwxrwxrwx 1 root root  9 2008-02-20 17:16 scsi-S_9QG4MSPC -> ../../sdg
lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-S_9QG4MSPC-part1 -> ../../sdg1
lrwxrwxrwx 1 root root  9 2008-02-20 17:16 scsi-S_9QG56Q28 -> ../../sdi
lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-S_9QG56Q28-part1 -> ../../sdi1
lrwxrwxrwx 1 root root 10 2008-02-20 17:16 usb-PLEXTOR_CORPORATION._PLEXTOR_USB2.0-ATA.ATAPI_Bridge_000000002BDC -> ../../scd0
lrwxrwxrwx 1 root root  9 2008-02-20 17:16 usb-ST316002_1A_200509223316-0:0 -> ../../sde
lrwxrwxrwx 1 root root 10 2008-02-20 17:16 usb-ST316002_1A_200509223316-0:0-part1 -> ../../sde1

My system has 8 drives, 2 SCSI and 6 SATA, as well as a USB DVD player and a USB drive. The symlinks let me set up smard.conf so that the physical hardwired devices are addressed.

Here I'll show you how to:

Smartd daemon's configuration file is /etc/smartd.conf (if it doesn't exist you have to create it).

File: /etc/smartd.conf
...
#DEVICESCAN
...
/dev/disk/by-id/scsi-SDNS0P6B00FTH \ 
-H \
-l error -l selftest \
-s (O/../../5/11|L/../../5/13|C/../../5/15) \
-m ThisIsNotUsed -M exec /usr/local/bin/smartd.sh

This is the content of the script:

File: /usr/local/bin/smartd.sh
#!/bin/bash
LOGFILE="/var/log/smartd.log"
echo -e "$(date)\n$SMARTD_MESSAGE\n" >> "$LOGFILE"
shutdown -h now

Obviously, make the script executable:

# chmod +x /usr/local/bin/smartd.sh

The previous one is only an example. Everyone is free to fit it according to his/her own configuration-related needs and preferences. If you want to learn more you can read the man page:

$ man smartd.conf

To test everything you should append -M test to smartd.conf's last line and launch the daemon (note that this will shut down your machine):

# /etc/init.d/smartd start

If something is wrong you can check /var/log/messages:

# tail /var/log/messages

Now remove -M test option and make smartd to be launched at boot:

# rc-update add smartd default

Finished!

Useful links

Original thread:

Others:

Retrieved from "http://www.gentoo-wiki.info/Smartmontools"

Last modified: Sat, 30 Aug 2008 22:58:00 +0000 Hits: 84,005