Command to identify a specific physical disk in a server with many disks

hard-diskhardware

I have a server containing 10 hard disks. Device /dev/sdh is reporting uncorrectable read errors on btrfs scrub. How can I determine which physical disk corresponds to /dev/sdh?

I know I can get the disks' model numbers and serial numbers with hdparm -I /dev/sd? and I can get mountpoints with findmnt or lsblk. However, I am not finding a way to connect /dev/sdh to a hard disk by serial number, which is what I need.

Best Answer

lsscsi

On servers where I have a lot of HDDs I've traditionally used lsscsi to determine which HDD is plugged into which port.

You can use this output to get the names + the device & generic device names:

$ lsscsi -g
[0:0:0:0]    disk    ATA      Hitachi HDT72101 A3AA  /dev/sda   /dev/sg0
[2:0:0:0]    disk    ATA      Hitachi HDS72101 A39C  /dev/sdb   /dev/sg1
[4:0:0:0]    disk    ATA      Maxtor 6L200P0   1G20  /dev/sdc   /dev/sg2
[12:0:0:0]   disk    WD       My Passport 25E2 4005  /dev/sde   /dev/sg5
[12:0:0:1]   enclosu WD       SES Device       4005  -         /dev/sg6

And use this to get the list of ports on your MB that correspond to the above devices:

$ lsscsi -H
[0]    ahci
[1]    ahci
[2]    ahci
[3]    ahci
[4]    pata_atiixp
[5]    pata_atiixp
[12]    usb-storage

You can also use the verbose output instead:

$ lsscsi --verbose
[0:0:0:0]    disk    ATA      Hitachi HDT72101 A3AA  /dev/sda
  dir: /sys/bus/scsi/devices/0:0:0:0  [/sys/devices/pci0000:00/0000:00:11.0/host0/target0:0:0/0:0:0:0]
[2:0:0:0]    disk    ATA      Hitachi HDS72101 A39C  /dev/sdb
  dir: /sys/bus/scsi/devices/2:0:0:0  [/sys/devices/pci0000:00/0000:00:11.0/host2/target2:0:0/2:0:0:0]
[4:0:0:0]    disk    ATA      Maxtor 6L200P0   1G20  /dev/sdc
  dir: /sys/bus/scsi/devices/4:0:0:0  [/sys/devices/pci0000:00/0000:00:14.1/host4/target4:0:0/4:0:0:0]
[12:0:0:0]   disk    WD       My Passport 25E2 4005  /dev/sde
  dir: /sys/bus/scsi/devices/12:0:0:0  [/sys/devices/pci0000:00/0000:00:13.2/usb2/2-3/2-3:1.0/host12/target12:0:0/12:0:0:0]
[12:0:0:1]   enclosu WD       SES Device       4005  -
  dir: /sys/bus/scsi/devices/12:0:0:1  [/sys/devices/pci0000:00/0000:00:13.2/usb2/2-3/2-3:1.0/host12/target12:0:0/12:0:0:1]

NOTE: The port that it's plugged into is the first digit in this block, [0] vs. [4] in the lsscsi -H output, for example.

lshw

I've also been able to use lshw for this because it tells you which ports etc. a particular HDD is plugged into so it's easier to figure out which one is which in a system that has multiples. Below you can see /dev/sda along with its serial number:

$ lshw -c disk -c storage
  *-storage
       description: SATA controller
       product: SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 11
       bus info: pci@0000:00:11.0
       logical name: scsi0
       logical name: scsi2
       version: 00
       width: 32 bits
       clock: 66MHz
       capabilities: storage pm ahci_1.0 bus_master cap_list emulated
       configuration: driver=ahci latency=64
       resources: irq:22 ioport:c000(size=8) ioport:b000(size=4) ioport:a000(size=8) ioport:9000(size=4) ioport:8000(size=16) memory:fbbff800-fbbffbff
     *-disk:0
          description: ATA Disk
          product: Hitachi HDT72101
          vendor: Hitachi
          physical id: 0
          bus info: scsi@0:0.0.0
          logical name: /dev/sda
          version: A3AA
          serial: STF604MH0AD4PB
          size: 931GiB (1TB)
          capabilities: partitioned partitioned:dos
          configuration: ansiversion=5 logicalsectorsize=512 sectorsize=512 signature=0005edc1

You can figure out which is which based on the coordinates of their respective bus info & physical id.

smartctl

The other method I've used in the past is smartctl. You can query each device independently to find out it's serial number, make & model and figure out which device it is once you open up the case.

$ smartctl -i /dev/sda
smartctl 5.43 2016-09-28 r4347 [x86_64-linux-2.6.32-642.6.2.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 7K1000.B
Device Model:     Hitachi HDT721010SLA360
Serial Number:    STF604MH0AD4PB
LU WWN Device Id: 5 000cca 349c4b953
Firmware Version: ST6OA3AA
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Aug  2 21:11:01 2018 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

ledctl/ledmon

On higher end rackmounted servers you can use ledctl to light up the LED for a given HDD through its /dev/ device name.

ledctl usage

# ledctl locate=/dev/rssda will blink drive LED
# ledctl locate={ /dev/rssda /dev/rssdb } will blink both drive LEDs
# ledctl locate_off=/dev/rssda will turn off the locate LED

References

Related Solutions

LVM not coming up after reboot, couldn’t find device with uuid

If I understood correctly, you have already fixed the volume, even though you have a lost+found directory which may or may not have critical files.

What is going on now that's blocking the VM from booting? It still can't find the boot device?

Your fdisk -l output seems a bit off to me. Have you considered the possibility that only the partition table was damaged? In this scenario, your snapshot may be helpful, and in the best case you won't even need a(nother) fsck. But we'll need something to try to find the partition offsets - I've used testdisk successfully more than once.

In the worst case scenario, if you need to scrape anything from the volume, forensic tools like PhotoRec or Autopsy/The Sleuth Kit may prove useful.

If none of this works, give us a lsblk -o NAME,RM,SIZE,RO,TYPE,MAJ:MIN -fat too (these flags are just to show as much information as possible), and relevant dmesg output, if any.

mdadm – Rebuilding IMSM RAID-0 Array from Disk Images Using mdadm

Looking at the partition table for /dev/loop0 and the disk image sizes reported for /dev/loop0 and /dev/loop1, I'm inclined to suggest that the two disks were simply bolted together and then the partition table was built for the resulting virtual disk:

Disk /dev/loop0: 298.1 GiB, 320072933376 bytes, 625142448 sectors

Device       Boot   Start        End    Sectors   Size Id Type
/dev/loop0p1 *       2048    4196351    4194304     2G  7 HPFS/NTFS/exFAT
/dev/loop0p2      4196352 1250273279 1246076928 594.2G  7 HPFS/NTFS/exFAT

and

Disk /dev/loop1: 298.1 GiB, 320072933376 bytes, 625142448 sectors

If we take the two disks at 298.1 GiB and 298.1 GiB we get 596.2 GiB total. If we then take the sizes of the two partitions 2G + 594.2G we also get 596.2 GiB. (This assumes the "G" indicates GiB.)

You have already warned that you cannot get mdadm to recognise the superblock information, so purely on the basis of the disk partition labels I would attempt to build the array like this:

mdadm --build /dev/md0 --raid-devices=2 --level=0 --chunk=128 /dev/loop0 /dev/loop1
cat /proc/mdstat

I have a chunk size of 128KiB to match the chunk size described by the metadata still present on the disks.

If that works you can then proceed to access the partition in the resulting RAID0.

ld=$(losetup --show --find --offset=$((4196352*512)) /dev/md0)
echo loop device is $ld
mkdir -p /mnt/dsk
mount -t ntfs -o ro $ld /mnt/dsk

We already have a couple of loop devices in use, so I've avoided assuming the name of the next free loop device and instead asked the losetup command to tell me the one it's used; this is put into $ld. The offset of 4196532 sectors (each of 512 bytes) corresponds to the offset into the image of the second partition. We could equally have omitted the offset from the losetup command and added it to the mount options.