dmesg – How to Interpret Entries in dmesg Logs

dmesgkernellogs

It's interesting to look at the entries in dmesg, but how can I find out what they all mean? I did man dmesg, but I can't find anything about decoding the messages themselves.

I wonder: Is there a way to drill down and find out the meaning and origin of each entry? For example which driver that wrote it (if it was a driver), and what the message means in detail?


Example of dmesg output:

[101466.656676] Read(10): 28 00 00 07 c4 25 00 00 01 00
[101466.656706] end_request: I/O error, dev sr0, sector 2035860
[101466.656722] Buffer I/O error on device sr0, logical block 508965
[101471.444586] sr 1:0:0:0: [sr0] Unhandled sense code
[101471.444607] sr 1:0:0:0: [sr0]  
[101471.444616] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[101471.444627] sr 1:0:0:0: [sr0]  
[101471.444634] Sense Key : Medium Error [current] 
[101471.444649] sr 1:0:0:0: [sr0]  
[101471.444657] Add. Sense: No seek complete
[101471.444668] sr 1:0:0:0: [sr0] CDB: 
[101471.444675] Read(10): 28 00 00 07 c4 24 00 00 01 00
[101471.444705] end_request: I/O error, dev sr0, sector 2035856
[101471.444721] Buffer I/O error on device sr0, logical block 508964

Best Answer

There's no easy way. These messages are intended for kernel developers and experienced system administrators, not for ordinary users. There's no general structure to them (apart from the number in brackets, which is the number of seconds since the kernel booted).

You can look for the message text in the kernel source code. That can provide useful information even if you don't know the C programming language — at least finding in which file the message is can tell you which driver is responsible. Either keep a local copy (most distributions have a package with the source of the kernel, e.g. apt-get install kernel-source-X.XX && cd /usr/src && sudo tar xf linux-source-X.XX.tar.xz on Debian and derivatives), or use an online browser such as LXR at Free Electrons or LXR at linux.no (better search but often down).

When searching, keep in minds that messages do not appear in the source code literally. They are often composed from a template and parameters. For example, the second line comes from the blk_update_request function in block/blk-core.c:

     printk_ratelimited(KERN_ERR "end_request: %s error, dev %s, sector %llu\n",
                        error_type, req->rq_disk ?
                        req->rq_disk->disk_name : "?",
                        (unsigned long long)blk_rq_pos(req));

The first %s in the template is replaced by the value of error_type, the second %s is replaced by req->rq_disk->disk_name (or a ? if this is not set), and the %llu is replaced by the integer returned by blk_rq_pos(req). Given the file the message is in, it concerns a block device. The disk name tells you which device: sr0. If you look at the standard device names, that's “First SCSI CD-ROM” (actually, first optical drive that talks a SCSI-like protocol, including most IDE/SATA and USB drives).

You can continue exploring the messages, but there's an evident pattern here: they're all related to sr. All of them are caused by the same problem: an error reading the DVD, around sector 2035860 (i.e. about 1 GB in — a sector is 512 bytes). The computer was suddenly told that there was no disk present (or an unreadable disk), and tried moving to another sector and reading that one failed as well.

This could be a speck of dust, or a scratched or otherwise damaged drive. Other problems could cause read errors, such as a damaged drive or a bad cable, but those would affect reading all the time, not just a particular area of a particular disk.

Related Question