SATA vs SCSI – How SATA ‘Talks’ SCSI and Shared Features

hard-disklinux-kernelProtocols

This is nothing new to me at least, that SATA actually "talks" SCSI, hence why these SATA devices show up as SCSI devices in Linux.

A related question has been asked before, e.g. Why do my SATA devices show up under /proc/scsi/scsi?

However what fails to be mentioned where I've seen this discussed before is exactly in what sense SATA relates to SCSI, and how they differ.

I assume it is taken for granted that they differ on the physical layer, as they do not share compatible cables.

However what about higher up on the stack? I am aware of how Linux represents SATA and even IDE disks on modern kernels as just SCSI to the SCSI subsystem. But what about the actual protocol that is used on the bus?

I also know that ATAPI is an encapsulation for SCSI, but what about regular ATA? I've noticed that features from SCSI such as NCQ, FUA, DPO, etc (if I don't remember incorrectly) have been adopted from SCSI. But it is unclear how "much" of the SCSI command set is actually shared or similar.

Do modern SATA devices with their ATA specification implement a subset of the SCSI command set, but encapsulated (as in ATAPI)? An identical set? A superset? Or perhaps only selected features are implemented as variants that are not directly identical?

Where can I find some clear information on this, and especially how it relates to the Linux kernel? Some kind of tutorial for driver development would be nice, but even just an overview that doesn't completely skip over all the details would suffice. I am aware I can just read the actual specification, but that is again much too detailed, hard to find what you're really looking for, and just not realistic for me and probably most other users in the temporal sense.

Best Answer

SCSI and ATA are entirely different standards. They are currently both developed under the aegis of the INCITS standards organization but by different groups. SCSI is under technical committee T10, while ATA is under T13.¹

ATA was designed with hard disk drives in mind, only. SCSI is both broader and older, being a standard way of controlling mass storage devices, tape drives, removable optical media drives (CD, DVD, Blu-Ray...), scanners, and many other device types.

It wasn't obvious in the mid-1980s — when IDE was introduced to the PC world — that SCSI would get pushed to the margins of the computing world. SCSI was well-established and more capable. Unix workstations and Macintosh computers shipped with SCSI hard disk drives for decades. High-end PCs often had a SCSI card for peripherals at least, and often for the system HDD, too. The early CD-ROM and tape drives for personal computers came out in SCSI form first.

The PC industry being what it is, though, there was a push to use the less expensive ATA standard instead of SCSI. The initial compromise was called ATAPI, an extension to ATA that allows a device that understands SCSI internally to receive those SCSI commands over an ATA interface. More on this below.

Several years later, SCSI got the ATA command pass-through feature, basically the inverse of ATAPI, allowing ATA commands over a SCSI bus. One use for this facility is to tunnel ATA SMART commands over SCSI. smartmontools does this, for example.

Later still, the INCITS T10 committee developed a standard called the SCSI/ATA Translation (SAT), which translates SCSI commands to ATA commands and vice versa.² The Linux kernel's libata library provides a SAT implementation for Linux, among other things.

There is some logical overlap in the SCSI and ATA protocols, since they both control hard disk drives. Both obviously need a way to seek to a particular hard drive sector, retrieve that sector's contents, etc. Nevertheless, the command formats are entirely different; otherwise, we wouldn't need these translation and pass-through mechanisms.

SATA actually "talks" SCSI

That is about as true as the assertion that "Cars are pink." Some cars are pink.

ATAPI, ATA pass-through, and SAT are only part of the story. Read on.

I assume it is taken for granted that they differ on the physical layer, as they do not share compatible cables.

That was true in the old parallel SCSI world, but just as SATA replaced PATA, SAS replaced parallel SCSI.

SAS and SATA share the same drive connectors, and they are electrically compatible. A SAS controller can talk to SAS and SATA devices, but a SAS drive cannot work with a SATA-only controller. The difference is in the negotiation, and in the commands you can use after the devices on each end of the cable figure out what they are talking to.

In fact, a lot of "SATA RAID" controllers are really SAS RAID controllers. Such controllers often have one or more SFF-8087 SAS mating connectors on the card, but you can connect SATA drives to them with an SFF-8087 to 4× SATA breakout cable. So, a SAS/SATA RAID card with two SFF-8087 mating connectors controls up to 8 drives.³

Another common situation is a hot-swap drive enclosure or computer case with a SAS backplane. The backplane usually has an SFF-8087 connector on it, allowing use of a simple 8087-to-8087 cable from the backplane to the disk controller. If the drives in the hot-swap trays are SATA, that's of no matter. The SAS controller can talk to them over the SAS cabling, as they sit in drive sleds that plug the drives into the SAS backplane. The drives are still SATA drives, though, speaking the ATA protocol, not SCSI.

I also know that ATAPI is an encapsulation for SCSI

True, but ATAPI is only used for devices other than hard disk drives. The main reason this standard exists is to allow an ATA interface to transport SCSI commands like the streaming data commands for a tape drive, the "eject media" command for an optical disk drive, or the "play track" command for a CD audio disc.

This fact is becoming less relevant as the non-HDD devices that used to speak SCSI over ATAPI disappear or move on to other interfaces. Low-end tape drives no longer exist, so tape drives are all SAS now.⁴ Scanners are pretty much USB-only these days. Optical media drives are moving outside the computer case to be connected via USB, or disappearing entirely, leaving just the increasingly rare internal optical drives speaking ATAPI.

Regardless, a SATA device that understands SCSI over ATAPI is a "SCSI device" only in a limited way. Such devices will not benefit from most of the advantages of SAS over SATA. These capabilities make SAS distinctly valuable compared to SATA, ATAPI notwithstanding.

If you want another car analogy, the fact that I can run my car on an oval race track does not make it a race car.

I've noticed that features from SCSI such as NCQ, FUA, DPO, etc (if I don't remember incorrectly) have been adopted from SCSI. But it is unclear how "much" of the SCSI command set is actually shared or similar.

Mostly this amounts to low-end mimicry. NCQ is not the same thing as TCQ, for example. You will only get a hard drive with TCQ if it is a SAS device. Plug an NCQ-capable SATA drive into a SAS controller, and it doesn't suddenly gain TCQ capability.

That said, a modern SATA device may well be much more capable than a SCSI device from a decade ago. It is certainly going to be capable of much higher levels of I/O.

All of this is confusing and overlapping because that's the nature of the PC hardware world. There aren't clear lines because optical drive manufacturers — just to pick on one sub-industry — really don't want to build two entirely different drives, one speaking SAS to its highest expression, and the other speaking SATA. So, they compromise. They lobby in the committees defining such standards to create a single standard that lets them drop their SATA drive on a SAS bus, and everyone's mostly happy.

Where can I find some clear information on this, and especially how it relates to the Linux kernel?

Ultimately, you want to read the Linux sources. The libATA Developer's Guide should also be helpful.

I'm not aware of any easy summary of how all this works. It wasn't designed to be easy. It was designed to accommodate three decades of hardware evolution, competing standards, and disparate goals. Further, it was designed without magical levels of foresight. In short, it's a mess. The only people who really have to know how the mess works are those building the OS kernels, those designing the hardware, and to a lesser extent, those writing the drivers for the OS kernels. For such a small cadre of highly capable people, standards and working code are sufficient.

Today, Linux calls most rewritable mass-storage devices /dev/sd?. "SD" once stood for "SCSI disk," and existed merely to differentiate from /dev/hd? generically meaning "Hard Disk," but implying PATA in most cases. This distinction is another practical irrelevancy today. Now we have SSDs, USB thumb drives, virtual hard drives, iSCSI devices and more all called /dev/sd?. I suggest you start thinking of "SD" as short for "storage device," rather than worrying about whether the device speaks ATA over SATA, ATA over Ethernet, SCSI over USB, SCSI over ATAPI, SCSI over SAS, SCSI over IP (iSCSI), or what have you.

The core problem is that naming schemes often outlast the reason behind the scheme. You see this in /dev/scd0. The device connected to that /dev node is more likely to be a DVD or Blu-Ray drive than a Compact Disc drive these days.

The alternative — where you name each /dev node after the exact device type that's connected to it — has its own problems. Would it really be better if we named the /dev node after the low-level protocol it used? /dev/atapi0, /dev/sas0, etc? Or maybe you'd prefer /dev/atapibluray0 and such? What about multi-media drives? Does the same driver also need to expose /dev/atapicd0 in case you slide a Compact Disc into the Blu-Ray drive? That just replaces one confusing scheme with another.

Linux's /dev/sd? abstraction is not perfect, but it is useful. For instance, you can learn the fact that /dev/sda is most likely the boot drive without bothering to worry about what cabling, interface protocol, and media are behind that name. If I tell you that a given Linux box has a single system drive, an optical drive, and sometimes has a USB thumb drive plugged into it, you can confidently guess that they are called /dev/sda, /dev/sdb and /dev/sdc, respectively.

Footnotes:

SCSI and ATA didn't start out sharing a parent standards organization. They both started out as proprietary hard disk controllers. SCSI evolved from Shugart Associates' SASI, and ATA/IDE came out of a much later design collaboration between Western Digital, Compaq and CDC.

ANSI later standardized both, with ATA-1 following SCSI-1 about 8 years later.

INCITS is a kind of sister organization to ANSI. INCITS publishes final standards through ANSI in the US, and ISO/IEC JTC 1 worldwide.

The current standard is SAT-3, published in May 2015, with SAT-4 and SAT-5 in progress as I write this in mid-July 2018. The latter link takes you to drafts of the in-progress versions.
I'm ignoring SATA port multipliers, SAS expanders, etc.
Excepting the models made for compatibility with old parallel SCSI systems.

Best Answer

Related Solutions

ATA and SATA Disk Names in Linux – Understanding the Differences

Linux Kernel – How to Ignore a Failing Disk Completely

Related Question