The code comment within drivers/nvme/host/core.c
in Linux kernel source seems to explain it best:
static int nvme_configure_apst(struct nvme_ctrl *ctrl)
{
/*
* APST (Autonomous Power State Transition) lets us program a
* table of power state transitions that the controller will
* perform automatically. We configure it with a simple
* heuristic: we are willing to spend at most 2% of the time
* transitioning between power states. Therefore, when running
* in any given state, we will enter the next lower-power
* non-operational state after waiting 50 * (enlat + exlat)
* microseconds, as long as that state's exit latency is under
* the requested maximum latency.
*
* We will not autonomously enter any non-operational state for
* which the total latency exceeds ps_max_latency_us. Users
* can set ps_max_latency_us to zero to turn off APST.
*/
So, APST is a feature that allows the NVMe controller (within the NVMe SSD) to switch between power management states autonomously, following configurable rules. The NVMe controller specifies how many microseconds it needs to enter and exit each power-save state; the kernel uses this information to configure the state transition rules within the NVMe controller.
- What and where is the specific flaw causing the problem?
It looks like this particular Kingston NVMe SSD is either way too optimistic in its wake-up time estimates, or fails to wake up at all (without fully resetting the controller) after entering a deep enough power saving state. When given the permission to use APST, it apparently goes into some power saving state and then fails to return to operational state within the specified time, which makes the kernel unhappy.
- What does the workaround change to prevent the presentation of the flaw?
It tells the maximum allowed time for waking up from APST power management states is exactly 0 microseconds, which causes the APST feature to be disabled.
- What functionality or other desired effect is lost due to such a workaround?
If the NVMe controller's autonomous power management feature cannot be used, the controller will only be allowed to enter power-saving states when specifically requested by the kernel. This means the power savings most likely won't be as great as with APST in use.
- And especially, what is required to be fixed, the kernel, the storage-media firmware, the system firmware (i.e. UEFI/BIOS), or some other component, for users to experience a proper a resolution?
The optimal fix would be for Kingston to provide a NVMe disk firmware update that either makes the APST power management work correctly, or at minimum, makes the drive not promise something it cannot deliver, i.e. not announce APST modes with overly-optimistic transition times, and/or not announce at all any APST modes that will cause the controller to fail if used.
If it turns out the problem can be avoided by e.g. programming APST to avoid the deepest power-saving state completely, it might be possible to create a more specific kernel-level workaround. Many device drivers in the Linux kernel have "quirk tables" specifying workarounds for specific hardware models. In the case of NVMe, you can find one in drivers/nvme/host/pci.c
within Linux kernel source:
static const struct pci_device_id nvme_id_table[] = {
{ PCI_VDEVICE(INTEL, 0x0953), /* Intel 750/P3500/P3600/P3700 */
.driver_data = NVME_QUIRK_STRIPE_SIZE |
NVME_QUIRK_DEALLOCATE_ZEROES, },
{ PCI_VDEVICE(INTEL, 0x0a53), /* Intel P3520 */
.driver_data = NVME_QUIRK_STRIPE_SIZE |
NVME_QUIRK_DEALLOCATE_ZEROES, },
{ PCI_VDEVICE(INTEL, 0x0a54), /* Intel P4500/P4600 */
.driver_data = NVME_QUIRK_STRIPE_SIZE |
NVME_QUIRK_DEALLOCATE_ZEROES, },
{ PCI_VDEVICE(INTEL, 0x0a55), /* Dell Express Flash P4600 */
.driver_data = NVME_QUIRK_STRIPE_SIZE |
NVME_QUIRK_DEALLOCATE_ZEROES, },
{ PCI_VDEVICE(INTEL, 0xf1a5), /* Intel 600P/P3100 */
.driver_data = NVME_QUIRK_NO_DEEPEST_PS |
NVME_QUIRK_MEDIUM_PRIO_SQ |
NVME_QUIRK_NO_TEMP_THRESH_CHANGE |
NVME_QUIRK_DISABLE_WRITE_ZEROES, },
[...]
Here the various NVME_QUIRK_
settings trigger various pieces of workaround code within the driver.
Note that there already exists a quirk setting named NVME_QUIRK_NO_DEEPEST_PS
which prevents state transitions to the deepest power management state. If the APST problem of your Kingston NVMe turns out to have the same workaround as already implemented for Intel 600P/P3100 and ADATA SX8200PNP, then all it would take is writing a new quirk table entry like this (replacing the things within <angle brackets>
with appropriate values, you can get them with lspci -nn
):
{ PCI_DEVICE(<PCI vendor ID>, <PCI product ID of the SSD>), /* <specify make/model of SSD here> */
.driver_data = NVME_QUIRK_NO_DEEPEST_PS, },
and recompiling the kernel with this modification.
Obviously, someone who actually has this exact SSD model is needed to test this. If you happen to be familiar with C programming basics and how to compile custom kernels, this could be your chance to get your name to the long list of Linux kernel contributors! If you are interested, you should probably read kernelnewbies.org for more details.
The kernel programming is not always deeply intricate: there are lot of simple parts that just need a person with the right kind of hardware and some basic programming knowledge. I've submitted a few minor patches just like this.
If setting the NVME_QUIRK_NO_DEEPEST_PS
turns out not to fix the problem, then implementing a new quirk might be needed. That could be more complicated, and might require some experimentation or ideally information from Kingston to find out what exactly needs to be done to avoid this problem, and perhaps discussion with the Linux NVMe driver maintainer on the best way to implement it.
Best Answer
I just recently added SX8200 to my existing system (which is installed on a SATA SSD) and the new drive wasn't recognized correctly. I have a very similar setup on a X399 Taichi and saw the same errors in dmesg. I'm on Ubuntu 18.04.1 with kernel 4.15.0-36-generic.
I don't know if there's a proper fix out there, but I was able to get past some of the errors with the same workaround that was used for some Samsung drives. You can try adding the following parameter to the kernel boot command line:
nvme_core.default_ps_max_latency_us=0
As I understand it, this will disable APST, which is a power saving feature. I didn't experiment with a larger value, maybe you can avoid the error and still get some efficiency in, but I haven't played with it yet.
After this workaround I still get the other errors, but the APST one is gone and the drive seems to work. I was able to mount it and read files (I formatted it to NTFS in Windows before).