Linux Tracing – How to Trace DMA

linuxmonitoringtracing

I am working on software that communicates with a PCI card through direct memory access (DMA) transactions. My programs use a suit of drivers and a library that handles the DMA. Everything runs on a Red Hat Linux.

To test and measure the performance of my programs I would like to trace the start and end of the DMA transactions. Now I do this by looking at a couple of functions in the library:

  • dma_from_host and dma_to_host that initiate the transactions by configuring the values in the registers of the card and writing 1 to a register called DMA_DESC_ENABLE
  • dma_wait that waits until the transaction has finished by continuously checking the value of the DMA_DESC_ENABLE register.

But I would like to have a more robust confirmation that a transaction has started and a more precise signal when the transaction has ended. Something from Linux or hardware itself would be the best.

I understand that in principle it is a cumbersome situation. The idea of DMA is that the hardware (the PCI card or the DMA controller on the motherboard) copies things directly into the memory of the process, bypassing the CPU and the OS. But I hope that it does not just copy things into RAM without notifying the CPU somehow. Are there some standard ways to trace these transactions or it is very platform-specific?

Are there some special interrupts that notify the CPU about the start and end of the DMA? I could not spot anything like that in the drivers that I use. But I am not experienced with drivers, so I could have easily looked at wrong places.

Another idea, are there any PMU-like hardware monitors that could provide this information? Something that just counts transactions on PCI lanes?

Also an idea, do I understand right that one could write a custom DMA-tracer as a Linux module or a BPF program that would continuously check the value of that DMA_DESC_ENABLE register? Is this a viable approach? Are there known tracers like that?

Best Answer

Encouraged by the comment from @dirkt, I looked better at the drivers and found the PCI MSI interrupts that correspond to these DMA transactions.

The driver enables these interrupts with a call

pci_enable_msix(.., msixTable,..)

that sets up the struct msix_entry msixTable[MAXMSIX]. Then it asings them to the handler static irqreturn_t irqHandler() by calling request_irq() in a loop:

request_irq(msixTable[interrupt].vector, irqHandler, 0, devName,...)

The handler just counts the interrupts in a local int array. These counters are exported in the /proc/<devName> file that this driver creates for diagnostics etc. In fact, the proc file is from where I started the search for the interrupts.

But there is a better way: the /proc/interrupts file. The enabled MSI-X interrupts show up there in lines like these:

$ cat /proc/interrupts 
            CPU0       CPU1  ...  CPU5       CPU6       CPU7       
  66:          0          0  ...     0          0          0  IR-PCI-MSI-edge      <devName>
  67:          0          0  ...     0          0          0  IR-PCI-MSI-edge      <devName>
  68:         33          0  ...     0          0          0  IR-PCI-MSI-edge      <devName>
  69:          0          0  ...     0          0          0  IR-PCI-MSI-edge      <devName>
  70:          0          0  ...     0          0          0  IR-PCI-MSI-edge      <devName>
  71:          0          0  ...     0          0          0  IR-PCI-MSI-edge      <devName>
  72:          0          0  ...     0          0          0  IR-PCI-MSI-edge      <devName>
  73:          0          0  ...     0          0          0  IR-PCI-MSI-edge      <devName>

And one more way is to find the PCI address of the card in the lspci output and to check the interrupts assigned to the card in the /sys directory:

$ ls  /sys/bus/pci/devices/0000:17:00.0/msi_irqs
66  67  68  69  70  71  72  73

# but these are empty
$ cat  /sys/bus/pci/devices/0000:17:00.0/irq
0

The interrupt number 68 fires up by the end of the transactions. The interrupt handlers have a static tracepoint irq:irq_handler_entry in Linux. The tracepoint parameters in /sys/kernel/debug/tracing/events/irq/irq_handler_entry/format have the interrupt number in the int irq field. Hence, this interrupt can be traced with the standard Linux facilities by this tracepoint with a filter condition:

# setup the ftrace
trace-cmd start -e irq:irq_handler_entry -f "irq == 68"
# for live stream
cat /sys/kernel/debug/tracing/trace_pipe
# or just
trace-cmd stop
trace-cmd show
trace-cmd reset

# with perf
perf record -e "irq:irq_handler_entry" --filter "irq == 68"

With this, one thing that is still worth confirming is that these interrupts are essential to the DMA, to be sure that I monitor something relevant to the system instead of just a handy counter for the proc file that might not be implemented in another situation. But I could not spot that any other relevant interrupts by watching at how they increment in the /proc/interrupts. There are interrupts for the devices dmar[0123] that seem like something about DMA, but they have never incremented. And that is to be expected, as in this case the DMA engine must be implemented as an FPGA core in the PCI card itself.

Related Question