Monday, February 19, 2018

Notes on PCI on s390x

As QEMU 2.12 will finally support PCI devices under s390x/tcg, I thought now is a good time to talk about some of the peculiarities of PCI on the mainframe.

Oddities of PCI on s390x architecture

Oddity #1: No MMIO, but instructions

Everywhere else, you use MMIO when interacting with PCI devices. Not on s390x; you have a set of instructions instead. For example, if you want to read or write memory, you will need to use the PCILG or PCISTG instructions, and for refreshing translations, you will need to use the RPCIT instructions. Fortunately, these instructions can be matched to the primitives in the Linux kernel; unfortunately, all those instructions are privileged, which leads us to

Oddity #2: No user space I/O

As any interaction with PCI devices needs to be done through privileged instructions, Linux user space can't interact with the devices directly; the Linux kernel needs to be involved in every case. This means that there are none of the PCI user space implementations popular on other platforms available on s390x.

Oddity #3: No topology, but FID and UID

Usually, you'll find busses, slots and functions when you identify a certain PCI function. The PCI instructions on s390x, however, don't expose any topology to the caller. This means that an operating system will get a simple list of functions, with a function id (FID) that can be mapped to a physical slot and an UID, which the Linux kernel will map to a domain number. A PCI identifier under Linux on s390x will therefore always be of the form <domain>:00:00.0.

Implications for the QEMU implementation of PCI on s390x

In order to support PCI on s390x in QEMU, some specialties had to be implemented.

Instruction handlers

Under KVM, every PCI instruction intercepts and is routed to user space. QEMU does the heavy lifting of emulating the operations and mapping to generic PCI code. This also implied that PCI under tcg did not work until the instructions had been wired up; this has now finally happened and will be in the 2.12 release.

Modelling and (lack of) topology

QEMU PCI code expects the normal topology present on other platforms. However, this (made-up) topology will be invisible to guests, as the PCI instructions do not relay it. Instead, there is a special "zpci" device with "fid" and "uid" properties that can be linked to a normal PCI device. If no zpci device is specified, QEMU will autogenerate the FID and the UID.

How can I try this out?

If you do not have a real mainframe with a real PCI card, you can use virtio-pci devices as of QEMU 2.12 (or current git as of the time of this writing). If you do have a mainframe and a PCI card, you can use vfio-pci (but not yet via libvirt).
Here's an example of how to specify a virtio-net-pci device for s390x, using tcg:
s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on (...) -device zpci,uid=12,fid=2,target=vpci02,id=zpci2 -device virtio-net-pci,id="vpci02",addr=0x2
Some notes on this:
  • You need to explicitly enable the "zpci" feature in the qemu cpu model. Two other features, "aen" and "ais", are enabled by default ("aen" and "zpci" are mandatory, "ais" is needed for Linux guest kernels prior to 4.15. If you use KVM, the host kernel also needs support for ais.)
  • The zpci device is joined with the PCI device via the "target" property. The virtio-net-pci device does not know anything about zpci devices.
  • Only virtio-pci devices using MSI-X will work on s390x.
In the guest, this device will show up in lspci -v as
000c:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
Subsystem: Red Hat, Inc. Device 0001
Physical Slot: 00000002
 Note how the uid of 12 shows up as domain 000c and the fid of 2 as physical slot 00000002.