We continue our effort to enable IOMMU and as side effect I have to play with various technologies to exercise reliable development environment which base on RTE. In this blog post I would like to present semi-automated technique to debug firmware, Xen and Linux kernel. The goal is to have set of tools that help in enabling various features in Debian-based dom0. We would like: * update Linux kernel which is exposed over HTTP server * update rootfs provided through NFS I will use following components:

  • PC Engines apu2c
  • RTE
  • pxe-server – our dockerized HTTP and NFS server
  • Xen 4.8
  • Linux kernel 4.14.y My workstation environment is QubesOS 4.0 with Debian stretch VMs, but it should not make any difference. I had to workaround one obstacle related to our environment, which is behind VPN, but I also wanted to access outside world in my fw-dev VM. More information about there can be found

here First, I assume that you have working pxe-server and RTE connected to apu2. We will start with automation of Linux kernel deployment since this is crucial while debugging. Initially this blog post was motivated with coreboot development effort to enable IOMMU. And error I get with 4.14.50 kernel and mentioned coreboot patches: [ 0.176137] Translation was enabled for IOMMU:0 but we are not in kdump mode [ 0.184000] AMD-Vi: Command buffer timeout [ 0.184000] AMD-Vi: Command buffer timeout [ 0.184000] AMD-Vi: Command buffer timeout [ 0.184000] AMD-Vi: Command buffer timeout [ 0.184000] AMD-Vi: Command buffer timeout [ 0.184000] AMD-Vi: Command buffer timeout [ 0.184000] AMD-Vi: Command buffer timeout

Building kernel with Xen support for apu2

Config fetched from github has couple features enabled that make it work as dom0 e.g.

CONFIG_XEN_DOM0. Options that are worth enabling when debugging IOMMU support in Linux kernel is Enable IOMMU debugging aka CONFIG_IOMMU_DEBUG.

Ansible for Linux kernel development Typically manual procedure of deploying new kernel and rootfs to pxe-server and NFS would look like below:

  1. Compile new kernel as described above
  2. Update rootfs using *.deb packages from point 1
  3. Update kernel using bzImage from point 1
  4. Boot new system over iPXE

*.deb packages and bzImage packages have to be deployed to NFS server and installed inside rootfs what typically mean chroot. Installation with system booted over NFS is way slower. We assume that server we working with is dedicated for developers. In our infrastructure we have 2 VMs one with production pxe-server and one with development pxe-server-dev. After exercising configuration on pxe-server-dev we applying them to production.

Flat ansible playbook I’m not familiar with Ansible design patters so I made it for now flat playbook. Rough steps of what was done in below scripts are like this:

  1. Copy all *.deb files mentioned in command line to pxe-server
  2. Mount required subsystems into Debian Dom0 rootfs
  3. create script that would be executed in chroot (upgrade and kernel installation)
  4. Umount subsystems from point 2
  5. Copy bzImage to /var/netboot/kernels/vmlinuz-dev
  6. Force to update netboot to revision that has support for *-dev menu options Things left out: * automatic selection of

*.deb packages that were created by build process * previous kernels cleanup in rootfs * modification of menu.ipxe – we rely now on branch in netboot repository, this not the best solution, because all modifications go through repository My Xen rootfs looks like that:

Running above code with command similar to:

Run Xen Linux dev with RTE Internally we developed extensive infrastructure that can leverage various features of RTE for example:

  • reserve device under test so no one else will intercept test execution – this is great in shared environment
  • check hardware configuration if it makes sense to run this test
  • automatically support all OSes exposed by pxe-server To verify our new kernel we would like to use last feature. Simplest

dev.robot may look like that: *** Settings *** Library SSHLibrary timeout=20 seconds Library Telnet timeout=20 seconds Library Process Library OperatingSystem Library String Library RequestsLibrary Library Collections

  • we use rtectrl-rest-api/rtectrl-rest-api.robot – this library provide control over GPIO, it is quite easy to implement your own if you have RTE since it is just interaction with sysfs
  • Run iPXE shell, iPXE menu, iPXE boot entry came from our iPXE library, which just parse PC Engines apu2 serial, it works only with pxe-server and firmware released by 3mdeb at pcengines.github.io.
  • Serial root login Linux is just login prompt wrapper with password as parameter Also to run above test you need modified Robot Framework which you can find

here. If you are interested in RTE usage please feel free to contact us. Having RTE you can achieve the same goal using various other methods (without our RF scripts). We plan to provide some working examples of RTE and Robot Framework during our workshop session on Open Source Firmware Conference.

How RTE-supported development workflow look like? Typically you work on your kernel modification and want to run it on hardware, so you point above ansible to deploy code to pxe-server. You may ask:

why use some external pxe-server and not just install everything locally? This implies couple problems: * target hardware have to be connected to your local network * every time you reboot computer you have some additional steps to finish setup * you can start container automatically, but still it consume resources on your local machine which you may use for other purposes (e.g. compilation) RTE is first about remote and second about automation. Of course RTE and pxe-server should always be behind VPN. Getting back to workflow. It may look like that: * build custom kernel as described above – time highly depends on your hardware * deploy kernel to pxe-server – time: 1min15s * run test – e.g. booting Xen Linux dev over iPXE RTE time: 1min40s * rebuild firmware – assuming you use pce-fw-builder RTE time: ~5min * firmware flashing and verification – RTE time: 2min (depends how many SPI blocks are different) * firmware flashing without verification – RTE time: 27s Please note that: * rebuilding firmware is not just building coreboot, but putting together all components (memtest, SeaBIOS, sortbootorder, iPXE) to make sure we didn’t messed something

pce-fw-builder preform distclean every time, we plan to change that so optionally it will reuse cached repositories, please track this issue * verification means booting over iPXE to OS and check if flashed version is the same as version exposed by provided binary Then you can run dev.robot to see how boot log look like. In my case mentioned at the begging I wanted initially to get better logs from kernel to continue investigation of repeating: AMD-Vi: Command buffer timeout

Summary We strongly believe in automation in firmware and embedded systems development. We think there is not enough validation in coming IoT era. Security requires reproducibility and validation. Because of that we try to automate our workflows, what is time consuming, but left us with some automation that always can be helpful in streamlining everyday work. We never know how many iteration given debugging session will take, why not to automate it? Or even better why not to try Test Driven Bug Fixing? If you think we can help in validation of your firmware or you looking for someone who can boot your product by leveraging advanced features of used hardware platform feel free to drop us email to