Friday, 27 May 2022

Linux for ARM on QEMU (virt) - System Information

Linux for ARM

Processor designs from Arm Ltd. are used in a plethora of microprocessors and SoC (System on a Chip) components. Which power a wide range of devices including: smartphones, tablets, PDAs, network routers, NAS systems, set-top boxes, etc. Some (non-exhaustive) lists of devices using the ARM architecture can be found in:

The initial port of the Linux kernel to ARM began back in 1994, then targeting an Acorn A5000 running RISCOS, and grew from there through to being part of the mainline Linux kernel.

The use of ARM cores in microprocessors, microcontrollers and SoC devices, for many different vendors, meant that supporting Linux on a device would often require a specific kernel built for that specific device. This limited the availability of general purpose Linux distributions, while making vendor specific embedded Linux kernels common. Fortunately since most user-space applications use the kernel abstractions to access devices, the same user-space can be used with any kernel built for the same flavor of the architecture (see ArmPorts - Debian Wiki).

Support for a selection of ARM based systems ('arm') appeared in the Debian GNU/Linux 2.2 (`potato') release in 2000. The current Debian Linux 11 (bullseye) supports ARM through the 'armel', 'armhf' and 'arm64' (aka. 'aarch64') ports.

QEMU

The diversity of devices using the ARM processor means the QEMU system emulators for ARM provide a large number of emulated systems, with the QEMU 5.2.0 build I'm using listing 90 systems for the 64-bit system emulator (qemu-system-aarch64), and 84 for the 32-bit system emulator (qemu-system-arm). While most systems appear in the lists for both emulators (suggesting they could be implemented with 32-bit or 64-bit processor cores) a small set of systems are 64-bit only.

While most of the available systems correspond to physical hardware, the "QEMU ARM Virtual Machine" system ('virt') is a virtual system based on the use of paravirtualized devices. This provides performance improvements, particularly for I/O, and is useful for software development and testing, for cases where specific hardware features are not required.

Emulation Command

Our 'virt' system was run with the QEMU command:

$ qemu-system-arm \
    -machine virt \
    -m 1024M \
    -drive if=none,file=hda_debian10_virt.qcow2,format=qcow2,id=hd \
    -device virtio-blk-device,drive=hd \
    -netdev 'user,guestfwd=:10.0.2.1:22-cmd:netcat 127.0.0.1 22,hostfwd=::2222-:22,id=mynet' \
    -device virtio-net-device,netdev=mynet \
    -kernel live-vmlinuz \
    -initrd live-initrd.img \
    -append 'root=/dev/vda2' \
    -no-reboot \
    -name 'Debian Linux 10 (buster) for armhf on QEMU (virt)'

By default the console is on the serial port (use ctrl-alt-2 to switch to the serial port, or the "View" menu if using the GUI). The storage and networking are specified as 'virtio' devices. The '-kernel' and '-initrd' parameters are used to boot the Linux kernel directly without having to worry about system firmware. The network forwarding rules provide host/guest ssh access.

System Information

So let's see what Linux has to say about the system...

uname & lsb_release

Operating system release and version information:

$ uname -a
Linux deb-virt 4.19.0-17-armmp-lpae #1 SMP Debian 4.19.194-1 (2021-06-10) armv7l GNU/Linux

So a "Linux" kernel, on a node named "deb-virt", kernel release "4.19.0-17-armmp-lpae" (a patched 4.19.0 kernel), version "#1 SMP Debian 4.19.194-1 (2021-06-10)", machine type "armv7l" for operating system "GNU/Linux".

Distribution information from Linux Standard Base (LSB):

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 10 (buster)
Release:	10
Codename:	buster

Looking at the Debian release information files:

cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ cat /etc/debian_version
10.10

Confirmed as a Debian Linux 10 (buster) distribution.

/proc/cpuinfo & lscpu

Processor information:

$ lscpu
Architecture:        armv7l
Byte Order:          Little Endian
CPU(s):              1
On-line CPU(s) list: 0
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           1
Vendor ID:           ARM
Model:               1
Model name:          Cortex-A15
Stepping:            r2p1
BogoMIPS:            125.00
Flags:               half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
$ cat /proc/cpuinfo
processor	: 0
model name	: ARMv7 Processor rev 1 (v7l)
BogoMIPS	: 125.00
Features	: half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0xc0f
CPU revision	: 1

Hardware	: Generic DT based system
Revision	: 0000
Serial		: 0000000000000000

While the processor is a Cortex-A15 (Wikipedia) it shows only a single core rather than the more usual 2 or 4 cores. This is due to the QEMU default for the number of CPUs being one, and can be overridden with the '-smp' option.

/proc/meminfo

Memory information:

$ cat /proc/meminfo
MemTotal:        1021640 kB
MemFree:          841148 kB
MemAvailable:     900876 kB
Buffers:           13044 kB
Cached:           110268 kB
SwapCached:            0 kB
Active:            81756 kB
Inactive:          54012 kB
Active(anon):      12524 kB
Inactive(anon):     1352 kB
Active(file):      69232 kB
Inactive(file):    52660 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:        262144 kB
HighFree:         134604 kB
LowTotal:         759496 kB
LowFree:          706544 kB
SwapTotal:        997372 kB
SwapFree:         997372 kB
Dirty:                 4 kB
Writeback:             0 kB
AnonPages:         12428 kB
Mapped:            13288 kB
Shmem:              1444 kB
Slab:              23380 kB
SReclaimable:      14820 kB
SUnreclaim:         8560 kB
KernelStack:         504 kB
PageTables:          712 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1508192 kB
Committed_AS:      80592 kB
VmallocTotal:     245760 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
Percpu:              128 kB
AnonHugePages:      2048 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:          16384 kB
CmaFree:           12260 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB

So 1.0 GiB RAM, with about 997 MB of swap.

lspci and lsusb

Details of installed PCI and USB devices:

# lspci -v
# lsusb

In this particular case there are no installed PCI or USB devices (the system uses virtio devices instead) so these commands don't report anything. However the 'virt' machine can have these installed if desired.

report-hw

Report hardware information using commands:

$ report-hw 
uname -a: Linux deb-virt 4.19.0-17-armmp-lpae #1 SMP Debian 4.19.194-1 (2021-06-10) armv7l GNU/Linux
lsmod: Module                  Size  Used by
lsmod: evdev                  24576  1
lsmod: ip_tables              24576  0
lsmod: x_tables               24576  1 ip_tables
lsmod: autofs4                40960  2
lsmod: ext4                  618496  2
lsmod: crc16                  16384  1 ext4
lsmod: mbcache                16384  1 ext4
lsmod: jbd2                  102400  1 ext4
lsmod: crc32c_generic         16384  3
lsmod: fscrypto               28672  1 ext4
lsmod: ecb                    16384  0
lsmod: virtio_net             45056  0
lsmod: net_failover           20480  1 virtio_net
lsmod: virtio_blk             20480  4
lsmod: failover               16384  1 net_failover
lsmod: virtio_mmio            20480  0
lsmod: virtio_ring            24576  3 virtio_blk,virtio_net,virtio_mmio
lsmod: virtio                 16384  3 virtio_blk,virtio_net,virtio_mmio
df: Filesystem     1K-blocks   Used Available Use% Mounted on
df: udev              491544      0    491544   0% /dev
df: tmpfs             102164   1440    100724   2% /run
df: /dev/vda2        6715744 978756   5376132  16% /
df: tmpfs             510820      0    510820   0% /dev/shm
df: tmpfs               5120      0      5120   0% /run/lock
df: tmpfs             510820      0    510820   0% /sys/fs/cgroup
df: /dev/vda1         482922  30205    427783   7% /boot
df: tmpfs             102164      0    102164   0% /run/user/1000
free:               total        used        free      shared  buff/cache   available
free: Mem:        1021640       44976      676632        1440      300032      894452
free: Swap:        997372           0      997372
/proc/cmdline: root=/dev/vda2
/proc/cpuinfo: processor	: 0
/proc/cpuinfo: model name	: ARMv7 Processor rev 1 (v7l)
/proc/cpuinfo: BogoMIPS	: 125.00
/proc/cpuinfo: Features	: half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm 
/proc/cpuinfo: CPU implementer	: 0x41
/proc/cpuinfo: CPU architecture: 7
/proc/cpuinfo: CPU variant	: 0x2
/proc/cpuinfo: CPU part	: 0xc0f
/proc/cpuinfo: CPU revision	: 1
/proc/cpuinfo: 
/proc/cpuinfo: Hardware	: Generic DT based system
/proc/cpuinfo: Revision	: 0000
/proc/cpuinfo: Serial		: 0000000000000000
/proc/iomem: 00000000-00000000 : pl011@9000000
/proc/iomem:   00000000-00000000 : pl011@9000000
/proc/iomem: 00000000-00000000 : pl031@9010000
/proc/iomem:   00000000-00000000 : rtc-pl031
/proc/iomem: 00000000-00000000 : pl061@9030000
/proc/iomem:   00000000-00000000 : pl061@9030000
/proc/iomem: 00000000-00000000 : a003c00.virtio_mmio
/proc/iomem: 00000000-00000000 : a003e00.virtio_mmio
/proc/iomem: 00000000-00000000 : System RAM
/proc/iomem:   00000000-00000000 : Kernel code
/proc/iomem:   00000000-00000000 : Kernel data
/proc/interrupts:            CPU0       
/proc/interrupts:  18:     114697     GIC-0  27 Level     arch_timer
/proc/interrupts:  50:       3127     GIC-0  78 Edge      virtio0
/proc/interrupts:  51:      18988     GIC-0  79 Edge      virtio1
/proc/interrupts:  53:          0     GIC-0  34 Level     rtc-pl031
/proc/interrupts:  54:          0     GIC-0  33 Level     uart-pl011
/proc/interrupts:  55:          0  9030000.pl061   3 Edge      GPIO Key Poweroff
/proc/interrupts: IPI0:          0  CPU wakeup interrupts
/proc/interrupts: IPI1:          0  Timer broadcast interrupts
/proc/interrupts: IPI2:          0  Rescheduling interrupts
/proc/interrupts: IPI3:          0  Function call interrupts
/proc/interrupts: IPI4:          0  CPU stop interrupts
/proc/interrupts: IPI5:          0  IRQ work interrupts
/proc/interrupts: IPI6:          0  completion interrupts
/proc/interrupts: Err:          0
/proc/meminfo: MemTotal:        1021640 kB
/proc/meminfo: MemFree:          676764 kB
/proc/meminfo: MemAvailable:     894604 kB
/proc/meminfo: Buffers:           23384 kB
/proc/meminfo: Cached:           254244 kB
/proc/meminfo: SwapCached:            0 kB
/proc/meminfo: Active:           107632 kB
/proc/meminfo: Inactive:         182748 kB
/proc/meminfo: Active(anon):      12840 kB
/proc/meminfo: Inactive(anon):     1348 kB
/proc/meminfo: Active(file):      94792 kB
/proc/meminfo: Inactive(file):   181400 kB
/proc/meminfo: Unevictable:           0 kB
/proc/meminfo: Mlocked:               0 kB
/proc/meminfo: HighTotal:        262144 kB
/proc/meminfo: HighFree:          90984 kB
/proc/meminfo: LowTotal:         759496 kB
/proc/meminfo: LowFree:          585780 kB
/proc/meminfo: SwapTotal:        997372 kB
/proc/meminfo: SwapFree:         997372 kB
/proc/meminfo: Dirty:                 4 kB
/proc/meminfo: Writeback:             0 kB
/proc/meminfo: AnonPages:         12756 kB
/proc/meminfo: Mapped:            13164 kB
/proc/meminfo: Shmem:              1440 kB
/proc/meminfo: Slab:              32760 kB
/proc/meminfo: SReclaimable:      22416 kB
/proc/meminfo: SUnreclaim:        10344 kB
/proc/meminfo: KernelStack:         504 kB
/proc/meminfo: PageTables:          736 kB
/proc/meminfo: NFS_Unstable:          0 kB
/proc/meminfo: Bounce:                0 kB
/proc/meminfo: WritebackTmp:          0 kB
/proc/meminfo: CommitLimit:     1508192 kB
/proc/meminfo: Committed_AS:      45448 kB
/proc/meminfo: VmallocTotal:     245760 kB
/proc/meminfo: VmallocUsed:           0 kB
/proc/meminfo: VmallocChunk:          0 kB
/proc/meminfo: Percpu:              152 kB
/proc/meminfo: AnonHugePages:      2048 kB
/proc/meminfo: ShmemHugePages:        0 kB
/proc/meminfo: ShmemPmdMapped:        0 kB
/proc/meminfo: CmaTotal:          16384 kB
/proc/meminfo: CmaFree:            7032 kB
/proc/meminfo: HugePages_Total:       0
/proc/meminfo: HugePages_Free:        0
/proc/meminfo: HugePages_Rsvd:        0
/proc/meminfo: HugePages_Surp:        0
/proc/meminfo: Hugepagesize:       2048 kB
/proc/meminfo: Hugetlb:               0 kB
/proc/bus/input/devices: I: Bus=0019 Vendor=0001 Product=0001 Version=0100
/proc/bus/input/devices: N: Name="gpio-keys"
/proc/bus/input/devices: P: Phys=gpio-keys/input0
/proc/bus/input/devices: S: Sysfs=/devices/platform/gpio-keys/input/input0
/proc/bus/input/devices: U: Uniq=
/proc/bus/input/devices: H: Handlers=kbd event0 
/proc/bus/input/devices: B: PROP=0
/proc/bus/input/devices: B: EV=3
/proc/bus/input/devices: B: KEY=100000 0 0 0
/proc/bus/input/devices: 

Since this machine is mostly based on virtio devices, the main hints about the devices are from the loaded kernel modules.

lshw

Report on system hardware:

# lshw
deb-virt                    
    description: ARMv7 Processor rev 1 (v7l)
    width: 32 bits
  *-core
       description: Motherboard
       physical id: 0
     *-cpu
          description: CPU
          product: cpu
          physical id: 0
          bus info: cpu@0
          capabilities: half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
     *-memory
          description: System memory
          physical id: 1
          size: 997MiB
     *-virtio0
          description: Ethernet interface
          physical id: 2
          bus info: virtio@0
          logical name: eth0
          serial: 52:54:00:12:34:56
          capabilities: ethernet physical
          configuration: autonegotiation=off broadcast=yes driver=virtio_net driverversion=1.0.0 ip=10.0.2.15 link=yes multicast=yes
     *-virtio1
          description: Virtual I/O device
          physical id: 3
          bus info: virtio@1
          logical name: /dev/vda
          size: 8GiB (8589MB)
          capabilities: partitioned partitioned:dos
          configuration: driver=virtio_blk logicalsectorsize=512 sectorsize=512 signature=baa19ca7
        *-volume:0
             description: Linux filesystem partition
             vendor: Linux
             physical id: 1
             bus info: virtio@1,1
             logical name: /dev/vda1
             logical name: /boot
             version: 1.0
             serial: e8a52b96-9cbb-473a-8e57-ca46f5beb36e
             size: 487MiB
             capacity: 487MiB
             capabilities: primary bootable extended_attributes large_files ext2 initialized
             configuration: filesystem=ext2 lastmountpoint=/ modified=2021-06-21 15:37:12 mount.fstype=ext2 mount.options=rw,relatime mounted=2021-06-21 15:37:10 state=mounted
        *-volume:1
             description: EXT4 volume
             vendor: Linux
             physical id: 2
             bus info: virtio@1,2
             logical name: /dev/vda2
             logical name: /
             version: 1.0
             serial: c431a7b7-8678-4326-9c99-9208aa9d221b
             size: 6728MiB
             capacity: 6728MiB
             capabilities: primary journaled extended_attributes large_files huge_files dir_nlink recover 64bit extents ext4 ext2 initialized
             configuration: created=2021-06-21 12:43:37 filesystem=ext4 lastmountpoint=/ modified=2021-06-21 15:37:43 mount.fstype=ext4 mount.options=rw,relatime,errors=remount-ro mounted=2021-06-21 15:37:54 state=mounted
        *-volume:2
             description: Extended partition
             physical id: 3
             bus info: virtio@1,3
             logical name: /dev/vda3
             size: 974MiB
             capacity: 974MiB
             capabilities: primary extended partitioned partitioned:extended
           *-logicalvolume
                description: Linux swap volume
                physical id: 5
                logical name: /dev/vda5
                version: 1
                serial: ea76383f-0df9-4da0-86a0-c454504fa5c5
                size: 974MiB
                capacity: 974MiB
                capabilities: nofs swap initialized
                configuration: filesystem=swap pagesize=4096

The approach taken by 'lshw' does a good job of picking up the virtio devices and providing a bit of information about them.

dmesg

System messages:

# dmesg
[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.19.0-17-armmp-lpae (debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.194-1 (2021-06-10)
[    0.000000] CPU: ARMv7 Processor [412fc0f1] revision 1 (ARMv7), cr=30c5387d
[    0.000000] CPU: div instructions available: patching division code
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
[    0.000000] OF: fdt: Machine model: linux,dummy-virt
[    0.000000] Memory policy: Data cache writealloc
[    0.000000] efi: Getting EFI parameters from FDT:
[    0.000000] efi: UEFI not found.
[    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
[    0.000000] On node 0 totalpages: 262144
[    0.000000]   DMA zone: 1728 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 196608 pages, LIFO batch:63
[    0.000000]   HighMem zone: 65536 pages, LIFO batch:15
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv0.2 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: Trusted OS migration not required
[    0.000000] random: get_random_bytes called from start_kernel+0x9c/0x52c with crng_init=0
[    0.000000] percpu: Embedded 17 pages/cpu s40460 r8192 d20980 u69632
[    0.000000] pcpu-alloc: s40460 r8192 d20980 u69632 alloc=17*4096
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 260416
[    0.000000] Kernel command line: root=/dev/vda2
[    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.000000] Memory: 983088K/1048576K available (10240K kernel code, 1136K rwdata, 2640K rodata, 2048K init, 319K bss, 49104K reserved, 16384K cma-reserved, 245760K highmem)
[    0.000000] Virtual kernel memory layout:
                   vector  : 0xffff0000 - 0xffff1000   (   4 kB)
                   fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
                   vmalloc : 0xf0800000 - 0xff800000   ( 240 MB)
                   lowmem  : 0xc0000000 - 0xf0000000   ( 768 MB)
                   pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
                   modules : 0xbf000000 - 0xbfe00000   (  14 MB)
                     .text : 0x(ptrval) - 0x(ptrval)   (12256 kB)
                     .init : 0x(ptrval) - 0x(ptrval)   (2048 kB)
                     .data : 0x(ptrval) - 0x(ptrval)   (1137 kB)
                      .bss : 0x(ptrval) - 0x(ptrval)   ( 320 kB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] ftrace: allocating 33878 entries in 100 pages
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu: 	RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=1.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[    0.000000] GICv2m: range[mem 0x08020000-0x08020fff], SPI[80:143]
[    0.000000] arch_timer: cp15 timer(s) running at 62.50MHz (virt).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x1cd42e208c, max_idle_ns: 881590405314 ns
[    0.000310] sched_clock: 56 bits at 62MHz, resolution 16ns, wraps every 4398046511096ns
[    0.000610] Switching to timer-based delay loop, resolution 16ns
[    0.013952] Console: colour dummy device 80x30
[    0.017228] console [tty0] enabled
[    0.020224] Calibrating delay loop (skipped), value calculated using timer frequency.. 125.00 BogoMIPS (lpj=250000)
[    0.020617] pid_max: default: 32768 minimum: 301
[    0.023237] Security Framework initialized
[    0.023513] Yama: disabled by default; enable with sysctl kernel.yama.*
[    0.029414] AppArmor: AppArmor initialized
[    0.031051] Mount-cache hash table entries: 2048 (order: 1, 8192 bytes)
[    0.031226] Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes)
[    0.063210] CPU: Testing write buffer coherency: ok
[    0.066335] CPU0: Spectre v2: firmware did not set auxiliary control register IBE bit, system vulnerable
[    0.091535] /cpus/cpu@0 missing clock-frequency property
[    0.092076] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
[    0.105087] Setting up static identity map for 0x40400000 - 0x404000a0
[    0.109599] rcu: Hierarchical SRCU implementation.
[    0.123175] EFI services will not be available.
[    0.127936] smp: Bringing up secondary CPUs ...
[    0.128125] smp: Brought up 1 node, 1 CPU
[    0.128280] SMP: Total of 1 processors activated (125.00 BogoMIPS).
[    0.128447] CPU: All CPU(s) started in SVC mode.
[    0.150130] devtmpfs: initialized
[    0.172709] VFP support v0.3: implementor 41 architecture 4 part 30 variant f rev 0
[    0.215612] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.217325] futex hash table entries: 256 (order: 2, 16384 bytes)
[    0.242761] pinctrl core: initialized pinctrl subsystem
[    0.264268] DMI not present or invalid.
[    0.282301] NET: Registered protocol family 16
[    0.308507] DMA: preallocated 256 KiB pool for atomic coherent allocations
[    0.311179] audit: initializing netlink subsys (disabled)
[    0.326825] audit: type=2000 audit(0.256:1): state=initialized audit_enabled=0 res=1
[    0.333292] No ATAGs?
[    0.347263] hw-breakpoint: found 5 (+1 reserved) breakpoint and 4 watchpoint registers.
[    0.347783] hw-breakpoint: maximum watchpoint size is 8 bytes.
[    0.354175] Serial: AMBA PL011 UART driver
[    0.402540] 9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 54, base_baud = 0) is a PL011 rev1
[    0.409453] console [ttyAMA0] enabled
[    0.466939] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    0.487892] vgaarb: loaded
[    0.492973] media: Linux media interface: v0.10
[    0.493361] videodev: Linux video capture interface: v2.00
[    0.493998] pps_core: LinuxPPS API ver. 1 registered
[    0.494137] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    0.494436] PTP clock support registered
[    0.524612] clocksource: Switched to clocksource arch_sys_counter
[    0.820729] VFS: Disk quotas dquot_6.6.0
[    0.821736] VFS: Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[    0.831274] AppArmor: AppArmor Filesystem Enabled
[    0.872644] NET: Registered protocol family 2
[    0.887376] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 6144 bytes)
[    0.888210] TCP established hash table entries: 8192 (order: 3, 32768 bytes)
[    0.888637] TCP bind hash table entries: 8192 (order: 4, 65536 bytes)
[    0.889268] TCP: Hash tables configured (established 8192 bind 8192)
[    0.891389] UDP hash table entries: 512 (order: 2, 16384 bytes)
[    0.891850] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
[    0.895884] NET: Registered protocol family 1
[    0.896943] NET: Registered protocol family 44
[    0.897298] PCI: CLS 0 bytes, default 64
[    0.907466] Unpacking initramfs...
[    4.169145] Freeing initrd memory: 20120K
[    4.169876] kvm [1]: HYP mode not available
[    4.180099] Initialise system trusted keyrings
[    4.184120] Key type blacklist registered
[    4.185554] workingset: timestamp_bits=14 max_order=18 bucket_order=4
[    4.212610] zbud: loaded
[   12.669135] Key type asymmetric registered
[   12.669512] Asymmetric key parser 'x509' registered
[   12.669969] bounce: pool size: 64 pages
[   12.670435] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 248)
[   12.672725] io scheduler noop registered
[   12.672931] io scheduler deadline registered
[   12.673869] io scheduler cfq registered (default)
[   12.674025] io scheduler mq-deadline registered
[   12.697603] pl061_gpio 9030000.pl061: PL061 GPIO chip @0x0000000009030000 registered
[   12.705054] pci-host-generic 4010000000.pcie: host bridge /pcie@10000000 ranges:
[   12.706226] pci-host-generic 4010000000.pcie:    IO 0x3eff0000..0x3effffff -> 0x00000000
[   12.707245] pci-host-generic 4010000000.pcie:   MEM 0x10000000..0x3efeffff -> 0x10000000
[   12.707451] pci-host-generic 4010000000.pcie:   MEM 0x8000000000..0xffffffffff -> 0x8000000000
[   12.715777] vmap allocation for size 1052672 failed: use vmalloc=<size> to increase size
[   12.716368] pci-host-generic 4010000000.pcie: ECAM ioremap failed
[   12.801081] pci-host-generic: probe of 4010000000.pcie failed with error -12
[   12.821734] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[   12.829543] Serial: AMBA driver
[   12.848173] libphy: Fixed MDIO Bus: probed
[   12.869160] mousedev: PS/2 mouse device common for all mice
[   12.879635] rtc-pl031 9010000.pl031: rtc core: registered pl031 as rtc0
[   12.891945] ledtrig-cpu: registered to indicate activity on CPUs
[   12.898750] NET: Registered protocol family 10
[   13.493137] Segment Routing with IPv6
[   13.494253] mip6: Mobile IPv6
[   13.494558] NET: Registered protocol family 17
[   13.495705] mpls_gso: MPLS GSO support
[   13.496845] ThumbEE CPU extension supported.
[   13.497077] Registering SWP/SWPB emulation handler
[   13.500863] registered taskstats version 1
[   13.501062] Loading compiled-in X.509 certificates
[   14.257325] Loaded X.509 cert 'Debian Secure Boot CA: 6ccece7e4c6c0d1f6149f3dd27dfcc5cbb419ea1'
[   14.258391] Loaded X.509 cert 'Debian Secure Boot Signer 2021 - linux: 4b6ef5abca669825178e052c84667ccbc0531f8c'
[   14.260934] zswap: loaded using pool lzo/zbud
[   14.263395] AppArmor: AppArmor sha1 policy hashing enabled
[   14.276770] input: gpio-keys as /devices/platform/gpio-keys/input/input0
[   14.280592] rtc-pl031 9010000.pl031: setting system clock to 2021-06-21 14:37:34 UTC (1624286254)
[   14.280826] sr_init: No PMIC hook to init smartreflex
[   14.293803] uart-pl011 9000000.pl011: no DMA platform data
[   14.467052] Freeing unused kernel memory: 2048K
[   14.478748] Run /init as init process
[   21.290454] virtio_blk virtio1: [vda] 16777216 512-byte logical blocks (8.59 GB/8.00 GiB)
[   21.326099]  vda: vda1 vda2 vda3 < vda5 >
[   22.452198] PM: Image not found (code -22)
[   24.154099] random: fast init done
[   24.184340] EXT4-fs (vda2): mounted filesystem with ordered data mode. Opts: (null)
[   26.220623] systemd[1]: Inserted module 'autofs4'
[   26.356649] systemd[1]: systemd 241 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
[   26.360218] systemd[1]: Detected virtualization qemu.
[   26.363874] systemd[1]: Detected architecture arm.
[   26.441264] systemd[1]: Set hostname to <deb-virt>.
[   30.925919] random: systemd: uninitialized urandom read (16 bytes read)
[   30.978126] random: systemd: uninitialized urandom read (16 bytes read)
[   30.983044] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[   31.000374] random: systemd: uninitialized urandom read (16 bytes read)
[   31.013527] systemd[1]: Listening on Journal Socket (/dev/log).
[   31.024261] systemd[1]: Listening on Journal Socket.
[   31.121713] systemd[1]: Starting Load Kernel Modules...
[   31.203157] systemd[1]: Starting Set the console keyboard layout...
[   31.330925] systemd[1]: Starting Create list of required static device nodes for the current kernel...
[   31.369992] systemd[1]: Listening on Journal Audit Socket.
[   31.394703] systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
[   31.409528] systemd[1]: Reached target Paths.
[   31.474747] systemd[1]: Created slice User and Session Slice.
[   31.494176] systemd[1]: Reached target Slices.
[   31.522608] systemd[1]: Listening on fsck to fsckd communication Socket.
[   31.581500] systemd[1]: Created slice system-getty.slice.
[   31.620261] systemd[1]: Created slice system-serial\x2dgetty.slice.
[   31.661822] systemd[1]: Listening on udev Kernel Socket.
[   31.714700] systemd[1]: Created slice system-systemd\x2dfsck.slice.
[   34.141708] EXT4-fs (vda2): re-mounted. Opts: errors=remount-ro
[   36.957671] systemd[1]: Started Journal Service.
[   38.783574] systemd-journald[157]: Received request to flush runtime journal from PID 1
[   50.157370] Adding 997372k swap on /dev/vda5.  Priority:-2 extents:1 across:997372k FS
[   51.882176] EXT4-fs (vda1): mounting ext2 file system using the ext4 subsystem
[   51.919406] EXT4-fs (vda1): mounted filesystem without journal. Opts: (null)
[   54.565974] audit: type=1400 audit(1624286294.780:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=216 comm="apparmor_parser"
[   54.568052] audit: type=1400 audit(1624286294.784:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_filter" pid=216 comm="apparmor_parser"
[   54.568337] audit: type=1400 audit(1624286294.784:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_groff" pid=216 comm="apparmor_parser"
[   54.752918] audit: type=1400 audit(1624286294.968:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=217 comm="apparmor_parser"
[   54.753422] audit: type=1400 audit(1624286294.972:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=217 comm="apparmor_parser"
[   63.633226] random: crng init done
[   63.633552] random: 7 urandom warning(s) missed due to ratelimiting

Lots of stuff in here...

Benchmark

Since this is a machine targeted at providing 32-bit ARM based virtual machines, how well does it perform?

BogoMips

The BogoMips pseudo-benchmark is used by the Linux kernel to calibrate a wait loop. The value obtained at boot is reported by '/proc/cpuinfo', 'lscpu' and 'dmesg' (see above).

Calibrating delay loop (skipped), value calculated using timer frequency.. 125.00 BogoMIPS (lpj=250000)

This BogoMips result is derived from a timer rather than using the delay loop calibration, and doesn't tell us anything about the processor performance. So an alternative benchmark is required to gauge performance.

OpenSSL

The OpenSSL cryptographic library provides a tool providing a command-line interface to the library methods and one aspect of this provides a speed test. Since I'm mostly interested in older systems I'm going to focus on the common RSA and MD5 methods.

$ openssl speed md5
Doing md5 for 3s on 16 size blocks: 1353924 md5's in 2.99s
Doing md5 for 3s on 64 size blocks: 1168093 md5's in 3.00s
Doing md5 for 3s on 256 size blocks: 794642 md5's in 3.00s
Doing md5 for 3s on 1024 size blocks: 356521 md5's in 2.99s
Doing md5 for 3s on 8192 size blocks: 59039 md5's in 3.00s
Doing md5 for 3s on 16384 size blocks: 29876 md5's in 3.00s
OpenSSL 1.1.1d  10 Sep 2019
built on: Mon Mar 22 23:08:47 2021 UTC
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr) 
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-pp1hfQ/openssl-1.1.1d=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md5               7245.08k    24919.32k    67809.45k   122099.50k   161215.83k   163162.79k
$ openssl speed rsa
Doing 512 bits private rsa's for 10s: 7005 512 bits private RSA's in 10.00s
Doing 512 bits public rsa's for 10s: 64954 512 bits public RSA's in 9.99s
Doing 1024 bits private rsa's for 10s: 1443 1024 bits private RSA's in 10.00s
Doing 1024 bits public rsa's for 10s: 31001 1024 bits public RSA's in 9.99s
Doing 2048 bits private rsa's for 10s: 289 2048 bits private RSA's in 10.02s
Doing 2048 bits public rsa's for 10s: 11082 2048 bits public RSA's in 10.00s
Doing 3072 bits private rsa's for 10s: 103 3072 bits private RSA's in 10.02s
Doing 3072 bits public rsa's for 10s: 5493 3072 bits public RSA's in 10.00s
Doing 4096 bits private rsa's for 10s: 49 4096 bits private RSA's in 10.04s
Doing 4096 bits public rsa's for 10s: 3232 4096 bits public RSA's in 9.99s
Doing 7680 bits private rsa's for 10s: 9 7680 bits private RSA's in 10.64s
Doing 7680 bits public rsa's for 10s: 980 7680 bits public RSA's in 10.01s
Doing 15360 bits private rsa's for 10s: 2 15360 bits private RSA's in 17.50s
Doing 15360 bits public rsa's for 10s: 254 15360 bits public RSA's in 10.03s
OpenSSL 1.1.1d  10 Sep 2019
built on: Mon Mar 22 23:08:47 2021 UTC
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr) 
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-pp1hfQ/openssl-1.1.1d=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
                  sign    verify    sign/s verify/s
rsa  512 bits 0.001428s 0.000154s    700.5   6501.9
rsa 1024 bits 0.006930s 0.000322s    144.3   3103.2
rsa 2048 bits 0.034671s 0.000902s     28.8   1108.2
rsa 3072 bits 0.097282s 0.001820s     10.3    549.3
rsa 4096 bits 0.204898s 0.003091s      4.9    323.5
rsa 7680 bits 1.182222s 0.010214s      0.8     97.9
rsa 15360 bits 8.750000s 0.039488s      0.1     25.3

Extracting the relevant figures for comparisons (see OpenSSL Speed Results):

  • OpenSSL speed MD5 8,192 bytes: 161,215.83k
  • OpenSSL speed RSA 4,096 bytes sign/s: 4.9
  • OpenSSL speed RSA 4,096 bytes verify/s: 323.5

What does these results tell us about the performance of the emulated system?

Thoughts

The use of paravirtualized devices mostly shows performance improvement for I/O, which makes a significant difference when building software. That makes the use of the 'virt' machine attractive for handing software compiles or other I/O intensive operations. There is also additional flexibility since the emulated machine is not constrained by the limitations of the physical hardware, making the use of large memory and SMP machines an option for development and testing.

Further Sources


Supplemental: OpenSSL 1.1.1d Results

Debian Linux 10 (buster) for PowerPC provides a build of OpenSSL 1.1.1d:

$ openssl version -a
OpenSSL 1.1.1d  10 Sep 2019
built on: Mon Mar 22 23:08:47 2021 UTC
platform: debian-armhf
options:  bn(64,32) rc4(char) des(long) blowfish(ptr) 
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-pp1hfQ/openssl-1.1.1d=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
OPENSSLDIR: "/usr/lib/ssl"
ENGINESDIR: "/usr/lib/arm-linux-gnueabihf/engines-1.1"
Seeding source: os-specific

From the compile flags this build has been compiled with assembler implementations for various methods (the '-D*_ASM' flags).

OpenSSL 1.1.1d speed

For reference a full run of the methods provided by OpenSSL on this QEMU system gives results (openssl speed):

OpenSSL 1.1.1d  10 Sep 2019
built on: Mon Mar 22 23:08:47 2021 UTC
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr) 
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-pp1hfQ/openssl-1.1.1d=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md2                  0.00         0.00         0.00         0.00         0.00         0.00 
mdc2                 0.00         0.00         0.00         0.00         0.00         0.00 
md4               2022.36k     7899.95k    26851.93k    69816.32k   130902.70k   139597.14k
md5               7200.52k    25794.99k    68818.52k   123324.76k   163203.75k   163561.47k
hmac(md5)         2046.50k     7701.38k    26572.20k    72404.99k   144195.64k   153032.02k
sha1              5815.29k    16865.98k    38440.36k    62111.74k    74887.40k    74825.73k
rmd160            1582.09k     5660.25k    15072.04k    25933.85k    33785.03k    34295.70k
rc4              27083.19k    33698.10k    38254.99k    39255.50k    39921.62k    40099.70k
des cbc           6977.67k     7657.51k     7798.67k     7900.84k     7912.54k     7946.24k
des ede3          2524.98k     2712.38k     2740.74k     2755.21k     2766.17k     2681.51k
idea cbc             0.00         0.00         0.00         0.00         0.00         0.00 
seed cbc         11746.65k    13988.74k    14682.45k    14954.15k    14647.30k    14734.68k
rc2 cbc          11376.46k    13449.37k    14163.31k    14397.78k    14327.81k    14394.91k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00         0.00 
blowfish cbc     13669.49k    16554.71k    17353.05k    17363.29k    17679.93k    18219.67k
cast cbc         11018.32k    13716.52k    14944.85k    15305.39k    15305.39k    15400.96k
aes-128 cbc      14755.46k    18263.47k    19698.39k    20151.30k    20335.27k    20310.70k
aes-192 cbc      13592.21k    16617.82k    17430.35k    17790.29k    17866.23k    17934.73k
aes-256 cbc      12605.05k    14901.25k    15636.82k    15807.83k    15941.63k    15521.11k
camellia-128 cbc    12887.26k    15468.54k    16288.68k    16565.25k    16652.50k    16685.38k
camellia-192 cbc    10960.87k    12776.04k    13388.71k    13552.07k    13567.49k    13576.87k
camellia-256 cbc    10920.62k    12753.73k    13370.99k    13537.69k    13592.14k    13578.45k
sha256            4642.14k    13129.05k    28556.97k    40522.41k    46093.65k    46383.10k
sha512            3119.22k    12307.09k    23328.09k    37438.81k    45405.53k    46333.95k
whirlpool          486.99k     1011.37k     1632.77k     1944.58k     2069.85k     2068.41k
aes-128 ige      13847.30k    17011.39k    18064.95k    18294.44k    18330.97k    18317.31k
aes-192 ige      12449.25k    15320.59k    15992.66k    16213.33k    16121.86k    16231.08k
aes-256 ige      11396.25k    13600.58k    14330.20k    14554.11k    14472.53k    14543.53k
ghash            14472.61k    16817.75k    17603.13k    18055.85k    17853.10k    18093.40k
rand               249.23k      899.76k     2912.81k     6291.71k     9618.63k     9731.44k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.001416s 0.000144s    706.1   6924.6
rsa 1024 bits 0.006638s 0.000318s    150.7   3146.5
rsa 2048 bits 0.034792s 0.000909s     28.7   1100.3
rsa 3072 bits 0.096990s 0.001887s     10.3    529.9
rsa 4096 bits 0.216170s 0.003241s      4.6    308.5
rsa 7680 bits 1.236667s 0.010707s      0.8     93.4
rsa 15360 bits 9.095000s 0.041577s      0.1     24.1
                  sign    verify    sign/s verify/s
dsa  512 bits 0.003025s 0.002209s    330.6    452.8
dsa 1024 bits 0.005071s 0.004170s    197.2    239.8
dsa 2048 bits 0.012364s 0.011050s     80.9     90.5
                              sign    verify    sign/s verify/s
 160 bits ecdsa (secp160r1)   0.0112s   0.0100s     89.1    100.0
 192 bits ecdsa (nistp192)   0.0149s   0.0124s     67.1     80.6
 224 bits ecdsa (nistp224)   0.0194s   0.0162s     51.6     61.8
 256 bits ecdsa (nistp256)   0.0015s   0.0041s    675.1    244.0
 384 bits ecdsa (nistp384)   0.0567s   0.0426s     17.6     23.5
 521 bits ecdsa (nistp521)   0.1235s   0.0878s      8.1     11.4
 163 bits ecdsa (nistk163)   0.0101s   0.0200s     98.8     50.1
 233 bits ecdsa (nistk233)   0.0175s   0.0352s     57.3     28.4
 283 bits ecdsa (nistk283)   0.0304s   0.0600s     32.9     16.7
 409 bits ecdsa (nistk409)   0.0651s   0.1283s     15.4      7.8
 571 bits ecdsa (nistk571)   0.1434s   0.2844s      7.0      3.5
 163 bits ecdsa (nistb163)   0.0107s   0.0210s     93.5     47.5
 233 bits ecdsa (nistb233)   0.0189s   0.0378s     52.9     26.4
 283 bits ecdsa (nistb283)   0.0329s   0.0649s     30.4     15.4
 409 bits ecdsa (nistb409)   0.0730s   0.1446s     13.7      6.9
 571 bits ecdsa (nistb571)   0.1627s   0.3219s      6.1      3.1
 256 bits ecdsa (brainpoolP256r1)   0.0184s   0.0175s     54.3     57.1
 256 bits ecdsa (brainpoolP256t1)   0.0184s   0.0164s     54.2     61.0
 384 bits ecdsa (brainpoolP384r1)   0.0574s   0.0455s     17.4     22.0
 384 bits ecdsa (brainpoolP384t1)   0.0567s   0.0429s     17.6     23.3
 512 bits ecdsa (brainpoolP512r1)   0.0805s   0.0649s     12.4     15.4
 512 bits ecdsa (brainpoolP512t1)   0.0806s   0.0585s     12.4     17.1
                              op      op/s
 160 bits ecdh (secp160r1)   0.0110s     91.0
 192 bits ecdh (nistp192)   0.0143s     69.9
 224 bits ecdh (nistp224)   0.0185s     54.0
 256 bits ecdh (nistp256)   0.0029s    342.5
 384 bits ecdh (nistp384)   0.0542s     18.4
 521 bits ecdh (nistp521)   0.1187s      8.4
 163 bits ecdh (nistk163)   0.0094s    106.0
 233 bits ecdh (nistk233)   0.0170s     58.8
 283 bits ecdh (nistk283)   0.0294s     34.0
 409 bits ecdh (nistk409)   0.0635s     15.8
 571 bits ecdh (nistk571)   0.1387s      7.2
 163 bits ecdh (nistb163)   0.0101s     99.3
 233 bits ecdh (nistb233)   0.0184s     54.5
 283 bits ecdh (nistb283)   0.0315s     31.8
 409 bits ecdh (nistb409)   0.0709s     14.1
 571 bits ecdh (nistb571)   0.1608s      6.2
 256 bits ecdh (brainpoolP256r1)   0.0180s     55.5
 256 bits ecdh (brainpoolP256t1)   0.0180s     55.4
 384 bits ecdh (brainpoolP384r1)   0.0558s     17.9
 384 bits ecdh (brainpoolP384t1)   0.0548s     18.2
 512 bits ecdh (brainpoolP512r1)   0.0776s     12.9
 512 bits ecdh (brainpoolP512t1)   0.0769s     13.0
 253 bits ecdh (X25519)   0.0049s    204.1
 448 bits ecdh (X448)   0.0171s     58.4
                              sign    verify    sign/s verify/s
 253 bits EdDSA (Ed25519)   0.0018s   0.0056s    545.5    177.1
 456 bits EdDSA (Ed448)   0.0079s   0.0183s    126.8     54.5

This version of OpenSSL supports accessing the Linux kernel cryptography implementations (see Linux Kernel Crypto API) via the 'afalg' engine:

$ openssl engine afalg -c
(afalg) AFALG engine support
 [AES-128-CBC, AES-192-CBC, AES-256-CBC]

The engine only supports AES methods. So getting a baseline with engine invocation:

$ openssl speed -engine afalg aes-256-cbc
engine "afalg" set.
Doing aes-256 cbc for 3s on 16 size blocks: 2231421 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 640874 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 256 size blocks: 176351 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 1024 size blocks: 45237 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 5053 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16384 size blocks: 2670 aes-256 cbc's in 3.00s
OpenSSL 1.1.1d  10 Sep 2019
built on: Mon Mar 22 23:08:47 2021 UTC
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr) 
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-pp1hfQ/openssl-1.1.1d=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256 cbc      11900.91k    13717.70k    15048.62k    15440.90k    13798.06k    14581.76k

In this version of OpenSSL the engine specific method is only called when the '-evp' option is used:

$ openssl speed -engine afalg -evp  aes-256-cbc
engine "afalg" set.
Doing aes-256-cbc for 3s on 16 size blocks: 8234 aes-256-cbc's in 0.34s
Doing aes-256-cbc for 3s on 64 size blocks: 9833 aes-256-cbc's in 0.34s
Doing aes-256-cbc for 3s on 256 size blocks: 9103 aes-256-cbc's in 0.36s
Doing aes-256-cbc for 3s on 1024 size blocks: 7141 aes-256-cbc's in 0.26s
Doing aes-256-cbc for 3s on 8192 size blocks: 2147 aes-256-cbc's in 0.08s
Doing aes-256-cbc for 3s on 16384 size blocks: 1171 aes-256-cbc's in 0.04s
OpenSSL 1.1.1d  10 Sep 2019
built on: Mon Mar 22 23:08:47 2021 UTC
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr) 
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-pp1hfQ/openssl-1.1.1d=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc        387.48k     1850.92k     6473.24k    28124.55k   219852.80k   479641.60k

Here we see a throughput improvement from 13.8 MB/s to 219.8 MB/s for 8 KB blocks.

The kernel method implementation can use hardware acceleration (if available) and processor specific features, which often significantly improves performance. Typically this manifests at larger block sizes, with the smaller block sizes seeing poor performance when the kernel methods are used.


No comments: