Saturday, 14 May 2022

Notes on OpenSSL Performance

OpenSSL

OpenSSL can use three different implementations of the cryptographic methods, providing different performance tiers:

  1. Portable C-based methods: most portable but typically the slowest.
  2. Processor family assembler based methods: faster but less portable, and may have problems on some processor compatible implementations
  3. Methods utilizing hardware acceleration: typically the fastest option, but has specific hardware requirements

A simple compile time option ('no-asm') disables the assembler implementations used by default, so a. & b. are simple to test. For c. appropriate hardware is required. Fortunately three of the options supported by OpenSSL are available to us:

  1. VIA PadLock (Wikipedia): as implemented in the VIA C3 Nehemiah processors (Wikipedia)
  2. Intel Advanced Encryption Standard New Instructions (AES-NI; Wikipedia): implemented on Intel x86_64 since 2010 and AMD x86_64 processors since 2011
  3. Intel SHA extensions (SHA Ext.; Wikipedia): implemented on Intel x86_64 processors from 2016 and AMD Ryzen from 2017

To illustrate the effect on performance a set of methods have been selected which show the effects of each type of implementation.

Using the results for 8,192 byte blocks:

VIA Luke @ 1.0 GHz
OpenSSL 1.1.0l on Debian Linux 11 for x86
MethodAES-256 CBCIDEA CBCMD5SHA-1SHA-256SHA-512
no-asm6,916.78k9,684.21k80,191.49k36,410.71k12,064.09k1,832.28k
asm11,122.01k9,661.10k106,332.16k45,978.97k21,534.04k9,393.49k
Kerneld,e11,127.92k9,655.64k104,336.04k45,566.63k21,515.64k9,374.38k
Padlockb520,596.10k9,662.71k104,357.28k45,472.50k21,460.31k9,376.82k
OpenSSL 3.0.2 on Debian Linux 11 for x86
MethodAES-256 CBCIDEAa CBCMD5SHA-1SHA-256SHA-512
no-asm6,916.47k9,662.71k78,665.19k36,050.26k12,125.81k1,829.55k
asm11,127.47k9,655.64k103,931.19k45,509.29k21,493.73k9,368.92k
Kerneld2,033,909.76k9,627.65k104,328.02k45,512.02k21,442.15k9,366.19k
Padlockb515,844.78k9,658.37k103,765.33k45,520.21k21,435.73k9,368.92k
AMD Ryzen 5 3600
OpenSSL 1.1.0l with Debian Linux 11 for x86_64
MethodAES-256 CBCIDEA CBCMD5SHA-1SHA-256SHA-512
no-asm212,366.68k119,010.65k749,469.70k845,922.30k307,301.03k553,937.58k
asm232,721.07k119,545.86k793,971.37k1,050,842.45k470,701.40k604,972.40k
AES-NI & SHA Ext.1,086,559.57k119,250.94k790,495.23k1,027,295.91k470,671.36k605,863.94k
Kerneld,e1,092,332.20k120,416.94k787,464.19k1,029,693.44k475,791.36k601,159.00k
OpenSSL 3.0.2 with Debian Linux 11 for x86_64
MethodAES-256 CBCIDEAa CBCMD5SHA-1SHA-256SHA-512
no-asm210,668.20k117,587.97k740,698.79k832,288.09k310,059.01k559,205.03k
asmc1,097,910.95k119,048.87k789,848.06k1,036,997.97k473,503.06k599,274.84k
Kerneld37,086,822.40k120,220.33k793,990.49k1,041,334.27k474,685.44k598,278.14k

Notes:

  1. In OpenSSL 3.0.2 access to the IDEA method requires use of the legacy provider (to use without installing $ LD_LIBRARY_PATH=`pwd` apps/openssl speed -provider-path ./providers/ -provider legacy -provider default idea)
  2. The OpenSSL PadLock engine only supports AES on our VIA Luke (C3 Nehemiah) based system, more recent versions of the VIA PadLock hardware provide additional methods, including SHA
  3. On systems that support AES-NI and/or SHA Ext. the standard assembler implementations in OpenSSL 3.0.2, detect and use the instruction set extensions to accelerate the methods
  4. The OpenSSL 'afalg' engine (used for "Kernel") uses the Linux Kernel Crypto API (AF_ALG) to access the methods in the Linux kernel, which make use of hardware acceleration and processor features beyond those used by the standard assembler implementations in OpenSSL
  5. The OpenSSL 1.1.0 implementation of the 'afalg' engine only supports use of the kernel methods for AES-128-CBC

Additional Notes & Raw Results

A. VIA Luke running Debian Linux 11, OpenSSL 1.1.0l

The second generation of VIA's Corefusion (Wikipedia) x86 processor, VIA Luke, features a VIA C3 Nehemiah core with the VIA PadLock (Wikipedia) cryptographic accelerator. The PadLock implementation in Luke provides a hardware Random Number Generator (RNG) and the Advanced Cryptography Engine (ACE) supporting AES (Advanced Encryption Standard; Wikipedia).

OpenSSL implemented a 'padlock' engine to access the ACE acceleration of AES back in 2005 with the release of OpenSSL 0.9.8. The VIA C7 enhanced ACE, adding SHA (Secure Hash Algorithms) and PMM (PadLock Montgomery Multiplier), and support was added in OpenSSL in 2006. Some Zhaoxin processors feature PadLock with SM3 (Wikipedia) and SM4 (SM4) added.

VIA Luke @ 1.0 GHz
OpenSSL 1.1.0l on Debian Linux 11 for x86
MethodAES-256 CBCIDEA CBCMD5SHA-1SHA-256SHA-512
no-asm6,916.78k9,684.21k80,191.49k36,410.71k12,064.09k1,832.28k
asm11,122.01k9,661.10k106,332.16k45,978.97k21,534.04k9,393.49k
Kernel11,127.92k9,655.64k104,336.04k45,566.63k21,515.64k9,374.38k
Padlock520,596.10k9,662.71k104,357.28k45,472.50k21,460.31k9,376.82k

Notes:

  • During the build tests (make test) the 'fuzz' test fails due to a missing update (see repo/gentoo.git - Official Gentoo ebuild repository for a patch that fixes the issue).
  • A runtime issue with the Elliptic Curve methods means some ECDH methods fail to give meaningful results, so the benchmark builds disable these methods (no-ec).
  • The 'afalg' engine in OpenSSL 1.1.0l only supports AES-128 CBC, so no improvement is seen in our selected comparison methods. From a test run 'afalg' increases throughput for AES-128 CBC from ~15 MB/s to ~2.7 GB/s on this system

A.1. VIA Luke, OpenSSL 1.1.0l, C methods only (no-asm)

Downloading the source distribution from the OpenSSL site and building with the no assembler option ($ ./config no-asm no-ec), gives a build using the portable C implementations for the methods.

Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed):

OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" 
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md2                  0.00         0.00         0.00         0.00         0.00         0.00 
mdc2              1216.39k     1349.70k     1389.96k     1400.27k     1403.56k     1403.56k
md4               6043.15k    19801.11k    51304.53k    85481.47k   106300.82k   108276.39k
md5              11623.79k    29433.43k    56385.29k    73045.33k    80191.49k    80820.40k
hmac(md5)         4272.43k    13891.37k    36658.69k    61732.47k    77897.97k    79107.41k
sha1              6812.00k    15794.66k    27587.24k    33936.04k    36410.71k    36616.59k
rmd160            3816.28k    10080.21k    20185.69k    26923.35k    29886.96k    30086.49k
rc4              30881.65k    34105.09k    34865.40k    35200.68k    35291.14k    35039.91k
des cbc           8691.31k     9165.61k     9332.22k     9374.04k     9390.56k     9360.73k
des ede3          3266.32k     3333.37k     3353.60k     3359.40k     3361.45k     3354.03k
idea cbc          8881.84k     9454.71k     9608.36k     9648.13k     9684.21k     9628.33k
seed cbc         12766.92k    13690.90k    13939.80k    14007.30k    14032.90k    13959.39k
rc2 cbc           6628.12k     6978.54k     7073.79k     7098.37k     7085.12k     7094.27k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00         0.00 
blowfish cbc     14833.34k    16211.05k    16626.90k    16685.40k    16722.60k    16635.22k
cast cbc         11473.94k    12236.09k    12494.85k    12571.15k    12520.11k    12539.22k
aes-128 cbc       8837.10k     9250.20k     9391.10k     9461.14k     9437.18k     9404.42k
aes-192 cbc       7549.75k     7851.81k     7950.34k     7975.59k     7984.47k     7934.46k
aes-256 cbc       6590.26k     6816.68k     6893.06k     6919.26k     6916.78k     6897.66k
camellia-128 cbc    13046.99k    14029.55k    14296.15k    14382.05k    14398.81k    14314.15k
camellia-192 cbc    10317.05k    10891.34k    11078.14k    11128.83k    11143.85k    11097.95k
camellia-256 cbc    10315.41k    10888.94k    11080.02k    11129.30k    11130.20k    11091.97k
sha256            2720.35k     5750.36k     9502.46k    11372.93k    12064.09k    12118.70k
sha512             228.66k      914.00k     1232.04k     1649.66k     1832.28k     1841.15k
whirlpool         1152.68k     2356.39k     3841.72k     4559.87k     4825.09k     4849.66k
aes-128 ige       8656.05k     9140.01k     9289.98k     9328.98k     9338.33k     9273.34k
aes-192 ige       7420.33k     7761.34k     7877.67k     7905.96k     7910.74k     7864.32k
aes-256 ige       6492.42k     6749.46k     6838.70k     6855.25k     6853.24k     6834.00k
ghash             4417.29k     4504.66k     4537.30k     4544.51k     4549.29k     4546.83k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.003229s 0.000257s    309.7   3888.2
rsa 1024 bits 0.018160s 0.000814s     55.1   1228.5
rsa 2048 bits 0.113258s 0.002835s      8.8    352.7
rsa 3072 bits 0.339667s 0.006174s      2.9    162.0
rsa 4096 bits 0.744286s 0.010050s      1.3     99.5
rsa 7680 bits 4.436667s 0.035426s      0.2     28.2
rsa 15360 bits 33.060000s 0.137260s      0.0      7.3
                  sign    verify    sign/s verify/s
dsa  512 bits 0.004530s 0.003398s    220.7    294.3
dsa 1024 bits 0.011801s 0.010539s     84.7     94.9
dsa 2048 bits 0.037955s 0.034567s     26.3     28.9

These results give a performance baseline for this platform.

A.2. VIA Luke, OpenSSL 1.1.0l, assembler methods

Downloading the source distribution from the OpenSSL site and building with the default options ($ ./config no-ec), gives a build with the assembler methods enabled.

Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed):

OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md2                  0.00         0.00         0.00         0.00         0.00         0.00 
mdc2              1231.24k     1376.94k     1418.75k     1429.82k     1433.60k     1435.65k
md4               5709.55k    19328.21k    51612.33k    88431.27k   112065.46k   114300.25k
md5              11881.41k    34809.17k    70811.75k    95242.24k   106332.16k   107266.05k
hmac(md5)         4605.06k    15539.84k    43113.64k    78534.31k   103164.22k   105600.34k
sha1              7529.77k    18275.74k    33563.32k    42401.79k    45978.97k    46293.05k
rmd160            3527.73k     9388.31k    18618.20k    24743.59k    27368.98k    27574.27k
rc4              35951.06k    42033.39k    45003.94k    45754.31k    45828.78k    45760.51k
des cbc          10149.31k    10655.74k    10806.87k    10843.14k    10858.52k    10857.13k
des ede3          3651.79k     3706.65k     3735.02k     3739.65k     3741.01k     3741.01k
idea cbc          8881.83k     9455.85k     9599.91k     9648.13k     9661.10k     9659.96k
seed cbc         12766.79k    13693.85k    13935.27k    13996.84k    14036.36k    14030.17k
rc2 cbc           6625.95k     6977.98k     7073.71k     7122.11k     7085.12k     7105.19k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00         0.00 
blowfish cbc     19068.84k    20448.28k    20788.66k    20916.06k    20930.56k    20938.75k
cast cbc         11475.89k    12226.88k    12492.97k    12563.46k    12587.64k    12582.91k
aes-128 cbc       6850.50k     7110.91k     7285.13k    15298.01k    15447.38k    15444.65k
aes-192 cbc       5687.63k     5958.40k     6061.99k    13012.99k    13093.55k    13090.71k
aes-256 cbc       4919.92k     5087.45k     5143.11k    11076.38k    11122.01k    11130.20k
camellia-128 cbc    11987.41k    15300.51k    16387.67k    16721.92k    16823.84k    16849.77k
camellia-192 cbc     9761.94k    11774.58k    12487.17k    12682.92k    12741.29k    12744.33k
camellia-256 cbc     9733.36k    11775.45k    12530.90k    12686.60k    12738.56k    12741.29k
sha256            3757.12k     8365.14k    15961.51k    19976.93k    21534.04k    21681.49k
sha512            1121.85k     4511.76k     6224.90k     8425.81k     9393.49k     9473.03k
whirlpool         2548.74k     5454.61k     9206.70k    11109.37k    11818.33k    11872.94k
aes-128 ige       6576.01k     6873.34k     6970.37k     6994.94k     6995.97k     7011.47k
aes-192 ige       5494.31k     5741.35k     5845.39k     5869.23k     5876.39k     5857.69k
aes-256 ige       4778.57k     4928.83k     4981.33k     4994.73k     4994.92k     5013.83k
ghash            14667.08k    20156.33k    22210.15k    22831.45k    23035.90k    23046.83k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.002296s 0.000200s    435.6   5011.3
rsa 1024 bits 0.014245s 0.000698s     70.2   1432.7
rsa 2048 bits 0.098431s 0.002584s     10.2    386.9
rsa 3072 bits 0.303030s 0.005677s      3.3    176.2
rsa 4096 bits 0.685333s 0.009980s      1.5    100.2
rsa 7680 bits 4.226667s 0.034567s      0.2     28.9
rsa 15360 bits 32.490000s 0.141408s      0.0      7.1
                  sign    verify    sign/s verify/s
dsa  512 bits 0.003438s 0.002715s    290.9    368.3
dsa 1024 bits 0.009980s 0.008943s    100.2    111.8
dsa 2048 bits 0.034704s 0.032226s     28.8     31.0

The performance gains from the assembler implementations are evident, and demonstrate which methods have assembler implementations.

A.3. VIA Luke, OpenSSL 1.1.0l, use kernel methods (AF_ALG)

The default compile (see A.2.) includes the 'afalg' engine, which provides access to the Linux Kernel Crypto API (AF_ALG) method implementations. Details of the available kernel crypto methods can be found in /proc/crypto:

$ cat /proc/crypto | grep '^name'
name         : cbc(aes)
name         : ecb(aes)
name         : aes
name         : crc32c
name         : crct10dif
name         : pkcs1pad(rsa,sha256)
name         : hmac(sha256)
name         : hmac(sha1)
name         : lzo-rle
name         : lzo-rle
name         : lzo
name         : lzo
name         : zlib-deflate
name         : deflate
name         : deflate
name         : sha224
name         : sha256
name         : sha1
name         : md5
name         : ecb(cipher_null)
name         : digest_null
name         : compress_null
name         : cipher_null
name         : rsa
name         : dh

However the 'afalg' engine only supports a subset:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl engine afalg -c
(afalg) AFALG engine support
 [AES-128-CBC]

So this implementation only supports the kernel methods for AES-128 CBC:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp aes-128-cbc
engine "afalg" set.
Doing aes-128-cbc for 3s on 16 size blocks: 89213 aes-128-cbc's in 0.25s
Doing aes-128-cbc for 3s on 64 size blocks: 89304 aes-128-cbc's in 0.27s
Doing aes-128-cbc for 3s on 256 size blocks: 84505 aes-128-cbc's in 0.23s
Doing aes-128-cbc for 3s on 1024 size blocks: 83619 aes-128-cbc's in 0.24s
Doing aes-128-cbc for 3s on 8192 size blocks: 53771 aes-128-cbc's in 0.16s
Doing aes-128-cbc for 3s on 16384 size blocks: 33297 aes-128-cbc's in 0.11s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc       5709.63k    21168.36k    94057.74k   356774.40k  2753075.20k  4959436.80k

In this version of OpenSSL the -evp is required to use the accelerated implementation, without it the option the normal version of the method is used:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg aes-128-cbc
engine "afalg" set.
Doing aes-128 cbc for 3s on 16 size blocks: 1276875 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 64 size blocks: 333337 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 84598 aes-128 cbc's in 2.98s
Doing aes-128 cbc for 3s on 1024 size blocks: 44947 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 5658 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 16384 size blocks: 2828 aes-128 cbc's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128 cbc       6832.78k     7111.19k     7267.48k    15341.91k    15501.78k    15444.65k

So the difference for AES-128 CBC is very significant: ~15 MB/s normally and ~2.7 GB/s with the kernel method.

Sadly this isn't one of our comparison methods, and they aren't accelerated. Still let's collect figures for them:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp aes-256-cbc
engine "afalg" set.
Doing aes-256-cbc for 3s on 16 size blocks: 855812 aes-256-cbc's in 2.96s
Doing aes-256-cbc for 3s on 64 size blocks: 234527 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 59972 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 32368 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 4048 aes-256-cbc's in 2.98s
Doing aes-256-cbc for 3s on 16384 size blocks: 2038 aes-256-cbc's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc       4626.01k     5003.24k     5117.61k    11048.28k    11127.92k    11130.20k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp idea
engine "afalg" set.
Doing idea-cbc for 3s on 16 size blocks: 1513297 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 64 size blocks: 431592 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 256 size blocks: 111836 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 1024 size blocks: 28037 idea-cbc's in 2.98s
Doing idea-cbc for 3s on 8192 size blocks: 3536 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 16384 size blocks: 1769 idea-cbc's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
idea-cbc          8070.92k     9207.30k     9543.34k     9634.19k     9655.64k     9661.10k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp md5
engine "afalg" set.
Doing md5 for 3s on 16 size blocks: 1085845 md5's in 3.00s
Doing md5 for 3s on 64 size blocks: 895226 md5's in 2.98s
Doing md5 for 3s on 256 size blocks: 587697 md5's in 3.00s
Doing md5 for 3s on 1024 size blocks: 243445 md5's in 2.99s
Doing md5 for 3s on 8192 size blocks: 38209 md5's in 3.00s
Doing md5 for 3s on 16384 size blocks: 19331 md5's in 2.98s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md5               5791.17k    19226.33k    50150.14k    83373.81k   104336.04k   106281.58k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp sha1
engine "afalg" set.
Doing sha1 for 3s on 16 size blocks: 806595 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 580960 sha1's in 2.98s
Doing sha1 for 3s on 256 size blocks: 323753 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 115538 sha1's in 2.98s
Doing sha1 for 3s on 8192 size blocks: 16687 sha1's in 3.00s
Doing sha1 for 3s on 16384 size blocks: 8434 sha1's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha1              4301.84k    12476.99k    27626.92k    39701.65k    45566.63k    46060.89k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp sha256
engine "afalg" set.
Doing sha256 for 3s on 16 size blocks: 510554 sha256's in 3.00s
Doing sha256 for 3s on 64 size blocks: 322074 sha256's in 3.00s
Doing sha256 for 3s on 256 size blocks: 168798 sha256's in 2.98s
Doing sha256 for 3s on 1024 size blocks: 56691 sha256's in 3.00s
Doing sha256 for 3s on 8192 size blocks: 7853 sha256's in 2.99s
Doing sha256 for 3s on 16384 size blocks: 3957 sha256's in 2.99s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256            2722.95k     6870.91k    14500.77k    19350.53k    21515.64k    21682.77k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp sha512
engine "afalg" set.
Doing sha512 for 3s on 16 size blocks: 187900 sha512's in 3.00s
Doing sha512 for 3s on 64 size blocks: 187783 sha512's in 3.00s
Doing sha512 for 3s on 256 size blocks: 69529 sha512's in 2.98s
Doing sha512 for 3s on 1024 size blocks: 24333 sha512's in 3.00s
Doing sha512 for 3s on 8192 size blocks: 3433 sha512's in 3.00s
Doing sha512 for 3s on 16384 size blocks: 1732 sha512's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha512            1002.13k     4006.04k     5972.96k     8305.66k     9374.38k     9459.03k

Sadly no improvement here.

A.4. VIA Luke, OpenSSL 1.1.0l, use VIA Padlock

The default compile (see A.2.) includes the 'padlock' engine, which provides access to the VIA PadLock (Wikipedia) method implementations utilizing the hardware acceleration in this processor:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl engine padlock -c
(padlock) VIA PadLock (no-RNG, ACE)
 [AES-128-ECB, AES-128-CBC, AES-128-CFB, AES-128-OFB, AES-128-CTR, AES-192-ECB, AES-192-CBC, AES-192-CFB, AES-192-OFB, AES-192-CTR, AES-256-ECB, AES-256-CBC, AES-256-CFB, AES-256-OFB, AES-256-CTR]

The VIA C3 Nehemiah core in the VIA Luke processor, only supports AES in its ACE version. But unlike the 'afalg' engine more of the AES methods are supported, including our selected comparison method:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine padlock -evp aes-256-cbc
engine "padlock" set.
Doing aes-256-cbc for 3s on 16 size blocks: 7997043 aes-256-cbc's in 2.98s
Doing aes-256-cbc for 3s on 64 size blocks: 6521328 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 3643683 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 1325294 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 189377 aes-256-cbc's in 2.98s
Doing aes-256-cbc for 3s on 16384 size blocks: 96281 aes-256-cbc's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc      42937.14k   139121.66k   310927.62k   452367.02k   520596.10k   525822.63k

Note the -evp is required to use the accelerated implementation, without the option the normal version of the method is used:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine padlock aes-256-cbc
engine "padlock" set.
Doing aes-256 cbc for 3s on 16 size blocks: 922597 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 64 size blocks: 238365 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 256 size blocks: 58655 aes-256 cbc's in 2.93s
Doing aes-256 cbc for 3s on 1024 size blocks: 32214 aes-256 cbc's in 2.98s
Doing aes-256 cbc for 3s on 8192 size blocks: 4076 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16384 size blocks: 2039 aes-256 cbc's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256 cbc       4936.97k     5102.13k     5124.81k    11069.51k    11130.20k    11135.66k

Which shows the difference the hardware makes: 520.5 MB/s with hardware and 11.1 MB/s without.

Doing this for each of the other methods in the comparison:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine padlock -evp idea
engine "padlock" set.
Doing idea-cbc for 3s on 16 size blocks: 1513517 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 64 size blocks: 428713 idea-cbc's in 2.98s
Doing idea-cbc for 3s on 256 size blocks: 111843 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 1024 size blocks: 28216 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 8192 size blocks: 3515 idea-cbc's in 2.98s
Doing idea-cbc for 3s on 16384 size blocks: 1758 idea-cbc's in 2.99s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
idea-cbc          8072.09k     9207.26k     9543.94k     9631.06k     9662.71k     9633.13k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine padlock -evp md5
engine "padlock" set.
Doing md5 for 3s on 16 size blocks: 1078848 md5's in 2.98s
Doing md5 for 3s on 64 size blocks: 900872 md5's in 3.00s
Doing md5 for 3s on 256 size blocks: 587686 md5's in 3.00s
Doing md5 for 3s on 1024 size blocks: 244869 md5's in 3.00s
Doing md5 for 3s on 8192 size blocks: 37962 md5's in 2.98s
Doing md5 for 3s on 16384 size blocks: 19446 md5's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md5               5792.47k    19218.60k    50149.21k    83581.95k   104357.28k   106201.09k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine padlock -evp sha1
engine "padlock" set.
Doing sha1 for 3s on 16 size blocks: 801140 sha1's in 2.98s
Doing sha1 for 3s on 64 size blocks: 586278 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 324677 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 116434 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 16486 sha1's in 2.97s
Doing sha1 for 3s on 16384 size blocks: 8384 sha1's in 2.98s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha1              4301.42k    12507.26k    27705.77k    39742.81k    45472.50k    46095.12k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine padlock -evp sha256
engine "padlock" set.
Doing sha256 for 3s on 16 size blocks: 510056 sha256's in 2.99s
Doing sha256 for 3s on 64 size blocks: 321581 sha256's in 3.00s
Doing sha256 for 3s on 256 size blocks: 169890 sha256's in 3.00s
Doing sha256 for 3s on 1024 size blocks: 55999 sha256's in 2.97s
Doing sha256 for 3s on 8192 size blocks: 7859 sha256's in 3.00s
Doing sha256 for 3s on 16384 size blocks: 3960 sha256's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256            2729.40k     6860.39k    14497.28k    19307.40k    21460.31k    21626.88k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine padlock -evp sha512
engine "padlock" set.
Doing sha512 for 3s on 16 size blocks: 186202 sha512's in 2.98s
Doing sha512 for 3s on 64 size blocks: 187168 sha512's in 3.00s
Doing sha512 for 3s on 256 size blocks: 69920 sha512's in 3.00s
Doing sha512 for 3s on 1024 size blocks: 24184 sha512's in 2.98s
Doing sha512 for 3s on 8192 size blocks: 3411 sha512's in 2.98s
Doing sha512 for 3s on 16384 size blocks: 1732 sha512's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha512             999.74k     3992.92k     5966.51k     8310.21k     9376.82k     9459.03k

As expected the other methods for comparison don't show any acceleration.

B. VIA Luke running Debian Linux 11, OpenSSL 3.0.2

The second generation of VIA's Corefusion (Wikipedia) x86 processor, VIA Luke, features a VIA C3 Nehemiah core with the VIA PadLock (Wikipedia) cryptographic accelerator. The PadLock implementation in Luke provides a hardware Random Number Generator (RNG) and the Advanced Cryptography Engine (ACE) supporting AES (Advanced Encryption Standard; Wikipedia).

OpenSSL implemented a 'padlock' engine to access the ACE acceleration of AES back in 2005 with the release of OpenSSL 0.9.8. The VIA C7 enhanced ACE, adding SHA (Secure Hash Algorithms) and PMM (PadLock Montgomery Multiplier), and support was added in OpenSSL in 2006. Some Zhaoxin processors feature PadLock with SM3 (Wikipedia) and SM4 (SM4) added.

VIA Luke @ 1.0 GHz
OpenSSL 3.0.2 on Debian Linux 11 for x86
MethodAES-256 CBCIDEA CBCMD5SHA-1SHA-256SHA-512
no-asm6,916.47k9,662.71k78,665.19k36,050.26k12,125.81k1,829.55k
asm11,127.47k9,655.64k103,931.19k45,509.29k21,493.73k9,368.92k
Kernel2,033,909.76k9,627.65k104,328.02k45,512.02k21,442.15k9,366.19k
Padlock515,844.78k9,658.37k103,765.33k45,520.21k21,435.73k9,368.92k

Notes:

  • While OpenSSL 3.0.2 changes the APIs the command-line interface remains the same
  • OpenSSL 3.x considers IDEA a "legacy" method. To include "legacy" methods in "speed" runs use the -provider legacy -provider default options.
  • The 'afalg' engine in OpenSSL 3.0.2 only supports AES-128 CBC, so no improvement is seen in our selected comparison methods. From a test run 'afalg' increases throughput for AES-128 CBC from ~15 MB/s to ~2.7 GB/s on this system

B.1. VIA Luke, OpenSSL 3.0.2, C methods only (no-asm)

Downloading the source distribution from the OpenSSL site and building with the no assembler option ($ ./config no-asm), gives a build using the portable C implementations for the methods.

Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default):

version: 3.0.2
built on: Sun May  1 13:04:48 2022 UTC
options: bn(64,32)
compiler: gcc -fPIC -pthread -m32 -Wall -O3 -fomit-frame-pointer -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: N/A
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
mdc2              1170.95k     1337.15k     1385.13k     1398.10k     1403.56k     1401.99k
md4               5395.54k    18021.31k    48302.34k    83558.74k   105521.15k   106501.46k
md5               5088.44k    16306.17k    40686.59k    64930.47k    78665.19k    79335.95k
sha1              3698.00k    10608.15k    22751.79k    31770.38k    36050.26k    36301.48k
rmd160            3510.66k     9560.31k    19649.45k    26672.81k    29721.34k    29975.02k
sha256            2098.78k     4985.83k     8995.70k    11200.98k    12125.81k    12173.31k
sha512             222.62k      891.72k     1221.41k     1643.86k     1829.55k     1839.80k
whirlpool         1011.46k     2198.19k     3729.83k     4519.94k     4818.99k     4838.74k
hmac(md5)         3755.22k    12565.82k    34336.85k    60485.41k    77701.12k    78888.96k
des-cbc           7854.77k     8906.50k     9268.14k     9356.97k     9390.56k     9374.07k
des-ede3          3146.78k     3299.69k     3345.83k     3358.24k     3361.45k     3358.72k
rc4              23275.45k    31302.89k    34076.84k    34981.28k    35233.79k    35187.37k
idea-cbc          8044.02k     9173.76k     9541.03k     9630.72k     9662.71k     9648.97k
seed-cbc         11114.26k    13166.34k    13754.22k    14015.44k    14024.70k    13991.94k
rc2-cbc           6113.20k     6845.26k     7035.31k     7088.13k     7082.38k     7118.00k
blowfish         12537.10k    15413.68k    16277.32k    16642.41k    16708.95k    16684.37k
cast-cbc          9889.85k    11743.85k    12360.45k    12531.71k    12579.39k    12561.07k
aes-128-cbc       8048.27k     9040.85k     9332.57k     9411.24k     9437.29k     9420.80k
aes-192-cbc       6972.55k     7686.14k     7889.63k     7961.94k     7981.74k     7968.09k
aes-256-cbc       6146.59k     6692.78k     6861.23k     6903.13k     6916.47k     6908.59k
camellia-128-cbc    11408.51k    13525.30k    14158.86k    14340.44k    14396.07k    14363.31k
camellia-192-cbc     9265.28k    10571.78k    10997.08k    11112.81k    11138.39k    11119.27k
camellia-256-cbc     9264.07k    10574.88k    10995.11k    11116.93k    11138.39k    11122.43k
ghash             4173.53k     4440.10k     4518.40k     4540.07k     4546.56k     4546.83k
rand               728.00k     2201.17k     4438.57k     5941.95k     6594.56k     6640.98k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.003164s 0.000250s    316.0   3996.5
rsa 1024 bits 0.018032s 0.000807s     55.5   1239.1
rsa 2048 bits 0.112921s 0.002828s      8.9    353.6
rsa 3072 bits 0.339333s 0.006157s      2.9    162.4
rsa 4096 bits 0.743571s 0.010040s      1.3     99.6
rsa 7680 bits 4.446667s 0.035406s      0.2     28.2
                  sign    verify    sign/s verify/s
dsa  512 bits 0.004443s 0.003235s    225.1    309.1
dsa 1024 bits 0.011718s 0.010071s     85.3     99.3
dsa 2048 bits 0.037879s 0.034687s     26.4     28.8
                              sign    verify    sign/s verify/s
 160 bits ecdsa (secp160r1)   0.0118s   0.0095s     84.5    105.7
 192 bits ecdsa (nistp192)   0.0121s   0.0092s     82.4    108.8
 224 bits ecdsa (nistp224)   0.0159s   0.0121s     62.9     82.8
 256 bits ecdsa (nistp256)   0.0178s   0.0134s     56.3     74.4
 384 bits ecdsa (nistp384)   0.0469s   0.0330s     21.3     30.3
 521 bits ecdsa (nistp521)   0.1487s   0.0953s      6.7     10.5
 163 bits ecdsa (nistk163)   0.0099s   0.0188s    101.4     53.3
 233 bits ecdsa (nistk233)   0.0185s   0.0357s     54.0     28.0
 283 bits ecdsa (nistk283)   0.0332s   0.0642s     30.1     15.6
 409 bits ecdsa (nistk409)   0.0753s   0.1463s     13.3      6.8
 571 bits ecdsa (nistk571)   0.1728s   0.3353s      5.8      3.0
 163 bits ecdsa (nistb163)   0.0106s   0.0203s     94.3     49.3
 233 bits ecdsa (nistb233)   0.0203s   0.0392s     49.3     25.5
 283 bits ecdsa (nistb283)   0.0368s   0.0714s     27.2     14.0
 409 bits ecdsa (nistb409)   0.0849s   0.1656s     11.8      6.0
 571 bits ecdsa (nistb571)   0.1971s   0.3830s      5.1      2.6
 256 bits ecdsa (brainpoolP256r1)   0.0266s   0.0227s     37.6     44.1
 256 bits ecdsa (brainpoolP256t1)   0.0266s   0.0207s     37.6     48.3
 384 bits ecdsa (brainpoolP384r1)   0.0763s   0.0619s     13.1     16.2
 384 bits ecdsa (brainpoolP384t1)   0.0759s   0.0561s     13.2     17.8
 512 bits ecdsa (brainpoolP512r1)   0.1557s   0.1235s      6.4      8.1
 512 bits ecdsa (brainpoolP512t1)   0.1552s   0.1116s      6.4      9.0
                              op      op/s
 160 bits ecdh (secp160r1)   0.0109s     91.4
 192 bits ecdh (nistp192)   0.0111s     90.4
 224 bits ecdh (nistp224)   0.0145s     69.1
 256 bits ecdh (nistp256)   0.0163s     61.4
 384 bits ecdh (nistp384)   0.0429s     23.3
 521 bits ecdh (nistp521)   0.1386s      7.2
 163 bits ecdh (nistk163)   0.0090s    111.4
 233 bits ecdh (nistk233)   0.0172s     58.2
 283 bits ecdh (nistk283)   0.0310s     32.3
 409 bits ecdh (nistk409)   0.0706s     14.2
 571 bits ecdh (nistk571)   0.1621s      6.2
 163 bits ecdh (nistb163)   0.0097s    102.9
 233 bits ecdh (nistb233)   0.0189s     52.8
 283 bits ecdh (nistb283)   0.0346s     28.9
 409 bits ecdh (nistb409)   0.0805s     12.4
 571 bits ecdh (nistb571)   0.1854s      5.4
 256 bits ecdh (brainpoolP256r1)   0.0251s     39.8
 256 bits ecdh (brainpoolP256t1)   0.0251s     39.8
 384 bits ecdh (brainpoolP384r1)   0.0722s     13.8
 384 bits ecdh (brainpoolP384t1)   0.0720s     13.9
 512 bits ecdh (brainpoolP512r1)   0.1476s      6.8
 512 bits ecdh (brainpoolP512t1)   0.1474s      6.8
 253 bits ecdh (X25519)   0.0045s    223.7
 448 bits ecdh (X448)   0.0198s     50.4
                              sign    verify    sign/s verify/s
 253 bits EdDSA (Ed25519)   0.0017s   0.0052s    601.0    193.2
 456 bits EdDSA (Ed448)   0.0084s   0.0221s    119.5     45.3
                              sign    verify    sign/s verify/s
 256 bits SM2 (CurveSM2)   0.0266s   0.0196s     37.6     51.0
                       op     op/s
2048 bits ffdh   0.3654s      2.7
3072 bits ffdh   1.1711s      0.9
4096 bits ffdh   2.5200s      0.4
6144 bits ffdh   8.6400s      0.1

These results form a baseline for the performance of the methods.

B.2. VIA Luke, OpenSSL 3.0.2, assembler methods

Downloading the source distribution from the OpenSSL site and building with the default options (target linux-x86), should give a build with assembler methods enabled. But there is a problem with the default build on our i686 compatible x86 processor:

$ LD_LIBRARY_PATH=`pwd` apps/openssl version
Illegal instruction

Doing a little debugging, this appears to be due to the use of 'ENDBR32' instructions (see assembly - How do old CPUs execute the new ENDBR64 and ENDBR32 instructions? - Stack Overflow) which are not supported by the VIA C3. These instructions are present in the OpenSSL generated assembler. Since these instructions are not supported by i386, i486, i586 and some i686 processors, it seems odd that OpenSSL is generating them by default for both the regular x86 assembler and the explicit 386 assembler. Commenting out the line generating the EDNBR32 opcodes in 'crypto/perlasm/x86asm.pl' (method sub ::endbranch, line 117 # &::data_byte(0xf3,0x0f,0x1e,0xfb);) is sufficient as a workaround to get a build that does not show this issue.

With a workaround in place we can see the performance...

Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default):

version: 3.0.2
built on: Sat Apr 30 10:20:30 2022 UTC
options: bn(64,32)
compiler: gcc -fPIC -pthread -m32 -Wa,--noexecstack -Wall -O3 -fomit-frame-pointer -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: OPENSSL_ia32cap=0x381bf3f:0x0
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
mdc2              1200.92k     1371.68k     1416.45k     1428.82k     1432.23k     1430.87k
md4               5257.54k    17644.57k    47449.86k    82751.49k   105796.95k   107249.66k
md5               5257.45k    17681.34k    47454.98k    81704.28k   103931.19k   105414.66k
sha1              4053.67k    12019.24k    27080.28k    39416.15k    45509.29k    45880.70k
rmd160            3324.45k     9072.49k    18345.05k    24597.16k    27355.23k    27508.74k
sha256            2652.97k     6705.41k    14340.10k    19297.56k    21493.73k    21550.42k
sha512             993.46k     3970.23k     5952.60k     8320.77k     9368.92k     9445.54k
whirlpool         1824.34k     4539.10k     8518.27k    10928.97k    11922.09k    11987.63k
hmac(md5)         4105.93k    14171.82k    40664.40k    76169.90k   102596.61k   104666.45k
des-cbc           8860.91k    10261.33k    10702.08k    10818.56k    10853.03k    10835.29k
des-ede3          3463.89k     3677.75k     3723.00k     3748.39k     3731.61k     3751.22k
rc4              24488.32k    36922.26k    43253.43k    45180.59k    45774.17k    45765.97k
idea-cbc          8000.61k     9179.52k     9568.92k     9634.19k     9655.64k     9627.65k
seed-cbc         11136.58k    13181.47k    13790.29k    14020.24k    14027.43k    14025.36k
rc2-cbc           6005.72k     6670.74k     6861.74k     6912.00k     6922.24k     6924.97k
blowfish         15662.13k    19262.18k    20483.18k    20859.80k    20930.56k    20914.34k
cast-cbc         10123.87k    11847.43k    12347.25k    12572.94k    12622.26k    12571.99k
aes-128-cbc       6036.35k     6679.00k     6935.13k    15294.02k    15444.65k    15433.73k
aes-192-cbc       5087.79k     5637.65k     5821.82k    13020.90k    13088.09k    13035.97k
aes-256-cbc       4452.21k     4830.24k     4928.51k    11047.17k    11127.47k    11119.27k
camellia-128-cbc    10217.49k    14540.18k    16149.50k    16660.82k    16810.09k    16744.45k
camellia-192-cbc     8600.76k    11301.48k    12345.30k    12644.01k    12735.83k    12697.60k
camellia-256-cbc     8573.67k    11309.98k    12352.60k    12636.16k    12738.83k    12697.60k
ghash            11669.10k    18477.14k    21751.61k    22697.30k    23022.82k    23003.58k
rand               624.36k     1801.88k     3367.44k     4291.24k     4685.06k     4686.59k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.002285s 0.000195s    437.7   5128.6
rsa 1024 bits 0.014217s 0.000694s     70.3   1441.6
rsa 2048 bits 0.098333s 0.002579s     10.2    387.7
rsa 3072 bits 0.302727s 0.005672s      3.3    176.3
rsa 4096 bits 0.685333s 0.009980s      1.5    100.2
rsa 7680 bits 4.220000s 0.034567s      0.2     28.9
                  sign    verify    sign/s verify/s
dsa  512 bits 0.003380s 0.002566s    295.9    389.7
dsa 1024 bits 0.009930s 0.008754s    100.7    114.2
dsa 2048 bits 0.034653s 0.032745s     28.9     30.5
                              sign    verify    sign/s verify/s
 160 bits ecdsa (secp160r1)   0.0072s   0.0061s    138.8    165.2
 192 bits ecdsa (nistp192)   0.0101s   0.0083s     98.9    120.0
 224 bits ecdsa (nistp224)   0.0147s   0.0117s     68.2     85.5
 256 bits ecdsa (nistp256)   0.0019s   0.0052s    528.9    193.8
 384 bits ecdsa (nistp384)   0.0542s   0.0402s     18.5     24.9
 521 bits ecdsa (nistp521)   0.1497s   0.1081s      6.7      9.3
 163 bits ecdsa (nistk163)   0.0088s   0.0172s    113.9     58.2
 233 bits ecdsa (nistk233)   0.0170s   0.0333s     58.8     30.0
 283 bits ecdsa (nistk283)   0.0309s   0.0605s     32.4     16.5
 409 bits ecdsa (nistk409)   0.0706s   0.1381s     14.2      7.2
 571 bits ecdsa (nistk571)   0.1626s   0.3184s      6.2      3.1
 163 bits ecdsa (nistb163)   0.0095s   0.0185s    105.6     54.0
 233 bits ecdsa (nistb233)   0.0187s   0.0366s     53.4     27.3
 283 bits ecdsa (nistb283)   0.0343s   0.0673s     29.2     14.9
 409 bits ecdsa (nistb409)   0.0798s   0.1567s     12.5      6.4
 571 bits ecdsa (nistb571)   0.1856s   0.3636s      5.4      2.8
 256 bits ecdsa (brainpoolP256r1)   0.0194s   0.0165s     51.5     60.7
 256 bits ecdsa (brainpoolP256t1)   0.0194s   0.0154s     51.6     65.1
 384 bits ecdsa (brainpoolP384r1)   0.0542s   0.0439s     18.5     22.8
 384 bits ecdsa (brainpoolP384t1)   0.0539s   0.0396s     18.6     25.2
 512 bits ecdsa (brainpoolP512r1)   0.1250s   0.0979s      8.0     10.2
 512 bits ecdsa (brainpoolP512t1)   0.1244s   0.0888s      8.0     11.3
                              op      op/s
 160 bits ecdh (secp160r1)   0.0067s    148.2
 192 bits ecdh (nistp192)   0.0096s    104.3
 224 bits ecdh (nistp224)   0.0139s     71.9
 256 bits ecdh (nistp256)   0.0037s    268.4
 384 bits ecdh (nistp384)   0.0515s     19.4
 521 bits ecdh (nistp521)   0.1420s      7.0
 163 bits ecdh (nistk163)   0.0083s    120.0
 233 bits ecdh (nistk233)   0.0162s     61.9
 283 bits ecdh (nistk283)   0.0294s     34.0
 409 bits ecdh (nistk409)   0.0670s     14.9
 571 bits ecdh (nistk571)   0.1549s      6.5
 163 bits ecdh (nistb163)   0.0090s    110.9
 233 bits ecdh (nistb233)   0.0178s     56.1
 283 bits ecdh (nistb283)   0.0329s     30.4
 409 bits ecdh (nistb409)   0.0766s     13.1
 571 bits ecdh (nistb571)   0.1772s      5.6
 256 bits ecdh (brainpoolP256r1)   0.0184s     54.4
 256 bits ecdh (brainpoolP256t1)   0.0184s     54.4
 384 bits ecdh (brainpoolP384r1)   0.0515s     19.4
 384 bits ecdh (brainpoolP384t1)   0.0512s     19.5
 512 bits ecdh (brainpoolP512r1)   0.1189s      8.4
 512 bits ecdh (brainpoolP512t1)   0.1185s      8.4
 253 bits ecdh (X25519)   0.0045s    223.8
 448 bits ecdh (X448)   0.0199s     50.3
                              sign    verify    sign/s verify/s
 253 bits EdDSA (Ed25519)   0.0015s   0.0051s    669.9    197.6
 456 bits EdDSA (Ed448)   0.0075s   0.0220s    132.8     45.5
                              sign    verify    sign/s verify/s
 256 bits SM2 (CurveSM2)   0.0195s   0.0146s     51.4     68.3
                       op     op/s
2048 bits ffdh   0.3360s      3.0
3072 bits ffdh   1.0860s      0.9
4096 bits ffdh   2.5250s      0.4
6144 bits ffdh   8.3400s      0.1

With the assembler in place many methods have improved performance.

B.3. VIA Luke, OpenSSL 3.0.2, use kernel methods (AF_ALG)

The default compile (see B.2.) includes the 'afalg' engine, which provides access to the Linux Kernel Crypto API (AF_ALG) method implementations. Details of the available kernel crypto methods can be found in /proc/crypto:

$ cat /proc/crypto | grep '^name'
name         : cbc(aes)
name         : ecb(aes)
name         : aes
name         : crc32c
name         : crct10dif
name         : pkcs1pad(rsa,sha256)
name         : hmac(sha256)
name         : hmac(sha1)
name         : lzo-rle
name         : lzo-rle
name         : lzo
name         : lzo
name         : zlib-deflate
name         : deflate
name         : deflate
name         : sha224
name         : sha256
name         : sha1
name         : md5
name         : ecb(cipher_null)
name         : digest_null
name         : compress_null
name         : cipher_null
name         : rsa
name         : dh

Note that on this system the kernel has loaded a module with support for VIA PadLock ACE, so the hardware acceleration may be being used:

$ lsmod | grep padlock
padlock_aes            16384  0
libaes                 16384  1 padlock_aes

The 'afalg' engine in OpenSSL supports a subset of the available methods:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl engine afalg -c
(afalg) AFALG engine support
 [AES-128-CBC, AES-192-CBC, AES-256-CBC]
00877BB7:error:1280006A:DSO support routines:dlfcn_bind_func:could not bind to the requested symbol name:crypto/dso/dso_dlfcn.c:188:symname(EVP_PKEY_base_id): /home/hamish/src/openssl-3.0.2-asm_no_endbr32/engines/afalg.so: undefined symbol: EVP_PKEY_base_id
00877BB7:error:1280006A:DSO support routines:DSO_bind_func:could not bind to the requested symbol name:crypto/dso/dso_lib.c:176:

Unlike our test with OpenSSL 1.1.0l, this version of the 'afalg' engine supports the AES-256 CBC method used in the comparisons. Also OpenSSL 3.0.2 uses the EVP methods by default in the openssl program, so a single run can give results for all the methods ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default -engine afalg):

version: 3.0.2
built on: Sat Apr 30 10:20:30 2022 UTC
options: bn(64,32)
compiler: gcc -fPIC -pthread -m32 -Wa,--noexecstack -Wall -O3 -fomit-frame-pointer -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: OPENSSL_ia32cap=0x381bf3f:0x0
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
mdc2              1203.49k     1368.03k     1416.28k     1429.16k     1433.60k     1430.18k
md4               5224.81k    17555.18k    47432.85k    82940.92k   105802.41k   107071.36k
md5               5256.77k    17709.38k    47522.56k    81770.15k   104328.02k   105551.19k
sha1              4050.69k    12006.21k    27084.11k    39410.35k    45512.02k    45880.66k
rmd160            3313.78k     9059.78k    18292.99k    24585.22k    27314.86k    27511.92k
sha256            2643.44k     6703.43k    14320.90k    19273.73k    21442.15k    21588.65k
sha512             992.24k     3964.20k     5947.99k     8280.06k     9366.19k     9445.54k
whirlpool         1825.70k     4532.37k     8526.69k    10942.02k    11922.09k    11976.70k
hmac(md5)         4127.19k    14260.58k    40742.23k    76256.04k   102768.64k   104748.37k
des-cbc           8853.75k    10257.81k    10701.65k    10816.51k    10855.77k    10846.21k
des-ede3          3462.86k     3664.04k     3734.67k     3735.21k     3728.58k     3741.01k
rc4              24405.02k    36925.59k    43289.77k    45176.15k    45757.78k    45744.13k
idea-cbc          8050.62k     9199.81k     9542.23k     9656.08k     9627.65k     9650.18k
seed-cbc         11173.68k    13136.48k    13794.22k    14021.61k    14071.61k    13972.98k
rc2-cbc           6004.93k     6675.95k     6861.48k     6912.34k     6924.97k     6924.97k
blowfish         15661.21k    19324.82k    20470.53k    20824.41k    20925.10k    20897.85k
cast-cbc         10140.27k    11779.85k    12382.72k    12537.47k    12580.18k    12571.99k
aes-128-cbc       7395.12k    26822.40k    93369.60k   251783.53k  2702028.80k  6126523.73k
aes-192-cbc       4927.35k    22058.67k    71679.12k   277274.48k  2596556.80k  3803194.51k
aes-256-cbc       7853.33k    20938.19k    82461.39k   289389.46k  2033909.76k  5132615.68k
camellia-128-cbc    10163.22k    14395.93k    16187.85k    16708.67k    16807.25k    16790.85k
camellia-192-cbc     8520.49k    11297.22k    12348.05k    12608.56k    12730.37k    12761.99k
camellia-256-cbc     8494.83k    11262.12k    12342.19k    12641.62k    12733.34k    12719.45k
ghash            11647.89k    18520.58k    21678.25k    22770.47k    22921.16k    22948.52k
rand               628.16k     1800.77k     3364.98k     4306.62k     4674.90k     4702.21k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.002285s 0.000195s    437.6   5135.9
rsa 1024 bits 0.014217s 0.000694s     70.3   1441.8
rsa 2048 bits 0.098333s 0.002579s     10.2    387.8
rsa 3072 bits 0.303030s 0.005672s      3.3    176.3
rsa 4096 bits 0.684667s 0.009980s      1.5    100.2
rsa 7680 bits 4.226667s 0.034567s      0.2     28.9
                  sign    verify    sign/s verify/s
dsa  512 bits 0.003380s 0.002556s    295.9    391.3
dsa 1024 bits 0.009930s 0.008754s    100.7    114.2
dsa 2048 bits 0.034653s 0.032395s     28.9     30.9
                              sign    verify    sign/s verify/s
 160 bits ecdsa (secp160r1)   0.0072s   0.0060s    138.6    167.3
 192 bits ecdsa (nistp192)   0.0101s   0.0083s     98.8    120.6
 224 bits ecdsa (nistp224)   0.0147s   0.0119s     68.1     84.3
 256 bits ecdsa (nistp256)   0.0019s   0.0052s    530.3    193.7
 384 bits ecdsa (nistp384)   0.0542s   0.0400s     18.4     25.0
 521 bits ecdsa (nistp521)   0.1497s   0.1078s      6.7      9.3
 163 bits ecdsa (nistk163)   0.0088s   0.0172s    114.0     58.3
 233 bits ecdsa (nistk233)   0.0170s   0.0333s     58.8     30.0
 283 bits ecdsa (nistk283)   0.0308s   0.0604s     32.4     16.5
 409 bits ecdsa (nistk409)   0.0707s   0.1381s     14.1      7.2
 571 bits ecdsa (nistk571)   0.1626s   0.3184s      6.2      3.1
 163 bits ecdsa (nistb163)   0.0095s   0.0185s    105.7     53.9
 233 bits ecdsa (nistb233)   0.0187s   0.0366s     53.6     27.3
 283 bits ecdsa (nistb283)   0.0343s   0.0674s     29.2     14.8
 409 bits ecdsa (nistb409)   0.0798s   0.1567s     12.5      6.4
 571 bits ecdsa (nistb571)   0.1854s   0.3636s      5.4      2.8
 256 bits ecdsa (brainpoolP256r1)   0.0194s   0.0164s     51.6     61.1
 256 bits ecdsa (brainpoolP256t1)   0.0194s   0.0154s     51.6     65.0
 384 bits ecdsa (brainpoolP384r1)   0.0542s   0.0435s     18.5     23.0
 384 bits ecdsa (brainpoolP384t1)   0.0539s   0.0395s     18.5     25.3
 512 bits ecdsa (brainpoolP512r1)   0.1250s   0.0980s      8.0     10.2
 512 bits ecdsa (brainpoolP512t1)   0.1247s   0.0882s      8.0     11.3
                              op      op/s
 160 bits ecdh (secp160r1)   0.0067s    148.2
 192 bits ecdh (nistp192)   0.0096s    104.2
 224 bits ecdh (nistp224)   0.0139s     71.9
 256 bits ecdh (nistp256)   0.0037s    268.4
 384 bits ecdh (nistp384)   0.0515s     19.4
 521 bits ecdh (nistp521)   0.1420s      7.0
 163 bits ecdh (nistk163)   0.0083s    120.2
 233 bits ecdh (nistk233)   0.0162s     61.9
 283 bits ecdh (nistk283)   0.0294s     34.0
 409 bits ecdh (nistk409)   0.0671s     14.9
 571 bits ecdh (nistk571)   0.1551s      6.4
 163 bits ecdh (nistb163)   0.0090s    111.0
 233 bits ecdh (nistb233)   0.0178s     56.1
 283 bits ecdh (nistb283)   0.0329s     30.4
 409 bits ecdh (nistb409)   0.0765s     13.1
 571 bits ecdh (nistb571)   0.1774s      5.6
 256 bits ecdh (brainpoolP256r1)   0.0184s     54.3
 256 bits ecdh (brainpoolP256t1)   0.0184s     54.2
 384 bits ecdh (brainpoolP384r1)   0.0515s     19.4
 384 bits ecdh (brainpoolP384t1)   0.0513s     19.5
 512 bits ecdh (brainpoolP512r1)   0.1190s      8.4
 512 bits ecdh (brainpoolP512t1)   0.1185s      8.4
 253 bits ecdh (X25519)   0.0045s    223.6
 448 bits ecdh (X448)   0.0199s     50.4
                              sign    verify    sign/s verify/s
 253 bits EdDSA (Ed25519)   0.0015s   0.0051s    670.5    196.1
 456 bits EdDSA (Ed448)   0.0075s   0.0220s    133.0     45.5
                              sign    verify    sign/s verify/s
 256 bits SM2 (CurveSM2)   0.0195s   0.0148s     51.4     67.6
                       op     op/s
2048 bits ffdh   0.3360s      3.0
3072 bits ffdh   1.0870s      0.9
4096 bits ffdh   2.5225s      0.4
6144 bits ffdh   8.3400s      0.1

Here the use PadLock acceleration by the kernel is evident in how much more throughput the AES methods have.

B.4. VIA Luke, OpenSSL 3.0.2, use VIA Padlock

The default compile (see B.2.) also supports the 'padlock' engine, which gives access to the VIA Padlock acceleration for AES:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl engine padlock -c
(padlock) VIA PadLock (no-RNG, ACE)
 [AES-128-ECB, AES-128-CBC, AES-128-CFB, AES-128-OFB, AES-128-CTR, AES-192-ECB, AES-192-CBC, AES-192-CFB, AES-192-OFB, AES-192-CTR, AES-256-ECB, AES-256-CBC, AES-256-CFB, AES-256-OFB, AES-256-CTR]
008785B7:error:1280006A:DSO support routines:dlfcn_bind_func:could not bind to the requested symbol name:crypto/dso/dso_dlfcn.c:188:symname(EVP_PKEY_base_id): /home/hamish/src/openssl-3.0.2-asm_no_endbr32/engines/padlock.so: undefined symbol: EVP_PKEY_base_id
008785B7:error:1280006A:DSO support routines:DSO_bind_func:could not bind to the requested symbol name:crypto/dso/dso_lib.c:176:

The errors here look like those seen when the default provider is not being loaded, not sure why they happen in this case.

In previous versions of OpenSSL the '-evp' options and the Padlock engine had to be specified to use the accelerated method, this still works (with providers specified):

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default -engine padlock -evp aes-256-cbc
Engine "padlock" set.
Doing AES-256-CBC for 3s on 16 size blocks: 4665275 AES-256-CBC's in 2.98s
Doing AES-256-CBC for 3s on 64 size blocks: 4121198 AES-256-CBC's in 3.00s
Doing AES-256-CBC for 3s on 256 size blocks: 2758722 AES-256-CBC's in 3.00s
Doing AES-256-CBC for 3s on 1024 size blocks: 1183837 AES-256-CBC's in 3.00s
Doing AES-256-CBC for 3s on 8192 size blocks: 186153 AES-256-CBC's in 2.98s
Doing AES-256-CBC for 3s on 16384 size blocks: 95474 AES-256-CBC's in 3.00s
version: 3.0.2
built on: Sat Apr 30 10:20:30 2022 UTC
options: bn(64,32)
compiler: gcc -fPIC -pthread -m32 -Wa,--noexecstack -Wall -O3 -fomit-frame-pointer -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: OPENSSL_ia32cap=0x381bf3f:0x0
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
AES-256-CBC      25048.46k    87918.89k   235410.94k   404083.03k   511733.35k   521415.34k
Segmentation fault

That segfault doesn't look good... but the method ran okay, and the figures show acceleration, so I'm guessing the fault comes from clean-up code when returning from the Padlock engine (see [WIP] Add a test case for the engine crash with AES-256-CTR by bernd-edlinger · Pull Request #18024 · openssl/openssl).

In OpenSSL 3.0 the -evp option is now optional, since the EVP methods are used by default:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default -engine padlock aes-256-cbc
Engine "padlock" set.
Doing aes-256-cbc for 3s on 16 size blocks: 5783292 aes-256-cbc's in 2.98s
Doing aes-256-cbc for 3s on 64 size blocks: 4927575 aes-256-cbc's in 2.98s
Doing aes-256-cbc for 3s on 256 size blocks: 3110981 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 1245940 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 187401 aes-256-cbc's in 2.98s
Doing aes-256-cbc for 3s on 16384 size blocks: 95841 aes-256-cbc's in 3.00s
version: 3.0.2
built on: Sat Apr 30 10:20:30 2022 UTC
options: bn(64,32)
compiler: gcc -fPIC -pthread -m32 -Wa,--noexecstack -Wall -O3 -fomit-frame-pointer -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: OPENSSL_ia32cap=0x381bf3f:0x0
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc      31051.23k   105827.11k   265470.38k   425280.85k   515164.09k   523419.65k
Segmentation fault

While a full 'speed' run should work, there are issues with the public-key methods when using the 'padlock' engine that mean a full run doesn't complete. So we'll just run the comparison methods this time ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default -engine padlock aes-256-cbc idea md5 sha1 sha256 sha512):

version: 3.0.2
built on: Sat Apr 30 10:20:30 2022 UTC
options: bn(64,32)
compiler: gcc -fPIC -pthread -m32 -Wa,--noexecstack -Wall -O3 -fomit-frame-pointer -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: OPENSSL_ia32cap=0x381bf3f:0x0
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md5               5226.37k    17711.27k    47580.84k    81752.41k   103765.33k   105966.25k
sha1              4069.17k    12035.98k    26991.73k    39550.03k    45520.21k    46011.73k
sha256            2642.86k     6697.94k    14379.07k    19284.76k    21435.73k    21610.50k
sha512             992.36k     3967.08k     5949.53k     8293.38k     9368.92k     9456.54k
idea-cbc          8059.19k     9200.92k     9572.09k     9631.06k     9658.37k     9654.46k
aes-256-cbc      31047.67k   105690.01k   265461.16k   425332.83k   515844.78k   523709.10k
Segmentation fault

As expected the Padlock engine only accelerates the AES methods on this hardware, the other methods show the same performance as without the engine. Interestingly the 'padlock' engine shows better performance for AES with small block sizes, than the 'afalg' engine using the kernel implementation, which also utilizes the PadLock hardware.

C. Notes for Debian Linux 11 and OpenSSL 1.1.0l on AMD Ryzen

AMD's Ryzen processors implement Intel's Advanced Encryption Standard (AES) New Instructions (AES-NI) and SHA Extensions (SHA Ext.) instruction set extensions for the acceleration of AES and SHA cryptographic methods.

Due to issues with the selection of AES assembler modes in the OpenSSL 1.1.1 series, OpenSSL 1.1.0l is being used instead of the current OpenSSL 1.1.1 series. This serves to better illustrate the differences in AES performance for the C method implementations (i.e. no assembler) and processor family assembler comparisons.

AMD Ryzen 5 3600
OpenSSL 1.1.0l on Debian Linux 11 for x86_64
MethodAES-256 CBCIDEA CBCMD5SHA-1SHA-256SHA-512
no-asm212,366.68k119,010.65k749,469.70k845,922.30k307,301.03k553,937.58k
asm232,721.07k119,545.86k793,971.37k1,050,842.45k470,701.40k604,972.40k
AES-NI & SHA Ext.1,086,559.57k119,250.94k790,495.23k1,027,295.91k470,671.36k605,863.94k
Kernel1,092,332.20k120,416.94k787,464.19k1,029,693.44k475,791.36k601,159.00k

Notes:

  • During the build tests (make test) the 'fuzz' test fails due to a missing update (see repo/gentoo.git - Official Gentoo ebuild repository for a patch that fixes the issue).
  • A runtime issue with the Elliptic Curve methods means some ECDH methods fail to give meaningful results, so the benchmark builds disable these methods (no-ec).
  • The OpenSSL 1.1.0l assembler SHA method implementations automatically utilize the SHA extensions, if they are available, making the 'asm' and SHA Ext. results are similar for those methods
  • The 'afalg' engine in OpenSSL 1.1.0l only supports AES-128 CBC, so no improvement is seen in our selected comparison methods. From a test run 'afalg' increases throughput for AES-128 CBC from ~15 MB/s to ~2.7 GB/s on this system. Since the 'afalg' method use requires use of the '-evp' option, methods show performance that includes acceleration from AES-NI and SHA Ext.

C.1. Debian Linux 11, OpenSSL 1.1.0l, no assembler compile

Downloading the source distribution from the OpenSSL site and building with the no assembler option ($ ./config no-asm no-ec), gives a build using the portable C implementations for the methods.

Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed):

OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" 
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md2                  0.00         0.00         0.00         0.00         0.00         0.00 
mdc2             20730.02k    22381.89k    22966.87k    23125.67k    23177.90k    23101.44k
md4             116766.41k   351480.79k   790411.01k  1157575.68k  1333168.81k  1353602.39k
md5             156742.81k   342366.40k   581831.32k   701704.19k   749469.70k   753330.86k
hmac(md5)        69585.79k   204529.32k   447552.26k   638072.83k   739137.58k   739519.15k
sha1            173182.02k   377457.00k   646010.28k   789712.55k   845922.30k   851148.80k
rmd160           55190.68k   131163.46k   236064.09k   296644.27k   319621.80k   321574.23k
rc4             385952.59k   392689.51k   392603.82k   398494.04k   396479.15k   401604.61k
des cbc          93933.23k    97041.13k    98196.31k    98509.82k    98492.42k    98435.07k
des ede3         36440.89k    36739.35k    36494.93k    37111.13k    37102.36k    37295.45k
idea cbc        114402.18k   117464.04k   118522.45k   119008.60k   119010.65k   119007.91k
seed cbc        106145.94k   109854.22k   109158.91k   110865.07k   110428.16k   109690.88k
rc2 cbc          56211.93k    57472.96k    58497.19k    58632.53k    58684.76k    58621.95k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00         0.00 
blowfish cbc    152106.21k   161719.22k   163615.91k   164193.96k   163667.97k   164080.30k
cast cbc        134783.27k   137901.70k   138407.34k   137374.72k   137805.82k   137620.14k
aes-128 cbc     268837.55k   279241.41k   283237.21k   284580.86k   286363.83k   285250.90k
aes-192 cbc     233378.87k   241100.22k   239992.66k   240700.76k   245841.92k   244094.29k
aes-256 cbc     203023.76k   210830.19k   210040.41k   209316.86k   212366.68k   213166.76k
camellia-128 cbc   192874.25k   196834.09k   198303.66k   198697.98k   197648.38k   195930.79k
camellia-192 cbc   148299.39k   152704.83k   150698.15k   152804.01k   152867.10k   152895.49k
camellia-256 cbc   148925.98k   152378.58k   153129.98k   153295.53k   152136.36k   151202.47k
sha256           69916.46k   147095.52k   240012.71k   287934.81k   307301.03k   305905.66k
sha512           62274.78k   254578.13k   347497.91k   490810.03k   553937.58k   570299.73k
whirlpool        38258.02k    78932.71k   129490.35k   153168.21k   162155.02k   162409.13k
aes-128 ige     240440.92k   262720.19k   270912.39k   272708.61k   272829.10k   273110.36k
aes-192 ige     214961.81k   231054.78k   235207.59k   233662.81k   231488.39k   234067.29k
aes-256 ige     188522.23k   201385.24k   206791.34k   206300.16k   205572.78k   208071.32k
ghash           335113.40k   341217.81k   343153.24k   346411.01k   348973.74k   350332.66k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.000112s 0.000006s   8900.5 175828.4
rsa 1024 bits 0.000546s 0.000017s   1830.4  60009.2
rsa 2048 bits 0.003192s 0.000055s    313.3  18201.4
rsa 3072 bits 0.007916s 0.000122s    126.3   8207.5
rsa 4096 bits 0.019550s 0.000196s     51.2   5092.3
rsa 7680 bits 0.098922s 0.000749s     10.1   1335.7
rsa 15360 bits 0.716429s 0.002938s      1.4    340.3
                  sign    verify    sign/s verify/s
dsa  512 bits 0.000143s 0.000087s   6983.6  11526.8
dsa 1024 bits 0.000337s 0.000247s   2967.9   4051.4
dsa 2048 bits 0.000975s 0.000900s   1026.0   1110.8

These results give a performance baseline for this platform.

C.2. Debian Linux 11, OpenSSL 1.1.0l, default compile

Downloading the source distribution from the OpenSSL site and building with the default options ($ ./config no-ec), gives a build with the assembler methods enabled.

Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed):

OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md2                  0.00         0.00         0.00         0.00         0.00         0.00 
mdc2             20798.84k    22593.45k    23178.67k    23328.09k    23166.98k    23358.12k
md4              95277.68k   297859.71k   719539.71k  1120059.05k  1334834.52k  1350079.83k
md5              97994.57k   340445.14k   597550.93k   741420.03k   793971.37k   789266.43k
hmac(md5)        61321.42k   186359.87k   430417.17k   649776.47k   782076.59k   791876.95k
sha1            103850.11k   409047.21k   757470.63k   967015.77k  1050842.45k  1058881.54k
rmd160           48997.42k   119448.55k   221065.56k   283963.73k   309026.82k   310028.97k
rc4             544631.54k   601012.33k   546238.46k   511943.34k   499725.65k   500438.36k
des cbc          87352.36k    89708.48k    91152.30k    91687.94k    91854.17k    91805.01k
des ede3         33937.44k    34380.59k    33691.82k    34263.38k    34463.74k    33878.41k
idea cbc        114699.64k   117182.55k   118750.38k   118914.73k   119545.86k   119444.82k
seed cbc        106397.13k   109433.28k   110279.17k   110304.60k   110235.47k   110269.78k
rc2 cbc          56623.22k    57481.81k    57929.39k    57963.52k    58387.11k    58507.26k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00         0.00 
blowfish cbc    150240.14k   160040.47k   163246.08k   163654.66k   163779.93k   163616.09k
cast cbc        134531.68k   137505.19k   137351.00k   137942.36k   138179.93k   136642.56k
aes-128 cbc     143370.76k   181033.49k   190430.63k   301337.17k   303351.67k   304562.18k
aes-192 cbc     127582.04k   154253.95k   161869.06k   257087.15k   265120.43k   265540.95k
aes-256 cbc     113777.54k   133252.31k   138981.80k   232166.74k   232721.07k   235678.38k
camellia-128 cbc   183278.21k   218869.42k   225665.79k   228573.87k   232745.64k   233619.46k
camellia-192 cbc   145994.45k   163625.87k   169293.82k   173766.66k   174830.93k   175484.15k
camellia-256 cbc   144564.23k   166374.21k   170964.03k   174196.74k   175087.62k   175155.88k
sha256           69818.45k   206040.64k   359991.13k   441089.02k   470701.40k   472733.01k
sha512           59340.73k   234639.83k   370854.91k   528685.04k   604972.40k   615967.17k
whirlpool        39840.33k    99675.18k   165740.89k   195083.95k   205299.71k   209704.28k
aes-128 ige     169776.44k   181493.42k   184641.88k   184963.75k   184407.38k   183833.94k
aes-192 ige     140920.70k   147661.03k   153481.90k   155284.14k   156060.33k   155178.33k
aes-256 ige     126160.81k   131790.38k   132928.38k   134006.33k   133819.05k   133283.84k
ghash          1829075.23k  4999172.03k  7964922.39k  8924905.47k  9241427.97k  9231335.42k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.000034s 0.000003s  29238.3 397281.6
rsa 1024 bits 0.000098s 0.000006s  10256.2 154937.1
rsa 2048 bits 0.000505s 0.000021s   1979.2  46636.7
rsa 3072 bits 0.002279s 0.000046s    438.7  21625.7
rsa 4096 bits 0.005269s 0.000079s    189.8  12586.5
rsa 7680 bits 0.045917s 0.000274s     21.8   3654.4
rsa 15360 bits 0.255750s 0.001085s      3.9    922.0
                  sign    verify    sign/s verify/s
dsa  512 bits 0.000063s 0.000040s  15846.7  24996.5
dsa 1024 bits 0.000114s 0.000091s   8753.2  10981.2
dsa 2048 bits 0.000313s 0.000288s   3194.5   3472.4

The performance gains from the assembler implementations are somewhat evident, although not as distinct as for the 32-bit tests, and demonstrate which methods have assembler implementations.

C.3. Debian Linux 11, OpenSSL 1.1.0l, default compile: AES-NI & SHA-NI

The default compile (see C.2.) also supports the hardware acceleration of AES and SHA using the AES-NI and SHA-NI instruction set extensions. To invoke the accelerated methods the required method has to be accessed with the EVP option:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 177455009 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 49476779 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 12693951 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 3195267 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 397910 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 200785 aes-256-cbc's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc     946426.71k  1055504.62k  1083217.15k  1090651.14k  1086559.57k  1096553.81k

Note the -evp is required to use the accelerated implementation, without the option the normal version of the method is used:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 21434583 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 6237975 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 256 size blocks: 1633547 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 1024 size blocks: 690046 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 86806 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16384 size blocks: 43459 aes-256 cbc's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256 cbc     114317.78k   133076.80k   139396.01k   235535.70k   237038.25k   237344.09k

Showing the difference between the normal assembler implementation (~237 MB/s) and the AES-NI accelerated version (1.09 GB/s).

Doing this for each of the other methods in the comparison:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -evp idea
Doing idea-cbc for 3s on 16 size blocks: 20963668 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 64 size blocks: 5496543 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 256 size blocks: 1388950 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 1024 size blocks: 345461 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 8192 size blocks: 43671 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 16384 size blocks: 21834 idea-cbc's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
idea-cbc        111806.23k   117259.58k   118523.73k   117917.35k   119250.94k   119242.75k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -evp md5
Doing md5 for 3s on 16 size blocks: 15666020 md5's in 3.00s
Doing md5 for 3s on 64 size blocks: 10908270 md5's in 3.00s
Doing md5 for 3s on 256 size blocks: 5874182 md5's in 3.00s
Doing md5 for 3s on 1024 size blocks: 2048347 md5's in 3.00s
Doing md5 for 3s on 8192 size blocks: 289488 md5's in 3.00s
Doing md5 for 3s on 16384 size blocks: 146239 md5's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md5              83552.11k   232709.76k   501263.53k   699169.11k   790495.23k   798659.93k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -evp sha1
Doing sha1 for 3s on 16 size blocks: 15591881 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 11524163 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 6873254 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 2584282 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 376207 sha1's in 3.00s
Doing sha1 for 3s on 16384 size blocks: 191879 sha1's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha1             83156.70k   245848.81k   586517.67k   882101.59k  1027295.91k  1047915.18k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -evp sha256
Doing sha256 for 3s on 16 size blocks: 11599385 sha256's in 3.00s
Doing sha256 for 3s on 64 size blocks: 7423843 sha256's in 3.00s
Doing sha256 for 3s on 256 size blocks: 3742404 sha256's in 3.00s
Doing sha256 for 3s on 1024 size blocks: 1249273 sha256's in 3.00s
Doing sha256 for 3s on 8192 size blocks: 172365 sha256's in 3.00s
Doing sha256 for 3s on 16384 size blocks: 87592 sha256's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256           61863.39k   158375.32k   319351.81k   426418.52k   470671.36k   478369.11k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -evp sha512
Doing sha512 for 3s on 16 size blocks: 8468652 sha512's in 3.00s
Doing sha512 for 3s on 64 size blocks: 8380446 sha512's in 3.00s
Doing sha512 for 3s on 256 size blocks: 3874261 sha512's in 3.00s
Doing sha512 for 3s on 1024 size blocks: 1491164 sha512's in 3.00s
Doing sha512 for 3s on 8192 size blocks: 221874 sha512's in 3.00s
Doing sha512 for 3s on 16384 size blocks: 112352 sha512's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha512           45166.14k   178782.85k   330603.61k   508983.98k   605863.94k   613591.72k

The IDEA and MD5 methods show the standard assembler performance. The SHA methods show the same performance as before, which suggests the regular assembler can use the SHA instructions if they are available.

C.4. Debian Linux 11, OpenSSL 1.1.0l, use kernel methods (AF_ALG)

The default compile (see C.2.) includes the 'afalg' engine, which provides access to the Linux Kernel Crypto API (AF_ALG) method implementations. Details of the available kernel crypto methods can be found in /proc/crypto:

$ cat /proc/crypto | grep '^name'
name         : __ghash
name         : ghash
name         : __ghash
name         : __gcm(aes)
name         : gcm(aes)
name         : __rfc4106(gcm(aes))
name         : rfc4106(gcm(aes))
name         : __gcm(aes)
name         : __rfc4106(gcm(aes))
name         : __xts(aes)
name         : xts(aes)
name         : __ctr(aes)
name         : ctr(aes)
name         : __cbc(aes)
name         : cbc(aes)
name         : __ecb(aes)
name         : ecb(aes)
name         : __xts(aes)
name         : __ctr(aes)
name         : __cbc(aes)
name         : __ecb(aes)
name         : aes
name         : crc32c
name         : crct10dif
name         : crct10dif
name         : crc32
name         : crc32c
name         : pkcs1pad(rsa,sha256)
name         : hmac(sha256)
name         : hmac(sha1)
name         : lzo-rle
name         : lzo-rle
name         : lzo
name         : lzo
name         : zlib-deflate
name         : deflate
name         : deflate
name         : sha224
name         : sha256
name         : sha1
name         : md5
name         : ecb(cipher_null)
name         : digest_null
name         : compress_null
name         : cipher_null
name         : rsa
name         : dh

However the 'afalg' engine in this version only supports a subset:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl engine afalg -c
(afalg) AFALG engine support
 [AES-128-CBC]

So this implementation only supports the kernel methods for AES-128 CBC:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp aes-128-cbc
engine "afalg" set.
Doing aes-128-cbc for 3s on 16 size blocks: 1643546 aes-128-cbc's in 0.42s
Doing aes-128-cbc for 3s on 64 size blocks: 1632116 aes-128-cbc's in 0.41s
Doing aes-128-cbc for 3s on 256 size blocks: 1508914 aes-128-cbc's in 0.46s
Doing aes-128-cbc for 3s on 1024 size blocks: 1199547 aes-128-cbc's in 0.42s
Doing aes-128-cbc for 3s on 8192 size blocks: 398876 aes-128-cbc's in 0.14s
Doing aes-128-cbc for 3s on 16384 size blocks: 223060 aes-128-cbc's in 0.12s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc      62611.28k   254769.33k   839743.44k  2924609.83k 23339944.23k 30455125.33k

In this version of OpenSSL the -evp is required to use the accelerated implementation, without it the option the normal version of the method is used:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg aes-128-cbc
engine "afalg" set.
Doing aes-128 cbc for 3s on 16 size blocks: 27509274 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 8428142 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 2272506 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 876181 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 111665 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 16384 size blocks: 56092 aes-128 cbc's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128 cbc     146716.13k   179800.36k   193920.51k   299069.78k   304919.89k   306337.11k

So the difference for AES-128 CBC is very significant: ~305 MB/s normally and ~23.3 GB/s with the kernel method.

Sadly this isn't one of our comparison methods, and those are not in the list of 'afalg' supported methods. Still let's collect figures for them:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp aes-256-cbc
engine "afalg" set.
Doing aes-256-cbc for 3s on 16 size blocks: 177999118 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 49697601 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 12686659 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 3192273 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 400024 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 200009 aes-256-cbc's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc     949328.63k  1060215.49k  1082594.90k  1089629.18k  1092332.20k  1092315.82k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp idea
engine "afalg" set.
Doing idea-cbc for 3s on 16 size blocks: 21078310 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 64 size blocks: 5561659 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 256 size blocks: 1405715 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 1024 size blocks: 352417 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 8192 size blocks: 44098 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 16384 size blocks: 22062 idea-cbc's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
idea-cbc        112417.65k   118648.73k   119954.35k   120291.67k   120416.94k   120487.94k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp md5
engine "afalg" set.
Doing md5 for 3s on 16 size blocks: 15277496 md5's in 2.99s
Doing md5 for 3s on 64 size blocks: 10886163 md5's in 3.00s
Doing md5 for 3s on 256 size blocks: 5820065 md5's in 3.00s
Doing md5 for 3s on 1024 size blocks: 2034742 md5's in 3.00s
Doing md5 for 3s on 8192 size blocks: 288378 md5's in 3.00s
Doing md5 for 3s on 16384 size blocks: 145376 md5's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md5              81752.49k   232238.14k   496645.55k   694525.27k   787464.19k   793946.79k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp sha1
engine "afalg" set.
Doing sha1 for 3s on 16 size blocks: 15013924 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 11614002 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 6838376 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 2583074 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 377085 sha1's in 3.00s
Doing sha1 for 3s on 16384 size blocks: 190971 sha1's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha1             80074.26k   247765.38k   583541.42k   881689.26k  1029693.44k  1042956.29k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp sha256
engine "afalg" set.
Doing sha256 for 3s on 16 size blocks: 11335573 sha256's in 3.00s
Doing sha256 for 3s on 64 size blocks: 7378903 sha256's in 3.00s
Doing sha256 for 3s on 256 size blocks: 3674405 sha256's in 3.00s
Doing sha256 for 3s on 1024 size blocks: 1245088 sha256's in 3.00s
Doing sha256 for 3s on 8192 size blocks: 174240 sha256's in 3.00s
Doing sha256 for 3s on 16384 size blocks: 87867 sha256's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256           60456.39k   157416.60k   313549.23k   424990.04k   475791.36k   479870.98k
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp sha512
engine "afalg" set.
Doing sha512 for 3s on 16 size blocks: 8480500 sha512's in 3.00s
Doing sha512 for 3s on 64 size blocks: 8412202 sha512's in 3.00s
Doing sha512 for 3s on 256 size blocks: 3837396 sha512's in 3.00s
Doing sha512 for 3s on 1024 size blocks: 1487457 sha512's in 3.00s
Doing sha512 for 3s on 8192 size blocks: 220151 sha512's in 3.00s
Doing sha512 for 3s on 16384 size blocks: 110360 sha512's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha512           45229.33k   179460.31k   327457.79k   507718.66k   601159.00k   602712.75k

So, as expected, no change here.

D. Notes for Debian Linux 11 and OpenSSL 3.0.2 on AMD Ryzen

The AMD Ryzen family of processors support the AES-NI and SHA-NI instruction set extensions first proposed by Intel. Support for AES-NI was added in OpenSSL 1.0.1 (2012), and for SHA-NI in OpenSSL 1.0.2 (2015).

The latest release version of OpenSSL, 3.0.2 contains a lot of changes, but retains the command-line interface used for testing.

OpenSSL 3.0.2 with Debian Linux 11 for x86_64 on AMD Ryzen 5 3600
MethodAES-256 CBCIDEA CBCMD5SHA-1SHA-256SHA-512
no-asm210,668.20k117,587.97k740,698.79k832,288.09k310,059.01k559,205.03k
asm1,097,910.95k119,048.87k789,848.06k1,036,997.97k473,503.06k599,274.84k
Kernel37,086,822.40k120,220.33k793,990.49k1,041,334.27k474,685.44k598,278.14k

Notes:

  • OpenSSL 3.x considers IDEA a "legacy" method. To include "legacy" methods in "speed" runs use the -provider legacy -provider default options.
  • For the purposes of these tests I am sacrificing some performance by using a VirtualBox VM running Debian Linux 11 on a MS Windows 10 host. While this introduces some overhead, and means the virtual system isn't as capable as the host, it does give access to the instructions, and since the "speed" benchmarks are single threaded the performance should be indicative.
  • Running from the source/build directories used a command-line like: LD_LIBRARY_PATH=`pwd` apps/openssl speed -provider-path ./providers/ -provider legacy -provider default
  • OpenSSL 3.0.2 assembler implementations of AES and SHA methods use the AES-NI and SHA Ext. instruction set extensions if they are available

D.1. Debian Linux 11, OpenSSL 3.0.2, no assembler compile

Downloading the source distribution from the OpenSSL site and building with the no assembler option ($ ./config no-asm), gives a build using the portable C implementations for the methods.

Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default):

version: 3.0.2
built on: Tue May  3 10:21:48 2022 UTC
options: bn(64,64)
compiler: gcc -fPIC -pthread -m64 -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: N/A
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
mdc2             20562.45k    23065.00k    24045.06k    24235.01k    24171.86k    24053.45k
md4              93440.70k   292054.72k   713656.32k  1107414.36k  1316017.49k  1335416.15k
md5              75762.36k   220376.19k   474277.03k   658416.35k   740698.79k   746968.41k
sha1             74135.27k   207818.20k   484524.97k   713803.09k   832288.09k   838565.89k
rmd160           48934.47k   123549.25k   228486.49k   294425.26k   320845.14k   322808.49k
sha256           44194.68k   109362.69k   215166.29k   281563.14k   310059.01k   311987.29k
sha512           39869.37k   156745.30k   285906.52k   459415.21k   559205.03k   565084.16k
whirlpool        28946.23k    66454.40k   117860.44k   150848.17k   162491.05k   162441.10k
hmac(md5)        57372.63k   178631.87k   415191.55k   626126.85k   741441.54k   748607.28k
des-cbc          89745.45k    94976.34k    96349.27k    96561.15k    98486.95k    98407.77k
des-ede3         35742.68k    36740.82k    36624.04k    36697.43k    37090.65k    37300.91k
rc4             404351.10k   399215.17k   401447.08k   403255.98k   399493.80k   395624.45k
idea-cbc        111618.99k   117232.90k   117311.49k   114899.29k   117587.97k   117522.43k
seed-cbc        104352.96k   109517.72k   110598.91k   111045.63k   110875.99k   109679.96k
rc2-cbc          55675.72k    57482.73k    58371.50k    57400.66k    57888.32k    58212.35k
blowfish        146209.32k   157064.23k   162900.39k   163886.08k   163973.80k   163998.38k
cast-cbc        129929.11k   133614.19k   137071.70k   136841.56k   136303.96k   137059.51k
aes-128-cbc     259058.78k   277107.73k   277100.80k   281938.60k   281853.95k   283421.43k
aes-192-cbc     225402.85k   239174.83k   239161.77k   238028.12k   243007.49k   242679.81k
aes-256-cbc     197046.28k   211595.69k   210573.14k   211770.03k   210668.20k   212779.72k
camellia-128-cbc   189300.94k   196735.68k   195763.88k   195276.46k   194786.65k   197225.13k
camellia-192-cbc   148085.38k   151104.77k   152280.66k   151895.38k   151928.83k   150574.42k
camellia-256-cbc   146215.45k   149354.67k   152370.18k   152790.36k   152002.56k   151333.55k
ghash           289559.97k   327118.70k   335364.52k   347649.37k   349069.31k   347755.86k
rand             15451.66k    50478.65k   117848.60k   175663.51k   202408.16k   207183.63k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.000106s 0.000006s   9438.1 180236.3
rsa 1024 bits 0.000547s 0.000016s   1827.7  60764.7
rsa 2048 bits 0.003244s 0.000055s    308.3  18215.1
rsa 3072 bits 0.008071s 0.000121s    123.9   8230.5
rsa 4096 bits 0.019841s 0.000200s     50.4   4998.0
rsa 7680 bits 0.101000s 0.000784s      9.9   1275.7
rsa 15360 bits 0.732143s 0.003008s      1.4    332.5
                  sign    verify    sign/s verify/s
dsa  512 bits 0.000134s 0.000076s   7467.2  13183.0
dsa 1024 bits 0.000332s 0.000239s   3013.1   4185.7
dsa 2048 bits 0.000970s 0.000904s   1031.3   1106.6
                              sign    verify    sign/s verify/s
 160 bits ecdsa (secp160r1)   0.0004s   0.0004s   2317.2   2732.4
 192 bits ecdsa (nistp192)   0.0004s   0.0003s   2315.7   2866.1
 224 bits ecdsa (nistp224)   0.0006s   0.0005s   1539.4   1933.3
 256 bits ecdsa (nistp256)   0.0007s   0.0006s   1338.4   1685.6
 384 bits ecdsa (nistp384)   0.0015s   0.0011s    650.5    873.3
 521 bits ecdsa (nistp521)   0.0030s   0.0020s    331.5    490.3
 163 bits ecdsa (nistk163)   0.0004s   0.0007s   2770.9   1426.6
 233 bits ecdsa (nistk233)   0.0005s   0.0010s   1874.8    969.2
 283 bits ecdsa (nistk283)   0.0011s   0.0022s    885.9    454.2
 409 bits ecdsa (nistk409)   0.0024s   0.0046s    413.6    215.4
 571 bits ecdsa (nistk571)   0.0049s   0.0096s    202.4    104.6
 163 bits ecdsa (nistb163)   0.0004s   0.0007s   2657.6   1353.5
 233 bits ecdsa (nistb233)   0.0006s   0.0011s   1802.5    937.5
 283 bits ecdsa (nistb283)   0.0012s   0.0024s    818.3    419.7
 409 bits ecdsa (nistb409)   0.0026s   0.0052s    378.6    191.1
 571 bits ecdsa (nistb571)   0.0056s   0.0107s    178.2     93.2
 256 bits ecdsa (brainpoolP256r1)   0.0008s   0.0008s   1180.4   1314.4
 256 bits ecdsa (brainpoolP256t1)   0.0008s   0.0007s   1182.8   1447.8
 384 bits ecdsa (brainpoolP384r1)   0.0019s   0.0016s    516.2    623.5
 384 bits ecdsa (brainpoolP384t1)   0.0019s   0.0015s    532.5    683.6
 512 bits ecdsa (brainpoolP512r1)   0.0046s   0.0039s    216.9    259.7
 512 bits ecdsa (brainpoolP512t1)   0.0046s   0.0034s    217.0    290.2
                              op      op/s
 160 bits ecdh (secp160r1)   0.0004s   2433.5
 192 bits ecdh (nistp192)   0.0004s   2505.3
 224 bits ecdh (nistp224)   0.0006s   1650.3
 256 bits ecdh (nistp256)   0.0007s   1424.5
 384 bits ecdh (nistp384)   0.0014s    702.2
 521 bits ecdh (nistp521)   0.0028s    362.3
 163 bits ecdh (nistk163)   0.0003s   2945.9
 233 bits ecdh (nistk233)   0.0005s   2022.0
 283 bits ecdh (nistk283)   0.0011s    933.7
 409 bits ecdh (nistk409)   0.0023s    443.2
 571 bits ecdh (nistk571)   0.0047s    211.7
 163 bits ecdh (nistb163)   0.0004s   2817.3
 233 bits ecdh (nistb233)   0.0005s   1938.5
 283 bits ecdh (nistb283)   0.0012s    863.2
 409 bits ecdh (nistb409)   0.0025s    394.7
 571 bits ecdh (nistb571)   0.0052s    191.7
 256 bits ecdh (brainpoolP256r1)   0.0008s   1248.1
 256 bits ecdh (brainpoolP256t1)   0.0008s   1252.6
 384 bits ecdh (brainpoolP384r1)   0.0019s    538.5
 384 bits ecdh (brainpoolP384t1)   0.0018s    558.9
 512 bits ecdh (brainpoolP512r1)   0.0044s    226.8
 512 bits ecdh (brainpoolP512t1)   0.0043s    231.4
 253 bits ecdh (X25519)   0.0000s  24870.5
 448 bits ecdh (X448)   0.0002s   6318.3
                              sign    verify    sign/s verify/s
 253 bits EdDSA (Ed25519)   0.0000s   0.0001s  33423.2   9964.8
 456 bits EdDSA (Ed448)   0.0004s   0.0002s   2382.6   4835.4
                              sign    verify    sign/s verify/s
 256 bits SM2 (CurveSM2)   0.0009s   0.0007s   1075.4   1425.0
                       op     op/s
2048 bits ffdh   0.0098s    101.7
3072 bits ffdh   0.0268s     37.4
4096 bits ffdh   0.0633s     15.8
6144 bits ffdh   0.1965s      5.1
8192 bits ffdh   0.4333s      2.3

Okay so this establishes a baseline for performance of the methods on this platform.

D.2. Debian Linux 11, OpenSSL 3.0.2, default compile

Downloading the source distribution from the OpenSSL site and building with the default options, gives a build with assembler methods enabled.

Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default):

version: 3.0.2
built on: Tue May  3 11:08:16 2022 UTC
options: bn(64,64)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xdef82203078bffff:0x840021
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
mdc2             19088.32k    21383.15k    22120.11k    22114.65k    22066.52k    22336.85k
md4              86959.79k   274045.21k   686398.72k  1086177.28k  1302282.24k  1318507.86k
md5              74624.08k   221166.83k   481884.16k   689194.67k   789848.06k   795716.27k
sha1             68835.40k   211919.64k   530604.20k   850401.62k  1036997.97k  1052246.02k
rmd160           45938.43k   118583.77k   225689.86k   292971.52k   320547.50k   320629.42k
sha256           56456.28k   151513.83k   312126.29k   423563.26k   473503.06k   468751.70k
sha512           42409.53k   173236.01k   326201.60k   500586.84k   599274.84k   607879.17k
whirlpool        34184.97k    83005.12k   152824.75k   192139.95k   207866.54k   209338.37k
hmac(md5)        54607.51k   175414.14k   427773.27k   652594.86k   774367.91k   792139.09k
des-cbc          86150.46k    90269.29k    90673.16k    91110.06k    87848.28k    90357.76k
des-ede3         33675.40k    34008.00k    32592.38k    34458.28k    34021.38k    34155.18k
rc4             461580.29k   545079.64k   521484.71k   501519.02k   496358.74k   499171.33k
idea-cbc        110334.51k   115313.34k   117287.51k   117309.44k   119048.87k   119138.99k
seed-cbc        104858.23k   108716.29k   110551.72k   111048.02k   109797.38k   110302.55k
rc2-cbc          55981.69k    56614.93k    56955.99k    57540.61k    57696.26k    57333.08k
blowfish        146308.27k   159094.66k   161397.33k   163182.59k   163012.61k   164080.30k
cast-cbc        131535.55k   136160.68k   137896.70k   138655.06k   138483.03k   138199.04k
aes-128-cbc    1232361.96k  1423773.50k  1448355.75k  1462991.53k  1474143.60k  1484817.81k
aes-192-cbc    1055904.59k  1199071.38k  1250383.96k  1261934.25k  1267545.43k  1262824.11k
aes-256-cbc     956115.02k  1054160.70k  1081789.53k  1089324.03k  1097910.95k  1084456.96k
camellia-128-cbc   164696.63k   210310.44k   226814.21k   227034.79k   230888.79k   231669.76k
camellia-192-cbc   132917.51k   162760.79k   170745.51k   174113.45k   175030.27k   171911.85k
camellia-256-cbc   130447.28k   162488.66k   169590.44k   169819.14k   168951.81k   174331.22k
ghash          1015241.07k  3052006.70k  6412202.58k  8288329.39k  8981708.80k  9173494.44k
rand             23134.64k    90360.31k   348861.17k  1238087.79k  4722876.39k  5988869.69k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.000033s 0.000002s  30318.4 440171.6
rsa 1024 bits 0.000098s 0.000006s  10207.1 160257.8
rsa 2048 bits 0.000510s 0.000022s   1962.1  45548.0
rsa 3072 bits 0.002315s 0.000047s    432.0  21473.4
rsa 4096 bits 0.005269s 0.000083s    189.8  12105.3
rsa 7680 bits 0.046055s 0.000278s     21.7   3603.5
rsa 15360 bits 0.253500s 0.001109s      3.9    902.0
                  sign    verify    sign/s verify/s
dsa  512 bits 0.000051s 0.000032s  19656.4  30843.9
dsa 1024 bits 0.000103s 0.000085s   9723.2  11707.9
dsa 2048 bits 0.000301s 0.000278s   3325.2   3591.6
                              sign    verify    sign/s verify/s
 160 bits ecdsa (secp160r1)   0.0002s   0.0002s   6005.0   6014.6
 192 bits ecdsa (nistp192)   0.0002s   0.0002s   4834.5   5073.3
 224 bits ecdsa (nistp224)   0.0003s   0.0003s   3353.4   3639.4
 256 bits ecdsa (nistp256)   0.0000s   0.0001s  50187.2  16141.0
 384 bits ecdsa (nistp384)   0.0008s   0.0007s   1300.1   1522.5
 521 bits ecdsa (nistp521)   0.0018s   0.0015s    541.4    684.8
 163 bits ecdsa (nistk163)   0.0002s   0.0003s   5822.0   2960.9
 233 bits ecdsa (nistk233)   0.0002s   0.0004s   4482.1   2309.0
 283 bits ecdsa (nistk283)   0.0004s   0.0008s   2608.5   1329.3
 409 bits ecdsa (nistk409)   0.0006s   0.0013s   1545.2    772.5
 571 bits ecdsa (nistk571)   0.0014s   0.0028s    694.7    360.9
 163 bits ecdsa (nistb163)   0.0002s   0.0003s   5628.4   2857.7
 233 bits ecdsa (nistb233)   0.0002s   0.0005s   4322.1   2218.3
 283 bits ecdsa (nistb283)   0.0004s   0.0008s   2497.4   1250.1
 409 bits ecdsa (nistb409)   0.0007s   0.0013s   1461.3    757.9
 571 bits ecdsa (nistb571)   0.0015s   0.0029s    656.3    344.4
 256 bits ecdsa (brainpoolP256r1)   0.0003s   0.0003s   3025.4   3109.1
 256 bits ecdsa (brainpoolP256t1)   0.0003s   0.0003s   3001.6   3316.0
 384 bits ecdsa (brainpoolP384r1)   0.0008s   0.0007s   1288.8   1469.8
 384 bits ecdsa (brainpoolP384t1)   0.0008s   0.0006s   1296.9   1590.2
 512 bits ecdsa (brainpoolP512r1)   0.0013s   0.0011s    755.6    890.6
 512 bits ecdsa (brainpoolP512t1)   0.0013s   0.0011s    760.4    951.9
                              op      op/s
 160 bits ecdh (secp160r1)   0.0002s   6464.6
 192 bits ecdh (nistp192)   0.0002s   5105.8
 224 bits ecdh (nistp224)   0.0003s   3564.4
 256 bits ecdh (nistp256)   0.0000s  21233.2
 384 bits ecdh (nistp384)   0.0007s   1372.1
 521 bits ecdh (nistp521)   0.0017s    576.3
 163 bits ecdh (nistk163)   0.0002s   6098.0
 233 bits ecdh (nistk233)   0.0002s   4869.1
 283 bits ecdh (nistk283)   0.0004s   2794.8
 409 bits ecdh (nistk409)   0.0006s   1673.1
 571 bits ecdh (nistk571)   0.0013s    756.7
 163 bits ecdh (nistb163)   0.0002s   5946.4
 233 bits ecdh (nistb233)   0.0002s   4702.5
 283 bits ecdh (nistb283)   0.0004s   2652.9
 409 bits ecdh (nistb409)   0.0006s   1588.7
 571 bits ecdh (nistb571)   0.0014s    713.5
 256 bits ecdh (brainpoolP256r1)   0.0003s   3166.2
 256 bits ecdh (brainpoolP256t1)   0.0003s   3147.1
 384 bits ecdh (brainpoolP384r1)   0.0007s   1338.4
 384 bits ecdh (brainpoolP384t1)   0.0007s   1375.6
 512 bits ecdh (brainpoolP512r1)   0.0013s    790.6
 512 bits ecdh (brainpoolP512t1)   0.0013s    792.7
 253 bits ecdh (X25519)   0.0000s  29235.1
 448 bits ecdh (X448)   0.0002s   6306.2
                              sign    verify    sign/s verify/s
 253 bits EdDSA (Ed25519)   0.0000s   0.0001s  31993.6  10019.0
 456 bits EdDSA (Ed448)   0.0002s   0.0002s   5386.3   4842.7
                              sign    verify    sign/s verify/s
 256 bits SM2 (CurveSM2)   0.0003s   0.0003s   3003.2   3320.2
                       op     op/s
2048 bits ffdh   0.0025s    394.7
3072 bits ffdh   0.0084s    119.4
4096 bits ffdh   0.0194s     51.5
6144 bits ffdh   0.0646s     15.5
8192 bits ffdh   0.1529s      6.5

With the assembler in place the methods with assembler implementations show improved performance.

D.3. Debian Linux 11, OpenSSL 3.0.2, default compile, kernel methods

The default compile (see B.2.) includes the 'afalg' engine, which provides access to the Linux Kernel Crypto API (AF_ALG) method implementations. Details of the available kernel crypto methods can be found in /proc/crypto:

$ cat /proc/crypto | grep '^name'
name         : __ghash
name         : ghash
name         : __ghash
name         : __gcm(aes)
name         : gcm(aes)
name         : __rfc4106(gcm(aes))
name         : rfc4106(gcm(aes))
name         : __gcm(aes)
name         : __rfc4106(gcm(aes))
name         : __xts(aes)
name         : xts(aes)
name         : __ctr(aes)
name         : ctr(aes)
name         : __cbc(aes)
name         : cbc(aes)
name         : __ecb(aes)
name         : ecb(aes)
name         : __xts(aes)
name         : __ctr(aes)
name         : __cbc(aes)
name         : __ecb(aes)
name         : aes
name         : crc32c
name         : crct10dif
name         : crct10dif
name         : crc32
name         : crc32c
name         : pkcs1pad(rsa,sha256)
name         : hmac(sha256)
name         : hmac(sha1)
name         : lzo-rle
name         : lzo-rle
name         : lzo
name         : lzo
name         : zlib-deflate
name         : deflate
name         : deflate
name         : sha224
name         : sha256
name         : sha1
name         : md5
name         : ecb(cipher_null)
name         : digest_null
name         : compress_null
name         : cipher_null
name         : rsa
name         : dh

Note that on this system the kernel has loaded modules with various crypto functions:

$ lsmod | grep -E '(alg)|(aes)|(sha)|(crypt)'
algif_skcipher         16384  0
af_alg                 32768  1 algif_skcipher
aesni_intel           368640  0
libaes                 16384  1 aesni_intel
crypto_simd            16384  1 aesni_intel
cryptd                 24576  2 crypto_simd,ghash_clmulni_intel
glue_helper            16384  1 aesni_intel

The 'afalg' engine in OpenSSL supports a subset of the available methods:

$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl engine afalg -c
(afalg) AFALG engine support
 [AES-128-CBC, AES-192-CBC, AES-256-CBC]
80327193287F0000:error:1280006A:DSO support routines:dlfcn_bind_func:could not bind to the requested symbol name:crypto/dso/dso_dlfcn.c:188:symname(EVP_PKEY_base_id): /home/hamish/src/openssl-3.0.2-asm/engines/afalg.so: undefined symbol: EVP_PKEY_base_id
80327193287F0000:error:1280006A:DSO support routines:DSO_bind_func:could not bind to the requested symbol name:crypto/dso/dso_lib.c:176:

The errors here are likely related to the move to EVP methods and the deprecation of the OpenSSL engine APIs.

Unlike our test with OpenSSL 1.1.0l, this version of the 'afalg' engine supports the AES-256 CBC method used in the comparisons. Also OpenSSL 3.0.2 uses the EVP methods by default in the openssl program, so a single run can give results for all the methods ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default -engine afalg):

version: 3.0.2
built on: Tue May  3 11:08:16 2022 UTC
options: bn(64,64)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xdef82203078bffff:0x840021
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
mdc2             19213.87k    21249.32k    22246.57k    22619.14k    22702.76k    22620.84k
md4              87693.94k   279738.60k   692753.07k  1097476.78k  1334378.50k  1349844.99k
md5              74869.91k   220405.33k   484849.15k   693844.31k   793990.49k   801641.81k
sha1             68785.21k   211651.69k   531813.89k   856263.68k  1041334.27k  1056615.08k
rmd160           46334.88k   119616.47k   229049.86k   295681.37k   323463.85k   325659.31k
sha256           55986.87k   150935.30k   311471.53k   424374.61k   474685.44k   475125.08k
sha512           42818.01k   171608.66k   322486.86k   500541.10k   598278.14k   611521.88k
whirlpool        34328.44k    83261.67k   153432.32k   194215.59k   211009.54k   211763.20k
hmac(md5)        55910.06k   181670.91k   434200.75k   666086.74k   788709.38k   799091.37k
des-cbc          86662.59k    91052.35k    91887.19k    91826.18k    91171.50k    92558.68k
des-ede3         33983.00k    34592.13k    34692.34k    34864.81k    34843.31k    34242.56k
rc4             463650.38k   572220.84k   533739.09k   514668.54k   503463.94k   500553.05k
idea-cbc        111304.33k   116668.42k   119646.46k   120101.21k   120220.33k   120351.40k
seed-cbc        106637.95k   109966.85k   110053.55k   111550.81k   112091.14k   112039.25k
rc2-cbc          57108.33k    58368.49k    59152.90k    59224.75k    59299.16k    59271.85k
blowfish        150639.86k   162052.50k   165114.37k   165712.90k   165997.23k   165199.87k
cast-cbc        132735.63k   137822.12k   139187.20k   140039.85k   139911.17k   139973.97k
aes-128-cbc      58259.80k   241961.82k   996687.43k  5113851.73k 27540343.47k 46165811.20k
aes-192-cbc      63910.40k   210796.15k  1092007.50k  3762384.10k 41412081.37k 45620311.77k
aes-256-cbc      59532.76k   257748.16k  1073952.18k  4130459.50k 37086822.40k 56257085.44k
camellia-128-cbc   169613.70k   214561.05k   230152.19k   233813.33k   234029.06k   233411.93k
camellia-192-cbc   137169.80k   164018.60k   171968.77k   175241.56k   176095.23k   176237.23k
camellia-256-cbc   138009.25k   164407.06k   172199.42k   175273.30k   176316.42k   174609.75k
ghash          1024894.92k  3138248.59k  6438084.44k  8391383.38k  9201382.74k  9217447.25k
rand             22825.32k    89219.80k   341466.76k  1244163.94k  4706436.84k  5927709.06k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.000032s 0.000002s  31006.3 453078.5
rsa 1024 bits 0.000096s 0.000006s  10417.9 161517.6
rsa 2048 bits 0.000498s 0.000021s   2009.9  47395.1
rsa 3072 bits 0.002284s 0.000046s    437.8  21741.9
rsa 4096 bits 0.005227s 0.000079s    191.3  12703.3
rsa 7680 bits 0.045500s 0.000272s     22.0   3679.6
rsa 15360 bits 0.251750s 0.001074s      4.0    930.9
                  sign    verify    sign/s verify/s
dsa  512 bits 0.000051s 0.000032s  19621.3  30836.6
dsa 1024 bits 0.000103s 0.000082s   9733.5  12182.3
dsa 2048 bits 0.000301s 0.000275s   3325.7   3634.2
                              sign    verify    sign/s verify/s
 160 bits ecdsa (secp160r1)   0.0002s   0.0002s   6099.3   6111.0
 192 bits ecdsa (nistp192)   0.0002s   0.0002s   4927.0   5071.7
 224 bits ecdsa (nistp224)   0.0003s   0.0003s   3380.9   3694.4
 256 bits ecdsa (nistp256)   0.0000s   0.0001s  49883.8  16261.6
 384 bits ecdsa (nistp384)   0.0008s   0.0006s   1307.9   1570.4
 521 bits ecdsa (nistp521)   0.0018s   0.0014s    543.0    703.4
 163 bits ecdsa (nistk163)   0.0002s   0.0003s   5835.4   2960.4
 233 bits ecdsa (nistk233)   0.0002s   0.0004s   4479.9   2301.3
 283 bits ecdsa (nistk283)   0.0004s   0.0008s   2612.7   1324.9
 409 bits ecdsa (nistk409)   0.0007s   0.0013s   1534.9    798.4
 571 bits ecdsa (nistk571)   0.0014s   0.0027s    718.1    368.4
 163 bits ecdsa (nistb163)   0.0002s   0.0003s   5626.4   2866.1
 233 bits ecdsa (nistb233)   0.0002s   0.0004s   4399.1   2245.8
 283 bits ecdsa (nistb283)   0.0004s   0.0008s   2478.7   1269.1
 409 bits ecdsa (nistb409)   0.0007s   0.0013s   1480.2    758.4
 571 bits ecdsa (nistb571)   0.0015s   0.0029s    666.0    343.7
 256 bits ecdsa (brainpoolP256r1)   0.0003s   0.0003s   3039.1   3138.8
 256 bits ecdsa (brainpoolP256t1)   0.0003s   0.0003s   2996.7   3245.1
 384 bits ecdsa (brainpoolP384r1)   0.0008s   0.0007s   1303.4   1485.5
 384 bits ecdsa (brainpoolP384t1)   0.0008s   0.0006s   1322.8   1577.9
 512 bits ecdsa (brainpoolP512r1)   0.0013s   0.0011s    756.6    891.3
 512 bits ecdsa (brainpoolP512t1)   0.0013s   0.0011s    766.0    950.8
                              op      op/s
 160 bits ecdh (secp160r1)   0.0002s   6369.8
 192 bits ecdh (nistp192)   0.0002s   5122.3
 224 bits ecdh (nistp224)   0.0003s   3562.5
 256 bits ecdh (nistp256)   0.0000s  21288.6
 384 bits ecdh (nistp384)   0.0007s   1377.7
 521 bits ecdh (nistp521)   0.0017s    578.9
 163 bits ecdh (nistk163)   0.0002s   6127.8
 233 bits ecdh (nistk233)   0.0002s   4843.5
 283 bits ecdh (nistk283)   0.0004s   2792.3
 409 bits ecdh (nistk409)   0.0006s   1673.8
 571 bits ecdh (nistk571)   0.0013s    767.4
 163 bits ecdh (nistb163)   0.0002s   5935.1
 233 bits ecdh (nistb233)   0.0002s   4694.9
 283 bits ecdh (nistb283)   0.0004s   2656.3
 409 bits ecdh (nistb409)   0.0006s   1564.5
 571 bits ecdh (nistb571)   0.0014s    713.8
 256 bits ecdh (brainpoolP256r1)   0.0003s   3217.3
 256 bits ecdh (brainpoolP256t1)   0.0003s   3215.5
 384 bits ecdh (brainpoolP384r1)   0.0007s   1372.4
 384 bits ecdh (brainpoolP384t1)   0.0007s   1394.3
 512 bits ecdh (brainpoolP512r1)   0.0013s    791.7
 512 bits ecdh (brainpoolP512t1)   0.0013s    797.6
 253 bits ecdh (X25519)   0.0000s  29052.1
 448 bits ecdh (X448)   0.0002s   6299.4
                              sign    verify    sign/s verify/s
 253 bits EdDSA (Ed25519)   0.0000s   0.0001s  32158.7  10184.2
 456 bits EdDSA (Ed448)   0.0002s   0.0002s   5142.2   4923.4
                              sign    verify    sign/s verify/s
 256 bits SM2 (CurveSM2)   0.0003s   0.0003s   2981.1   3336.0
                       op     op/s
2048 bits ffdh   0.0026s    392.1
3072 bits ffdh   0.0082s    121.4
4096 bits ffdh   0.0193s     51.9
6144 bits ffdh   0.0641s     15.6
8192 bits ffdh   0.1520s      6.6

Here the kernel's use of acceleration is evident in how much more throughput the AES methods have.

Z. Notes for NetBSD 9.2 and OpenSSL 1.1.0l

Issues with the selection of AES assembler modes in the OpenSSL 1.1.1 series, mean that the OpenSSL 1.1.0 series serve as better illustration for the differences in AES performance for the C method no assembler and processor family assembler comparisons.

The OpenSSL available in a base install of NetBSD, is a patched OpenSSL 1.1.1k. The patch forces the use of an assembler implementation for AES from the 1.1.0 series, which appears to be related to a retaining an API for AES, from a look at the history in the git repository for NetBSD.

An issue with the 'fuzz' test required a patch from Gentoo (see repo/gentoo.git - Official Gentoo ebuild repository). A runtime issue with the Elliptic Curve methods means some ECDH methods fail to give meaningful results.

OpenSSL 1.1.0l with NetBSD 9.2 i386 on VIA C3 Nehemiah @ 1.33 GHz
MethodAES-256 CBCIDEA CBCMD5SHA-1SHA-256SHA-512
no-asm7,892.62k13,011.94k85,639.17k48,430.78k14,884.40k2,275.25k
asm14,802.94k13,011.94k141,019.43k60,963.72k28,576.74k12,443.13k
Padlock690,180.08k13,040.78k138,036.56k60,386.74k28,448.83k12,415.91k

Z.1. NetBSD 9.2, OpenSSL 1.1.0l, no assembler compile

Downloading the source distribution from the OpenSSL site and building with the no assembler options, gives a build using the C method implementations.

Running the OpenSSL speed test gives:

OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md2                  0.00         0.00         0.00         0.00         0.00         0.00 
mdc2              1541.42k     1720.60k     1775.70k     1788.66k     1788.09k     1784.88k
md4               7361.29k    23712.28k    60744.29k    99464.99k   122539.80k   124578.28k
md5              13428.07k    32972.76k    61493.84k    78324.09k    85639.17k    85915.30k
hmac(md5)         5100.78k    16146.10k    41441.04k    67855.48k    83574.73k    85289.64k
sha1              9030.46k    20720.69k    36506.13k    45023.00k    48430.78k    48735.55k
rmd160            4542.97k    11893.45k    23608.98k    31373.86k    34807.81k    34972.49k
rc4              31293.06k    34036.09k    34675.07k    34927.52k    35026.92k    35063.95k
des cbc          11425.19k    12086.47k    12309.26k    12368.63k    12388.70k    12388.70k
des ede3          4235.64k     4307.92k     4337.12k     4349.43k     4346.39k     4349.11k
idea cbc         11784.23k    12685.65k    12925.87k    12994.25k    13011.94k    13058.05k
seed cbc         14215.47k    15110.97k    15342.14k    15461.70k    15439.61k    15442.33k
rc2 cbc           7590.42k     7962.56k     8062.81k     8087.56k     8077.31k     8131.72k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00         0.00 
blowfish cbc     19871.74k    22029.69k    22586.68k    22856.02k    22913.02k    22875.88k
cast cbc         12686.39k    13533.68k    13805.91k    13954.05k    13937.29k    13940.01k
aes-128 cbc      10089.55k    10507.69k    10686.11k    10686.00k    10701.31k    10701.31k
aes-192 cbc       8644.86k     8953.49k     9049.64k     9076.52k     9081.96k     9084.68k
aes-256 cbc       7556.37k     7785.93k     7865.75k     7912.45k     7892.62k     7875.24k
camellia-128 cbc    16805.35k    18147.19k    18487.71k    18603.80k    18663.51k    18637.48k
camellia-192 cbc    13414.01k    14169.91k    14437.80k    14567.42k    14541.48k    14538.76k
camellia-256 cbc    13417.33k    14173.21k    14465.03k    14518.69k    14541.48k    14538.76k
sha256            3346.99k     7060.69k    11702.26k    14020.98k    14884.40k    15002.28k
sha512             283.34k     1133.54k     1530.81k     2052.79k     2275.25k     2289.42k
whirlpool         1506.65k     3072.92k     5011.39k     5958.59k     6305.93k     6330.43k
aes-128 ige       9972.33k    10428.88k    10583.21k    10588.36k    10595.17k    10581.56k
aes-192 ige       8552.80k     8865.68k     8974.97k     9012.22k     9011.20k     8997.59k
aes-256 ige       7484.17k     7726.03k     7809.11k     7849.64k     7835.47k     7821.68k
ghash             5237.51k     5369.79k     5385.95k     5396.24k     5399.64k     5399.64k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.003018s 0.000249s    331.4   4013.7
rsa 1024 bits 0.017744s 0.000825s     56.4   1212.3
rsa 2048 bits 0.114091s 0.002937s      8.8    340.5
rsa 3072 bits 0.363929s 0.006782s      2.7    147.4
rsa 4096 bits 0.767143s 0.010638s      1.3     94.0
rsa 7680 bits 4.896667s 0.039683s      0.2     25.2
rsa 15360 bits 36.930000s 0.154923s      0.0      6.5
                  sign    verify    sign/s verify/s
dsa  512 bits 0.004313s 0.003315s    231.8    301.7
dsa 1024 bits 0.011826s 0.010695s     84.6     93.5
dsa 2048 bits 0.039141s 0.036410s     25.5     27.5
                              sign    verify    sign/s verify/s
 160 bit ecdsa (secp160r1)   0.0131s   0.0079s     76.3    127.1
 192 bit ecdsa (nistp192)   0.0123s   0.0072s     81.0    138.6
 224 bit ecdsa (nistp224)   0.0164s   0.0094s     60.9    106.8
 256 bit ecdsa (nistp256)   0.0186s   0.0108s     53.7     93.0
 384 bit ecdsa (nistp384)   0.0532s   0.0278s     18.8     36.0
 521 bit ecdsa (nistp521)   0.1733s   0.0809s      5.8     12.4
 163 bit ecdsa (nistk163)   0.0312s   0.0147s     32.1     68.1
 233 bit ecdsa (nistk233)   0.0711s   0.0292s     14.1     34.2
 283 bit ecdsa (nistk283)   0.1168s   0.0521s      8.6     19.2
 409 bit ecdsa (nistk409)   0.3088s   0.1229s      3.2      8.1
 571 bit ecdsa (nistk571)   0.8000s   0.2792s      1.2      3.6
 163 bit ecdsa (nistb163)   0.0312s   0.0157s     32.1     63.5
 233 bit ecdsa (nistb233)   0.0710s   0.0320s     14.1     31.2
 283 bit ecdsa (nistb283)   0.1167s   0.0584s      8.6     17.1
 409 bit ecdsa (nistb409)   0.3097s   0.1392s      3.2      7.2
 571 bit ecdsa (nistb571)   0.7992s   0.3206s      1.3      3.1
                              op      op/s
 160 bit ecdh (secp160r1)   0.0121s     82.5
 192 bit ecdh (nistp192)   0.0113s     88.2
 224 bit ecdh (nistp224)   0.0151s     66.2
 256 bit ecdh (nistp256)   0.0171s     58.3
 384 bit ecdh (nistp384)   0.0487s     20.5
 521 bit ecdh (nistp521)   0.1618s      6.2
 163 bit ecdh (nistk163)   0.0071s    140.4
 233 bit ecdh (nistk233)   0.0143s     69.8
 283 bit ecdh (nistk283)   0.0258s     38.8
 409 bit ecdh (nistk409)   0.0608s     16.5
 571 bit ecdh (nistk571)   0.1389s      7.2
 163 bit ecdh (nistb163)   0.0077s    129.3
 233 bit ecdh (nistb233)   0.0157s     63.6
 283 bit ecdh (nistb283)   0.0288s     34.7
 409 bit ecdh (nistb409)   0.0692s     14.5
 571 bit ecdh (nistb571)   0.1597s      6.3
 253 bit ecdh (X25519)   0.0000s      inf

Z.2. NetBSD 9.2, OpenSSL 1.1.0l, default compile

Downloading the source distribution from the OpenSSL site and building with the default options, gives a build with assembler methods enabled.

Running the OpenSSL speed test gives:

OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md2                  0.00         0.00         0.00         0.00         0.00         0.00 
mdc2              1592.23k     1790.83k     1854.81k     1863.27k     1869.74k     1867.01k
md4               7049.91k    23028.14k    59961.41k   100087.13k   124314.28k   126537.82k
md5              16389.98k    45268.69k    93029.04k   126255.68k   141019.43k   142159.78k
hmac(md5)         6002.41k    20050.12k    56516.72k   103284.86k   136639.83k   140026.05k
sha1              9861.10k    23650.08k    44021.62k    56210.43k    60963.72k    61350.19k
rmd160            4569.69k    12237.65k    24478.02k    32722.07k    36279.64k    36578.23k
rc4              47127.60k    55720.09k    59427.33k    60404.09k    60762.32k    60986.71k
des cbc          13358.21k    14007.62k    14168.49k    14225.78k    14239.38k    14233.94k
des ede3          4790.65k     4874.25k     4914.18k     4919.98k     4907.04k     4904.28k
idea cbc         11787.40k    12681.63k    12929.53k    13031.77k    13011.94k    13017.42k
seed cbc         14236.65k    15112.89k    15342.90k    15410.18k    15442.33k    15442.33k
rc2 cbc           7615.33k     7961.03k     8063.06k     8088.58k     8093.81k     8099.47k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00         0.00 
blowfish cbc     25389.32k    27125.91k    27561.42k    27719.44k    27765.71k    27852.80k
cast cbc         12761.46k    13561.98k    13844.84k    13916.53k    13891.14k    13945.58k
aes-128 cbc       9093.77k     9420.40k     9632.15k    20335.08k    20480.00k    20488.16k
aes-192 cbc       7528.09k     7919.27k     8037.80k    17244.84k    17355.61k    17369.22k
aes-256 cbc       6517.73k     6741.56k     6814.28k    14680.62k    14802.94k    14712.05k
camellia-128 cbc    15823.70k    20227.86k    21718.75k    22229.33k    22298.03k    22311.63k
camellia-192 cbc    12898.81k    15586.43k    16552.14k    16812.99k    16890.22k    16890.22k
camellia-256 cbc    12930.67k    15644.59k    16554.01k    16811.29k    16890.22k    16880.98k
sha256            5000.94k    10959.15k    21098.41k    26459.68k    28576.74k    28745.48k
sha512            1406.32k     5639.51k     8080.58k    11090.84k    12443.13k    12542.43k
whirlpool         3430.66k     7301.38k    12303.90k    14892.23k    15925.25k    15948.54k
aes-128 ige       8749.00k     9079.59k     9207.24k     9241.86k     9253.42k     9247.98k
aes-192 ige       7313.20k     7619.67k     7726.95k     7760.63k     7772.87k     7766.02k
aes-256 ige       6333.32k     6525.36k     6592.61k     6600.55k     6610.75k     6613.48k
ghash            19307.14k    26654.55k    29423.33k    30269.44k    30544.46k    30563.51k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.001787s 0.000151s    559.5   6607.0
rsa 1024 bits 0.010917s 0.000528s     91.6   1895.6
rsa 2048 bits 0.075149s 0.001950s     13.3    512.7
rsa 3072 bits 0.230227s 0.004287s      4.3    233.3
rsa 4096 bits 0.520000s 0.007532s      1.9    132.8
rsa 7680 bits 3.192500s 0.026070s      0.3     38.4
rsa 15360 bits 24.550000s 0.103402s      0.0      9.7
                  sign    verify    sign/s verify/s
dsa  512 bits 0.002641s 0.002100s    378.6    476.3
dsa 1024 bits 0.007581s 0.006748s    131.9    148.2
dsa 2048 bits 0.026273s 0.024595s     38.1     40.7
                              sign    verify    sign/s verify/s
 160 bit ecdsa (secp160r1)   0.0071s   0.0045s    141.2    221.2
 192 bit ecdsa (nistp192)   0.0098s   0.0063s    102.3    159.5
 224 bit ecdsa (nistp224)   0.0139s   0.0088s     71.7    113.0
 256 bit ecdsa (nistp256)   0.0015s   0.0037s    651.4    271.7
 384 bit ecdsa (nistp384)   0.0500s   0.0288s     20.0     34.7
 521 bit ecdsa (nistp521)   0.1358s   0.0774s      7.4     12.9
 163 bit ecdsa (nistk163)   0.0291s   0.0126s     34.4     79.1
 233 bit ecdsa (nistk233)   0.0670s   0.0248s     14.9     40.3
 283 bit ecdsa (nistk283)   0.1096s   0.0449s      9.1     22.3
 409 bit ecdsa (nistk409)   0.2911s   0.1045s      3.4      9.6
 571 bit ecdsa (nistk571)   0.7600s   0.2379s      1.3      4.2
 163 bit ecdsa (nistb163)   0.0291s   0.0136s     34.4     73.6
 233 bit ecdsa (nistb233)   0.0669s   0.0275s     14.9     36.3
 283 bit ecdsa (nistb283)   0.1098s   0.0502s      9.1     19.9
 409 bit ecdsa (nistb409)   0.2917s   0.1182s      3.4      8.5
 571 bit ecdsa (nistb571)   0.7586s   0.2732s      1.3      3.7
                              op      op/s
 160 bit ecdh (secp160r1)   0.0066s    151.0
 192 bit ecdh (nistp192)   0.0093s    107.1
 224 bit ecdh (nistp224)   0.0133s     75.3
 256 bit ecdh (nistp256)   0.0028s    355.1
 384 bit ecdh (nistp384)   0.0477s     21.0
 521 bit ecdh (nistp521)   0.1290s      7.8
 163 bit ecdh (nistk163)   0.0061s    164.0
 233 bit ecdh (nistk233)   0.0120s     83.4
 283 bit ecdh (nistk283)   0.0218s     45.9
 409 bit ecdh (nistk409)   0.0507s     19.7
 571 bit ecdh (nistk571)   0.1161s      8.6
 163 bit ecdh (nistb163)   0.0066s    151.3
 233 bit ecdh (nistb233)   0.0132s     75.5
 283 bit ecdh (nistb283)   0.0244s     40.9
 409 bit ecdh (nistb409)   0.0577s     17.3
 571 bit ecdh (nistb571)   0.1332s      7.5
 253 bit ecdh (X25519)   0.0000s      inf

With the assembler implementations included, various methods show performance gain.

Z.3. NetBSD 9.2, OpenSSL 1.1.0l, default compile: VIA Padlock

The default compile (see Z.2.) also supports the Padlock engine, which gives access to the VIA Padlock acceleration for AES:

$ /usr/local/bin/openssl engine padlock -c
(padlock) VIA PadLock (no-RNG, ACE)
 [AES-128-ECB, AES-128-CBC, AES-128-CFB, AES-128-OFB, AES-128-CTR, AES-192-ECB, AES-192-CBC, AES-192-CFB, AES-192-OFB, AES-192-CTR, AES-256-ECB, AES-256-CBC, AES-256-CFB, AES-256-OFB, AES-256-CTR]

To invoke the Padlock accelerated method the engine has to be specified and the required method accessed with the EVP option:

$ /usr/local/bin/openssl speed -engine padlock -evp aes-256-cbc
engine "padlock" set.
Doing aes-256-cbc for 3s on 16 size blocks: 10662518 aes-256-cbc's in 3.01s
Doing aes-256-cbc for 3s on 64 size blocks: 8597677 aes-256-cbc's in 3.01s
Doing aes-256-cbc for 3s on 256 size blocks: 4848074 aes-256-cbc's in 3.01s
Doing aes-256-cbc for 3s on 1024 size blocks: 1706932 aes-256-cbc's in 2.91s
Doing aes-256-cbc for 3s on 8192 size blocks: 253594 aes-256-cbc's in 3.01s
Doing aes-256-cbc for 3s on 16384 size blocks: 127179 aes-256-cbc's in 2.98s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc      56677.84k   182807.75k   412327.89k   600652.36k   690180.08k   699228.43k

Note the -evp is required to use the accelerated implementation, without the option the normal version of the method is used:

$ /usr/local/bin/openssl speed -engine padlock aes-256-cbc
engine "padlock" set.
Doing aes-256 cbc for 3s on 16 size blocks: 1224119 aes-256 cbc's in 3.01s
Doing aes-256 cbc for 3s on 64 size blocks: 317144 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 256 size blocks: 80115 aes-256 cbc's in 3.01s
Doing aes-256 cbc for 3s on 1024 size blocks: 43153 aes-256 cbc's in 3.01s
Doing aes-256 cbc for 3s on 8192 size blocks: 5421 aes-256 cbc's in 3.01s
Doing aes-256 cbc for 3s on 16384 size blocks: 2710 aes-256 cbc's in 3.01s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256 cbc       6506.94k     6765.74k     6813.77k    14680.62k    14753.76k    14751.04k

For methods without a specific engine implementation the regular implementations are used. So using an command that will invoke the engine for each of the other methods in our comparison:

$ /usr/local/bin/openssl speed -engine padlock -evp idea
engine "padlock" set.
Doing idea-cbc for 3s on 16 size blocks: 2005747 idea-cbc's in 3.00s
Doing idea-cbc for 3s on 64 size blocks: 580638 idea-cbc's in 3.01s
Doing idea-cbc for 3s on 256 size blocks: 150989 idea-cbc's in 3.01s
Doing idea-cbc for 3s on 1024 size blocks: 38130 idea-cbc's in 3.01s
Doing idea-cbc for 3s on 8192 size blocks: 4712 idea-cbc's in 2.96s
Doing idea-cbc for 3s on 16384 size blocks: 2350 idea-cbc's in 2.96s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
idea-cbc         10697.32k    12345.79k    12841.59k    12971.80k    13040.78k    13007.57k
$ /usr/local/bin/openssl speed -engine padlock -evp md5 
engine "padlock" set.
Doing md5 for 3s on 16 size blocks: 1370335 md5's in 3.00s
Doing md5 for 3s on 64 size blocks: 1095540 md5's in 2.90s
Doing md5 for 3s on 256 size blocks: 755228 md5's in 3.00s
Doing md5 for 3s on 1024 size blocks: 321409 md5's in 3.00s
Doing md5 for 3s on 8192 size blocks: 50719 md5's in 3.01s
Doing md5 for 3s on 16384 size blocks: 25852 md5's in 3.01s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md5               7308.45k    24177.43k    64446.12k   109707.61k   138036.56k   140717.33k
$ /usr/local/bin/openssl speed -engine padlock -evp sha1
engine "padlock" set.
Doing sha1 for 3s on 16 size blocks: 1036264 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 755219 sha1's in 3.01s
Doing sha1 for 3s on 256 size blocks: 424166 sha1's in 3.01s
Doing sha1 for 3s on 1024 size blocks: 153876 sha1's in 3.01s
Doing sha1 for 3s on 8192 size blocks: 22188 sha1's in 3.01s
Doing sha1 for 3s on 16384 size blocks: 11202 sha1's in 3.00s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha1              5526.74k    16057.81k    36075.25k    52348.51k    60386.74k    61177.86k
$ /usr/local/bin/openssl speed -engine padlock -evp sha256
engine "padlock" set.
Doing sha256 for 3s on 16 size blocks: 663483 sha256's in 3.01s
Doing sha256 for 3s on 64 size blocks: 420057 sha256's in 3.01s
Doing sha256 for 3s on 256 size blocks: 217137 sha256's in 2.92s
Doing sha256 for 3s on 1024 size blocks: 74607 sha256's in 2.98s
Doing sha256 for 3s on 8192 size blocks: 10453 sha256's in 3.01s
Doing sha256 for 3s on 16384 size blocks: 5268 sha256's in 3.01s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256            3526.82k     8931.44k    19036.67k    25636.77k    28448.83k    28674.72k
$ /usr/local/bin/openssl speed -engine padlock -evp sha512
engine "padlock" set.
Doing sha512 for 3s on 16 size blocks: 234871 sha512's in 3.00s
Doing sha512 for 3s on 64 size blocks: 235262 sha512's in 3.01s
Doing sha512 for 3s on 256 size blocks: 90985 sha512's in 3.01s
Doing sha512 for 3s on 1024 size blocks: 32115 sha512's in 3.01s
Doing sha512 for 3s on 8192 size blocks: 4562 sha512's in 3.01s
Doing sha512 for 3s on 16384 size blocks: 2286 sha512's in 2.98s
OpenSSL 1.1.0l  10 Sep 2019
built on: reproducible build, date unspecified
options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha512            1252.65k     5002.25k     7738.26k    10925.50k    12415.91k    12568.40k

Unsurprisingly these show the same performance as the regular methods, since the Padlock engine can only accelerate AES on this platform.

Further Sources


No comments: