OpenSSL
OpenSSL can use three different implementations of the cryptographic methods, providing different performance tiers:
- Portable C-based methods: most portable but typically the slowest.
- Processor family assembler based methods: faster but less portable, and may have problems on some processor compatible implementations
- Methods utilizing hardware acceleration: typically the fastest option, but has specific hardware requirements
A simple compile time option ('no-asm') disables the assembler implementations used by default, so a. & b. are simple to test. For c. appropriate hardware is required. Fortunately three of the options supported by OpenSSL are available to us:
- VIA PadLock (Wikipedia): as implemented in the VIA C3 Nehemiah processors (Wikipedia)
- Intel Advanced Encryption Standard New Instructions (AES-NI; Wikipedia): implemented on Intel x86_64 since 2010 and AMD x86_64 processors since 2011
- Intel SHA extensions (SHA Ext.; Wikipedia): implemented on Intel x86_64 processors from 2016 and AMD Ryzen from 2017
To illustrate the effect on performance a set of methods have been selected which show the effects of each type of implementation.
Using the results for 8,192 byte blocks:
VIA Luke @ 1.0 GHz | ||||||
---|---|---|---|---|---|---|
OpenSSL 1.1.0l on Debian Linux 11 for x86 | ||||||
Method | AES-256 CBC | IDEA CBC | MD5 | SHA-1 | SHA-256 | SHA-512 |
no-asm | 6,916.78k | 9,684.21k | 80,191.49k | 36,410.71k | 12,064.09k | 1,832.28k |
asm | 11,122.01k | 9,661.10k | 106,332.16k | 45,978.97k | 21,534.04k | 9,393.49k |
Kerneld,e | 11,127.92k | 9,655.64k | 104,336.04k | 45,566.63k | 21,515.64k | 9,374.38k |
Padlockb | 520,596.10k | 9,662.71k | 104,357.28k | 45,472.50k | 21,460.31k | 9,376.82k |
OpenSSL 3.0.2 on Debian Linux 11 for x86 | ||||||
Method | AES-256 CBC | IDEAa CBC | MD5 | SHA-1 | SHA-256 | SHA-512 |
no-asm | 6,916.47k | 9,662.71k | 78,665.19k | 36,050.26k | 12,125.81k | 1,829.55k |
asm | 11,127.47k | 9,655.64k | 103,931.19k | 45,509.29k | 21,493.73k | 9,368.92k |
Kerneld | 2,033,909.76k | 9,627.65k | 104,328.02k | 45,512.02k | 21,442.15k | 9,366.19k |
Padlockb | 515,844.78k | 9,658.37k | 103,765.33k | 45,520.21k | 21,435.73k | 9,368.92k |
AMD Ryzen 5 3600 | ||||||
OpenSSL 1.1.0l with Debian Linux 11 for x86_64 | ||||||
Method | AES-256 CBC | IDEA CBC | MD5 | SHA-1 | SHA-256 | SHA-512 |
no-asm | 212,366.68k | 119,010.65k | 749,469.70k | 845,922.30k | 307,301.03k | 553,937.58k |
asm | 232,721.07k | 119,545.86k | 793,971.37k | 1,050,842.45k | 470,701.40k | 604,972.40k |
AES-NI & SHA Ext. | 1,086,559.57k | 119,250.94k | 790,495.23k | 1,027,295.91k | 470,671.36k | 605,863.94k |
Kerneld,e | 1,092,332.20k | 120,416.94k | 787,464.19k | 1,029,693.44k | 475,791.36k | 601,159.00k |
OpenSSL 3.0.2 with Debian Linux 11 for x86_64 | ||||||
Method | AES-256 CBC | IDEAa CBC | MD5 | SHA-1 | SHA-256 | SHA-512 |
no-asm | 210,668.20k | 117,587.97k | 740,698.79k | 832,288.09k | 310,059.01k | 559,205.03k |
asmc | 1,097,910.95k | 119,048.87k | 789,848.06k | 1,036,997.97k | 473,503.06k | 599,274.84k |
Kerneld | 37,086,822.40k | 120,220.33k | 793,990.49k | 1,041,334.27k | 474,685.44k | 598,278.14k |
Notes:
- In OpenSSL 3.0.2 access to the IDEA method requires use of the legacy provider (to use without installing $ LD_LIBRARY_PATH=`pwd` apps/openssl speed -provider-path ./providers/ -provider legacy -provider default idea)
- The OpenSSL PadLock engine only supports AES on our VIA Luke (C3 Nehemiah) based system, more recent versions of the VIA PadLock hardware provide additional methods, including SHA
- On systems that support AES-NI and/or SHA Ext. the standard assembler implementations in OpenSSL 3.0.2, detect and use the instruction set extensions to accelerate the methods
- The OpenSSL 'afalg' engine (used for "Kernel") uses the Linux Kernel Crypto API (AF_ALG) to access the methods in the Linux kernel, which make use of hardware acceleration and processor features beyond those used by the standard assembler implementations in OpenSSL
- The OpenSSL 1.1.0 implementation of the 'afalg' engine only supports use of the kernel methods for AES-128-CBC
Additional Notes & Raw Results
- A. VIA Luke running Debian Linux 11, OpenSSL 1.1.0l
- A.1. VIA Luke, OpenSSL 1.1.0l, C methods only (no-asm)
- A.2. VIA Luke, OpenSSL 1.1.0l, assembler methods
- A.3. VIA Luke, OpenSSL 1.1.0l, use kernel methods (AF_ALG)
- A.4. VIA Luke, OpenSSL 1.1.0l, use VIA Padlock
- B. VIA Luke running Debian Linux 11, OpenSSL 3.0.2
- B.1. VIA Luke, OpenSSL 3.0.2, C methods only (no-asm)
- B.2. VIA Luke, OpenSSL 3.0.2, assembler methods
- B.3. VIA Luke, OpenSSL 3.0.2, use kernel methods (AF_ALG)
- B.4. VIA Luke, OpenSSL 3.0.2, use VIA Padlock
- C. AMD Ryzen running Debian Linux 11, OpenSSL 1.1.0l
- C.1. AMD Ryzen, OpenSSL 1.1.0l, C methods only (no-asm)
- C.2. AMD Ryzen, OpenSSL 1.1.0l, assembler methods
- C.3. AMD Ryzen, OpenSSL 1.1.0l, use VIA Padlock
- C.4. AMD Ryzen, OpenSSL 1.1.0l, use kernel methods (AF_ALG)
- D. AMD Ryzen running Debian Linux 11, OpenSSL 3.0.2
- D.1. AMD Ryzen, OpenSSL 3.0.2, C methods only (no-asm)
- D.2. AMD Ryzen, OpenSSL 3.0.2, assembler methods
- D.3. AMD Ryzen, OpenSSL 3.0.2, use VIA Padlock
- D.4 AMD Ryzen, OpenSSL 3.0.2, use kernel methods (AF_ALG)
- Z. VIA C3 Nehemiah running NetBSD 9.2 and OpenSSL 1.1.0l
- Z.1. NetBSD 9.2, OpenSSL 1.1.0l, no assembler compile
- Z.2. NetBSD 9.2, OpenSSL 1.1.0l, default compile
- Z.3. NetBSD 9.2, OpenSSL 1.1.0l, default compile VIA Padlock
A. VIA Luke running Debian Linux 11, OpenSSL 1.1.0l
The second generation of VIA's Corefusion (Wikipedia) x86 processor, VIA Luke, features a VIA C3 Nehemiah core with the VIA PadLock (Wikipedia) cryptographic accelerator. The PadLock implementation in Luke provides a hardware Random Number Generator (RNG) and the Advanced Cryptography Engine (ACE) supporting AES (Advanced Encryption Standard; Wikipedia).
OpenSSL implemented a 'padlock' engine to access the ACE acceleration of AES back in 2005 with the release of OpenSSL 0.9.8. The VIA C7 enhanced ACE, adding SHA (Secure Hash Algorithms) and PMM (PadLock Montgomery Multiplier), and support was added in OpenSSL in 2006. Some Zhaoxin processors feature PadLock with SM3 (Wikipedia) and SM4 (SM4) added.
VIA Luke @ 1.0 GHz | ||||||
---|---|---|---|---|---|---|
OpenSSL 1.1.0l on Debian Linux 11 for x86 | ||||||
Method | AES-256 CBC | IDEA CBC | MD5 | SHA-1 | SHA-256 | SHA-512 |
no-asm | 6,916.78k | 9,684.21k | 80,191.49k | 36,410.71k | 12,064.09k | 1,832.28k |
asm | 11,122.01k | 9,661.10k | 106,332.16k | 45,978.97k | 21,534.04k | 9,393.49k |
Kernel | 11,127.92k | 9,655.64k | 104,336.04k | 45,566.63k | 21,515.64k | 9,374.38k |
Padlock | 520,596.10k | 9,662.71k | 104,357.28k | 45,472.50k | 21,460.31k | 9,376.82k |
Notes:
- During the build tests (make test) the 'fuzz' test fails due to a missing update (see repo/gentoo.git - Official Gentoo ebuild repository for a patch that fixes the issue).
- A runtime issue with the Elliptic Curve methods means some ECDH methods fail to give meaningful results, so the benchmark builds disable these methods (no-ec).
- The 'afalg' engine in OpenSSL 1.1.0l only supports AES-128 CBC, so no improvement is seen in our selected comparison methods. From a test run 'afalg' increases throughput for AES-128 CBC from ~15 MB/s to ~2.7 GB/s on this system
A.1. VIA Luke, OpenSSL 1.1.0l, C methods only (no-asm)
Downloading the source distribution from the OpenSSL site and building with the no assembler option ($ ./config no-asm no-ec), gives a build using the portable C implementations for the methods.
Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed):
OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md2 0.00 0.00 0.00 0.00 0.00 0.00 mdc2 1216.39k 1349.70k 1389.96k 1400.27k 1403.56k 1403.56k md4 6043.15k 19801.11k 51304.53k 85481.47k 106300.82k 108276.39k md5 11623.79k 29433.43k 56385.29k 73045.33k 80191.49k 80820.40k hmac(md5) 4272.43k 13891.37k 36658.69k 61732.47k 77897.97k 79107.41k sha1 6812.00k 15794.66k 27587.24k 33936.04k 36410.71k 36616.59k rmd160 3816.28k 10080.21k 20185.69k 26923.35k 29886.96k 30086.49k rc4 30881.65k 34105.09k 34865.40k 35200.68k 35291.14k 35039.91k des cbc 8691.31k 9165.61k 9332.22k 9374.04k 9390.56k 9360.73k des ede3 3266.32k 3333.37k 3353.60k 3359.40k 3361.45k 3354.03k idea cbc 8881.84k 9454.71k 9608.36k 9648.13k 9684.21k 9628.33k seed cbc 12766.92k 13690.90k 13939.80k 14007.30k 14032.90k 13959.39k rc2 cbc 6628.12k 6978.54k 7073.79k 7098.37k 7085.12k 7094.27k rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00 0.00 blowfish cbc 14833.34k 16211.05k 16626.90k 16685.40k 16722.60k 16635.22k cast cbc 11473.94k 12236.09k 12494.85k 12571.15k 12520.11k 12539.22k aes-128 cbc 8837.10k 9250.20k 9391.10k 9461.14k 9437.18k 9404.42k aes-192 cbc 7549.75k 7851.81k 7950.34k 7975.59k 7984.47k 7934.46k aes-256 cbc 6590.26k 6816.68k 6893.06k 6919.26k 6916.78k 6897.66k camellia-128 cbc 13046.99k 14029.55k 14296.15k 14382.05k 14398.81k 14314.15k camellia-192 cbc 10317.05k 10891.34k 11078.14k 11128.83k 11143.85k 11097.95k camellia-256 cbc 10315.41k 10888.94k 11080.02k 11129.30k 11130.20k 11091.97k sha256 2720.35k 5750.36k 9502.46k 11372.93k 12064.09k 12118.70k sha512 228.66k 914.00k 1232.04k 1649.66k 1832.28k 1841.15k whirlpool 1152.68k 2356.39k 3841.72k 4559.87k 4825.09k 4849.66k aes-128 ige 8656.05k 9140.01k 9289.98k 9328.98k 9338.33k 9273.34k aes-192 ige 7420.33k 7761.34k 7877.67k 7905.96k 7910.74k 7864.32k aes-256 ige 6492.42k 6749.46k 6838.70k 6855.25k 6853.24k 6834.00k ghash 4417.29k 4504.66k 4537.30k 4544.51k 4549.29k 4546.83k sign verify sign/s verify/s rsa 512 bits 0.003229s 0.000257s 309.7 3888.2 rsa 1024 bits 0.018160s 0.000814s 55.1 1228.5 rsa 2048 bits 0.113258s 0.002835s 8.8 352.7 rsa 3072 bits 0.339667s 0.006174s 2.9 162.0 rsa 4096 bits 0.744286s 0.010050s 1.3 99.5 rsa 7680 bits 4.436667s 0.035426s 0.2 28.2 rsa 15360 bits 33.060000s 0.137260s 0.0 7.3 sign verify sign/s verify/s dsa 512 bits 0.004530s 0.003398s 220.7 294.3 dsa 1024 bits 0.011801s 0.010539s 84.7 94.9 dsa 2048 bits 0.037955s 0.034567s 26.3 28.9
These results give a performance baseline for this platform.
A.2. VIA Luke, OpenSSL 1.1.0l, assembler methods
Downloading the source distribution from the OpenSSL site and building with the default options ($ ./config no-ec), gives a build with the assembler methods enabled.
Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed):
OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md2 0.00 0.00 0.00 0.00 0.00 0.00 mdc2 1231.24k 1376.94k 1418.75k 1429.82k 1433.60k 1435.65k md4 5709.55k 19328.21k 51612.33k 88431.27k 112065.46k 114300.25k md5 11881.41k 34809.17k 70811.75k 95242.24k 106332.16k 107266.05k hmac(md5) 4605.06k 15539.84k 43113.64k 78534.31k 103164.22k 105600.34k sha1 7529.77k 18275.74k 33563.32k 42401.79k 45978.97k 46293.05k rmd160 3527.73k 9388.31k 18618.20k 24743.59k 27368.98k 27574.27k rc4 35951.06k 42033.39k 45003.94k 45754.31k 45828.78k 45760.51k des cbc 10149.31k 10655.74k 10806.87k 10843.14k 10858.52k 10857.13k des ede3 3651.79k 3706.65k 3735.02k 3739.65k 3741.01k 3741.01k idea cbc 8881.83k 9455.85k 9599.91k 9648.13k 9661.10k 9659.96k seed cbc 12766.79k 13693.85k 13935.27k 13996.84k 14036.36k 14030.17k rc2 cbc 6625.95k 6977.98k 7073.71k 7122.11k 7085.12k 7105.19k rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00 0.00 blowfish cbc 19068.84k 20448.28k 20788.66k 20916.06k 20930.56k 20938.75k cast cbc 11475.89k 12226.88k 12492.97k 12563.46k 12587.64k 12582.91k aes-128 cbc 6850.50k 7110.91k 7285.13k 15298.01k 15447.38k 15444.65k aes-192 cbc 5687.63k 5958.40k 6061.99k 13012.99k 13093.55k 13090.71k aes-256 cbc 4919.92k 5087.45k 5143.11k 11076.38k 11122.01k 11130.20k camellia-128 cbc 11987.41k 15300.51k 16387.67k 16721.92k 16823.84k 16849.77k camellia-192 cbc 9761.94k 11774.58k 12487.17k 12682.92k 12741.29k 12744.33k camellia-256 cbc 9733.36k 11775.45k 12530.90k 12686.60k 12738.56k 12741.29k sha256 3757.12k 8365.14k 15961.51k 19976.93k 21534.04k 21681.49k sha512 1121.85k 4511.76k 6224.90k 8425.81k 9393.49k 9473.03k whirlpool 2548.74k 5454.61k 9206.70k 11109.37k 11818.33k 11872.94k aes-128 ige 6576.01k 6873.34k 6970.37k 6994.94k 6995.97k 7011.47k aes-192 ige 5494.31k 5741.35k 5845.39k 5869.23k 5876.39k 5857.69k aes-256 ige 4778.57k 4928.83k 4981.33k 4994.73k 4994.92k 5013.83k ghash 14667.08k 20156.33k 22210.15k 22831.45k 23035.90k 23046.83k sign verify sign/s verify/s rsa 512 bits 0.002296s 0.000200s 435.6 5011.3 rsa 1024 bits 0.014245s 0.000698s 70.2 1432.7 rsa 2048 bits 0.098431s 0.002584s 10.2 386.9 rsa 3072 bits 0.303030s 0.005677s 3.3 176.2 rsa 4096 bits 0.685333s 0.009980s 1.5 100.2 rsa 7680 bits 4.226667s 0.034567s 0.2 28.9 rsa 15360 bits 32.490000s 0.141408s 0.0 7.1 sign verify sign/s verify/s dsa 512 bits 0.003438s 0.002715s 290.9 368.3 dsa 1024 bits 0.009980s 0.008943s 100.2 111.8 dsa 2048 bits 0.034704s 0.032226s 28.8 31.0
The performance gains from the assembler implementations are evident, and demonstrate which methods have assembler implementations.
A.3. VIA Luke, OpenSSL 1.1.0l, use kernel methods (AF_ALG)
The default compile (see A.2.) includes the 'afalg' engine, which provides access to the Linux Kernel Crypto API (AF_ALG) method implementations. Details of the available kernel crypto methods can be found in /proc/crypto:
$ cat /proc/crypto | grep '^name' name : cbc(aes) name : ecb(aes) name : aes name : crc32c name : crct10dif name : pkcs1pad(rsa,sha256) name : hmac(sha256) name : hmac(sha1) name : lzo-rle name : lzo-rle name : lzo name : lzo name : zlib-deflate name : deflate name : deflate name : sha224 name : sha256 name : sha1 name : md5 name : ecb(cipher_null) name : digest_null name : compress_null name : cipher_null name : rsa name : dh
However the 'afalg' engine only supports a subset:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl engine afalg -c (afalg) AFALG engine support [AES-128-CBC]
So this implementation only supports the kernel methods for AES-128 CBC:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp aes-128-cbc engine "afalg" set. Doing aes-128-cbc for 3s on 16 size blocks: 89213 aes-128-cbc's in 0.25s Doing aes-128-cbc for 3s on 64 size blocks: 89304 aes-128-cbc's in 0.27s Doing aes-128-cbc for 3s on 256 size blocks: 84505 aes-128-cbc's in 0.23s Doing aes-128-cbc for 3s on 1024 size blocks: 83619 aes-128-cbc's in 0.24s Doing aes-128-cbc for 3s on 8192 size blocks: 53771 aes-128-cbc's in 0.16s Doing aes-128-cbc for 3s on 16384 size blocks: 33297 aes-128-cbc's in 0.11s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 5709.63k 21168.36k 94057.74k 356774.40k 2753075.20k 4959436.80k
In this version of OpenSSL the -evp is required to use the accelerated implementation, without it the option the normal version of the method is used:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg aes-128-cbc engine "afalg" set. Doing aes-128 cbc for 3s on 16 size blocks: 1276875 aes-128 cbc's in 2.99s Doing aes-128 cbc for 3s on 64 size blocks: 333337 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 256 size blocks: 84598 aes-128 cbc's in 2.98s Doing aes-128 cbc for 3s on 1024 size blocks: 44947 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 8192 size blocks: 5658 aes-128 cbc's in 2.99s Doing aes-128 cbc for 3s on 16384 size blocks: 2828 aes-128 cbc's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128 cbc 6832.78k 7111.19k 7267.48k 15341.91k 15501.78k 15444.65k
So the difference for AES-128 CBC is very significant: ~15 MB/s normally and ~2.7 GB/s with the kernel method.
Sadly this isn't one of our comparison methods, and they aren't accelerated. Still let's collect figures for them:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp aes-256-cbc engine "afalg" set. Doing aes-256-cbc for 3s on 16 size blocks: 855812 aes-256-cbc's in 2.96s Doing aes-256-cbc for 3s on 64 size blocks: 234527 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 59972 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 32368 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 4048 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 16384 size blocks: 2038 aes-256-cbc's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-cbc 4626.01k 5003.24k 5117.61k 11048.28k 11127.92k 11130.20k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp idea engine "afalg" set. Doing idea-cbc for 3s on 16 size blocks: 1513297 idea-cbc's in 3.00s Doing idea-cbc for 3s on 64 size blocks: 431592 idea-cbc's in 3.00s Doing idea-cbc for 3s on 256 size blocks: 111836 idea-cbc's in 3.00s Doing idea-cbc for 3s on 1024 size blocks: 28037 idea-cbc's in 2.98s Doing idea-cbc for 3s on 8192 size blocks: 3536 idea-cbc's in 3.00s Doing idea-cbc for 3s on 16384 size blocks: 1769 idea-cbc's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes idea-cbc 8070.92k 9207.30k 9543.34k 9634.19k 9655.64k 9661.10k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp md5 engine "afalg" set. Doing md5 for 3s on 16 size blocks: 1085845 md5's in 3.00s Doing md5 for 3s on 64 size blocks: 895226 md5's in 2.98s Doing md5 for 3s on 256 size blocks: 587697 md5's in 3.00s Doing md5 for 3s on 1024 size blocks: 243445 md5's in 2.99s Doing md5 for 3s on 8192 size blocks: 38209 md5's in 3.00s Doing md5 for 3s on 16384 size blocks: 19331 md5's in 2.98s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md5 5791.17k 19226.33k 50150.14k 83373.81k 104336.04k 106281.58k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp sha1 engine "afalg" set. Doing sha1 for 3s on 16 size blocks: 806595 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 580960 sha1's in 2.98s Doing sha1 for 3s on 256 size blocks: 323753 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 115538 sha1's in 2.98s Doing sha1 for 3s on 8192 size blocks: 16687 sha1's in 3.00s Doing sha1 for 3s on 16384 size blocks: 8434 sha1's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha1 4301.84k 12476.99k 27626.92k 39701.65k 45566.63k 46060.89k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp sha256 engine "afalg" set. Doing sha256 for 3s on 16 size blocks: 510554 sha256's in 3.00s Doing sha256 for 3s on 64 size blocks: 322074 sha256's in 3.00s Doing sha256 for 3s on 256 size blocks: 168798 sha256's in 2.98s Doing sha256 for 3s on 1024 size blocks: 56691 sha256's in 3.00s Doing sha256 for 3s on 8192 size blocks: 7853 sha256's in 2.99s Doing sha256 for 3s on 16384 size blocks: 3957 sha256's in 2.99s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha256 2722.95k 6870.91k 14500.77k 19350.53k 21515.64k 21682.77k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp sha512 engine "afalg" set. Doing sha512 for 3s on 16 size blocks: 187900 sha512's in 3.00s Doing sha512 for 3s on 64 size blocks: 187783 sha512's in 3.00s Doing sha512 for 3s on 256 size blocks: 69529 sha512's in 2.98s Doing sha512 for 3s on 1024 size blocks: 24333 sha512's in 3.00s Doing sha512 for 3s on 8192 size blocks: 3433 sha512's in 3.00s Doing sha512 for 3s on 16384 size blocks: 1732 sha512's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha512 1002.13k 4006.04k 5972.96k 8305.66k 9374.38k 9459.03k
Sadly no improvement here.
A.4. VIA Luke, OpenSSL 1.1.0l, use VIA Padlock
The default compile (see A.2.) includes the 'padlock' engine, which provides access to the VIA PadLock (Wikipedia) method implementations utilizing the hardware acceleration in this processor:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl engine padlock -c (padlock) VIA PadLock (no-RNG, ACE) [AES-128-ECB, AES-128-CBC, AES-128-CFB, AES-128-OFB, AES-128-CTR, AES-192-ECB, AES-192-CBC, AES-192-CFB, AES-192-OFB, AES-192-CTR, AES-256-ECB, AES-256-CBC, AES-256-CFB, AES-256-OFB, AES-256-CTR]
The VIA C3 Nehemiah core in the VIA Luke processor, only supports AES in its ACE version. But unlike the 'afalg' engine more of the AES methods are supported, including our selected comparison method:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine padlock -evp aes-256-cbc engine "padlock" set. Doing aes-256-cbc for 3s on 16 size blocks: 7997043 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 64 size blocks: 6521328 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 3643683 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 1325294 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 189377 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 16384 size blocks: 96281 aes-256-cbc's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-cbc 42937.14k 139121.66k 310927.62k 452367.02k 520596.10k 525822.63k
Note the -evp is required to use the accelerated implementation, without the option the normal version of the method is used:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine padlock aes-256-cbc engine "padlock" set. Doing aes-256 cbc for 3s on 16 size blocks: 922597 aes-256 cbc's in 2.99s Doing aes-256 cbc for 3s on 64 size blocks: 238365 aes-256 cbc's in 2.99s Doing aes-256 cbc for 3s on 256 size blocks: 58655 aes-256 cbc's in 2.93s Doing aes-256 cbc for 3s on 1024 size blocks: 32214 aes-256 cbc's in 2.98s Doing aes-256 cbc for 3s on 8192 size blocks: 4076 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 16384 size blocks: 2039 aes-256 cbc's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256 cbc 4936.97k 5102.13k 5124.81k 11069.51k 11130.20k 11135.66k
Which shows the difference the hardware makes: 520.5 MB/s with hardware and 11.1 MB/s without.
Doing this for each of the other methods in the comparison:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine padlock -evp idea engine "padlock" set. Doing idea-cbc for 3s on 16 size blocks: 1513517 idea-cbc's in 3.00s Doing idea-cbc for 3s on 64 size blocks: 428713 idea-cbc's in 2.98s Doing idea-cbc for 3s on 256 size blocks: 111843 idea-cbc's in 3.00s Doing idea-cbc for 3s on 1024 size blocks: 28216 idea-cbc's in 3.00s Doing idea-cbc for 3s on 8192 size blocks: 3515 idea-cbc's in 2.98s Doing idea-cbc for 3s on 16384 size blocks: 1758 idea-cbc's in 2.99s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes idea-cbc 8072.09k 9207.26k 9543.94k 9631.06k 9662.71k 9633.13k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine padlock -evp md5 engine "padlock" set. Doing md5 for 3s on 16 size blocks: 1078848 md5's in 2.98s Doing md5 for 3s on 64 size blocks: 900872 md5's in 3.00s Doing md5 for 3s on 256 size blocks: 587686 md5's in 3.00s Doing md5 for 3s on 1024 size blocks: 244869 md5's in 3.00s Doing md5 for 3s on 8192 size blocks: 37962 md5's in 2.98s Doing md5 for 3s on 16384 size blocks: 19446 md5's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md5 5792.47k 19218.60k 50149.21k 83581.95k 104357.28k 106201.09k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine padlock -evp sha1 engine "padlock" set. Doing sha1 for 3s on 16 size blocks: 801140 sha1's in 2.98s Doing sha1 for 3s on 64 size blocks: 586278 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 324677 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 116434 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 16486 sha1's in 2.97s Doing sha1 for 3s on 16384 size blocks: 8384 sha1's in 2.98s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha1 4301.42k 12507.26k 27705.77k 39742.81k 45472.50k 46095.12k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine padlock -evp sha256 engine "padlock" set. Doing sha256 for 3s on 16 size blocks: 510056 sha256's in 2.99s Doing sha256 for 3s on 64 size blocks: 321581 sha256's in 3.00s Doing sha256 for 3s on 256 size blocks: 169890 sha256's in 3.00s Doing sha256 for 3s on 1024 size blocks: 55999 sha256's in 2.97s Doing sha256 for 3s on 8192 size blocks: 7859 sha256's in 3.00s Doing sha256 for 3s on 16384 size blocks: 3960 sha256's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha256 2729.40k 6860.39k 14497.28k 19307.40k 21460.31k 21626.88k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine padlock -evp sha512 engine "padlock" set. Doing sha512 for 3s on 16 size blocks: 186202 sha512's in 2.98s Doing sha512 for 3s on 64 size blocks: 187168 sha512's in 3.00s Doing sha512 for 3s on 256 size blocks: 69920 sha512's in 3.00s Doing sha512 for 3s on 1024 size blocks: 24184 sha512's in 2.98s Doing sha512 for 3s on 8192 size blocks: 3411 sha512's in 2.98s Doing sha512 for 3s on 16384 size blocks: 1732 sha512's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha512 999.74k 3992.92k 5966.51k 8310.21k 9376.82k 9459.03k
As expected the other methods for comparison don't show any acceleration.
B. VIA Luke running Debian Linux 11, OpenSSL 3.0.2
The second generation of VIA's Corefusion (Wikipedia) x86 processor, VIA Luke, features a VIA C3 Nehemiah core with the VIA PadLock (Wikipedia) cryptographic accelerator. The PadLock implementation in Luke provides a hardware Random Number Generator (RNG) and the Advanced Cryptography Engine (ACE) supporting AES (Advanced Encryption Standard; Wikipedia).
OpenSSL implemented a 'padlock' engine to access the ACE acceleration of AES back in 2005 with the release of OpenSSL 0.9.8. The VIA C7 enhanced ACE, adding SHA (Secure Hash Algorithms) and PMM (PadLock Montgomery Multiplier), and support was added in OpenSSL in 2006. Some Zhaoxin processors feature PadLock with SM3 (Wikipedia) and SM4 (SM4) added.
VIA Luke @ 1.0 GHz | ||||||
---|---|---|---|---|---|---|
OpenSSL 3.0.2 on Debian Linux 11 for x86 | ||||||
Method | AES-256 CBC | IDEA CBC | MD5 | SHA-1 | SHA-256 | SHA-512 |
no-asm | 6,916.47k | 9,662.71k | 78,665.19k | 36,050.26k | 12,125.81k | 1,829.55k |
asm | 11,127.47k | 9,655.64k | 103,931.19k | 45,509.29k | 21,493.73k | 9,368.92k |
Kernel | 2,033,909.76k | 9,627.65k | 104,328.02k | 45,512.02k | 21,442.15k | 9,366.19k |
Padlock | 515,844.78k | 9,658.37k | 103,765.33k | 45,520.21k | 21,435.73k | 9,368.92k |
Notes:
- While OpenSSL 3.0.2 changes the APIs the command-line interface remains the same
- OpenSSL 3.x considers IDEA a "legacy" method. To include "legacy" methods in "speed" runs use the -provider legacy -provider default options.
- The 'afalg' engine in OpenSSL 3.0.2 only supports AES-128 CBC, so no improvement is seen in our selected comparison methods. From a test run 'afalg' increases throughput for AES-128 CBC from ~15 MB/s to ~2.7 GB/s on this system
B.1. VIA Luke, OpenSSL 3.0.2, C methods only (no-asm)
Downloading the source distribution from the OpenSSL site and building with the no assembler option ($ ./config no-asm), gives a build using the portable C implementations for the methods.
Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default):
version: 3.0.2 built on: Sun May 1 13:04:48 2022 UTC options: bn(64,32) compiler: gcc -fPIC -pthread -m32 -Wall -O3 -fomit-frame-pointer -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG CPUINFO: N/A The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes mdc2 1170.95k 1337.15k 1385.13k 1398.10k 1403.56k 1401.99k md4 5395.54k 18021.31k 48302.34k 83558.74k 105521.15k 106501.46k md5 5088.44k 16306.17k 40686.59k 64930.47k 78665.19k 79335.95k sha1 3698.00k 10608.15k 22751.79k 31770.38k 36050.26k 36301.48k rmd160 3510.66k 9560.31k 19649.45k 26672.81k 29721.34k 29975.02k sha256 2098.78k 4985.83k 8995.70k 11200.98k 12125.81k 12173.31k sha512 222.62k 891.72k 1221.41k 1643.86k 1829.55k 1839.80k whirlpool 1011.46k 2198.19k 3729.83k 4519.94k 4818.99k 4838.74k hmac(md5) 3755.22k 12565.82k 34336.85k 60485.41k 77701.12k 78888.96k des-cbc 7854.77k 8906.50k 9268.14k 9356.97k 9390.56k 9374.07k des-ede3 3146.78k 3299.69k 3345.83k 3358.24k 3361.45k 3358.72k rc4 23275.45k 31302.89k 34076.84k 34981.28k 35233.79k 35187.37k idea-cbc 8044.02k 9173.76k 9541.03k 9630.72k 9662.71k 9648.97k seed-cbc 11114.26k 13166.34k 13754.22k 14015.44k 14024.70k 13991.94k rc2-cbc 6113.20k 6845.26k 7035.31k 7088.13k 7082.38k 7118.00k blowfish 12537.10k 15413.68k 16277.32k 16642.41k 16708.95k 16684.37k cast-cbc 9889.85k 11743.85k 12360.45k 12531.71k 12579.39k 12561.07k aes-128-cbc 8048.27k 9040.85k 9332.57k 9411.24k 9437.29k 9420.80k aes-192-cbc 6972.55k 7686.14k 7889.63k 7961.94k 7981.74k 7968.09k aes-256-cbc 6146.59k 6692.78k 6861.23k 6903.13k 6916.47k 6908.59k camellia-128-cbc 11408.51k 13525.30k 14158.86k 14340.44k 14396.07k 14363.31k camellia-192-cbc 9265.28k 10571.78k 10997.08k 11112.81k 11138.39k 11119.27k camellia-256-cbc 9264.07k 10574.88k 10995.11k 11116.93k 11138.39k 11122.43k ghash 4173.53k 4440.10k 4518.40k 4540.07k 4546.56k 4546.83k rand 728.00k 2201.17k 4438.57k 5941.95k 6594.56k 6640.98k sign verify sign/s verify/s rsa 512 bits 0.003164s 0.000250s 316.0 3996.5 rsa 1024 bits 0.018032s 0.000807s 55.5 1239.1 rsa 2048 bits 0.112921s 0.002828s 8.9 353.6 rsa 3072 bits 0.339333s 0.006157s 2.9 162.4 rsa 4096 bits 0.743571s 0.010040s 1.3 99.6 rsa 7680 bits 4.446667s 0.035406s 0.2 28.2 sign verify sign/s verify/s dsa 512 bits 0.004443s 0.003235s 225.1 309.1 dsa 1024 bits 0.011718s 0.010071s 85.3 99.3 dsa 2048 bits 0.037879s 0.034687s 26.4 28.8 sign verify sign/s verify/s 160 bits ecdsa (secp160r1) 0.0118s 0.0095s 84.5 105.7 192 bits ecdsa (nistp192) 0.0121s 0.0092s 82.4 108.8 224 bits ecdsa (nistp224) 0.0159s 0.0121s 62.9 82.8 256 bits ecdsa (nistp256) 0.0178s 0.0134s 56.3 74.4 384 bits ecdsa (nistp384) 0.0469s 0.0330s 21.3 30.3 521 bits ecdsa (nistp521) 0.1487s 0.0953s 6.7 10.5 163 bits ecdsa (nistk163) 0.0099s 0.0188s 101.4 53.3 233 bits ecdsa (nistk233) 0.0185s 0.0357s 54.0 28.0 283 bits ecdsa (nistk283) 0.0332s 0.0642s 30.1 15.6 409 bits ecdsa (nistk409) 0.0753s 0.1463s 13.3 6.8 571 bits ecdsa (nistk571) 0.1728s 0.3353s 5.8 3.0 163 bits ecdsa (nistb163) 0.0106s 0.0203s 94.3 49.3 233 bits ecdsa (nistb233) 0.0203s 0.0392s 49.3 25.5 283 bits ecdsa (nistb283) 0.0368s 0.0714s 27.2 14.0 409 bits ecdsa (nistb409) 0.0849s 0.1656s 11.8 6.0 571 bits ecdsa (nistb571) 0.1971s 0.3830s 5.1 2.6 256 bits ecdsa (brainpoolP256r1) 0.0266s 0.0227s 37.6 44.1 256 bits ecdsa (brainpoolP256t1) 0.0266s 0.0207s 37.6 48.3 384 bits ecdsa (brainpoolP384r1) 0.0763s 0.0619s 13.1 16.2 384 bits ecdsa (brainpoolP384t1) 0.0759s 0.0561s 13.2 17.8 512 bits ecdsa (brainpoolP512r1) 0.1557s 0.1235s 6.4 8.1 512 bits ecdsa (brainpoolP512t1) 0.1552s 0.1116s 6.4 9.0 op op/s 160 bits ecdh (secp160r1) 0.0109s 91.4 192 bits ecdh (nistp192) 0.0111s 90.4 224 bits ecdh (nistp224) 0.0145s 69.1 256 bits ecdh (nistp256) 0.0163s 61.4 384 bits ecdh (nistp384) 0.0429s 23.3 521 bits ecdh (nistp521) 0.1386s 7.2 163 bits ecdh (nistk163) 0.0090s 111.4 233 bits ecdh (nistk233) 0.0172s 58.2 283 bits ecdh (nistk283) 0.0310s 32.3 409 bits ecdh (nistk409) 0.0706s 14.2 571 bits ecdh (nistk571) 0.1621s 6.2 163 bits ecdh (nistb163) 0.0097s 102.9 233 bits ecdh (nistb233) 0.0189s 52.8 283 bits ecdh (nistb283) 0.0346s 28.9 409 bits ecdh (nistb409) 0.0805s 12.4 571 bits ecdh (nistb571) 0.1854s 5.4 256 bits ecdh (brainpoolP256r1) 0.0251s 39.8 256 bits ecdh (brainpoolP256t1) 0.0251s 39.8 384 bits ecdh (brainpoolP384r1) 0.0722s 13.8 384 bits ecdh (brainpoolP384t1) 0.0720s 13.9 512 bits ecdh (brainpoolP512r1) 0.1476s 6.8 512 bits ecdh (brainpoolP512t1) 0.1474s 6.8 253 bits ecdh (X25519) 0.0045s 223.7 448 bits ecdh (X448) 0.0198s 50.4 sign verify sign/s verify/s 253 bits EdDSA (Ed25519) 0.0017s 0.0052s 601.0 193.2 456 bits EdDSA (Ed448) 0.0084s 0.0221s 119.5 45.3 sign verify sign/s verify/s 256 bits SM2 (CurveSM2) 0.0266s 0.0196s 37.6 51.0 op op/s 2048 bits ffdh 0.3654s 2.7 3072 bits ffdh 1.1711s 0.9 4096 bits ffdh 2.5200s 0.4 6144 bits ffdh 8.6400s 0.1
These results form a baseline for the performance of the methods.
B.2. VIA Luke, OpenSSL 3.0.2, assembler methods
Downloading the source distribution from the OpenSSL site and building with the default options (target linux-x86), should give a build with assembler methods enabled. But there is a problem with the default build on our i686 compatible x86 processor:
$ LD_LIBRARY_PATH=`pwd` apps/openssl version Illegal instruction
Doing a little debugging, this appears to be due to the use of 'ENDBR32' instructions (see assembly - How do old CPUs execute the new ENDBR64 and ENDBR32 instructions? - Stack Overflow) which are not supported by the VIA C3. These instructions are present in the OpenSSL generated assembler. Since these instructions are not supported by i386, i486, i586 and some i686 processors, it seems odd that OpenSSL is generating them by default for both the regular x86 assembler and the explicit 386 assembler. Commenting out the line generating the EDNBR32 opcodes in 'crypto/perlasm/x86asm.pl' (method sub ::endbranch, line 117 # &::data_byte(0xf3,0x0f,0x1e,0xfb);) is sufficient as a workaround to get a build that does not show this issue.
With a workaround in place we can see the performance...
Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default):
version: 3.0.2 built on: Sat Apr 30 10:20:30 2022 UTC options: bn(64,32) compiler: gcc -fPIC -pthread -m32 -Wa,--noexecstack -Wall -O3 -fomit-frame-pointer -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG CPUINFO: OPENSSL_ia32cap=0x381bf3f:0x0 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes mdc2 1200.92k 1371.68k 1416.45k 1428.82k 1432.23k 1430.87k md4 5257.54k 17644.57k 47449.86k 82751.49k 105796.95k 107249.66k md5 5257.45k 17681.34k 47454.98k 81704.28k 103931.19k 105414.66k sha1 4053.67k 12019.24k 27080.28k 39416.15k 45509.29k 45880.70k rmd160 3324.45k 9072.49k 18345.05k 24597.16k 27355.23k 27508.74k sha256 2652.97k 6705.41k 14340.10k 19297.56k 21493.73k 21550.42k sha512 993.46k 3970.23k 5952.60k 8320.77k 9368.92k 9445.54k whirlpool 1824.34k 4539.10k 8518.27k 10928.97k 11922.09k 11987.63k hmac(md5) 4105.93k 14171.82k 40664.40k 76169.90k 102596.61k 104666.45k des-cbc 8860.91k 10261.33k 10702.08k 10818.56k 10853.03k 10835.29k des-ede3 3463.89k 3677.75k 3723.00k 3748.39k 3731.61k 3751.22k rc4 24488.32k 36922.26k 43253.43k 45180.59k 45774.17k 45765.97k idea-cbc 8000.61k 9179.52k 9568.92k 9634.19k 9655.64k 9627.65k seed-cbc 11136.58k 13181.47k 13790.29k 14020.24k 14027.43k 14025.36k rc2-cbc 6005.72k 6670.74k 6861.74k 6912.00k 6922.24k 6924.97k blowfish 15662.13k 19262.18k 20483.18k 20859.80k 20930.56k 20914.34k cast-cbc 10123.87k 11847.43k 12347.25k 12572.94k 12622.26k 12571.99k aes-128-cbc 6036.35k 6679.00k 6935.13k 15294.02k 15444.65k 15433.73k aes-192-cbc 5087.79k 5637.65k 5821.82k 13020.90k 13088.09k 13035.97k aes-256-cbc 4452.21k 4830.24k 4928.51k 11047.17k 11127.47k 11119.27k camellia-128-cbc 10217.49k 14540.18k 16149.50k 16660.82k 16810.09k 16744.45k camellia-192-cbc 8600.76k 11301.48k 12345.30k 12644.01k 12735.83k 12697.60k camellia-256-cbc 8573.67k 11309.98k 12352.60k 12636.16k 12738.83k 12697.60k ghash 11669.10k 18477.14k 21751.61k 22697.30k 23022.82k 23003.58k rand 624.36k 1801.88k 3367.44k 4291.24k 4685.06k 4686.59k sign verify sign/s verify/s rsa 512 bits 0.002285s 0.000195s 437.7 5128.6 rsa 1024 bits 0.014217s 0.000694s 70.3 1441.6 rsa 2048 bits 0.098333s 0.002579s 10.2 387.7 rsa 3072 bits 0.302727s 0.005672s 3.3 176.3 rsa 4096 bits 0.685333s 0.009980s 1.5 100.2 rsa 7680 bits 4.220000s 0.034567s 0.2 28.9 sign verify sign/s verify/s dsa 512 bits 0.003380s 0.002566s 295.9 389.7 dsa 1024 bits 0.009930s 0.008754s 100.7 114.2 dsa 2048 bits 0.034653s 0.032745s 28.9 30.5 sign verify sign/s verify/s 160 bits ecdsa (secp160r1) 0.0072s 0.0061s 138.8 165.2 192 bits ecdsa (nistp192) 0.0101s 0.0083s 98.9 120.0 224 bits ecdsa (nistp224) 0.0147s 0.0117s 68.2 85.5 256 bits ecdsa (nistp256) 0.0019s 0.0052s 528.9 193.8 384 bits ecdsa (nistp384) 0.0542s 0.0402s 18.5 24.9 521 bits ecdsa (nistp521) 0.1497s 0.1081s 6.7 9.3 163 bits ecdsa (nistk163) 0.0088s 0.0172s 113.9 58.2 233 bits ecdsa (nistk233) 0.0170s 0.0333s 58.8 30.0 283 bits ecdsa (nistk283) 0.0309s 0.0605s 32.4 16.5 409 bits ecdsa (nistk409) 0.0706s 0.1381s 14.2 7.2 571 bits ecdsa (nistk571) 0.1626s 0.3184s 6.2 3.1 163 bits ecdsa (nistb163) 0.0095s 0.0185s 105.6 54.0 233 bits ecdsa (nistb233) 0.0187s 0.0366s 53.4 27.3 283 bits ecdsa (nistb283) 0.0343s 0.0673s 29.2 14.9 409 bits ecdsa (nistb409) 0.0798s 0.1567s 12.5 6.4 571 bits ecdsa (nistb571) 0.1856s 0.3636s 5.4 2.8 256 bits ecdsa (brainpoolP256r1) 0.0194s 0.0165s 51.5 60.7 256 bits ecdsa (brainpoolP256t1) 0.0194s 0.0154s 51.6 65.1 384 bits ecdsa (brainpoolP384r1) 0.0542s 0.0439s 18.5 22.8 384 bits ecdsa (brainpoolP384t1) 0.0539s 0.0396s 18.6 25.2 512 bits ecdsa (brainpoolP512r1) 0.1250s 0.0979s 8.0 10.2 512 bits ecdsa (brainpoolP512t1) 0.1244s 0.0888s 8.0 11.3 op op/s 160 bits ecdh (secp160r1) 0.0067s 148.2 192 bits ecdh (nistp192) 0.0096s 104.3 224 bits ecdh (nistp224) 0.0139s 71.9 256 bits ecdh (nistp256) 0.0037s 268.4 384 bits ecdh (nistp384) 0.0515s 19.4 521 bits ecdh (nistp521) 0.1420s 7.0 163 bits ecdh (nistk163) 0.0083s 120.0 233 bits ecdh (nistk233) 0.0162s 61.9 283 bits ecdh (nistk283) 0.0294s 34.0 409 bits ecdh (nistk409) 0.0670s 14.9 571 bits ecdh (nistk571) 0.1549s 6.5 163 bits ecdh (nistb163) 0.0090s 110.9 233 bits ecdh (nistb233) 0.0178s 56.1 283 bits ecdh (nistb283) 0.0329s 30.4 409 bits ecdh (nistb409) 0.0766s 13.1 571 bits ecdh (nistb571) 0.1772s 5.6 256 bits ecdh (brainpoolP256r1) 0.0184s 54.4 256 bits ecdh (brainpoolP256t1) 0.0184s 54.4 384 bits ecdh (brainpoolP384r1) 0.0515s 19.4 384 bits ecdh (brainpoolP384t1) 0.0512s 19.5 512 bits ecdh (brainpoolP512r1) 0.1189s 8.4 512 bits ecdh (brainpoolP512t1) 0.1185s 8.4 253 bits ecdh (X25519) 0.0045s 223.8 448 bits ecdh (X448) 0.0199s 50.3 sign verify sign/s verify/s 253 bits EdDSA (Ed25519) 0.0015s 0.0051s 669.9 197.6 456 bits EdDSA (Ed448) 0.0075s 0.0220s 132.8 45.5 sign verify sign/s verify/s 256 bits SM2 (CurveSM2) 0.0195s 0.0146s 51.4 68.3 op op/s 2048 bits ffdh 0.3360s 3.0 3072 bits ffdh 1.0860s 0.9 4096 bits ffdh 2.5250s 0.4 6144 bits ffdh 8.3400s 0.1
With the assembler in place many methods have improved performance.
B.3. VIA Luke, OpenSSL 3.0.2, use kernel methods (AF_ALG)
The default compile (see B.2.) includes the 'afalg' engine, which provides access to the Linux Kernel Crypto API (AF_ALG) method implementations. Details of the available kernel crypto methods can be found in /proc/crypto:
$ cat /proc/crypto | grep '^name' name : cbc(aes) name : ecb(aes) name : aes name : crc32c name : crct10dif name : pkcs1pad(rsa,sha256) name : hmac(sha256) name : hmac(sha1) name : lzo-rle name : lzo-rle name : lzo name : lzo name : zlib-deflate name : deflate name : deflate name : sha224 name : sha256 name : sha1 name : md5 name : ecb(cipher_null) name : digest_null name : compress_null name : cipher_null name : rsa name : dh
Note that on this system the kernel has loaded a module with support for VIA PadLock ACE, so the hardware acceleration may be being used:
$ lsmod | grep padlock padlock_aes 16384 0 libaes 16384 1 padlock_aes
The 'afalg' engine in OpenSSL supports a subset of the available methods:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl engine afalg -c (afalg) AFALG engine support [AES-128-CBC, AES-192-CBC, AES-256-CBC] 00877BB7:error:1280006A:DSO support routines:dlfcn_bind_func:could not bind to the requested symbol name:crypto/dso/dso_dlfcn.c:188:symname(EVP_PKEY_base_id): /home/hamish/src/openssl-3.0.2-asm_no_endbr32/engines/afalg.so: undefined symbol: EVP_PKEY_base_id 00877BB7:error:1280006A:DSO support routines:DSO_bind_func:could not bind to the requested symbol name:crypto/dso/dso_lib.c:176:
Unlike our test with OpenSSL 1.1.0l, this version of the 'afalg' engine supports the AES-256 CBC method used in the comparisons. Also OpenSSL 3.0.2 uses the EVP methods by default in the openssl program, so a single run can give results for all the methods ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default -engine afalg):
version: 3.0.2 built on: Sat Apr 30 10:20:30 2022 UTC options: bn(64,32) compiler: gcc -fPIC -pthread -m32 -Wa,--noexecstack -Wall -O3 -fomit-frame-pointer -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG CPUINFO: OPENSSL_ia32cap=0x381bf3f:0x0 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes mdc2 1203.49k 1368.03k 1416.28k 1429.16k 1433.60k 1430.18k md4 5224.81k 17555.18k 47432.85k 82940.92k 105802.41k 107071.36k md5 5256.77k 17709.38k 47522.56k 81770.15k 104328.02k 105551.19k sha1 4050.69k 12006.21k 27084.11k 39410.35k 45512.02k 45880.66k rmd160 3313.78k 9059.78k 18292.99k 24585.22k 27314.86k 27511.92k sha256 2643.44k 6703.43k 14320.90k 19273.73k 21442.15k 21588.65k sha512 992.24k 3964.20k 5947.99k 8280.06k 9366.19k 9445.54k whirlpool 1825.70k 4532.37k 8526.69k 10942.02k 11922.09k 11976.70k hmac(md5) 4127.19k 14260.58k 40742.23k 76256.04k 102768.64k 104748.37k des-cbc 8853.75k 10257.81k 10701.65k 10816.51k 10855.77k 10846.21k des-ede3 3462.86k 3664.04k 3734.67k 3735.21k 3728.58k 3741.01k rc4 24405.02k 36925.59k 43289.77k 45176.15k 45757.78k 45744.13k idea-cbc 8050.62k 9199.81k 9542.23k 9656.08k 9627.65k 9650.18k seed-cbc 11173.68k 13136.48k 13794.22k 14021.61k 14071.61k 13972.98k rc2-cbc 6004.93k 6675.95k 6861.48k 6912.34k 6924.97k 6924.97k blowfish 15661.21k 19324.82k 20470.53k 20824.41k 20925.10k 20897.85k cast-cbc 10140.27k 11779.85k 12382.72k 12537.47k 12580.18k 12571.99k aes-128-cbc 7395.12k 26822.40k 93369.60k 251783.53k 2702028.80k 6126523.73k aes-192-cbc 4927.35k 22058.67k 71679.12k 277274.48k 2596556.80k 3803194.51k aes-256-cbc 7853.33k 20938.19k 82461.39k 289389.46k 2033909.76k 5132615.68k camellia-128-cbc 10163.22k 14395.93k 16187.85k 16708.67k 16807.25k 16790.85k camellia-192-cbc 8520.49k 11297.22k 12348.05k 12608.56k 12730.37k 12761.99k camellia-256-cbc 8494.83k 11262.12k 12342.19k 12641.62k 12733.34k 12719.45k ghash 11647.89k 18520.58k 21678.25k 22770.47k 22921.16k 22948.52k rand 628.16k 1800.77k 3364.98k 4306.62k 4674.90k 4702.21k sign verify sign/s verify/s rsa 512 bits 0.002285s 0.000195s 437.6 5135.9 rsa 1024 bits 0.014217s 0.000694s 70.3 1441.8 rsa 2048 bits 0.098333s 0.002579s 10.2 387.8 rsa 3072 bits 0.303030s 0.005672s 3.3 176.3 rsa 4096 bits 0.684667s 0.009980s 1.5 100.2 rsa 7680 bits 4.226667s 0.034567s 0.2 28.9 sign verify sign/s verify/s dsa 512 bits 0.003380s 0.002556s 295.9 391.3 dsa 1024 bits 0.009930s 0.008754s 100.7 114.2 dsa 2048 bits 0.034653s 0.032395s 28.9 30.9 sign verify sign/s verify/s 160 bits ecdsa (secp160r1) 0.0072s 0.0060s 138.6 167.3 192 bits ecdsa (nistp192) 0.0101s 0.0083s 98.8 120.6 224 bits ecdsa (nistp224) 0.0147s 0.0119s 68.1 84.3 256 bits ecdsa (nistp256) 0.0019s 0.0052s 530.3 193.7 384 bits ecdsa (nistp384) 0.0542s 0.0400s 18.4 25.0 521 bits ecdsa (nistp521) 0.1497s 0.1078s 6.7 9.3 163 bits ecdsa (nistk163) 0.0088s 0.0172s 114.0 58.3 233 bits ecdsa (nistk233) 0.0170s 0.0333s 58.8 30.0 283 bits ecdsa (nistk283) 0.0308s 0.0604s 32.4 16.5 409 bits ecdsa (nistk409) 0.0707s 0.1381s 14.1 7.2 571 bits ecdsa (nistk571) 0.1626s 0.3184s 6.2 3.1 163 bits ecdsa (nistb163) 0.0095s 0.0185s 105.7 53.9 233 bits ecdsa (nistb233) 0.0187s 0.0366s 53.6 27.3 283 bits ecdsa (nistb283) 0.0343s 0.0674s 29.2 14.8 409 bits ecdsa (nistb409) 0.0798s 0.1567s 12.5 6.4 571 bits ecdsa (nistb571) 0.1854s 0.3636s 5.4 2.8 256 bits ecdsa (brainpoolP256r1) 0.0194s 0.0164s 51.6 61.1 256 bits ecdsa (brainpoolP256t1) 0.0194s 0.0154s 51.6 65.0 384 bits ecdsa (brainpoolP384r1) 0.0542s 0.0435s 18.5 23.0 384 bits ecdsa (brainpoolP384t1) 0.0539s 0.0395s 18.5 25.3 512 bits ecdsa (brainpoolP512r1) 0.1250s 0.0980s 8.0 10.2 512 bits ecdsa (brainpoolP512t1) 0.1247s 0.0882s 8.0 11.3 op op/s 160 bits ecdh (secp160r1) 0.0067s 148.2 192 bits ecdh (nistp192) 0.0096s 104.2 224 bits ecdh (nistp224) 0.0139s 71.9 256 bits ecdh (nistp256) 0.0037s 268.4 384 bits ecdh (nistp384) 0.0515s 19.4 521 bits ecdh (nistp521) 0.1420s 7.0 163 bits ecdh (nistk163) 0.0083s 120.2 233 bits ecdh (nistk233) 0.0162s 61.9 283 bits ecdh (nistk283) 0.0294s 34.0 409 bits ecdh (nistk409) 0.0671s 14.9 571 bits ecdh (nistk571) 0.1551s 6.4 163 bits ecdh (nistb163) 0.0090s 111.0 233 bits ecdh (nistb233) 0.0178s 56.1 283 bits ecdh (nistb283) 0.0329s 30.4 409 bits ecdh (nistb409) 0.0765s 13.1 571 bits ecdh (nistb571) 0.1774s 5.6 256 bits ecdh (brainpoolP256r1) 0.0184s 54.3 256 bits ecdh (brainpoolP256t1) 0.0184s 54.2 384 bits ecdh (brainpoolP384r1) 0.0515s 19.4 384 bits ecdh (brainpoolP384t1) 0.0513s 19.5 512 bits ecdh (brainpoolP512r1) 0.1190s 8.4 512 bits ecdh (brainpoolP512t1) 0.1185s 8.4 253 bits ecdh (X25519) 0.0045s 223.6 448 bits ecdh (X448) 0.0199s 50.4 sign verify sign/s verify/s 253 bits EdDSA (Ed25519) 0.0015s 0.0051s 670.5 196.1 456 bits EdDSA (Ed448) 0.0075s 0.0220s 133.0 45.5 sign verify sign/s verify/s 256 bits SM2 (CurveSM2) 0.0195s 0.0148s 51.4 67.6 op op/s 2048 bits ffdh 0.3360s 3.0 3072 bits ffdh 1.0870s 0.9 4096 bits ffdh 2.5225s 0.4 6144 bits ffdh 8.3400s 0.1
Here the use PadLock acceleration by the kernel is evident in how much more throughput the AES methods have.
B.4. VIA Luke, OpenSSL 3.0.2, use VIA Padlock
The default compile (see B.2.) also supports the 'padlock' engine, which gives access to the VIA Padlock acceleration for AES:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl engine padlock -c (padlock) VIA PadLock (no-RNG, ACE) [AES-128-ECB, AES-128-CBC, AES-128-CFB, AES-128-OFB, AES-128-CTR, AES-192-ECB, AES-192-CBC, AES-192-CFB, AES-192-OFB, AES-192-CTR, AES-256-ECB, AES-256-CBC, AES-256-CFB, AES-256-OFB, AES-256-CTR] 008785B7:error:1280006A:DSO support routines:dlfcn_bind_func:could not bind to the requested symbol name:crypto/dso/dso_dlfcn.c:188:symname(EVP_PKEY_base_id): /home/hamish/src/openssl-3.0.2-asm_no_endbr32/engines/padlock.so: undefined symbol: EVP_PKEY_base_id 008785B7:error:1280006A:DSO support routines:DSO_bind_func:could not bind to the requested symbol name:crypto/dso/dso_lib.c:176:
The errors here look like those seen when the default provider is not being loaded, not sure why they happen in this case.
In previous versions of OpenSSL the '-evp' options and the Padlock engine had to be specified to use the accelerated method, this still works (with providers specified):
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default -engine padlock -evp aes-256-cbc Engine "padlock" set. Doing AES-256-CBC for 3s on 16 size blocks: 4665275 AES-256-CBC's in 2.98s Doing AES-256-CBC for 3s on 64 size blocks: 4121198 AES-256-CBC's in 3.00s Doing AES-256-CBC for 3s on 256 size blocks: 2758722 AES-256-CBC's in 3.00s Doing AES-256-CBC for 3s on 1024 size blocks: 1183837 AES-256-CBC's in 3.00s Doing AES-256-CBC for 3s on 8192 size blocks: 186153 AES-256-CBC's in 2.98s Doing AES-256-CBC for 3s on 16384 size blocks: 95474 AES-256-CBC's in 3.00s version: 3.0.2 built on: Sat Apr 30 10:20:30 2022 UTC options: bn(64,32) compiler: gcc -fPIC -pthread -m32 -Wa,--noexecstack -Wall -O3 -fomit-frame-pointer -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG CPUINFO: OPENSSL_ia32cap=0x381bf3f:0x0 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes AES-256-CBC 25048.46k 87918.89k 235410.94k 404083.03k 511733.35k 521415.34k Segmentation fault
That segfault doesn't look good... but the method ran okay, and the figures show acceleration, so I'm guessing the fault comes from clean-up code when returning from the Padlock engine (see [WIP] Add a test case for the engine crash with AES-256-CTR by bernd-edlinger · Pull Request #18024 · openssl/openssl).
In OpenSSL 3.0 the -evp option is now optional, since the EVP methods are used by default:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default -engine padlock aes-256-cbc Engine "padlock" set. Doing aes-256-cbc for 3s on 16 size blocks: 5783292 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 64 size blocks: 4927575 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 256 size blocks: 3110981 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 1245940 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 187401 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 16384 size blocks: 95841 aes-256-cbc's in 3.00s version: 3.0.2 built on: Sat Apr 30 10:20:30 2022 UTC options: bn(64,32) compiler: gcc -fPIC -pthread -m32 -Wa,--noexecstack -Wall -O3 -fomit-frame-pointer -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG CPUINFO: OPENSSL_ia32cap=0x381bf3f:0x0 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-cbc 31051.23k 105827.11k 265470.38k 425280.85k 515164.09k 523419.65k Segmentation fault
While a full 'speed' run should work, there are issues with the public-key methods when using the 'padlock' engine that mean a full run doesn't complete. So we'll just run the comparison methods this time ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default -engine padlock aes-256-cbc idea md5 sha1 sha256 sha512):
version: 3.0.2 built on: Sat Apr 30 10:20:30 2022 UTC options: bn(64,32) compiler: gcc -fPIC -pthread -m32 -Wa,--noexecstack -Wall -O3 -fomit-frame-pointer -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG CPUINFO: OPENSSL_ia32cap=0x381bf3f:0x0 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md5 5226.37k 17711.27k 47580.84k 81752.41k 103765.33k 105966.25k sha1 4069.17k 12035.98k 26991.73k 39550.03k 45520.21k 46011.73k sha256 2642.86k 6697.94k 14379.07k 19284.76k 21435.73k 21610.50k sha512 992.36k 3967.08k 5949.53k 8293.38k 9368.92k 9456.54k idea-cbc 8059.19k 9200.92k 9572.09k 9631.06k 9658.37k 9654.46k aes-256-cbc 31047.67k 105690.01k 265461.16k 425332.83k 515844.78k 523709.10k Segmentation fault
As expected the Padlock engine only accelerates the AES methods on this hardware, the other methods show the same performance as without the engine. Interestingly the 'padlock' engine shows better performance for AES with small block sizes, than the 'afalg' engine using the kernel implementation, which also utilizes the PadLock hardware.
C. Notes for Debian Linux 11 and OpenSSL 1.1.0l on AMD Ryzen
AMD's Ryzen processors implement Intel's Advanced Encryption Standard (AES) New Instructions (AES-NI) and SHA Extensions (SHA Ext.) instruction set extensions for the acceleration of AES and SHA cryptographic methods.
Due to issues with the selection of AES assembler modes in the OpenSSL 1.1.1 series, OpenSSL 1.1.0l is being used instead of the current OpenSSL 1.1.1 series. This serves to better illustrate the differences in AES performance for the C method implementations (i.e. no assembler) and processor family assembler comparisons.
AMD Ryzen 5 3600 | ||||||
---|---|---|---|---|---|---|
OpenSSL 1.1.0l on Debian Linux 11 for x86_64 | ||||||
Method | AES-256 CBC | IDEA CBC | MD5 | SHA-1 | SHA-256 | SHA-512 |
no-asm | 212,366.68k | 119,010.65k | 749,469.70k | 845,922.30k | 307,301.03k | 553,937.58k |
asm | 232,721.07k | 119,545.86k | 793,971.37k | 1,050,842.45k | 470,701.40k | 604,972.40k |
AES-NI & SHA Ext. | 1,086,559.57k | 119,250.94k | 790,495.23k | 1,027,295.91k | 470,671.36k | 605,863.94k |
Kernel | 1,092,332.20k | 120,416.94k | 787,464.19k | 1,029,693.44k | 475,791.36k | 601,159.00k |
Notes:
- During the build tests (make test) the 'fuzz' test fails due to a missing update (see repo/gentoo.git - Official Gentoo ebuild repository for a patch that fixes the issue).
- A runtime issue with the Elliptic Curve methods means some ECDH methods fail to give meaningful results, so the benchmark builds disable these methods (no-ec).
- The OpenSSL 1.1.0l assembler SHA method implementations automatically utilize the SHA extensions, if they are available, making the 'asm' and SHA Ext. results are similar for those methods
- The 'afalg' engine in OpenSSL 1.1.0l only supports AES-128 CBC, so no improvement is seen in our selected comparison methods. From a test run 'afalg' increases throughput for AES-128 CBC from ~15 MB/s to ~2.7 GB/s on this system. Since the 'afalg' method use requires use of the '-evp' option, methods show performance that includes acceleration from AES-NI and SHA Ext.
C.1. Debian Linux 11, OpenSSL 1.1.0l, no assembler compile
Downloading the source distribution from the OpenSSL site and building with the no assembler option ($ ./config no-asm no-ec), gives a build using the portable C implementations for the methods.
Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed):
OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md2 0.00 0.00 0.00 0.00 0.00 0.00 mdc2 20730.02k 22381.89k 22966.87k 23125.67k 23177.90k 23101.44k md4 116766.41k 351480.79k 790411.01k 1157575.68k 1333168.81k 1353602.39k md5 156742.81k 342366.40k 581831.32k 701704.19k 749469.70k 753330.86k hmac(md5) 69585.79k 204529.32k 447552.26k 638072.83k 739137.58k 739519.15k sha1 173182.02k 377457.00k 646010.28k 789712.55k 845922.30k 851148.80k rmd160 55190.68k 131163.46k 236064.09k 296644.27k 319621.80k 321574.23k rc4 385952.59k 392689.51k 392603.82k 398494.04k 396479.15k 401604.61k des cbc 93933.23k 97041.13k 98196.31k 98509.82k 98492.42k 98435.07k des ede3 36440.89k 36739.35k 36494.93k 37111.13k 37102.36k 37295.45k idea cbc 114402.18k 117464.04k 118522.45k 119008.60k 119010.65k 119007.91k seed cbc 106145.94k 109854.22k 109158.91k 110865.07k 110428.16k 109690.88k rc2 cbc 56211.93k 57472.96k 58497.19k 58632.53k 58684.76k 58621.95k rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00 0.00 blowfish cbc 152106.21k 161719.22k 163615.91k 164193.96k 163667.97k 164080.30k cast cbc 134783.27k 137901.70k 138407.34k 137374.72k 137805.82k 137620.14k aes-128 cbc 268837.55k 279241.41k 283237.21k 284580.86k 286363.83k 285250.90k aes-192 cbc 233378.87k 241100.22k 239992.66k 240700.76k 245841.92k 244094.29k aes-256 cbc 203023.76k 210830.19k 210040.41k 209316.86k 212366.68k 213166.76k camellia-128 cbc 192874.25k 196834.09k 198303.66k 198697.98k 197648.38k 195930.79k camellia-192 cbc 148299.39k 152704.83k 150698.15k 152804.01k 152867.10k 152895.49k camellia-256 cbc 148925.98k 152378.58k 153129.98k 153295.53k 152136.36k 151202.47k sha256 69916.46k 147095.52k 240012.71k 287934.81k 307301.03k 305905.66k sha512 62274.78k 254578.13k 347497.91k 490810.03k 553937.58k 570299.73k whirlpool 38258.02k 78932.71k 129490.35k 153168.21k 162155.02k 162409.13k aes-128 ige 240440.92k 262720.19k 270912.39k 272708.61k 272829.10k 273110.36k aes-192 ige 214961.81k 231054.78k 235207.59k 233662.81k 231488.39k 234067.29k aes-256 ige 188522.23k 201385.24k 206791.34k 206300.16k 205572.78k 208071.32k ghash 335113.40k 341217.81k 343153.24k 346411.01k 348973.74k 350332.66k sign verify sign/s verify/s rsa 512 bits 0.000112s 0.000006s 8900.5 175828.4 rsa 1024 bits 0.000546s 0.000017s 1830.4 60009.2 rsa 2048 bits 0.003192s 0.000055s 313.3 18201.4 rsa 3072 bits 0.007916s 0.000122s 126.3 8207.5 rsa 4096 bits 0.019550s 0.000196s 51.2 5092.3 rsa 7680 bits 0.098922s 0.000749s 10.1 1335.7 rsa 15360 bits 0.716429s 0.002938s 1.4 340.3 sign verify sign/s verify/s dsa 512 bits 0.000143s 0.000087s 6983.6 11526.8 dsa 1024 bits 0.000337s 0.000247s 2967.9 4051.4 dsa 2048 bits 0.000975s 0.000900s 1026.0 1110.8
These results give a performance baseline for this platform.
C.2. Debian Linux 11, OpenSSL 1.1.0l, default compile
Downloading the source distribution from the OpenSSL site and building with the default options ($ ./config no-ec), gives a build with the assembler methods enabled.
Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed):
OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md2 0.00 0.00 0.00 0.00 0.00 0.00 mdc2 20798.84k 22593.45k 23178.67k 23328.09k 23166.98k 23358.12k md4 95277.68k 297859.71k 719539.71k 1120059.05k 1334834.52k 1350079.83k md5 97994.57k 340445.14k 597550.93k 741420.03k 793971.37k 789266.43k hmac(md5) 61321.42k 186359.87k 430417.17k 649776.47k 782076.59k 791876.95k sha1 103850.11k 409047.21k 757470.63k 967015.77k 1050842.45k 1058881.54k rmd160 48997.42k 119448.55k 221065.56k 283963.73k 309026.82k 310028.97k rc4 544631.54k 601012.33k 546238.46k 511943.34k 499725.65k 500438.36k des cbc 87352.36k 89708.48k 91152.30k 91687.94k 91854.17k 91805.01k des ede3 33937.44k 34380.59k 33691.82k 34263.38k 34463.74k 33878.41k idea cbc 114699.64k 117182.55k 118750.38k 118914.73k 119545.86k 119444.82k seed cbc 106397.13k 109433.28k 110279.17k 110304.60k 110235.47k 110269.78k rc2 cbc 56623.22k 57481.81k 57929.39k 57963.52k 58387.11k 58507.26k rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00 0.00 blowfish cbc 150240.14k 160040.47k 163246.08k 163654.66k 163779.93k 163616.09k cast cbc 134531.68k 137505.19k 137351.00k 137942.36k 138179.93k 136642.56k aes-128 cbc 143370.76k 181033.49k 190430.63k 301337.17k 303351.67k 304562.18k aes-192 cbc 127582.04k 154253.95k 161869.06k 257087.15k 265120.43k 265540.95k aes-256 cbc 113777.54k 133252.31k 138981.80k 232166.74k 232721.07k 235678.38k camellia-128 cbc 183278.21k 218869.42k 225665.79k 228573.87k 232745.64k 233619.46k camellia-192 cbc 145994.45k 163625.87k 169293.82k 173766.66k 174830.93k 175484.15k camellia-256 cbc 144564.23k 166374.21k 170964.03k 174196.74k 175087.62k 175155.88k sha256 69818.45k 206040.64k 359991.13k 441089.02k 470701.40k 472733.01k sha512 59340.73k 234639.83k 370854.91k 528685.04k 604972.40k 615967.17k whirlpool 39840.33k 99675.18k 165740.89k 195083.95k 205299.71k 209704.28k aes-128 ige 169776.44k 181493.42k 184641.88k 184963.75k 184407.38k 183833.94k aes-192 ige 140920.70k 147661.03k 153481.90k 155284.14k 156060.33k 155178.33k aes-256 ige 126160.81k 131790.38k 132928.38k 134006.33k 133819.05k 133283.84k ghash 1829075.23k 4999172.03k 7964922.39k 8924905.47k 9241427.97k 9231335.42k sign verify sign/s verify/s rsa 512 bits 0.000034s 0.000003s 29238.3 397281.6 rsa 1024 bits 0.000098s 0.000006s 10256.2 154937.1 rsa 2048 bits 0.000505s 0.000021s 1979.2 46636.7 rsa 3072 bits 0.002279s 0.000046s 438.7 21625.7 rsa 4096 bits 0.005269s 0.000079s 189.8 12586.5 rsa 7680 bits 0.045917s 0.000274s 21.8 3654.4 rsa 15360 bits 0.255750s 0.001085s 3.9 922.0 sign verify sign/s verify/s dsa 512 bits 0.000063s 0.000040s 15846.7 24996.5 dsa 1024 bits 0.000114s 0.000091s 8753.2 10981.2 dsa 2048 bits 0.000313s 0.000288s 3194.5 3472.4
The performance gains from the assembler implementations are somewhat evident, although not as distinct as for the 32-bit tests, and demonstrate which methods have assembler implementations.
C.3. Debian Linux 11, OpenSSL 1.1.0l, default compile: AES-NI & SHA-NI
The default compile (see C.2.) also supports the hardware acceleration of AES and SHA using the AES-NI and SHA-NI instruction set extensions. To invoke the accelerated methods the required method has to be accessed with the EVP option:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 177455009 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 64 size blocks: 49476779 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 12693951 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 3195267 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 397910 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 16384 size blocks: 200785 aes-256-cbc's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-cbc 946426.71k 1055504.62k 1083217.15k 1090651.14k 1086559.57k 1096553.81k
Note the -evp is required to use the accelerated implementation, without the option the normal version of the method is used:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed aes-256-cbc Doing aes-256 cbc for 3s on 16 size blocks: 21434583 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 64 size blocks: 6237975 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 256 size blocks: 1633547 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 1024 size blocks: 690046 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 8192 size blocks: 86806 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 16384 size blocks: 43459 aes-256 cbc's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256 cbc 114317.78k 133076.80k 139396.01k 235535.70k 237038.25k 237344.09k
Showing the difference between the normal assembler implementation (~237 MB/s) and the AES-NI accelerated version (1.09 GB/s).
Doing this for each of the other methods in the comparison:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -evp idea Doing idea-cbc for 3s on 16 size blocks: 20963668 idea-cbc's in 3.00s Doing idea-cbc for 3s on 64 size blocks: 5496543 idea-cbc's in 3.00s Doing idea-cbc for 3s on 256 size blocks: 1388950 idea-cbc's in 3.00s Doing idea-cbc for 3s on 1024 size blocks: 345461 idea-cbc's in 3.00s Doing idea-cbc for 3s on 8192 size blocks: 43671 idea-cbc's in 3.00s Doing idea-cbc for 3s on 16384 size blocks: 21834 idea-cbc's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes idea-cbc 111806.23k 117259.58k 118523.73k 117917.35k 119250.94k 119242.75k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -evp md5 Doing md5 for 3s on 16 size blocks: 15666020 md5's in 3.00s Doing md5 for 3s on 64 size blocks: 10908270 md5's in 3.00s Doing md5 for 3s on 256 size blocks: 5874182 md5's in 3.00s Doing md5 for 3s on 1024 size blocks: 2048347 md5's in 3.00s Doing md5 for 3s on 8192 size blocks: 289488 md5's in 3.00s Doing md5 for 3s on 16384 size blocks: 146239 md5's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md5 83552.11k 232709.76k 501263.53k 699169.11k 790495.23k 798659.93k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -evp sha1 Doing sha1 for 3s on 16 size blocks: 15591881 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11524163 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 6873254 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 2584282 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 376207 sha1's in 3.00s Doing sha1 for 3s on 16384 size blocks: 191879 sha1's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha1 83156.70k 245848.81k 586517.67k 882101.59k 1027295.91k 1047915.18k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -evp sha256 Doing sha256 for 3s on 16 size blocks: 11599385 sha256's in 3.00s Doing sha256 for 3s on 64 size blocks: 7423843 sha256's in 3.00s Doing sha256 for 3s on 256 size blocks: 3742404 sha256's in 3.00s Doing sha256 for 3s on 1024 size blocks: 1249273 sha256's in 3.00s Doing sha256 for 3s on 8192 size blocks: 172365 sha256's in 3.00s Doing sha256 for 3s on 16384 size blocks: 87592 sha256's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha256 61863.39k 158375.32k 319351.81k 426418.52k 470671.36k 478369.11k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -evp sha512 Doing sha512 for 3s on 16 size blocks: 8468652 sha512's in 3.00s Doing sha512 for 3s on 64 size blocks: 8380446 sha512's in 3.00s Doing sha512 for 3s on 256 size blocks: 3874261 sha512's in 3.00s Doing sha512 for 3s on 1024 size blocks: 1491164 sha512's in 3.00s Doing sha512 for 3s on 8192 size blocks: 221874 sha512's in 3.00s Doing sha512 for 3s on 16384 size blocks: 112352 sha512's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha512 45166.14k 178782.85k 330603.61k 508983.98k 605863.94k 613591.72k
The IDEA and MD5 methods show the standard assembler performance. The SHA methods show the same performance as before, which suggests the regular assembler can use the SHA instructions if they are available.
C.4. Debian Linux 11, OpenSSL 1.1.0l, use kernel methods (AF_ALG)
The default compile (see C.2.) includes the 'afalg' engine, which provides access to the Linux Kernel Crypto API (AF_ALG) method implementations. Details of the available kernel crypto methods can be found in /proc/crypto:
$ cat /proc/crypto | grep '^name' name : __ghash name : ghash name : __ghash name : __gcm(aes) name : gcm(aes) name : __rfc4106(gcm(aes)) name : rfc4106(gcm(aes)) name : __gcm(aes) name : __rfc4106(gcm(aes)) name : __xts(aes) name : xts(aes) name : __ctr(aes) name : ctr(aes) name : __cbc(aes) name : cbc(aes) name : __ecb(aes) name : ecb(aes) name : __xts(aes) name : __ctr(aes) name : __cbc(aes) name : __ecb(aes) name : aes name : crc32c name : crct10dif name : crct10dif name : crc32 name : crc32c name : pkcs1pad(rsa,sha256) name : hmac(sha256) name : hmac(sha1) name : lzo-rle name : lzo-rle name : lzo name : lzo name : zlib-deflate name : deflate name : deflate name : sha224 name : sha256 name : sha1 name : md5 name : ecb(cipher_null) name : digest_null name : compress_null name : cipher_null name : rsa name : dh
However the 'afalg' engine in this version only supports a subset:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl engine afalg -c (afalg) AFALG engine support [AES-128-CBC]
So this implementation only supports the kernel methods for AES-128 CBC:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp aes-128-cbc engine "afalg" set. Doing aes-128-cbc for 3s on 16 size blocks: 1643546 aes-128-cbc's in 0.42s Doing aes-128-cbc for 3s on 64 size blocks: 1632116 aes-128-cbc's in 0.41s Doing aes-128-cbc for 3s on 256 size blocks: 1508914 aes-128-cbc's in 0.46s Doing aes-128-cbc for 3s on 1024 size blocks: 1199547 aes-128-cbc's in 0.42s Doing aes-128-cbc for 3s on 8192 size blocks: 398876 aes-128-cbc's in 0.14s Doing aes-128-cbc for 3s on 16384 size blocks: 223060 aes-128-cbc's in 0.12s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 62611.28k 254769.33k 839743.44k 2924609.83k 23339944.23k 30455125.33k
In this version of OpenSSL the -evp is required to use the accelerated implementation, without it the option the normal version of the method is used:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg aes-128-cbc engine "afalg" set. Doing aes-128 cbc for 3s on 16 size blocks: 27509274 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 64 size blocks: 8428142 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 256 size blocks: 2272506 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 1024 size blocks: 876181 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 8192 size blocks: 111665 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 16384 size blocks: 56092 aes-128 cbc's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128 cbc 146716.13k 179800.36k 193920.51k 299069.78k 304919.89k 306337.11k
So the difference for AES-128 CBC is very significant: ~305 MB/s normally and ~23.3 GB/s with the kernel method.
Sadly this isn't one of our comparison methods, and those are not in the list of 'afalg' supported methods. Still let's collect figures for them:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp aes-256-cbc engine "afalg" set. Doing aes-256-cbc for 3s on 16 size blocks: 177999118 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 64 size blocks: 49697601 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 12686659 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 3192273 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 400024 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 16384 size blocks: 200009 aes-256-cbc's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-cbc 949328.63k 1060215.49k 1082594.90k 1089629.18k 1092332.20k 1092315.82k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp idea engine "afalg" set. Doing idea-cbc for 3s on 16 size blocks: 21078310 idea-cbc's in 3.00s Doing idea-cbc for 3s on 64 size blocks: 5561659 idea-cbc's in 3.00s Doing idea-cbc for 3s on 256 size blocks: 1405715 idea-cbc's in 3.00s Doing idea-cbc for 3s on 1024 size blocks: 352417 idea-cbc's in 3.00s Doing idea-cbc for 3s on 8192 size blocks: 44098 idea-cbc's in 3.00s Doing idea-cbc for 3s on 16384 size blocks: 22062 idea-cbc's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes idea-cbc 112417.65k 118648.73k 119954.35k 120291.67k 120416.94k 120487.94k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp md5 engine "afalg" set. Doing md5 for 3s on 16 size blocks: 15277496 md5's in 2.99s Doing md5 for 3s on 64 size blocks: 10886163 md5's in 3.00s Doing md5 for 3s on 256 size blocks: 5820065 md5's in 3.00s Doing md5 for 3s on 1024 size blocks: 2034742 md5's in 3.00s Doing md5 for 3s on 8192 size blocks: 288378 md5's in 3.00s Doing md5 for 3s on 16384 size blocks: 145376 md5's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md5 81752.49k 232238.14k 496645.55k 694525.27k 787464.19k 793946.79k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp sha1 engine "afalg" set. Doing sha1 for 3s on 16 size blocks: 15013924 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11614002 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 6838376 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 2583074 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 377085 sha1's in 3.00s Doing sha1 for 3s on 16384 size blocks: 190971 sha1's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha1 80074.26k 247765.38k 583541.42k 881689.26k 1029693.44k 1042956.29k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp sha256 engine "afalg" set. Doing sha256 for 3s on 16 size blocks: 11335573 sha256's in 3.00s Doing sha256 for 3s on 64 size blocks: 7378903 sha256's in 3.00s Doing sha256 for 3s on 256 size blocks: 3674405 sha256's in 3.00s Doing sha256 for 3s on 1024 size blocks: 1245088 sha256's in 3.00s Doing sha256 for 3s on 8192 size blocks: 174240 sha256's in 3.00s Doing sha256 for 3s on 16384 size blocks: 87867 sha256's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha256 60456.39k 157416.60k 313549.23k 424990.04k 475791.36k 479870.98k $ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -engine afalg -evp sha512 engine "afalg" set. Doing sha512 for 3s on 16 size blocks: 8480500 sha512's in 3.00s Doing sha512 for 3s on 64 size blocks: 8412202 sha512's in 3.00s Doing sha512 for 3s on 256 size blocks: 3837396 sha512's in 3.00s Doing sha512 for 3s on 1024 size blocks: 1487457 sha512's in 3.00s Doing sha512 for 3s on 8192 size blocks: 220151 sha512's in 3.00s Doing sha512 for 3s on 16384 size blocks: 110360 sha512's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha512 45229.33k 179460.31k 327457.79k 507718.66k 601159.00k 602712.75k
So, as expected, no change here.
D. Notes for Debian Linux 11 and OpenSSL 3.0.2 on AMD Ryzen
The AMD Ryzen family of processors support the AES-NI and SHA-NI instruction set extensions first proposed by Intel. Support for AES-NI was added in OpenSSL 1.0.1 (2012), and for SHA-NI in OpenSSL 1.0.2 (2015).
The latest release version of OpenSSL, 3.0.2 contains a lot of changes, but retains the command-line interface used for testing.
OpenSSL 3.0.2 with Debian Linux 11 for x86_64 on AMD Ryzen 5 3600 | ||||||
---|---|---|---|---|---|---|
Method | AES-256 CBC | IDEA CBC | MD5 | SHA-1 | SHA-256 | SHA-512 |
no-asm | 210,668.20k | 117,587.97k | 740,698.79k | 832,288.09k | 310,059.01k | 559,205.03k |
asm | 1,097,910.95k | 119,048.87k | 789,848.06k | 1,036,997.97k | 473,503.06k | 599,274.84k |
Kernel | 37,086,822.40k | 120,220.33k | 793,990.49k | 1,041,334.27k | 474,685.44k | 598,278.14k |
Notes:
- OpenSSL 3.x considers IDEA a "legacy" method. To include "legacy" methods in "speed" runs use the -provider legacy -provider default options.
- For the purposes of these tests I am sacrificing some performance by using a VirtualBox VM running Debian Linux 11 on a MS Windows 10 host. While this introduces some overhead, and means the virtual system isn't as capable as the host, it does give access to the instructions, and since the "speed" benchmarks are single threaded the performance should be indicative.
- Running from the source/build directories used a command-line like: LD_LIBRARY_PATH=`pwd` apps/openssl speed -provider-path ./providers/ -provider legacy -provider default
- OpenSSL 3.0.2 assembler implementations of AES and SHA methods use the AES-NI and SHA Ext. instruction set extensions if they are available
D.1. Debian Linux 11, OpenSSL 3.0.2, no assembler compile
Downloading the source distribution from the OpenSSL site and building with the no assembler option ($ ./config no-asm), gives a build using the portable C implementations for the methods.
Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default):
version: 3.0.2 built on: Tue May 3 10:21:48 2022 UTC options: bn(64,64) compiler: gcc -fPIC -pthread -m64 -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG CPUINFO: N/A The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes mdc2 20562.45k 23065.00k 24045.06k 24235.01k 24171.86k 24053.45k md4 93440.70k 292054.72k 713656.32k 1107414.36k 1316017.49k 1335416.15k md5 75762.36k 220376.19k 474277.03k 658416.35k 740698.79k 746968.41k sha1 74135.27k 207818.20k 484524.97k 713803.09k 832288.09k 838565.89k rmd160 48934.47k 123549.25k 228486.49k 294425.26k 320845.14k 322808.49k sha256 44194.68k 109362.69k 215166.29k 281563.14k 310059.01k 311987.29k sha512 39869.37k 156745.30k 285906.52k 459415.21k 559205.03k 565084.16k whirlpool 28946.23k 66454.40k 117860.44k 150848.17k 162491.05k 162441.10k hmac(md5) 57372.63k 178631.87k 415191.55k 626126.85k 741441.54k 748607.28k des-cbc 89745.45k 94976.34k 96349.27k 96561.15k 98486.95k 98407.77k des-ede3 35742.68k 36740.82k 36624.04k 36697.43k 37090.65k 37300.91k rc4 404351.10k 399215.17k 401447.08k 403255.98k 399493.80k 395624.45k idea-cbc 111618.99k 117232.90k 117311.49k 114899.29k 117587.97k 117522.43k seed-cbc 104352.96k 109517.72k 110598.91k 111045.63k 110875.99k 109679.96k rc2-cbc 55675.72k 57482.73k 58371.50k 57400.66k 57888.32k 58212.35k blowfish 146209.32k 157064.23k 162900.39k 163886.08k 163973.80k 163998.38k cast-cbc 129929.11k 133614.19k 137071.70k 136841.56k 136303.96k 137059.51k aes-128-cbc 259058.78k 277107.73k 277100.80k 281938.60k 281853.95k 283421.43k aes-192-cbc 225402.85k 239174.83k 239161.77k 238028.12k 243007.49k 242679.81k aes-256-cbc 197046.28k 211595.69k 210573.14k 211770.03k 210668.20k 212779.72k camellia-128-cbc 189300.94k 196735.68k 195763.88k 195276.46k 194786.65k 197225.13k camellia-192-cbc 148085.38k 151104.77k 152280.66k 151895.38k 151928.83k 150574.42k camellia-256-cbc 146215.45k 149354.67k 152370.18k 152790.36k 152002.56k 151333.55k ghash 289559.97k 327118.70k 335364.52k 347649.37k 349069.31k 347755.86k rand 15451.66k 50478.65k 117848.60k 175663.51k 202408.16k 207183.63k sign verify sign/s verify/s rsa 512 bits 0.000106s 0.000006s 9438.1 180236.3 rsa 1024 bits 0.000547s 0.000016s 1827.7 60764.7 rsa 2048 bits 0.003244s 0.000055s 308.3 18215.1 rsa 3072 bits 0.008071s 0.000121s 123.9 8230.5 rsa 4096 bits 0.019841s 0.000200s 50.4 4998.0 rsa 7680 bits 0.101000s 0.000784s 9.9 1275.7 rsa 15360 bits 0.732143s 0.003008s 1.4 332.5 sign verify sign/s verify/s dsa 512 bits 0.000134s 0.000076s 7467.2 13183.0 dsa 1024 bits 0.000332s 0.000239s 3013.1 4185.7 dsa 2048 bits 0.000970s 0.000904s 1031.3 1106.6 sign verify sign/s verify/s 160 bits ecdsa (secp160r1) 0.0004s 0.0004s 2317.2 2732.4 192 bits ecdsa (nistp192) 0.0004s 0.0003s 2315.7 2866.1 224 bits ecdsa (nistp224) 0.0006s 0.0005s 1539.4 1933.3 256 bits ecdsa (nistp256) 0.0007s 0.0006s 1338.4 1685.6 384 bits ecdsa (nistp384) 0.0015s 0.0011s 650.5 873.3 521 bits ecdsa (nistp521) 0.0030s 0.0020s 331.5 490.3 163 bits ecdsa (nistk163) 0.0004s 0.0007s 2770.9 1426.6 233 bits ecdsa (nistk233) 0.0005s 0.0010s 1874.8 969.2 283 bits ecdsa (nistk283) 0.0011s 0.0022s 885.9 454.2 409 bits ecdsa (nistk409) 0.0024s 0.0046s 413.6 215.4 571 bits ecdsa (nistk571) 0.0049s 0.0096s 202.4 104.6 163 bits ecdsa (nistb163) 0.0004s 0.0007s 2657.6 1353.5 233 bits ecdsa (nistb233) 0.0006s 0.0011s 1802.5 937.5 283 bits ecdsa (nistb283) 0.0012s 0.0024s 818.3 419.7 409 bits ecdsa (nistb409) 0.0026s 0.0052s 378.6 191.1 571 bits ecdsa (nistb571) 0.0056s 0.0107s 178.2 93.2 256 bits ecdsa (brainpoolP256r1) 0.0008s 0.0008s 1180.4 1314.4 256 bits ecdsa (brainpoolP256t1) 0.0008s 0.0007s 1182.8 1447.8 384 bits ecdsa (brainpoolP384r1) 0.0019s 0.0016s 516.2 623.5 384 bits ecdsa (brainpoolP384t1) 0.0019s 0.0015s 532.5 683.6 512 bits ecdsa (brainpoolP512r1) 0.0046s 0.0039s 216.9 259.7 512 bits ecdsa (brainpoolP512t1) 0.0046s 0.0034s 217.0 290.2 op op/s 160 bits ecdh (secp160r1) 0.0004s 2433.5 192 bits ecdh (nistp192) 0.0004s 2505.3 224 bits ecdh (nistp224) 0.0006s 1650.3 256 bits ecdh (nistp256) 0.0007s 1424.5 384 bits ecdh (nistp384) 0.0014s 702.2 521 bits ecdh (nistp521) 0.0028s 362.3 163 bits ecdh (nistk163) 0.0003s 2945.9 233 bits ecdh (nistk233) 0.0005s 2022.0 283 bits ecdh (nistk283) 0.0011s 933.7 409 bits ecdh (nistk409) 0.0023s 443.2 571 bits ecdh (nistk571) 0.0047s 211.7 163 bits ecdh (nistb163) 0.0004s 2817.3 233 bits ecdh (nistb233) 0.0005s 1938.5 283 bits ecdh (nistb283) 0.0012s 863.2 409 bits ecdh (nistb409) 0.0025s 394.7 571 bits ecdh (nistb571) 0.0052s 191.7 256 bits ecdh (brainpoolP256r1) 0.0008s 1248.1 256 bits ecdh (brainpoolP256t1) 0.0008s 1252.6 384 bits ecdh (brainpoolP384r1) 0.0019s 538.5 384 bits ecdh (brainpoolP384t1) 0.0018s 558.9 512 bits ecdh (brainpoolP512r1) 0.0044s 226.8 512 bits ecdh (brainpoolP512t1) 0.0043s 231.4 253 bits ecdh (X25519) 0.0000s 24870.5 448 bits ecdh (X448) 0.0002s 6318.3 sign verify sign/s verify/s 253 bits EdDSA (Ed25519) 0.0000s 0.0001s 33423.2 9964.8 456 bits EdDSA (Ed448) 0.0004s 0.0002s 2382.6 4835.4 sign verify sign/s verify/s 256 bits SM2 (CurveSM2) 0.0009s 0.0007s 1075.4 1425.0 op op/s 2048 bits ffdh 0.0098s 101.7 3072 bits ffdh 0.0268s 37.4 4096 bits ffdh 0.0633s 15.8 6144 bits ffdh 0.1965s 5.1 8192 bits ffdh 0.4333s 2.3
Okay so this establishes a baseline for performance of the methods on this platform.
D.2. Debian Linux 11, OpenSSL 3.0.2, default compile
Downloading the source distribution from the OpenSSL site and building with the default options, gives a build with assembler methods enabled.
Running the OpenSSL speed test gives ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default):
version: 3.0.2 built on: Tue May 3 11:08:16 2022 UTC options: bn(64,64) compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG CPUINFO: OPENSSL_ia32cap=0xdef82203078bffff:0x840021 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes mdc2 19088.32k 21383.15k 22120.11k 22114.65k 22066.52k 22336.85k md4 86959.79k 274045.21k 686398.72k 1086177.28k 1302282.24k 1318507.86k md5 74624.08k 221166.83k 481884.16k 689194.67k 789848.06k 795716.27k sha1 68835.40k 211919.64k 530604.20k 850401.62k 1036997.97k 1052246.02k rmd160 45938.43k 118583.77k 225689.86k 292971.52k 320547.50k 320629.42k sha256 56456.28k 151513.83k 312126.29k 423563.26k 473503.06k 468751.70k sha512 42409.53k 173236.01k 326201.60k 500586.84k 599274.84k 607879.17k whirlpool 34184.97k 83005.12k 152824.75k 192139.95k 207866.54k 209338.37k hmac(md5) 54607.51k 175414.14k 427773.27k 652594.86k 774367.91k 792139.09k des-cbc 86150.46k 90269.29k 90673.16k 91110.06k 87848.28k 90357.76k des-ede3 33675.40k 34008.00k 32592.38k 34458.28k 34021.38k 34155.18k rc4 461580.29k 545079.64k 521484.71k 501519.02k 496358.74k 499171.33k idea-cbc 110334.51k 115313.34k 117287.51k 117309.44k 119048.87k 119138.99k seed-cbc 104858.23k 108716.29k 110551.72k 111048.02k 109797.38k 110302.55k rc2-cbc 55981.69k 56614.93k 56955.99k 57540.61k 57696.26k 57333.08k blowfish 146308.27k 159094.66k 161397.33k 163182.59k 163012.61k 164080.30k cast-cbc 131535.55k 136160.68k 137896.70k 138655.06k 138483.03k 138199.04k aes-128-cbc 1232361.96k 1423773.50k 1448355.75k 1462991.53k 1474143.60k 1484817.81k aes-192-cbc 1055904.59k 1199071.38k 1250383.96k 1261934.25k 1267545.43k 1262824.11k aes-256-cbc 956115.02k 1054160.70k 1081789.53k 1089324.03k 1097910.95k 1084456.96k camellia-128-cbc 164696.63k 210310.44k 226814.21k 227034.79k 230888.79k 231669.76k camellia-192-cbc 132917.51k 162760.79k 170745.51k 174113.45k 175030.27k 171911.85k camellia-256-cbc 130447.28k 162488.66k 169590.44k 169819.14k 168951.81k 174331.22k ghash 1015241.07k 3052006.70k 6412202.58k 8288329.39k 8981708.80k 9173494.44k rand 23134.64k 90360.31k 348861.17k 1238087.79k 4722876.39k 5988869.69k sign verify sign/s verify/s rsa 512 bits 0.000033s 0.000002s 30318.4 440171.6 rsa 1024 bits 0.000098s 0.000006s 10207.1 160257.8 rsa 2048 bits 0.000510s 0.000022s 1962.1 45548.0 rsa 3072 bits 0.002315s 0.000047s 432.0 21473.4 rsa 4096 bits 0.005269s 0.000083s 189.8 12105.3 rsa 7680 bits 0.046055s 0.000278s 21.7 3603.5 rsa 15360 bits 0.253500s 0.001109s 3.9 902.0 sign verify sign/s verify/s dsa 512 bits 0.000051s 0.000032s 19656.4 30843.9 dsa 1024 bits 0.000103s 0.000085s 9723.2 11707.9 dsa 2048 bits 0.000301s 0.000278s 3325.2 3591.6 sign verify sign/s verify/s 160 bits ecdsa (secp160r1) 0.0002s 0.0002s 6005.0 6014.6 192 bits ecdsa (nistp192) 0.0002s 0.0002s 4834.5 5073.3 224 bits ecdsa (nistp224) 0.0003s 0.0003s 3353.4 3639.4 256 bits ecdsa (nistp256) 0.0000s 0.0001s 50187.2 16141.0 384 bits ecdsa (nistp384) 0.0008s 0.0007s 1300.1 1522.5 521 bits ecdsa (nistp521) 0.0018s 0.0015s 541.4 684.8 163 bits ecdsa (nistk163) 0.0002s 0.0003s 5822.0 2960.9 233 bits ecdsa (nistk233) 0.0002s 0.0004s 4482.1 2309.0 283 bits ecdsa (nistk283) 0.0004s 0.0008s 2608.5 1329.3 409 bits ecdsa (nistk409) 0.0006s 0.0013s 1545.2 772.5 571 bits ecdsa (nistk571) 0.0014s 0.0028s 694.7 360.9 163 bits ecdsa (nistb163) 0.0002s 0.0003s 5628.4 2857.7 233 bits ecdsa (nistb233) 0.0002s 0.0005s 4322.1 2218.3 283 bits ecdsa (nistb283) 0.0004s 0.0008s 2497.4 1250.1 409 bits ecdsa (nistb409) 0.0007s 0.0013s 1461.3 757.9 571 bits ecdsa (nistb571) 0.0015s 0.0029s 656.3 344.4 256 bits ecdsa (brainpoolP256r1) 0.0003s 0.0003s 3025.4 3109.1 256 bits ecdsa (brainpoolP256t1) 0.0003s 0.0003s 3001.6 3316.0 384 bits ecdsa (brainpoolP384r1) 0.0008s 0.0007s 1288.8 1469.8 384 bits ecdsa (brainpoolP384t1) 0.0008s 0.0006s 1296.9 1590.2 512 bits ecdsa (brainpoolP512r1) 0.0013s 0.0011s 755.6 890.6 512 bits ecdsa (brainpoolP512t1) 0.0013s 0.0011s 760.4 951.9 op op/s 160 bits ecdh (secp160r1) 0.0002s 6464.6 192 bits ecdh (nistp192) 0.0002s 5105.8 224 bits ecdh (nistp224) 0.0003s 3564.4 256 bits ecdh (nistp256) 0.0000s 21233.2 384 bits ecdh (nistp384) 0.0007s 1372.1 521 bits ecdh (nistp521) 0.0017s 576.3 163 bits ecdh (nistk163) 0.0002s 6098.0 233 bits ecdh (nistk233) 0.0002s 4869.1 283 bits ecdh (nistk283) 0.0004s 2794.8 409 bits ecdh (nistk409) 0.0006s 1673.1 571 bits ecdh (nistk571) 0.0013s 756.7 163 bits ecdh (nistb163) 0.0002s 5946.4 233 bits ecdh (nistb233) 0.0002s 4702.5 283 bits ecdh (nistb283) 0.0004s 2652.9 409 bits ecdh (nistb409) 0.0006s 1588.7 571 bits ecdh (nistb571) 0.0014s 713.5 256 bits ecdh (brainpoolP256r1) 0.0003s 3166.2 256 bits ecdh (brainpoolP256t1) 0.0003s 3147.1 384 bits ecdh (brainpoolP384r1) 0.0007s 1338.4 384 bits ecdh (brainpoolP384t1) 0.0007s 1375.6 512 bits ecdh (brainpoolP512r1) 0.0013s 790.6 512 bits ecdh (brainpoolP512t1) 0.0013s 792.7 253 bits ecdh (X25519) 0.0000s 29235.1 448 bits ecdh (X448) 0.0002s 6306.2 sign verify sign/s verify/s 253 bits EdDSA (Ed25519) 0.0000s 0.0001s 31993.6 10019.0 456 bits EdDSA (Ed448) 0.0002s 0.0002s 5386.3 4842.7 sign verify sign/s verify/s 256 bits SM2 (CurveSM2) 0.0003s 0.0003s 3003.2 3320.2 op op/s 2048 bits ffdh 0.0025s 394.7 3072 bits ffdh 0.0084s 119.4 4096 bits ffdh 0.0194s 51.5 6144 bits ffdh 0.0646s 15.5 8192 bits ffdh 0.1529s 6.5
With the assembler in place the methods with assembler implementations show improved performance.
D.3. Debian Linux 11, OpenSSL 3.0.2, default compile, kernel methods
The default compile (see B.2.) includes the 'afalg' engine, which provides access to the Linux Kernel Crypto API (AF_ALG) method implementations. Details of the available kernel crypto methods can be found in /proc/crypto:
$ cat /proc/crypto | grep '^name' name : __ghash name : ghash name : __ghash name : __gcm(aes) name : gcm(aes) name : __rfc4106(gcm(aes)) name : rfc4106(gcm(aes)) name : __gcm(aes) name : __rfc4106(gcm(aes)) name : __xts(aes) name : xts(aes) name : __ctr(aes) name : ctr(aes) name : __cbc(aes) name : cbc(aes) name : __ecb(aes) name : ecb(aes) name : __xts(aes) name : __ctr(aes) name : __cbc(aes) name : __ecb(aes) name : aes name : crc32c name : crct10dif name : crct10dif name : crc32 name : crc32c name : pkcs1pad(rsa,sha256) name : hmac(sha256) name : hmac(sha1) name : lzo-rle name : lzo-rle name : lzo name : lzo name : zlib-deflate name : deflate name : deflate name : sha224 name : sha256 name : sha1 name : md5 name : ecb(cipher_null) name : digest_null name : compress_null name : cipher_null name : rsa name : dh
Note that on this system the kernel has loaded modules with various crypto functions:
$ lsmod | grep -E '(alg)|(aes)|(sha)|(crypt)' algif_skcipher 16384 0 af_alg 32768 1 algif_skcipher aesni_intel 368640 0 libaes 16384 1 aesni_intel crypto_simd 16384 1 aesni_intel cryptd 24576 2 crypto_simd,ghash_clmulni_intel glue_helper 16384 1 aesni_intel
The 'afalg' engine in OpenSSL supports a subset of the available methods:
$ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl engine afalg -c (afalg) AFALG engine support [AES-128-CBC, AES-192-CBC, AES-256-CBC] 80327193287F0000:error:1280006A:DSO support routines:dlfcn_bind_func:could not bind to the requested symbol name:crypto/dso/dso_dlfcn.c:188:symname(EVP_PKEY_base_id): /home/hamish/src/openssl-3.0.2-asm/engines/afalg.so: undefined symbol: EVP_PKEY_base_id 80327193287F0000:error:1280006A:DSO support routines:DSO_bind_func:could not bind to the requested symbol name:crypto/dso/dso_lib.c:176:
The errors here are likely related to the move to EVP methods and the deprecation of the OpenSSL engine APIs.
Unlike our test with OpenSSL 1.1.0l, this version of the 'afalg' engine supports the AES-256 CBC method used in the comparisons. Also OpenSSL 3.0.2 uses the EVP methods by default in the openssl program, so a single run can give results for all the methods ($ LD_LIBRARY_PATH=`pwd` OPENSSL_ENGINES=`pwd`/engines/ apps/openssl speed -provider-path ./providers/ -provider legacy -provider default -engine afalg):
version: 3.0.2 built on: Tue May 3 11:08:16 2022 UTC options: bn(64,64) compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG CPUINFO: OPENSSL_ia32cap=0xdef82203078bffff:0x840021 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes mdc2 19213.87k 21249.32k 22246.57k 22619.14k 22702.76k 22620.84k md4 87693.94k 279738.60k 692753.07k 1097476.78k 1334378.50k 1349844.99k md5 74869.91k 220405.33k 484849.15k 693844.31k 793990.49k 801641.81k sha1 68785.21k 211651.69k 531813.89k 856263.68k 1041334.27k 1056615.08k rmd160 46334.88k 119616.47k 229049.86k 295681.37k 323463.85k 325659.31k sha256 55986.87k 150935.30k 311471.53k 424374.61k 474685.44k 475125.08k sha512 42818.01k 171608.66k 322486.86k 500541.10k 598278.14k 611521.88k whirlpool 34328.44k 83261.67k 153432.32k 194215.59k 211009.54k 211763.20k hmac(md5) 55910.06k 181670.91k 434200.75k 666086.74k 788709.38k 799091.37k des-cbc 86662.59k 91052.35k 91887.19k 91826.18k 91171.50k 92558.68k des-ede3 33983.00k 34592.13k 34692.34k 34864.81k 34843.31k 34242.56k rc4 463650.38k 572220.84k 533739.09k 514668.54k 503463.94k 500553.05k idea-cbc 111304.33k 116668.42k 119646.46k 120101.21k 120220.33k 120351.40k seed-cbc 106637.95k 109966.85k 110053.55k 111550.81k 112091.14k 112039.25k rc2-cbc 57108.33k 58368.49k 59152.90k 59224.75k 59299.16k 59271.85k blowfish 150639.86k 162052.50k 165114.37k 165712.90k 165997.23k 165199.87k cast-cbc 132735.63k 137822.12k 139187.20k 140039.85k 139911.17k 139973.97k aes-128-cbc 58259.80k 241961.82k 996687.43k 5113851.73k 27540343.47k 46165811.20k aes-192-cbc 63910.40k 210796.15k 1092007.50k 3762384.10k 41412081.37k 45620311.77k aes-256-cbc 59532.76k 257748.16k 1073952.18k 4130459.50k 37086822.40k 56257085.44k camellia-128-cbc 169613.70k 214561.05k 230152.19k 233813.33k 234029.06k 233411.93k camellia-192-cbc 137169.80k 164018.60k 171968.77k 175241.56k 176095.23k 176237.23k camellia-256-cbc 138009.25k 164407.06k 172199.42k 175273.30k 176316.42k 174609.75k ghash 1024894.92k 3138248.59k 6438084.44k 8391383.38k 9201382.74k 9217447.25k rand 22825.32k 89219.80k 341466.76k 1244163.94k 4706436.84k 5927709.06k sign verify sign/s verify/s rsa 512 bits 0.000032s 0.000002s 31006.3 453078.5 rsa 1024 bits 0.000096s 0.000006s 10417.9 161517.6 rsa 2048 bits 0.000498s 0.000021s 2009.9 47395.1 rsa 3072 bits 0.002284s 0.000046s 437.8 21741.9 rsa 4096 bits 0.005227s 0.000079s 191.3 12703.3 rsa 7680 bits 0.045500s 0.000272s 22.0 3679.6 rsa 15360 bits 0.251750s 0.001074s 4.0 930.9 sign verify sign/s verify/s dsa 512 bits 0.000051s 0.000032s 19621.3 30836.6 dsa 1024 bits 0.000103s 0.000082s 9733.5 12182.3 dsa 2048 bits 0.000301s 0.000275s 3325.7 3634.2 sign verify sign/s verify/s 160 bits ecdsa (secp160r1) 0.0002s 0.0002s 6099.3 6111.0 192 bits ecdsa (nistp192) 0.0002s 0.0002s 4927.0 5071.7 224 bits ecdsa (nistp224) 0.0003s 0.0003s 3380.9 3694.4 256 bits ecdsa (nistp256) 0.0000s 0.0001s 49883.8 16261.6 384 bits ecdsa (nistp384) 0.0008s 0.0006s 1307.9 1570.4 521 bits ecdsa (nistp521) 0.0018s 0.0014s 543.0 703.4 163 bits ecdsa (nistk163) 0.0002s 0.0003s 5835.4 2960.4 233 bits ecdsa (nistk233) 0.0002s 0.0004s 4479.9 2301.3 283 bits ecdsa (nistk283) 0.0004s 0.0008s 2612.7 1324.9 409 bits ecdsa (nistk409) 0.0007s 0.0013s 1534.9 798.4 571 bits ecdsa (nistk571) 0.0014s 0.0027s 718.1 368.4 163 bits ecdsa (nistb163) 0.0002s 0.0003s 5626.4 2866.1 233 bits ecdsa (nistb233) 0.0002s 0.0004s 4399.1 2245.8 283 bits ecdsa (nistb283) 0.0004s 0.0008s 2478.7 1269.1 409 bits ecdsa (nistb409) 0.0007s 0.0013s 1480.2 758.4 571 bits ecdsa (nistb571) 0.0015s 0.0029s 666.0 343.7 256 bits ecdsa (brainpoolP256r1) 0.0003s 0.0003s 3039.1 3138.8 256 bits ecdsa (brainpoolP256t1) 0.0003s 0.0003s 2996.7 3245.1 384 bits ecdsa (brainpoolP384r1) 0.0008s 0.0007s 1303.4 1485.5 384 bits ecdsa (brainpoolP384t1) 0.0008s 0.0006s 1322.8 1577.9 512 bits ecdsa (brainpoolP512r1) 0.0013s 0.0011s 756.6 891.3 512 bits ecdsa (brainpoolP512t1) 0.0013s 0.0011s 766.0 950.8 op op/s 160 bits ecdh (secp160r1) 0.0002s 6369.8 192 bits ecdh (nistp192) 0.0002s 5122.3 224 bits ecdh (nistp224) 0.0003s 3562.5 256 bits ecdh (nistp256) 0.0000s 21288.6 384 bits ecdh (nistp384) 0.0007s 1377.7 521 bits ecdh (nistp521) 0.0017s 578.9 163 bits ecdh (nistk163) 0.0002s 6127.8 233 bits ecdh (nistk233) 0.0002s 4843.5 283 bits ecdh (nistk283) 0.0004s 2792.3 409 bits ecdh (nistk409) 0.0006s 1673.8 571 bits ecdh (nistk571) 0.0013s 767.4 163 bits ecdh (nistb163) 0.0002s 5935.1 233 bits ecdh (nistb233) 0.0002s 4694.9 283 bits ecdh (nistb283) 0.0004s 2656.3 409 bits ecdh (nistb409) 0.0006s 1564.5 571 bits ecdh (nistb571) 0.0014s 713.8 256 bits ecdh (brainpoolP256r1) 0.0003s 3217.3 256 bits ecdh (brainpoolP256t1) 0.0003s 3215.5 384 bits ecdh (brainpoolP384r1) 0.0007s 1372.4 384 bits ecdh (brainpoolP384t1) 0.0007s 1394.3 512 bits ecdh (brainpoolP512r1) 0.0013s 791.7 512 bits ecdh (brainpoolP512t1) 0.0013s 797.6 253 bits ecdh (X25519) 0.0000s 29052.1 448 bits ecdh (X448) 0.0002s 6299.4 sign verify sign/s verify/s 253 bits EdDSA (Ed25519) 0.0000s 0.0001s 32158.7 10184.2 456 bits EdDSA (Ed448) 0.0002s 0.0002s 5142.2 4923.4 sign verify sign/s verify/s 256 bits SM2 (CurveSM2) 0.0003s 0.0003s 2981.1 3336.0 op op/s 2048 bits ffdh 0.0026s 392.1 3072 bits ffdh 0.0082s 121.4 4096 bits ffdh 0.0193s 51.9 6144 bits ffdh 0.0641s 15.6 8192 bits ffdh 0.1520s 6.6
Here the kernel's use of acceleration is evident in how much more throughput the AES methods have.
Z. Notes for NetBSD 9.2 and OpenSSL 1.1.0l
Issues with the selection of AES assembler modes in the OpenSSL 1.1.1 series, mean that the OpenSSL 1.1.0 series serve as better illustration for the differences in AES performance for the C method no assembler and processor family assembler comparisons.
The OpenSSL available in a base install of NetBSD, is a patched OpenSSL 1.1.1k. The patch forces the use of an assembler implementation for AES from the 1.1.0 series, which appears to be related to a retaining an API for AES, from a look at the history in the git repository for NetBSD.
An issue with the 'fuzz' test required a patch from Gentoo (see repo/gentoo.git - Official Gentoo ebuild repository). A runtime issue with the Elliptic Curve methods means some ECDH methods fail to give meaningful results.
OpenSSL 1.1.0l with NetBSD 9.2 i386 on VIA C3 Nehemiah @ 1.33 GHz | ||||||
---|---|---|---|---|---|---|
Method | AES-256 CBC | IDEA CBC | MD5 | SHA-1 | SHA-256 | SHA-512 |
no-asm | 7,892.62k | 13,011.94k | 85,639.17k | 48,430.78k | 14,884.40k | 2,275.25k |
asm | 14,802.94k | 13,011.94k | 141,019.43k | 60,963.72k | 28,576.74k | 12,443.13k |
Padlock | 690,180.08k | 13,040.78k | 138,036.56k | 60,386.74k | 28,448.83k | 12,415.91k |
Z.1. NetBSD 9.2, OpenSSL 1.1.0l, no assembler compile
Downloading the source distribution from the OpenSSL site and building with the no assembler options, gives a build using the C method implementations.
Running the OpenSSL speed test gives:
OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md2 0.00 0.00 0.00 0.00 0.00 0.00 mdc2 1541.42k 1720.60k 1775.70k 1788.66k 1788.09k 1784.88k md4 7361.29k 23712.28k 60744.29k 99464.99k 122539.80k 124578.28k md5 13428.07k 32972.76k 61493.84k 78324.09k 85639.17k 85915.30k hmac(md5) 5100.78k 16146.10k 41441.04k 67855.48k 83574.73k 85289.64k sha1 9030.46k 20720.69k 36506.13k 45023.00k 48430.78k 48735.55k rmd160 4542.97k 11893.45k 23608.98k 31373.86k 34807.81k 34972.49k rc4 31293.06k 34036.09k 34675.07k 34927.52k 35026.92k 35063.95k des cbc 11425.19k 12086.47k 12309.26k 12368.63k 12388.70k 12388.70k des ede3 4235.64k 4307.92k 4337.12k 4349.43k 4346.39k 4349.11k idea cbc 11784.23k 12685.65k 12925.87k 12994.25k 13011.94k 13058.05k seed cbc 14215.47k 15110.97k 15342.14k 15461.70k 15439.61k 15442.33k rc2 cbc 7590.42k 7962.56k 8062.81k 8087.56k 8077.31k 8131.72k rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00 0.00 blowfish cbc 19871.74k 22029.69k 22586.68k 22856.02k 22913.02k 22875.88k cast cbc 12686.39k 13533.68k 13805.91k 13954.05k 13937.29k 13940.01k aes-128 cbc 10089.55k 10507.69k 10686.11k 10686.00k 10701.31k 10701.31k aes-192 cbc 8644.86k 8953.49k 9049.64k 9076.52k 9081.96k 9084.68k aes-256 cbc 7556.37k 7785.93k 7865.75k 7912.45k 7892.62k 7875.24k camellia-128 cbc 16805.35k 18147.19k 18487.71k 18603.80k 18663.51k 18637.48k camellia-192 cbc 13414.01k 14169.91k 14437.80k 14567.42k 14541.48k 14538.76k camellia-256 cbc 13417.33k 14173.21k 14465.03k 14518.69k 14541.48k 14538.76k sha256 3346.99k 7060.69k 11702.26k 14020.98k 14884.40k 15002.28k sha512 283.34k 1133.54k 1530.81k 2052.79k 2275.25k 2289.42k whirlpool 1506.65k 3072.92k 5011.39k 5958.59k 6305.93k 6330.43k aes-128 ige 9972.33k 10428.88k 10583.21k 10588.36k 10595.17k 10581.56k aes-192 ige 8552.80k 8865.68k 8974.97k 9012.22k 9011.20k 8997.59k aes-256 ige 7484.17k 7726.03k 7809.11k 7849.64k 7835.47k 7821.68k ghash 5237.51k 5369.79k 5385.95k 5396.24k 5399.64k 5399.64k sign verify sign/s verify/s rsa 512 bits 0.003018s 0.000249s 331.4 4013.7 rsa 1024 bits 0.017744s 0.000825s 56.4 1212.3 rsa 2048 bits 0.114091s 0.002937s 8.8 340.5 rsa 3072 bits 0.363929s 0.006782s 2.7 147.4 rsa 4096 bits 0.767143s 0.010638s 1.3 94.0 rsa 7680 bits 4.896667s 0.039683s 0.2 25.2 rsa 15360 bits 36.930000s 0.154923s 0.0 6.5 sign verify sign/s verify/s dsa 512 bits 0.004313s 0.003315s 231.8 301.7 dsa 1024 bits 0.011826s 0.010695s 84.6 93.5 dsa 2048 bits 0.039141s 0.036410s 25.5 27.5 sign verify sign/s verify/s 160 bit ecdsa (secp160r1) 0.0131s 0.0079s 76.3 127.1 192 bit ecdsa (nistp192) 0.0123s 0.0072s 81.0 138.6 224 bit ecdsa (nistp224) 0.0164s 0.0094s 60.9 106.8 256 bit ecdsa (nistp256) 0.0186s 0.0108s 53.7 93.0 384 bit ecdsa (nistp384) 0.0532s 0.0278s 18.8 36.0 521 bit ecdsa (nistp521) 0.1733s 0.0809s 5.8 12.4 163 bit ecdsa (nistk163) 0.0312s 0.0147s 32.1 68.1 233 bit ecdsa (nistk233) 0.0711s 0.0292s 14.1 34.2 283 bit ecdsa (nistk283) 0.1168s 0.0521s 8.6 19.2 409 bit ecdsa (nistk409) 0.3088s 0.1229s 3.2 8.1 571 bit ecdsa (nistk571) 0.8000s 0.2792s 1.2 3.6 163 bit ecdsa (nistb163) 0.0312s 0.0157s 32.1 63.5 233 bit ecdsa (nistb233) 0.0710s 0.0320s 14.1 31.2 283 bit ecdsa (nistb283) 0.1167s 0.0584s 8.6 17.1 409 bit ecdsa (nistb409) 0.3097s 0.1392s 3.2 7.2 571 bit ecdsa (nistb571) 0.7992s 0.3206s 1.3 3.1 op op/s 160 bit ecdh (secp160r1) 0.0121s 82.5 192 bit ecdh (nistp192) 0.0113s 88.2 224 bit ecdh (nistp224) 0.0151s 66.2 256 bit ecdh (nistp256) 0.0171s 58.3 384 bit ecdh (nistp384) 0.0487s 20.5 521 bit ecdh (nistp521) 0.1618s 6.2 163 bit ecdh (nistk163) 0.0071s 140.4 233 bit ecdh (nistk233) 0.0143s 69.8 283 bit ecdh (nistk283) 0.0258s 38.8 409 bit ecdh (nistk409) 0.0608s 16.5 571 bit ecdh (nistk571) 0.1389s 7.2 163 bit ecdh (nistb163) 0.0077s 129.3 233 bit ecdh (nistb233) 0.0157s 63.6 283 bit ecdh (nistb283) 0.0288s 34.7 409 bit ecdh (nistb409) 0.0692s 14.5 571 bit ecdh (nistb571) 0.1597s 6.3 253 bit ecdh (X25519) 0.0000s inf
Z.2. NetBSD 9.2, OpenSSL 1.1.0l, default compile
Downloading the source distribution from the OpenSSL site and building with the default options, gives a build with assembler methods enabled.
Running the OpenSSL speed test gives:
OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md2 0.00 0.00 0.00 0.00 0.00 0.00 mdc2 1592.23k 1790.83k 1854.81k 1863.27k 1869.74k 1867.01k md4 7049.91k 23028.14k 59961.41k 100087.13k 124314.28k 126537.82k md5 16389.98k 45268.69k 93029.04k 126255.68k 141019.43k 142159.78k hmac(md5) 6002.41k 20050.12k 56516.72k 103284.86k 136639.83k 140026.05k sha1 9861.10k 23650.08k 44021.62k 56210.43k 60963.72k 61350.19k rmd160 4569.69k 12237.65k 24478.02k 32722.07k 36279.64k 36578.23k rc4 47127.60k 55720.09k 59427.33k 60404.09k 60762.32k 60986.71k des cbc 13358.21k 14007.62k 14168.49k 14225.78k 14239.38k 14233.94k des ede3 4790.65k 4874.25k 4914.18k 4919.98k 4907.04k 4904.28k idea cbc 11787.40k 12681.63k 12929.53k 13031.77k 13011.94k 13017.42k seed cbc 14236.65k 15112.89k 15342.90k 15410.18k 15442.33k 15442.33k rc2 cbc 7615.33k 7961.03k 8063.06k 8088.58k 8093.81k 8099.47k rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00 0.00 blowfish cbc 25389.32k 27125.91k 27561.42k 27719.44k 27765.71k 27852.80k cast cbc 12761.46k 13561.98k 13844.84k 13916.53k 13891.14k 13945.58k aes-128 cbc 9093.77k 9420.40k 9632.15k 20335.08k 20480.00k 20488.16k aes-192 cbc 7528.09k 7919.27k 8037.80k 17244.84k 17355.61k 17369.22k aes-256 cbc 6517.73k 6741.56k 6814.28k 14680.62k 14802.94k 14712.05k camellia-128 cbc 15823.70k 20227.86k 21718.75k 22229.33k 22298.03k 22311.63k camellia-192 cbc 12898.81k 15586.43k 16552.14k 16812.99k 16890.22k 16890.22k camellia-256 cbc 12930.67k 15644.59k 16554.01k 16811.29k 16890.22k 16880.98k sha256 5000.94k 10959.15k 21098.41k 26459.68k 28576.74k 28745.48k sha512 1406.32k 5639.51k 8080.58k 11090.84k 12443.13k 12542.43k whirlpool 3430.66k 7301.38k 12303.90k 14892.23k 15925.25k 15948.54k aes-128 ige 8749.00k 9079.59k 9207.24k 9241.86k 9253.42k 9247.98k aes-192 ige 7313.20k 7619.67k 7726.95k 7760.63k 7772.87k 7766.02k aes-256 ige 6333.32k 6525.36k 6592.61k 6600.55k 6610.75k 6613.48k ghash 19307.14k 26654.55k 29423.33k 30269.44k 30544.46k 30563.51k sign verify sign/s verify/s rsa 512 bits 0.001787s 0.000151s 559.5 6607.0 rsa 1024 bits 0.010917s 0.000528s 91.6 1895.6 rsa 2048 bits 0.075149s 0.001950s 13.3 512.7 rsa 3072 bits 0.230227s 0.004287s 4.3 233.3 rsa 4096 bits 0.520000s 0.007532s 1.9 132.8 rsa 7680 bits 3.192500s 0.026070s 0.3 38.4 rsa 15360 bits 24.550000s 0.103402s 0.0 9.7 sign verify sign/s verify/s dsa 512 bits 0.002641s 0.002100s 378.6 476.3 dsa 1024 bits 0.007581s 0.006748s 131.9 148.2 dsa 2048 bits 0.026273s 0.024595s 38.1 40.7 sign verify sign/s verify/s 160 bit ecdsa (secp160r1) 0.0071s 0.0045s 141.2 221.2 192 bit ecdsa (nistp192) 0.0098s 0.0063s 102.3 159.5 224 bit ecdsa (nistp224) 0.0139s 0.0088s 71.7 113.0 256 bit ecdsa (nistp256) 0.0015s 0.0037s 651.4 271.7 384 bit ecdsa (nistp384) 0.0500s 0.0288s 20.0 34.7 521 bit ecdsa (nistp521) 0.1358s 0.0774s 7.4 12.9 163 bit ecdsa (nistk163) 0.0291s 0.0126s 34.4 79.1 233 bit ecdsa (nistk233) 0.0670s 0.0248s 14.9 40.3 283 bit ecdsa (nistk283) 0.1096s 0.0449s 9.1 22.3 409 bit ecdsa (nistk409) 0.2911s 0.1045s 3.4 9.6 571 bit ecdsa (nistk571) 0.7600s 0.2379s 1.3 4.2 163 bit ecdsa (nistb163) 0.0291s 0.0136s 34.4 73.6 233 bit ecdsa (nistb233) 0.0669s 0.0275s 14.9 36.3 283 bit ecdsa (nistb283) 0.1098s 0.0502s 9.1 19.9 409 bit ecdsa (nistb409) 0.2917s 0.1182s 3.4 8.5 571 bit ecdsa (nistb571) 0.7586s 0.2732s 1.3 3.7 op op/s 160 bit ecdh (secp160r1) 0.0066s 151.0 192 bit ecdh (nistp192) 0.0093s 107.1 224 bit ecdh (nistp224) 0.0133s 75.3 256 bit ecdh (nistp256) 0.0028s 355.1 384 bit ecdh (nistp384) 0.0477s 21.0 521 bit ecdh (nistp521) 0.1290s 7.8 163 bit ecdh (nistk163) 0.0061s 164.0 233 bit ecdh (nistk233) 0.0120s 83.4 283 bit ecdh (nistk283) 0.0218s 45.9 409 bit ecdh (nistk409) 0.0507s 19.7 571 bit ecdh (nistk571) 0.1161s 8.6 163 bit ecdh (nistb163) 0.0066s 151.3 233 bit ecdh (nistb233) 0.0132s 75.5 283 bit ecdh (nistb283) 0.0244s 40.9 409 bit ecdh (nistb409) 0.0577s 17.3 571 bit ecdh (nistb571) 0.1332s 7.5 253 bit ecdh (X25519) 0.0000s inf
With the assembler implementations included, various methods show performance gain.
Z.3. NetBSD 9.2, OpenSSL 1.1.0l, default compile: VIA Padlock
The default compile (see Z.2.) also supports the Padlock engine, which gives access to the VIA Padlock acceleration for AES:
$ /usr/local/bin/openssl engine padlock -c (padlock) VIA PadLock (no-RNG, ACE) [AES-128-ECB, AES-128-CBC, AES-128-CFB, AES-128-OFB, AES-128-CTR, AES-192-ECB, AES-192-CBC, AES-192-CFB, AES-192-OFB, AES-192-CTR, AES-256-ECB, AES-256-CBC, AES-256-CFB, AES-256-OFB, AES-256-CTR]
To invoke the Padlock accelerated method the engine has to be specified and the required method accessed with the EVP option:
$ /usr/local/bin/openssl speed -engine padlock -evp aes-256-cbc engine "padlock" set. Doing aes-256-cbc for 3s on 16 size blocks: 10662518 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 64 size blocks: 8597677 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 256 size blocks: 4848074 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 1024 size blocks: 1706932 aes-256-cbc's in 2.91s Doing aes-256-cbc for 3s on 8192 size blocks: 253594 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 16384 size blocks: 127179 aes-256-cbc's in 2.98s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-cbc 56677.84k 182807.75k 412327.89k 600652.36k 690180.08k 699228.43k
Note the -evp is required to use the accelerated implementation, without the option the normal version of the method is used:
$ /usr/local/bin/openssl speed -engine padlock aes-256-cbc engine "padlock" set. Doing aes-256 cbc for 3s on 16 size blocks: 1224119 aes-256 cbc's in 3.01s Doing aes-256 cbc for 3s on 64 size blocks: 317144 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 256 size blocks: 80115 aes-256 cbc's in 3.01s Doing aes-256 cbc for 3s on 1024 size blocks: 43153 aes-256 cbc's in 3.01s Doing aes-256 cbc for 3s on 8192 size blocks: 5421 aes-256 cbc's in 3.01s Doing aes-256 cbc for 3s on 16384 size blocks: 2710 aes-256 cbc's in 3.01s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256 cbc 6506.94k 6765.74k 6813.77k 14680.62k 14753.76k 14751.04k
For methods without a specific engine implementation the regular implementations are used. So using an command that will invoke the engine for each of the other methods in our comparison:
$ /usr/local/bin/openssl speed -engine padlock -evp idea engine "padlock" set. Doing idea-cbc for 3s on 16 size blocks: 2005747 idea-cbc's in 3.00s Doing idea-cbc for 3s on 64 size blocks: 580638 idea-cbc's in 3.01s Doing idea-cbc for 3s on 256 size blocks: 150989 idea-cbc's in 3.01s Doing idea-cbc for 3s on 1024 size blocks: 38130 idea-cbc's in 3.01s Doing idea-cbc for 3s on 8192 size blocks: 4712 idea-cbc's in 2.96s Doing idea-cbc for 3s on 16384 size blocks: 2350 idea-cbc's in 2.96s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes idea-cbc 10697.32k 12345.79k 12841.59k 12971.80k 13040.78k 13007.57k $ /usr/local/bin/openssl speed -engine padlock -evp md5 engine "padlock" set. Doing md5 for 3s on 16 size blocks: 1370335 md5's in 3.00s Doing md5 for 3s on 64 size blocks: 1095540 md5's in 2.90s Doing md5 for 3s on 256 size blocks: 755228 md5's in 3.00s Doing md5 for 3s on 1024 size blocks: 321409 md5's in 3.00s Doing md5 for 3s on 8192 size blocks: 50719 md5's in 3.01s Doing md5 for 3s on 16384 size blocks: 25852 md5's in 3.01s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md5 7308.45k 24177.43k 64446.12k 109707.61k 138036.56k 140717.33k $ /usr/local/bin/openssl speed -engine padlock -evp sha1 engine "padlock" set. Doing sha1 for 3s on 16 size blocks: 1036264 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 755219 sha1's in 3.01s Doing sha1 for 3s on 256 size blocks: 424166 sha1's in 3.01s Doing sha1 for 3s on 1024 size blocks: 153876 sha1's in 3.01s Doing sha1 for 3s on 8192 size blocks: 22188 sha1's in 3.01s Doing sha1 for 3s on 16384 size blocks: 11202 sha1's in 3.00s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha1 5526.74k 16057.81k 36075.25k 52348.51k 60386.74k 61177.86k $ /usr/local/bin/openssl speed -engine padlock -evp sha256 engine "padlock" set. Doing sha256 for 3s on 16 size blocks: 663483 sha256's in 3.01s Doing sha256 for 3s on 64 size blocks: 420057 sha256's in 3.01s Doing sha256 for 3s on 256 size blocks: 217137 sha256's in 2.92s Doing sha256 for 3s on 1024 size blocks: 74607 sha256's in 2.98s Doing sha256 for 3s on 8192 size blocks: 10453 sha256's in 3.01s Doing sha256 for 3s on 16384 size blocks: 5268 sha256's in 3.01s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha256 3526.82k 8931.44k 19036.67k 25636.77k 28448.83k 28674.72k $ /usr/local/bin/openssl speed -engine padlock -evp sha512 engine "padlock" set. Doing sha512 for 3s on 16 size blocks: 234871 sha512's in 3.00s Doing sha512 for 3s on 64 size blocks: 235262 sha512's in 3.01s Doing sha512 for 3s on 256 size blocks: 90985 sha512's in 3.01s Doing sha512 for 3s on 1024 size blocks: 32115 sha512's in 3.01s Doing sha512 for 3s on 8192 size blocks: 4562 sha512's in 3.01s Doing sha512 for 3s on 16384 size blocks: 2286 sha512's in 2.98s OpenSSL 1.1.0l 10 Sep 2019 built on: reproducible build, date unspecified options:bn(64,32) rc4(4x,int) des(long) aes(partial) idea(int) blowfish(ptr) compiler: cc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha512 1252.65k 5002.25k 7738.26k 10925.50k 12415.91k 12568.40k
Unsurprisingly these show the same performance as the regular methods, since the Padlock engine can only accelerate AES on this platform.
Further Sources
- How to check if AES-NI is enabled for OpenSSL on Linux
- repo/gentoo.git - Official Gentoo ebuild repository: a patch for test/recipes/90-test_fuzz.t which addresses a test failure
- OpenSSL Performance with a VIA Eden (padlock)
- Linux Kernel Crypto API — The Linux Kernel documentation
- VIA PadLock support for Linux
- VIA Padlock - Crypto++ Wiki
- Enable padlock and viadrm on NetBSD | NetBSD
No comments:
Post a Comment