macOS Arm64 Apple M4 Cipher Benchmarks
Machine Profile
Machine Specification
The benchmarks were run on the following machine:
BenchmarkDotNet v0.15.8, macOS Tahoe 26.3.1 (a) (25D771280a) [Darwin 25.3.0]
Apple M4, 1 CPU, 10 logical and 10 physical cores
.NET SDK 10.0.201
[Host] : .NET 10.0.5 (10.0.5, 10.0.526.15411), Arm64 RyuJIT armv8.0-a
.NET 10.0 : .NET 10.0.5 (10.0.5, 10.0.526.15411), Arm64 RyuJIT armv8.0-a
Method=TryComputeHash Job=.NET 10.0 Runtime=.NET 10.0
Toolchain=net10.0
Note: Results are machine-specific and may vary between systems. Run benchmarks locally for your specific hardware.
BenchmarkDotNet measurements for all cipher algorithm implementations in CryptoHives.Foundation.Security.Cryptography. Each algorithm is benchmarked across representative payload sizes (17 bytes through 128 KiB) to capture both latency and throughput characteristics.
Implementation Variants
Each cipher family exposes multiple acceleration tiers. The runtime automatically selects the fastest tier supported by the host CPU via SimdSupport detection. Callers can also force a specific tier through the Create(SimdSupport) factory for testing or compatibility.
AES Family
| Variant | Instructions | .NET Target | When Selected | Description |
|---|---|---|---|---|
| Managed | Scalar | All | No ARM Crypto support | T-table AES using scalar uint arithmetic. Fully portable, zero-allocation. ~10–16× slower than ArmAes depending on mode and payload size. |
| ArmAes | AES (ARM Crypto Ext.) | .NET 8+ | ArmBase.IsSupported |
Hardware AES round instructions (AESD, AESE, AESMC, AESIMC). For CBC, uses 8-block interleaved decrypt for maximum instruction-level parallelism — all 8 plaintext blocks decoded simultaneously via parallel AESD dispatch. For GCM/CCM, accelerates counter-mode encryption and CBC-MAC. Decrypt is ~8.5× faster than OS at 128 B; at bulk sizes Apple CommonCrypto leads via Apple Silicon–specific AES pipelining. |
| ArmAes+ArmPmull | AES + PMULL (ARM Crypto Ext.) | .NET 8+ | AdvSimd.Arm64.IsSupported |
Adds carry-less polynomial multiplication (PMULL/PMULL2) for hardware-accelerated GHASH over GF(2¹²⁸). PMULL operates on 64-bit polynomial operands to produce 128-bit products; PMULL2 reads from the upper halves of 128-bit NEON registers (a free lane-select requiring no additional instruction). Uses the same 8-block stitched AES+GHASH pipeline as the x86 PClMul path. Modular reduction uses a 2-PMULL SymCrypt-style MODREDUCE. Pre-computes Karatsuba cross-term halves for H¹–H⁸ powers. ~32× faster than OS at 17 B; OS leads at ≥8 KiB due to Apple Silicon–specific bulk AES acceleration. |
ChaCha20 Family
| Variant | Instructions | .NET Target | When Selected | Description |
|---|---|---|---|---|
| Managed | Scalar | All | No NEON support | Quarter-round operations using scalar uint arithmetic. Fully portable. ~4× slower than Neon at all payload sizes. |
| Neon | AdvSIMD (NEON) | .NET 8+ | AdvSimd.IsSupported |
Maps the 4×4 ChaCha state to four Vector128<uint> rows. Uses ARM NEON shift-left, shift-right, and byte-table permute instructions for the 16-bit, 12-bit, 8-bit, and 7-bit rotations. Diagonal rounds use AdvSimd.ExtractVector128 to rotate rows by one element. Processes one 64-byte keystream block per iteration. ~4× faster than Managed; ~1.24× faster than BouncyCastle at 128 KiB. |
When to Use Each Variant
- Small messages (≤256 B): AES-GCM with ArmAes+ArmPmull is ~32× faster than OS at 17 B and ~14× at 128 B — zero P/Invoke overhead eliminates the ~1.7–1.9 μs kernel transition cost entirely. ChaCha20-Poly1305 NEON is ~3× faster than OS at 128 B.
- Medium messages (256 B–4 KB): ArmAes+ArmPmull leads through ~1 KiB. ChaCha20-Poly1305 NEON remains competitive at 1 KiB (~1.25× faster than OS). This range covers QUIC (~1.4 KB), WireGuard (~1.4 KB), and IPsec packets.
- Large messages (8 KB–128 KB): Apple CommonCrypto dominates — OS is ~2× faster for AES-GCM and ~1.7× faster for ChaCha20-Poly1305. This is likely due to Apple Silicon–specific AES/PMULL micro-architectural pipelining that .NET's current ARMv8 paths do not yet fully exploit. This range covers TLS records (1–16 KB) and OPC UA chunks (8 KB default).
- No hardware AES: Use ChaCha20-Poly1305 NEON — it outperforms Managed AES-GCM by 3–10× depending on payload size and is always zero-allocation.
- IoT / constrained devices: AES-CCM with ArmAes provides ~4× speedup over BouncyCastle at 128 KiB. Supports variable nonce (7–13 bytes) and tag sizes.
Highlights
| Family | Leader | Key Insight |
|---|---|---|
| ChaCha20 | Neon | NEON ~4× faster than Managed; ~1.24× faster than BouncyCastle at 128 KiB; zero allocation |
| ChaCha20-Poly1305 | Neon | ~3× faster than OS at 128 B; OS leads at ≥8 KiB; zero allocation |
| XChaCha20-Poly1305 | Neon | ~3.3× faster than Managed at 128 KiB; zero allocation |
| AES-CBC | ArmAes | Decrypt ~8.5× faster than OS at 128 B; OS leads at ≥8 KiB (Apple Silicon bulk path); zero allocation |
| AES-GCM | ArmAes+ArmPmull | ~32× faster than OS at 17 B; ~14× at 128 B; OS leads at ≥8 KiB; 8-block stitched AES+GHASH pipeline |
| AES-CCM | ArmAes | ~4× faster than BouncyCastle at 128 KiB; zero allocation; no OS adapter available |
Stream Ciphers
ChaCha20
ChaCha20 is a stream cipher designed by Daniel J. Bernstein. Two acceleration tiers are available on ARM:
- Neon: Single-block processing — maps the 4×4 ChaCha state matrix to four
Vector128<uint>rows. Uses ARM NEONvshl/vsri(shift-and-insert) andvtbl(byte-table permute) instructions for the four rotation widths (16-bit, 12-bit, 8-bit, 7-bit). Diagonal rounds useAdvSimd.ExtractVector128to rotate rows by one element. Yields ~750 MB/s throughput at 128 KiB; ~1.24× faster than BouncyCastle. - Managed: Scalar
uintquarter-round arithmetic. Fully portable across all .NET targets. ~4.1× slower than Neon at 128 KiB.
Key observations:
- Neon is the fastest at all sizes; ~1.24× faster than BouncyCastle at 128 KiB; ~1.35× at 1 KiB
- BouncyCastle allocates 96 B per call; NaCl.Core allocates 24 B per call
- Managed and Neon paths are zero-allocation
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · ChaCha20 (Neon) | 128B | 170.2 ns | 0.05 ns | 0.05 ns | - |
| Decrypt · ChaCha20 (BouncyCastle) | 128B | 304.1 ns | 1.49 ns | 1.32 ns | 96 B |
| Decrypt · ChaCha20 (NaCl.Core) | 128B | 521.2 ns | 0.19 ns | 0.18 ns | 24 B |
| Decrypt · ChaCha20 (Managed) | 128B | 692.3 ns | 2.27 ns | 2.12 ns | - |
| Encrypt · ChaCha20 (Neon) | 128B | 170.2 ns | 0.07 ns | 0.06 ns | - |
| Encrypt · ChaCha20 (BouncyCastle) | 128B | 299.9 ns | 4.36 ns | 4.08 ns | 96 B |
| Encrypt · ChaCha20 (NaCl.Core) | 128B | 521.1 ns | 0.20 ns | 0.19 ns | 24 B |
| Encrypt · ChaCha20 (Managed) | 128B | 698.0 ns | 2.51 ns | 2.35 ns | - |
| Decrypt · ChaCha20 (Neon) | 1KB | 1,336.0 ns | 0.72 ns | 0.64 ns | - |
| Decrypt · ChaCha20 (BouncyCastle) | 1KB | 1,812.7 ns | 22.35 ns | 19.81 ns | 96 B |
| Decrypt · ChaCha20 (NaCl.Core) | 1KB | 2,935.6 ns | 0.94 ns | 0.78 ns | 24 B |
| Decrypt · ChaCha20 (Managed) | 1KB | 5,466.8 ns | 13.05 ns | 11.57 ns | - |
| Encrypt · ChaCha20 (Neon) | 1KB | 1,335.9 ns | 0.90 ns | 0.84 ns | - |
| Encrypt · ChaCha20 (BouncyCastle) | 1KB | 1,868.0 ns | 36.32 ns | 37.29 ns | 96 B |
| Encrypt · ChaCha20 (NaCl.Core) | 1KB | 2,935.4 ns | 0.89 ns | 0.83 ns | 24 B |
| Encrypt · ChaCha20 (Managed) | 1KB | 5,495.7 ns | 17.69 ns | 16.55 ns | - |
| Decrypt · ChaCha20 (Neon) | 8KB | 10,652.9 ns | 2.54 ns | 2.12 ns | - |
| Decrypt · ChaCha20 (BouncyCastle) | 8KB | 13,452.3 ns | 196.17 ns | 183.50 ns | 96 B |
| Decrypt · ChaCha20 (NaCl.Core) | 8KB | 22,244.7 ns | 11.08 ns | 10.37 ns | 24 B |
| Decrypt · ChaCha20 (Managed) | 8KB | 43,589.3 ns | 170.58 ns | 159.56 ns | - |
| Encrypt · ChaCha20 (Neon) | 8KB | 10,655.7 ns | 5.51 ns | 5.15 ns | - |
| Encrypt · ChaCha20 (BouncyCastle) | 8KB | 13,947.8 ns | 22.50 ns | 21.05 ns | 96 B |
| Encrypt · ChaCha20 (NaCl.Core) | 8KB | 22,251.6 ns | 5.66 ns | 4.72 ns | 24 B |
| Encrypt · ChaCha20 (Managed) | 8KB | 43,758.3 ns | 189.25 ns | 177.03 ns | - |
| Decrypt · ChaCha20 (Neon) | 128KB | 170,370.4 ns | 30.76 ns | 27.27 ns | - |
| Decrypt · ChaCha20 (BouncyCastle) | 128KB | 211,614.5 ns | 273.29 ns | 255.64 ns | 96 B |
| Decrypt · ChaCha20 (NaCl.Core) | 128KB | 353,412.1 ns | 32.71 ns | 27.31 ns | 24 B |
| Decrypt · ChaCha20 (Managed) | 128KB | 697,996.7 ns | 2,113.03 ns | 1,976.53 ns | - |
| Encrypt · ChaCha20 (Neon) | 128KB | 170,355.3 ns | 79.47 ns | 66.36 ns | - |
| Encrypt · ChaCha20 (BouncyCastle) | 128KB | 212,054.6 ns | 185.30 ns | 173.33 ns | 96 B |
| Encrypt · ChaCha20 (NaCl.Core) | 128KB | 353,326.8 ns | 199.21 ns | 186.34 ns | 24 B |
| Encrypt · ChaCha20 (Managed) | 128KB | 699,802.4 ns | 3,056.58 ns | 2,859.13 ns | - |
Block Ciphers
AES-128-CBC
AES-CBC (Cipher Block Chaining) is the most widely deployed AES mode. Two acceleration tiers are available on Apple M4:
- ArmAes: Uses ARM Cryptography Extension
AESD/AESE/AESMC/AESIMCinstructions. Decrypt uses 8-block interleaving — 8 ciphertext blocks are loaded and decrypted simultaneously via parallelAESDdispatch. Each block decrypts independently, requiring only the preceding ciphertext block as an XOR mask (10 rounds × 8 blocks = 80AESDinstructions in flight). Encrypt remains serial because each plaintext block must be XORed with the previous ciphertext before the nextAESEcan proceed. - Managed: T-table AES using four 256-entry lookup tables per round. Fully portable, zero-allocation. Comparable to BouncyCastle at large sizes.
Key observations:
- ArmAes Decrypt: ~8.5× faster than OS at 128 B; near OS at 4 KiB; OS leads from ~8 KiB (Apple Silicon uses a wider AES pipeline at bulk sizes)
- ArmAes Encrypt: ~1.5× faster than OS at 128 B; OS leads from 1 KiB (CBC encrypt is inherently serial; CommonCrypto uses NEON-assisted interleaving for partial parallelism)
- Managed: Zero-allocation T-table AES; comparable to BouncyCastle at large sizes
- OS: Allocates 72 B per call (P/Invoke marshalling overhead)
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-128-CBC (ArmAes) | 128B | 22.08 ns | 0.068 ns | 0.061 ns | - |
| Decrypt · AES-128-CBC (OS) | 128B | 187.81 ns | 1.061 ns | 0.828 ns | 72 B |
| Decrypt · AES-128-CBC (Managed) | 128B | 386.28 ns | 0.583 ns | 0.545 ns | - |
| Decrypt · AES-128-CBC (BouncyCastle) | 128B | 604.78 ns | 1.863 ns | 1.652 ns | 832 B |
| Encrypt · AES-128-CBC (ArmAes) | 128B | 129.02 ns | 0.636 ns | 0.564 ns | - |
| Encrypt · AES-128-CBC (OS) | 128B | 192.84 ns | 1.322 ns | 1.104 ns | 72 B |
| Encrypt · AES-128-CBC (Managed) | 128B | 428.98 ns | 0.876 ns | 0.777 ns | - |
| Encrypt · AES-128-CBC (BouncyCastle) | 128B | 558.80 ns | 3.133 ns | 2.931 ns | 832 B |
| Decrypt · AES-128-CBC (ArmAes) | 1KB | 87.13 ns | 0.535 ns | 0.500 ns | - |
| Decrypt · AES-128-CBC (OS) | 1KB | 229.65 ns | 2.508 ns | 2.346 ns | 72 B |
| Decrypt · AES-128-CBC (Managed) | 1KB | 2,702.80 ns | 1.275 ns | 1.193 ns | - |
| Decrypt · AES-128-CBC (BouncyCastle) | 1KB | 3,378.77 ns | 3.735 ns | 3.494 ns | 832 B |
| Encrypt · AES-128-CBC (OS) | 1KB | 541.56 ns | 1.345 ns | 1.192 ns | 72 B |
| Encrypt · AES-128-CBC (ArmAes) | 1KB | 914.69 ns | 5.226 ns | 4.888 ns | - |
| Encrypt · AES-128-CBC (Managed) | 1KB | 3,112.53 ns | 6.331 ns | 5.922 ns | - |
| Encrypt · AES-128-CBC (BouncyCastle) | 1KB | 3,241.94 ns | 3.052 ns | 2.549 ns | 832 B |
| Decrypt · AES-128-CBC (OS) | 8KB | 560.77 ns | 4.027 ns | 3.767 ns | 72 B |
| Decrypt · AES-128-CBC (ArmAes) | 8KB | 610.05 ns | 3.382 ns | 3.164 ns | - |
| Decrypt · AES-128-CBC (Managed) | 8KB | 21,239.52 ns | 7.107 ns | 6.648 ns | - |
| Decrypt · AES-128-CBC (BouncyCastle) | 8KB | 25,311.13 ns | 45.050 ns | 42.140 ns | 832 B |
| Encrypt · AES-128-CBC (OS) | 8KB | 3,286.63 ns | 25.573 ns | 22.669 ns | 72 B |
| Encrypt · AES-128-CBC (ArmAes) | 8KB | 7,177.20 ns | 18.455 ns | 16.360 ns | - |
| Encrypt · AES-128-CBC (Managed) | 8KB | 24,539.18 ns | 25.378 ns | 21.192 ns | - |
| Encrypt · AES-128-CBC (BouncyCastle) | 8KB | 24,691.65 ns | 18.235 ns | 17.057 ns | 832 B |
| Decrypt · AES-128-CBC (OS) | 128KB | 6,436.87 ns | 25.556 ns | 23.905 ns | 72 B |
| Decrypt · AES-128-CBC (ArmAes) | 128KB | 9,613.76 ns | 33.784 ns | 31.601 ns | - |
| Decrypt · AES-128-CBC (Managed) | 128KB | 341,935.68 ns | 504.550 ns | 471.956 ns | - |
| Decrypt · AES-128-CBC (BouncyCastle) | 128KB | 402,159.16 ns | 961.030 ns | 898.948 ns | 832 B |
| Encrypt · AES-128-CBC (OS) | 128KB | 50,556.34 ns | 34.506 ns | 30.589 ns | 72 B |
| Encrypt · AES-128-CBC (ArmAes) | 128KB | 119,683.72 ns | 506.758 ns | 395.644 ns | - |
| Encrypt · AES-128-CBC (Managed) | 128KB | 393,501.61 ns | 265.260 ns | 221.504 ns | - |
| Encrypt · AES-128-CBC (BouncyCastle) | 128KB | 393,912.18 ns | 499.123 ns | 466.880 ns | 832 B |
AES-256-CBC
AES-256-CBC uses 14 rounds (vs 10 for AES-128), adding ~25-30% overhead. The same 8-block interleaved decrypt and serial encrypt architecture applies via ArmAes. Decrypt is ~1.65× faster than OS at 128 B; OS leads from ~8 KiB. Encrypt is slower than OS from 1 KiB (serial CBC encrypt bottleneck on Apple Silicon).
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-256-CBC (ArmAes) | 128B | 24.49 ns | 0.058 ns | 0.054 ns | - |
| Decrypt · AES-256-CBC (OS) | 128B | 223.79 ns | 2.652 ns | 2.481 ns | 72 B |
| Decrypt · AES-256-CBC (Managed) | 128B | 518.96 ns | 0.478 ns | 0.447 ns | - |
| Decrypt · AES-256-CBC (BouncyCastle) | 128B | 795.90 ns | 0.699 ns | 0.584 ns | 1024 B |
| Encrypt · AES-256-CBC (ArmAes) | 128B | 147.67 ns | 0.518 ns | 0.484 ns | - |
| Encrypt · AES-256-CBC (OS) | 128B | 244.19 ns | 1.642 ns | 1.536 ns | 72 B |
| Encrypt · AES-256-CBC (Managed) | 128B | 568.90 ns | 0.157 ns | 0.140 ns | - |
| Encrypt · AES-256-CBC (BouncyCastle) | 128B | 727.30 ns | 3.226 ns | 3.018 ns | 1024 B |
| Decrypt · AES-256-CBC (ArmAes) | 1KB | 105.71 ns | 0.536 ns | 0.501 ns | - |
| Decrypt · AES-256-CBC (OS) | 1KB | 278.84 ns | 1.092 ns | 1.022 ns | 72 B |
| Decrypt · AES-256-CBC (Managed) | 1KB | 3,658.68 ns | 1.154 ns | 0.963 ns | - |
| Decrypt · AES-256-CBC (BouncyCastle) | 1KB | 4,423.07 ns | 2.201 ns | 1.951 ns | 1024 B |
| Encrypt · AES-256-CBC (OS) | 1KB | 726.35 ns | 1.198 ns | 1.121 ns | 72 B |
| Encrypt · AES-256-CBC (ArmAes) | 1KB | 1,089.55 ns | 3.812 ns | 3.379 ns | - |
| Encrypt · AES-256-CBC (Managed) | 1KB | 4,093.74 ns | 2.524 ns | 2.361 ns | - |
| Encrypt · AES-256-CBC (BouncyCastle) | 1KB | 4,264.13 ns | 4.516 ns | 3.771 ns | 1024 B |
| Decrypt · AES-256-CBC (OS) | 8KB | 713.66 ns | 3.539 ns | 3.137 ns | 72 B |
| Decrypt · AES-256-CBC (ArmAes) | 8KB | 750.17 ns | 2.570 ns | 2.404 ns | - |
| Decrypt · AES-256-CBC (Managed) | 8KB | 28,820.45 ns | 3.599 ns | 3.005 ns | - |
| Decrypt · AES-256-CBC (BouncyCastle) | 8KB | 33,282.54 ns | 35.367 ns | 31.352 ns | 1024 B |
| Encrypt · AES-256-CBC (OS) | 8KB | 4,420.51 ns | 3.907 ns | 3.463 ns | 72 B |
| Encrypt · AES-256-CBC (ArmAes) | 8KB | 8,531.38 ns | 45.876 ns | 42.912 ns | - |
| Encrypt · AES-256-CBC (Managed) | 8KB | 32,252.99 ns | 15.854 ns | 14.830 ns | - |
| Encrypt · AES-256-CBC (BouncyCastle) | 8KB | 32,451.74 ns | 19.617 ns | 18.350 ns | 1024 B |
| Decrypt · AES-256-CBC (OS) | 128KB | 8,453.37 ns | 31.046 ns | 29.040 ns | 72 B |
| Decrypt · AES-256-CBC (ArmAes) | 128KB | 11,843.55 ns | 24.484 ns | 22.902 ns | - |
| Decrypt · AES-256-CBC (Managed) | 128KB | 461,650.22 ns | 193.805 ns | 171.803 ns | - |
| Decrypt · AES-256-CBC (BouncyCastle) | 128KB | 527,391.86 ns | 2,011.466 ns | 1,881.527 ns | 1024 B |
| Encrypt · AES-256-CBC (OS) | 128KB | 68,785.66 ns | 68.572 ns | 60.787 ns | 72 B |
| Encrypt · AES-256-CBC (ArmAes) | 128KB | 136,660.36 ns | 528.923 ns | 494.755 ns | - |
| Encrypt · AES-256-CBC (Managed) | 128KB | 515,053.11 ns | 540.111 ns | 505.221 ns | - |
| Encrypt · AES-256-CBC (BouncyCastle) | 128KB | 518,864.32 ns | 247.465 ns | 231.479 ns | 1024 B |
AEAD Ciphers (Authenticated Encryption)
Authenticated Encryption with Associated Data (AEAD) ciphers provide both confidentiality and authenticity in a single operation. All CryptoHives AEAD implementations are zero-allocation.
AES-128-GCM
AES-GCM combines counter-mode AES encryption (GCTR) with GHASH polynomial authentication over GF(2¹²⁸). Two acceleration tiers are available on Apple M4:
- ArmAes+ArmPmull (.NET 8+): Uses ARM Cryptography Extension
AESD/AESEfor counter-mode encryption andPMULL/PMULL2for GHASH polynomial multiplication.PMULLoperates on 64-bit polynomial operands to produce 128-bit products;PMULL2reads from the upper halves of 128-bit NEON registers (a free lane-select requiring no additional instruction). Uses an 8-block stitched loop that interleaves AES rounds with lagged GHASH of the previous 8 blocks. Modular reduction uses a 2-PMULL SymCrypt-styleMODREDUCE. Pre-computes Karatsuba cross-term halves for H¹–H⁸ powers. Small payloads use the non-stitched path (≤8 blocks). ~32× faster than OS at 17 B; ~14× at 128 B. At bulk sizes (≥8 KiB), Apple CommonCrypto leads — likely due to Apple Silicon–specific AES pipelining not yet accessible to the .NET ARM intrinsics layer. - Managed: Scalar T-table AES with 4-bit Shoup table GHASH (16-entry reduction table, byte-by-byte multiplication). Fully portable, zero-allocation.
Key observations:
- ArmAes+ArmPmull: ~32× faster than OS at 17 B encrypt; ~14× at 128 B; ~2.5× at 1 KiB; OS leads from ~4–8 KiB
- ArmAes+ArmPmull at 128 KiB: OS is ~4.8× faster for both encrypt and decrypt
- Managed: Uses 4-bit Shoup table GHASH, T-table AES; zero allocation
- BouncyCastle: Uses ARM AES + PMULL internally on ARM64; allocates ~1.5 KB per call
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-128-GCM (ArmAes+ArmPmull) | 17B | 83.46 ns | 0.140 ns | 0.131 ns | - |
| Decrypt · AES-128-GCM (Managed) | 17B | 349.10 ns | 0.676 ns | 0.599 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 17B | 571.92 ns | 1.796 ns | 1.592 ns | 1536 B |
| Decrypt · AES-128-GCM (OS) | 17B | 1,876.11 ns | 7.974 ns | 7.459 ns | - |
| Encrypt · AES-128-GCM (ArmAes+ArmPmull) | 17B | 52.02 ns | 0.134 ns | 0.125 ns | - |
| Encrypt · AES-128-GCM (Managed) | 17B | 313.23 ns | 0.600 ns | 0.561 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 17B | 489.69 ns | 2.913 ns | 2.725 ns | 1520 B |
| Encrypt · AES-128-GCM (OS) | 17B | 1,667.75 ns | 5.798 ns | 4.841 ns | - |
| Decrypt · AES-128-GCM (ArmAes+ArmPmull) | 65B | 117.63 ns | 0.446 ns | 0.417 ns | - |
| Decrypt · AES-128-GCM (Managed) | 65B | 608.52 ns | 0.821 ns | 0.768 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 65B | 770.15 ns | 2.003 ns | 1.874 ns | 1536 B |
| Decrypt · AES-128-GCM (OS) | 65B | 1,878.06 ns | 10.164 ns | 9.507 ns | - |
| Encrypt · AES-128-GCM (ArmAes+ArmPmull) | 65B | 81.43 ns | 0.412 ns | 0.385 ns | - |
| Encrypt · AES-128-GCM (Managed) | 65B | 571.55 ns | 0.691 ns | 0.647 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 65B | 700.34 ns | 1.246 ns | 1.165 ns | 1520 B |
| Encrypt · AES-128-GCM (OS) | 65B | 1,670.34 ns | 8.758 ns | 8.192 ns | - |
| Decrypt · AES-128-GCM (ArmAes+ArmPmull) | 128B | 150.28 ns | 0.719 ns | 0.673 ns | - |
| Decrypt · AES-128-GCM (Managed) | 128B | 865.62 ns | 1.100 ns | 0.918 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 128B | 973.42 ns | 1.716 ns | 1.521 ns | 1536 B |
| Decrypt · AES-128-GCM (OS) | 128B | 1,892.75 ns | 15.790 ns | 14.770 ns | - |
| Encrypt · AES-128-GCM (ArmAes+ArmPmull) | 128B | 114.82 ns | 0.541 ns | 0.506 ns | - |
| Encrypt · AES-128-GCM (Managed) | 128B | 834.97 ns | 0.197 ns | 0.174 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 128B | 916.07 ns | 1.968 ns | 1.840 ns | 1520 B |
| Encrypt · AES-128-GCM (OS) | 128B | 1,675.67 ns | 7.713 ns | 7.215 ns | - |
| Decrypt · AES-128-GCM (ArmAes+ArmPmull) | 152B | 180.33 ns | 0.965 ns | 0.903 ns | - |
| Decrypt · AES-128-GCM (Managed) | 152B | 1,049.62 ns | 2.411 ns | 2.138 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 152B | 1,097.85 ns | 1.020 ns | 0.954 ns | 1536 B |
| Decrypt · AES-128-GCM (OS) | 152B | 1,915.24 ns | 23.454 ns | 20.792 ns | - |
| Encrypt · AES-128-GCM (ArmAes+ArmPmull) | 152B | 141.78 ns | 0.931 ns | 0.871 ns | - |
| Encrypt · AES-128-GCM (Managed) | 152B | 998.72 ns | 1.057 ns | 0.937 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 152B | 1,044.79 ns | 2.014 ns | 1.884 ns | 1520 B |
| Encrypt · AES-128-GCM (OS) | 152B | 1,693.58 ns | 9.880 ns | 9.242 ns | - |
| Decrypt · AES-128-GCM (ArmAes+ArmPmull) | 256B | 245.16 ns | 2.051 ns | 1.919 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 256B | 1,505.57 ns | 21.598 ns | 20.202 ns | 1536 B |
| Decrypt · AES-128-GCM (Managed) | 256B | 1,576.52 ns | 7.472 ns | 6.623 ns | - |
| Decrypt · AES-128-GCM (OS) | 256B | 1,928.80 ns | 9.036 ns | 8.011 ns | - |
| Encrypt · AES-128-GCM (ArmAes+ArmPmull) | 256B | 206.31 ns | 0.522 ns | 0.488 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 256B | 1,457.01 ns | 2.930 ns | 2.741 ns | 1520 B |
| Encrypt · AES-128-GCM (Managed) | 256B | 1,537.20 ns | 0.764 ns | 0.677 ns | - |
| Encrypt · AES-128-GCM (OS) | 256B | 1,699.98 ns | 9.487 ns | 8.874 ns | - |
| Decrypt · AES-128-GCM (ArmAes+ArmPmull) | 1KB | 825.49 ns | 8.153 ns | 7.626 ns | - |
| Decrypt · AES-128-GCM (OS) | 1KB | 2,043.39 ns | 11.923 ns | 11.152 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 1KB | 4,507.99 ns | 4.555 ns | 4.038 ns | 1536 B |
| Decrypt · AES-128-GCM (Managed) | 1KB | 5,632.19 ns | 3.065 ns | 2.559 ns | - |
| Encrypt · AES-128-GCM (ArmAes+ArmPmull) | 1KB | 773.43 ns | 0.084 ns | 0.070 ns | - |
| Encrypt · AES-128-GCM (OS) | 1KB | 1,846.66 ns | 8.340 ns | 7.802 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 1KB | 4,720.98 ns | 2.953 ns | 2.618 ns | 1520 B |
| Encrypt · AES-128-GCM (Managed) | 1KB | 5,488.35 ns | 1.785 ns | 1.669 ns | - |
| Decrypt · AES-128-GCM (OS) | 8KB | 2,980.11 ns | 8.988 ns | 7.968 ns | - |
| Decrypt · AES-128-GCM (ArmAes+ArmPmull) | 8KB | 6,154.80 ns | 54.469 ns | 48.286 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 8KB | 32,353.34 ns | 14.646 ns | 12.230 ns | 1536 B |
| Decrypt · AES-128-GCM (Managed) | 8KB | 43,234.55 ns | 74.420 ns | 65.972 ns | - |
| Encrypt · AES-128-GCM (OS) | 8KB | 2,759.67 ns | 23.114 ns | 21.621 ns | - |
| Encrypt · AES-128-GCM (ArmAes+ArmPmull) | 8KB | 6,031.61 ns | 1.374 ns | 1.218 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 8KB | 34,591.45 ns | 15.798 ns | 14.004 ns | 1520 B |
| Encrypt · AES-128-GCM (Managed) | 8KB | 42,968.62 ns | 31.414 ns | 29.385 ns | - |
| Decrypt · AES-128-GCM (OS) | 128KB | 20,417.54 ns | 92.491 ns | 81.991 ns | - |
| Decrypt · AES-128-GCM (ArmAes+ArmPmull) | 128KB | 98,744.15 ns | 770.397 ns | 720.630 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 128KB | 509,846.38 ns | 3,538.195 ns | 2,954.553 ns | 1536 B |
| Decrypt · AES-128-GCM (Managed) | 128KB | 687,721.49 ns | 520.607 ns | 434.731 ns | - |
| Encrypt · AES-128-GCM (OS) | 128KB | 20,606.77 ns | 188.091 ns | 175.940 ns | - |
| Encrypt · AES-128-GCM (ArmAes+ArmPmull) | 128KB | 97,818.19 ns | 888.611 ns | 742.030 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 128KB | 548,420.82 ns | 578.240 ns | 540.886 ns | 1520 B |
| Encrypt · AES-128-GCM (Managed) | 128KB | 686,949.55 ns | 4,014.906 ns | 3,352.628 ns | - |
AES-192-GCM
AES-192-GCM uses 12 rounds (vs 10 for AES-128), adding ~10-15% overhead. The same ArmAes+ArmPmull pipeline applies. The performance pattern mirrors AES-128-GCM: dominant over OS at small payloads, OS leads at bulk sizes.
| Description | TestDataSize | Mean | Error | StdDev | Median | Allocated |
|---|---|---|---|---|---|---|
| Decrypt · AES-192-GCM (ArmAes+ArmPmull) | 17B | 88.64 ns | 0.816 ns | 1.386 ns | 88.48 ns | - |
| Decrypt · AES-192-GCM (Managed) | 17B | 384.55 ns | 2.477 ns | 2.317 ns | 383.82 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 17B | 644.88 ns | 2.070 ns | 1.936 ns | 644.89 ns | 1640 B |
| Decrypt · AES-192-GCM (OS) | 17B | 1,961.31 ns | 9.998 ns | 9.352 ns | 1,963.25 ns | - |
| Encrypt · AES-192-GCM (ArmAes+ArmPmull) | 17B | 54.83 ns | 0.363 ns | 0.339 ns | 54.80 ns | - |
| Encrypt · AES-192-GCM (Managed) | 17B | 337.99 ns | 0.361 ns | 0.301 ns | 337.96 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 17B | 539.33 ns | 1.369 ns | 1.213 ns | 539.13 ns | 1624 B |
| Encrypt · AES-192-GCM (OS) | 17B | 1,724.67 ns | 11.283 ns | 10.002 ns | 1,725.28 ns | - |
| Decrypt · AES-192-GCM (ArmAes+ArmPmull) | 65B | 127.00 ns | 0.526 ns | 0.466 ns | 126.98 ns | - |
| Decrypt · AES-192-GCM (Managed) | 65B | 678.65 ns | 3.988 ns | 3.535 ns | 678.16 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 65B | 875.62 ns | 6.169 ns | 5.469 ns | 873.15 ns | 1640 B |
| Decrypt · AES-192-GCM (OS) | 65B | 1,961.01 ns | 8.769 ns | 8.203 ns | 1,960.41 ns | - |
| Encrypt · AES-192-GCM (ArmAes+ArmPmull) | 65B | 85.93 ns | 0.357 ns | 0.334 ns | 86.04 ns | - |
| Encrypt · AES-192-GCM (Managed) | 65B | 621.38 ns | 5.008 ns | 4.439 ns | 620.10 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 65B | 768.43 ns | 1.952 ns | 1.630 ns | 767.94 ns | 1624 B |
| Encrypt · AES-192-GCM (OS) | 65B | 1,699.25 ns | 6.290 ns | 5.884 ns | 1,697.84 ns | - |
| Decrypt · AES-192-GCM (ArmAes+ArmPmull) | 128B | 165.14 ns | 0.953 ns | 0.892 ns | 165.07 ns | - |
| Decrypt · AES-192-GCM (Managed) | 128B | 963.21 ns | 5.582 ns | 4.948 ns | 964.87 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 128B | 1,123.41 ns | 2.551 ns | 2.386 ns | 1,123.69 ns | 1640 B |
| Decrypt · AES-192-GCM (OS) | 128B | 2,013.48 ns | 25.533 ns | 23.884 ns | 2,016.24 ns | - |
| Encrypt · AES-192-GCM (ArmAes+ArmPmull) | 128B | 119.79 ns | 0.434 ns | 0.406 ns | 119.74 ns | - |
| Encrypt · AES-192-GCM (Managed) | 128B | 900.18 ns | 1.336 ns | 1.185 ns | 899.74 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 128B | 1,011.73 ns | 1.451 ns | 1.357 ns | 1,011.96 ns | 1624 B |
| Encrypt · AES-192-GCM (OS) | 128B | 1,721.26 ns | 9.321 ns | 8.719 ns | 1,720.09 ns | - |
| Decrypt · AES-192-GCM (ArmAes+ArmPmull) | 152B | 197.90 ns | 0.479 ns | 0.425 ns | 197.89 ns | - |
| Decrypt · AES-192-GCM (Managed) | 152B | 1,186.42 ns | 3.950 ns | 3.695 ns | 1,187.44 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 152B | 1,269.90 ns | 7.636 ns | 7.143 ns | 1,269.82 ns | 1640 B |
| Decrypt · AES-192-GCM (OS) | 152B | 2,026.10 ns | 21.096 ns | 19.733 ns | 2,028.72 ns | - |
| Encrypt · AES-192-GCM (ArmAes+ArmPmull) | 152B | 148.07 ns | 0.826 ns | 0.773 ns | 147.96 ns | - |
| Encrypt · AES-192-GCM (Managed) | 152B | 1,083.81 ns | 2.450 ns | 2.172 ns | 1,083.46 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 152B | 1,156.93 ns | 1.140 ns | 1.011 ns | 1,156.88 ns | 1624 B |
| Encrypt · AES-192-GCM (OS) | 152B | 1,886.51 ns | 37.631 ns | 77.714 ns | 1,921.00 ns | - |
| Decrypt · AES-192-GCM (ArmAes+ArmPmull) | 256B | 274.20 ns | 1.219 ns | 1.140 ns | 274.24 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 256B | 1,716.26 ns | 5.456 ns | 5.104 ns | 1,715.83 ns | 1640 B |
| Decrypt · AES-192-GCM (Managed) | 256B | 1,777.79 ns | 4.789 ns | 4.480 ns | 1,777.46 ns | - |
| Decrypt · AES-192-GCM (OS) | 256B | 2,006.89 ns | 5.787 ns | 5.414 ns | 2,006.84 ns | - |
| Encrypt · AES-192-GCM (ArmAes+ArmPmull) | 256B | 245.46 ns | 3.782 ns | 3.538 ns | 246.90 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 256B | 1,784.07 ns | 13.793 ns | 12.902 ns | 1,785.44 ns | 1624 B |
| Encrypt · AES-192-GCM (Managed) | 256B | 1,826.82 ns | 16.124 ns | 15.082 ns | 1,832.15 ns | - |
| Encrypt · AES-192-GCM (OS) | 256B | 1,942.43 ns | 26.472 ns | 23.467 ns | 1,944.04 ns | - |
| Decrypt · AES-192-GCM (ArmAes+ArmPmull) | 1KB | 919.98 ns | 17.348 ns | 23.747 ns | 910.43 ns | - |
| Decrypt · AES-192-GCM (OS) | 1KB | 2,129.58 ns | 29.742 ns | 24.836 ns | 2,125.08 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 1KB | 5,155.56 ns | 83.072 ns | 119.140 ns | 5,093.43 ns | 1640 B |
| Decrypt · AES-192-GCM (Managed) | 1KB | 6,220.61 ns | 20.487 ns | 19.164 ns | 6,220.82 ns | - |
| Encrypt · AES-192-GCM (ArmAes+ArmPmull) | 1KB | 911.15 ns | 15.261 ns | 14.275 ns | 914.45 ns | - |
| Encrypt · AES-192-GCM (OS) | 1KB | 2,084.34 ns | 18.503 ns | 16.402 ns | 2,084.09 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 1KB | 5,806.63 ns | 41.236 ns | 38.572 ns | 5,816.79 ns | 1624 B |
| Encrypt · AES-192-GCM (Managed) | 1KB | 6,625.54 ns | 56.436 ns | 52.790 ns | 6,646.75 ns | - |
| Decrypt · AES-192-GCM (OS) | 8KB | 3,192.92 ns | 41.167 ns | 38.507 ns | 3,197.00 ns | - |
| Decrypt · AES-192-GCM (ArmAes+ArmPmull) | 8KB | 6,861.46 ns | 96.269 ns | 85.340 ns | 6,846.07 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 8KB | 36,794.07 ns | 157.620 ns | 147.438 ns | 36,746.45 ns | 1640 B |
| Decrypt · AES-192-GCM (Managed) | 8KB | 47,855.32 ns | 115.413 ns | 107.957 ns | 47,853.18 ns | - |
| Encrypt · AES-192-GCM (OS) | 8KB | 3,216.31 ns | 25.133 ns | 22.280 ns | 3,216.80 ns | - |
| Encrypt · AES-192-GCM (ArmAes+ArmPmull) | 8KB | 7,275.47 ns | 143.373 ns | 147.234 ns | 7,293.49 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 8KB | 42,470.61 ns | 342.962 ns | 320.807 ns | 42,536.57 ns | 1624 B |
| Encrypt · AES-192-GCM (Managed) | 8KB | 51,352.26 ns | 1,025.780 ns | 1,566.475 ns | 51,974.95 ns | - |
| Decrypt · AES-192-GCM (OS) | 128KB | 21,617.22 ns | 41.413 ns | 38.738 ns | 21,625.07 ns | - |
| Decrypt · AES-192-GCM (ArmAes+ArmPmull) | 128KB | 108,670.67 ns | 1,473.814 ns | 3,502.675 ns | 108,040.67 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 128KB | 569,932.15 ns | 197.979 ns | 175.504 ns | 569,919.62 ns | 1640 B |
| Decrypt · AES-192-GCM (Managed) | 128KB | 747,764.92 ns | 339.076 ns | 283.143 ns | 747,864.68 ns | - |
| Encrypt · AES-192-GCM (OS) | 128KB | 23,587.42 ns | 129.504 ns | 114.802 ns | 23,599.07 ns | - |
| Encrypt · AES-192-GCM (ArmAes+ArmPmull) | 128KB | 112,051.01 ns | 2,663.548 ns | 7,246.378 ns | 113,045.82 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 128KB | 643,812.16 ns | 7,453.016 ns | 6,971.556 ns | 644,847.05 ns | 1624 B |
| Encrypt · AES-192-GCM (Managed) | 128KB | 796,955.76 ns | 5,329.920 ns | 4,450.725 ns | 797,598.18 ns | - |
AES-256-GCM
AES-256-GCM uses 14 rounds (vs 10 for AES-128), adding ~20-30% overhead per block. The same 2-tier architecture (ArmAes+ArmPmull → Managed) applies. Encrypt is ~14-16× faster than OS at 128 B; OS leads from ~4–8 KiB. The large-payload gap mirrors AES-128-GCM — Apple CommonCrypto likely exploits Apple Silicon–specific AES/PMULL execution units that are not yet accessible through the .NET ARMv8 intrinsics layer.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-256-GCM (ArmAes+ArmPmull) | 17B | 85.36 ns | 0.119 ns | 0.112 ns | - |
| Decrypt · AES-256-GCM (Managed) | 17B | 390.48 ns | 0.448 ns | 0.397 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 17B | 663.59 ns | 0.935 ns | 0.875 ns | 1744 B |
| Decrypt · AES-256-GCM (OS) | 17B | 1,922.85 ns | 9.287 ns | 8.687 ns | - |
| Encrypt · AES-256-GCM (ArmAes+ArmPmull) | 17B | 55.83 ns | 0.110 ns | 0.098 ns | - |
| Encrypt · AES-256-GCM (Managed) | 17B | 360.85 ns | 0.454 ns | 0.379 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 17B | 591.05 ns | 3.246 ns | 2.711 ns | 1728 B |
| Encrypt · AES-256-GCM (OS) | 17B | 1,755.88 ns | 7.393 ns | 6.915 ns | - |
| Decrypt · AES-256-GCM (ArmAes+ArmPmull) | 65B | 119.10 ns | 0.537 ns | 0.503 ns | - |
| Decrypt · AES-256-GCM (Managed) | 65B | 701.72 ns | 0.245 ns | 0.217 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 65B | 902.60 ns | 1.008 ns | 0.943 ns | 1744 B |
| Decrypt · AES-256-GCM (OS) | 65B | 1,915.24 ns | 11.849 ns | 11.083 ns | - |
| Encrypt · AES-256-GCM (ArmAes+ArmPmull) | 65B | 86.45 ns | 0.318 ns | 0.298 ns | - |
| Encrypt · AES-256-GCM (Managed) | 65B | 663.52 ns | 0.391 ns | 0.305 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 65B | 843.46 ns | 1.208 ns | 1.009 ns | 1728 B |
| Encrypt · AES-256-GCM (OS) | 65B | 1,748.38 ns | 3.972 ns | 3.317 ns | - |
| Decrypt · AES-256-GCM (ArmAes+ArmPmull) | 128B | 154.90 ns | 1.016 ns | 0.950 ns | - |
| Decrypt · AES-256-GCM (Managed) | 128B | 1,001.89 ns | 0.404 ns | 0.378 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 128B | 1,153.02 ns | 0.897 ns | 0.839 ns | 1744 B |
| Decrypt · AES-256-GCM (OS) | 128B | 1,935.36 ns | 11.800 ns | 11.038 ns | - |
| Encrypt · AES-256-GCM (ArmAes+ArmPmull) | 128B | 122.88 ns | 0.824 ns | 0.730 ns | - |
| Encrypt · AES-256-GCM (Managed) | 128B | 965.58 ns | 0.905 ns | 0.707 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 128B | 1,105.18 ns | 1.522 ns | 1.271 ns | 1728 B |
| Encrypt · AES-256-GCM (OS) | 128B | 1,765.75 ns | 6.769 ns | 6.001 ns | - |
| Decrypt · AES-256-GCM (ArmAes+ArmPmull) | 152B | 185.18 ns | 1.329 ns | 1.179 ns | - |
| Decrypt · AES-256-GCM (Managed) | 152B | 1,206.84 ns | 0.975 ns | 0.912 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 152B | 1,308.12 ns | 1.040 ns | 0.973 ns | 1744 B |
| Decrypt · AES-256-GCM (OS) | 152B | 1,940.98 ns | 15.149 ns | 14.171 ns | - |
| Encrypt · AES-256-GCM (ArmAes+ArmPmull) | 152B | 150.99 ns | 0.686 ns | 0.642 ns | - |
| Encrypt · AES-256-GCM (Managed) | 152B | 1,169.38 ns | 0.932 ns | 0.871 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 152B | 1,267.30 ns | 1.184 ns | 1.107 ns | 1728 B |
| Encrypt · AES-256-GCM (OS) | 152B | 1,775.11 ns | 10.951 ns | 9.708 ns | - |
| Decrypt · AES-256-GCM (ArmAes+ArmPmull) | 256B | 251.56 ns | 1.575 ns | 1.473 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 256B | 1,781.33 ns | 0.660 ns | 0.618 ns | 1744 B |
| Decrypt · AES-256-GCM (Managed) | 256B | 1,819.14 ns | 0.465 ns | 0.435 ns | - |
| Decrypt · AES-256-GCM (OS) | 256B | 1,927.66 ns | 10.525 ns | 8.789 ns | - |
| Encrypt · AES-256-GCM (ArmAes+ArmPmull) | 256B | 221.30 ns | 0.766 ns | 0.716 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 256B | 1,767.94 ns | 0.785 ns | 0.735 ns | 1728 B |
| Encrypt · AES-256-GCM (Managed) | 256B | 1,780.59 ns | 0.407 ns | 0.361 ns | - |
| Encrypt · AES-256-GCM (OS) | 256B | 1,786.94 ns | 8.376 ns | 7.835 ns | - |
| Decrypt · AES-256-GCM (ArmAes+ArmPmull) | 1KB | 821.40 ns | 4.078 ns | 3.615 ns | - |
| Decrypt · AES-256-GCM (OS) | 1KB | 2,066.99 ns | 13.375 ns | 12.511 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 1KB | 5,529.15 ns | 1.817 ns | 1.700 ns | 1744 B |
| Decrypt · AES-256-GCM (Managed) | 1KB | 6,580.74 ns | 2.554 ns | 2.264 ns | - |
| Encrypt · AES-256-GCM (ArmAes+ArmPmull) | 1KB | 798.05 ns | 4.140 ns | 3.873 ns | - |
| Encrypt · AES-256-GCM (OS) | 1KB | 1,899.13 ns | 17.089 ns | 15.985 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 1KB | 5,748.85 ns | 1.430 ns | 1.338 ns | 1728 B |
| Encrypt · AES-256-GCM (Managed) | 1KB | 6,458.19 ns | 1.358 ns | 1.270 ns | - |
| Decrypt · AES-256-GCM (OS) | 8KB | 3,085.76 ns | 25.035 ns | 23.417 ns | - |
| Decrypt · AES-256-GCM (ArmAes+ArmPmull) | 8KB | 6,200.93 ns | 5.322 ns | 4.718 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 8KB | 39,942.50 ns | 28.876 ns | 27.010 ns | 1744 B |
| Decrypt · AES-256-GCM (Managed) | 8KB | 50,698.65 ns | 35.145 ns | 29.347 ns | - |
| Encrypt · AES-256-GCM (OS) | 8KB | 2,974.20 ns | 11.999 ns | 11.224 ns | - |
| Encrypt · AES-256-GCM (ArmAes+ArmPmull) | 8KB | 6,201.55 ns | 15.969 ns | 14.156 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 8KB | 42,596.76 ns | 13.218 ns | 12.364 ns | 1728 B |
| Encrypt · AES-256-GCM (Managed) | 8KB | 50,671.48 ns | 26.373 ns | 24.670 ns | - |
| Decrypt · AES-256-GCM (OS) | 128KB | 22,147.61 ns | 96.588 ns | 90.349 ns | - |
| Decrypt · AES-256-GCM (ArmAes+ArmPmull) | 128KB | 98,473.75 ns | 154.333 ns | 120.493 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 128KB | 631,948.75 ns | 279.856 ns | 261.777 ns | 1744 B |
| Decrypt · AES-256-GCM (Managed) | 128KB | 808,117.93 ns | 135.905 ns | 120.477 ns | - |
| Encrypt · AES-256-GCM (OS) | 128KB | 22,816.94 ns | 86.248 ns | 80.676 ns | - |
| Encrypt · AES-256-GCM (ArmAes+ArmPmull) | 128KB | 99,847.51 ns | 1,357.781 ns | 1,270.069 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 128KB | 672,293.47 ns | 362.406 ns | 302.626 ns | 1728 B |
| Encrypt · AES-256-GCM (Managed) | 128KB | 806,702.94 ns | 324.443 ns | 287.610 ns | - |
AES-128-CCM
AES-CCM (Counter with CBC-MAC) combines CTR mode encryption with CBC-MAC authentication. Unlike GCM, CCM requires two sequential passes (encrypt + MAC or MAC + decrypt), making it inherently less parallelizable. It is widely used in IoT protocols (Bluetooth LE, ZigBee, Thread) and supports variable nonce (7–13 bytes) and tag sizes (4–16 bytes). Two acceleration tiers are available:
- ArmAes: ARM Cryptography Extension
AESD/AESEinstructions for all block operations — counter-mode encryption, CBC-MAC computation, and AAD processing. UsesVector128<byte>round keys viaMemoryMarshal.Castfrom the shareduint[]key schedule. Dispatched via_useAesNibool flag (shared with x86 dispatch; indicates hardware AES availability on any ISA). - Managed: T-table AES for all block operations. Fully portable, zero-allocation.
Key observations:
- ArmAes: ~4× faster than Managed at 128 KiB; ~4.3× faster than BouncyCastle; zero allocation
- Managed: T-table AES; comparable to BouncyCastle at large sizes
- BouncyCastle: Allocates ~2.4–2.5 KB per call
- No OS adapter available for comparison (System.Security.Cryptography does not expose AES-CCM on all platforms)
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-128-CCM (ArmAes) | 128B | 273.4 ns | 1.13 ns | 1.05 ns | - |
| Decrypt · AES-128-CCM (Managed) | 128B | 957.4 ns | 0.69 ns | 0.65 ns | - |
| Decrypt · AES-128-CCM (BouncyCastle) | 128B | 1,427.7 ns | 2.83 ns | 2.65 ns | 2424 B |
| Encrypt · AES-128-CCM (ArmAes) | 128B | 238.9 ns | 1.22 ns | 1.14 ns | - |
| Encrypt · AES-128-CCM (Managed) | 128B | 912.3 ns | 0.48 ns | 0.40 ns | - |
| Encrypt · AES-128-CCM (BouncyCastle) | 128B | 1,384.6 ns | 3.04 ns | 2.85 ns | 2464 B |
| Decrypt · AES-128-CCM (ArmAes) | 1KB | 1,538.2 ns | 3.70 ns | 3.46 ns | - |
| Decrypt · AES-128-CCM (Managed) | 1KB | 5,995.5 ns | 2.76 ns | 2.58 ns | - |
| Decrypt · AES-128-CCM (BouncyCastle) | 1KB | 6,843.3 ns | 18.32 ns | 17.14 ns | 2424 B |
| Encrypt · AES-128-CCM (ArmAes) | 1KB | 1,502.6 ns | 3.89 ns | 3.64 ns | - |
| Encrypt · AES-128-CCM (Managed) | 1KB | 5,953.0 ns | 1.37 ns | 1.28 ns | - |
| Encrypt · AES-128-CCM (BouncyCastle) | 1KB | 6,744.4 ns | 11.18 ns | 10.46 ns | 2464 B |
| Decrypt · AES-128-CCM (ArmAes) | 8KB | 11,687.0 ns | 36.11 ns | 33.78 ns | - |
| Decrypt · AES-128-CCM (Managed) | 8KB | 46,739.8 ns | 396.13 ns | 370.54 ns | - |
| Decrypt · AES-128-CCM (BouncyCastle) | 8KB | 49,745.9 ns | 87.41 ns | 81.76 ns | 2424 B |
| Encrypt · AES-128-CCM (ArmAes) | 8KB | 11,540.1 ns | 33.28 ns | 31.13 ns | - |
| Encrypt · AES-128-CCM (Managed) | 8KB | 46,157.0 ns | 39.38 ns | 32.89 ns | - |
| Encrypt · AES-128-CCM (BouncyCastle) | 8KB | 49,590.7 ns | 112.17 ns | 99.44 ns | 2464 B |
| Decrypt · AES-128-CCM (ArmAes) | 128KB | 184,210.5 ns | 452.86 ns | 423.60 ns | - |
| Decrypt · AES-128-CCM (Managed) | 128KB | 736,430.0 ns | 459.91 ns | 430.20 ns | - |
| Decrypt · AES-128-CCM (BouncyCastle) | 128KB | 792,925.7 ns | 809.34 ns | 717.46 ns | 2424 B |
| Encrypt · AES-128-CCM (ArmAes) | 128KB | 183,919.0 ns | 630.04 ns | 589.34 ns | - |
| Encrypt · AES-128-CCM (Managed) | 128KB | 736,115.3 ns | 191.11 ns | 169.41 ns | - |
| Encrypt · AES-128-CCM (BouncyCastle) | 128KB | 801,690.1 ns | 591.05 ns | 493.55 ns | 2464 B |
AES-256-CCM
AES-256-CCM uses 14 rounds (vs 10 for AES-128). The same ArmAes / Managed dispatch applies. The additional rounds add ~10-15% overhead on the Apple M4.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-256-CCM (ArmAes) | 128B | 299.9 ns | 1.13 ns | 1.06 ns | - |
| Decrypt · AES-256-CCM (Managed) | 128B | 1,252.1 ns | 1.36 ns | 1.27 ns | - |
| Decrypt · AES-256-CCM (BouncyCastle) | 128B | 1,785.1 ns | 3.98 ns | 3.73 ns | 2808 B |
| Encrypt · AES-256-CCM (ArmAes) | 128B | 265.4 ns | 1.55 ns | 1.45 ns | - |
| Encrypt · AES-256-CCM (Managed) | 128B | 1,208.7 ns | 0.83 ns | 0.65 ns | - |
| Encrypt · AES-256-CCM (BouncyCastle) | 128B | 1,743.7 ns | 4.22 ns | 3.94 ns | 2848 B |
| Decrypt · AES-256-CCM (ArmAes) | 1KB | 1,707.3 ns | 5.60 ns | 5.24 ns | - |
| Decrypt · AES-256-CCM (Managed) | 1KB | 7,946.1 ns | 3.66 ns | 3.42 ns | - |
| Decrypt · AES-256-CCM (BouncyCastle) | 1KB | 8,898.1 ns | 2.81 ns | 2.49 ns | 2808 B |
| Encrypt · AES-256-CCM (ArmAes) | 1KB | 1,670.1 ns | 6.02 ns | 5.63 ns | - |
| Encrypt · AES-256-CCM (Managed) | 1KB | 7,898.7 ns | 2.85 ns | 2.53 ns | - |
| Encrypt · AES-256-CCM (BouncyCastle) | 1KB | 8,859.8 ns | 1.49 ns | 1.32 ns | 2848 B |
| Decrypt · AES-256-CCM (ArmAes) | 8KB | 12,868.8 ns | 32.20 ns | 30.12 ns | - |
| Decrypt · AES-256-CCM (Managed) | 8KB | 61,446.4 ns | 25.35 ns | 23.72 ns | - |
| Decrypt · AES-256-CCM (BouncyCastle) | 8KB | 65,644.3 ns | 34.18 ns | 30.30 ns | 2808 B |
| Encrypt · AES-256-CCM (ArmAes) | 8KB | 12,807.3 ns | 44.59 ns | 41.71 ns | - |
| Encrypt · AES-256-CCM (Managed) | 8KB | 61,295.6 ns | 22.59 ns | 20.03 ns | - |
| Encrypt · AES-256-CCM (BouncyCastle) | 8KB | 65,412.6 ns | 33.97 ns | 31.78 ns | 2848 B |
| Decrypt · AES-256-CCM (ArmAes) | 128KB | 205,913.9 ns | 612.81 ns | 543.24 ns | - |
| Decrypt · AES-256-CCM (Managed) | 128KB | 979,175.3 ns | 592.66 ns | 554.37 ns | - |
| Decrypt · AES-256-CCM (BouncyCastle) | 128KB | 1,040,518.4 ns | 673.52 ns | 630.01 ns | 2808 B |
| Encrypt · AES-256-CCM (ArmAes) | 128KB | 204,195.7 ns | 643.92 ns | 602.32 ns | - |
| Encrypt · AES-256-CCM (Managed) | 128KB | 977,042.6 ns | 740.09 ns | 656.07 ns | - |
| Encrypt · AES-256-CCM (BouncyCastle) | 128KB | 1,038,874.9 ns | 506.46 ns | 473.74 ns | 2848 B |
ChaCha20-Poly1305
ChaCha20-Poly1305 is a software-friendly AEAD cipher (RFC 8439) that combines ChaCha20 stream encryption with Poly1305 MAC authentication. It is the recommended AEAD cipher when hardware AES acceleration is unavailable. Two acceleration tiers are available on ARM:
- Neon: Single-block ChaCha20 via
Vector128<uint>combined with Poly1305 donna-64 MAC (3×44-bit limbs, 9 multiplications per 16-byte block usingMath.BigMul). ~3× faster than OS at 128 B; competitive with OS at 1 KiB; OS leads at ≥8 KiB. - Managed: Scalar ChaCha20 + Poly1305 donna-32 (5×26-bit limbs, 25 multiplications per block on .NET Framework / .NET Standard). Fully portable.
Key observations:
- Neon ~3× faster than OS at 128 B encrypt; ~1.25× at 1 KiB; OS ~1.45× faster from 8 KiB; OS ~1.67× faster at 128 KiB
- BouncyCastle is slightly faster than NEON at very small payloads (128 B) due to lower NEON setup overhead at that granularity
- Managed and Neon paths are zero-allocation
- BouncyCastle allocates 336–416 B per call; NaCl.Core allocates 48–72 B per call
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · ChaCha20-Poly1305 (BouncyCastle) | 128B | 684.7 ns | 0.99 ns | 0.93 ns | 416 B |
| Decrypt · ChaCha20-Poly1305 (Neon) | 128B | 725.1 ns | 3.68 ns | 3.44 ns | - |
| Decrypt · ChaCha20-Poly1305 (NaCl.Core) | 128B | 822.1 ns | 0.42 ns | 0.33 ns | 48 B |
| Decrypt · ChaCha20-Poly1305 (Managed) | 128B | 1,314.5 ns | 9.14 ns | 8.55 ns | - |
| Decrypt · ChaCha20-Poly1305 (OS) | 128B | 2,262.0 ns | 21.27 ns | 18.85 ns | - |
| Encrypt · ChaCha20-Poly1305 (BouncyCastle) | 128B | 493.7 ns | 1.23 ns | 1.15 ns | 336 B |
| Encrypt · ChaCha20-Poly1305 (Neon) | 128B | 655.5 ns | 2.59 ns | 2.29 ns | - |
| Encrypt · ChaCha20-Poly1305 (NaCl.Core) | 128B | 790.9 ns | 0.15 ns | 0.14 ns | 48 B |
| Encrypt · ChaCha20-Poly1305 (Managed) | 128B | 1,207.0 ns | 18.02 ns | 16.85 ns | - |
| Encrypt · ChaCha20-Poly1305 (OS) | 128B | 1,933.5 ns | 21.88 ns | 20.47 ns | - |
| Decrypt · ChaCha20-Poly1305 (Neon) | 1KB | 2,333.8 ns | 0.98 ns | 0.82 ns | - |
| Decrypt · ChaCha20-Poly1305 (BouncyCastle) | 1KB | 2,395.5 ns | 4.30 ns | 4.02 ns | 416 B |
| Decrypt · ChaCha20-Poly1305 (OS) | 1KB | 3,155.5 ns | 18.22 ns | 17.04 ns | - |
| Decrypt · ChaCha20-Poly1305 (NaCl.Core) | 1KB | 3,668.8 ns | 1.21 ns | 1.13 ns | 72 B |
| Decrypt · ChaCha20-Poly1305 (Managed) | 1KB | 6,721.7 ns | 20.85 ns | 19.50 ns | - |
| Encrypt · ChaCha20-Poly1305 (BouncyCastle) | 1KB | 2,195.2 ns | 4.44 ns | 3.70 ns | 336 B |
| Encrypt · ChaCha20-Poly1305 (Neon) | 1KB | 2,278.3 ns | 0.86 ns | 0.80 ns | - |
| Encrypt · ChaCha20-Poly1305 (OS) | 1KB | 2,848.6 ns | 21.18 ns | 19.81 ns | - |
| Encrypt · ChaCha20-Poly1305 (NaCl.Core) | 1KB | 3,625.7 ns | 0.83 ns | 0.77 ns | 72 B |
| Encrypt · ChaCha20-Poly1305 (Managed) | 1KB | 6,658.5 ns | 22.85 ns | 21.37 ns | - |
| Decrypt · ChaCha20-Poly1305 (OS) | 8KB | 10,635.2 ns | 42.47 ns | 39.73 ns | - |
| Decrypt · ChaCha20-Poly1305 (Neon) | 8KB | 14,726.0 ns | 9.83 ns | 9.19 ns | - |
| Decrypt · ChaCha20-Poly1305 (BouncyCastle) | 8KB | 15,766.2 ns | 42.70 ns | 39.94 ns | 416 B |
| Decrypt · ChaCha20-Poly1305 (NaCl.Core) | 8KB | 26,270.6 ns | 9.37 ns | 8.76 ns | 72 B |
| Decrypt · ChaCha20-Poly1305 (Managed) | 8KB | 48,278.6 ns | 137.24 ns | 128.37 ns | - |
| Encrypt · ChaCha20-Poly1305 (OS) | 8KB | 10,135.5 ns | 49.14 ns | 45.96 ns | - |
| Encrypt · ChaCha20-Poly1305 (Neon) | 8KB | 14,754.8 ns | 4.52 ns | 3.77 ns | - |
| Encrypt · ChaCha20-Poly1305 (BouncyCastle) | 8KB | 15,683.5 ns | 40.95 ns | 38.30 ns | 336 B |
| Encrypt · ChaCha20-Poly1305 (NaCl.Core) | 8KB | 26,311.4 ns | 8.07 ns | 7.15 ns | 72 B |
| Encrypt · ChaCha20-Poly1305 (Managed) | 8KB | 47,890.3 ns | 173.96 ns | 162.72 ns | - |
| Decrypt · ChaCha20-Poly1305 (OS) | 128KB | 147,445.9 ns | 855.86 ns | 800.57 ns | - |
| Decrypt · ChaCha20-Poly1305 (Neon) | 128KB | 228,291.2 ns | 170.33 ns | 159.33 ns | - |
| Decrypt · ChaCha20-Poly1305 (BouncyCastle) | 128KB | 247,358.2 ns | 713.67 ns | 667.57 ns | 416 B |
| Decrypt · ChaCha20-Poly1305 (NaCl.Core) | 128KB | 414,167.8 ns | 183.58 ns | 171.72 ns | 72 B |
| Decrypt · ChaCha20-Poly1305 (Managed) | 128KB | 761,690.2 ns | 3,038.17 ns | 2,841.91 ns | - |
| Encrypt · ChaCha20-Poly1305 (OS) | 128KB | 136,972.2 ns | 815.14 ns | 762.49 ns | - |
| Encrypt · ChaCha20-Poly1305 (Neon) | 128KB | 228,956.6 ns | 57.95 ns | 54.20 ns | - |
| Encrypt · ChaCha20-Poly1305 (BouncyCastle) | 128KB | 249,403.3 ns | 567.57 ns | 530.90 ns | 336 B |
| Encrypt · ChaCha20-Poly1305 (NaCl.Core) | 128KB | 414,330.5 ns | 165.82 ns | 147.00 ns | 72 B |
| Encrypt · ChaCha20-Poly1305 (Managed) | 128KB | 760,649.4 ns | 3,111.39 ns | 2,910.40 ns | - |
XChaCha20-Poly1305
XChaCha20-Poly1305 extends ChaCha20-Poly1305 with a 24-byte nonce (vs 12 bytes), making random nonce generation safe against collisions (2⁹² birthday bound vs 2³² for ChaCha20-Poly1305). The implementation prepends an HChaCha20 key derivation step that derives a subkey from the first 16 bytes of the nonce. The same Neon / Managed acceleration tiers apply to the inner ChaCha20-Poly1305 operation.
Key observations:
- Performance nearly identical to ChaCha20-Poly1305 (HChaCha20 adds ~400 ns constant overhead)
- Neon ~3.3× faster than Managed at 128 KiB; ~3.3× faster than NaCl.Core at 128 KiB
- No OS or BouncyCastle implementations available for comparison
- NaCl.Core allocates 48–72 B per call
- Managed and Neon paths are zero-allocation
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · XChaCha20-Poly1305 (Neon) | 128B | 1.167 μs | 0.0064 μs | 0.0060 μs | - |
| Decrypt · XChaCha20-Poly1305 (NaCl.Core) | 128B | 1.480 μs | 0.0002 μs | 0.0002 μs | 48 B |
| Decrypt · XChaCha20-Poly1305 (Managed) | 128B | 1.722 μs | 0.0056 μs | 0.0052 μs | - |
| Encrypt · XChaCha20-Poly1305 (Neon) | 128B | 1.096 μs | 0.0075 μs | 0.0070 μs | - |
| Encrypt · XChaCha20-Poly1305 (NaCl.Core) | 128B | 1.448 μs | 0.0003 μs | 0.0002 μs | 48 B |
| Encrypt · XChaCha20-Poly1305 (Managed) | 128B | 1.621 μs | 0.0049 μs | 0.0045 μs | - |
| Decrypt · XChaCha20-Poly1305 (Neon) | 1KB | 2.699 μs | 0.0012 μs | 0.0011 μs | - |
| Decrypt · XChaCha20-Poly1305 (NaCl.Core) | 1KB | 6.635 μs | 0.0023 μs | 0.0021 μs | 72 B |
| Decrypt · XChaCha20-Poly1305 (Managed) | 1KB | 7.086 μs | 0.0254 μs | 0.0238 μs | - |
| Encrypt · XChaCha20-Poly1305 (Neon) | 1KB | 2.671 μs | 0.0012 μs | 0.0012 μs | - |
| Encrypt · XChaCha20-Poly1305 (NaCl.Core) | 1KB | 6.597 μs | 0.0015 μs | 0.0013 μs | 72 B |
| Encrypt · XChaCha20-Poly1305 (Managed) | 1KB | 7.048 μs | 0.0183 μs | 0.0171 μs | - |
| Decrypt · XChaCha20-Poly1305 (Neon) | 8KB | 15.129 μs | 0.0049 μs | 0.0041 μs | - |
| Decrypt · XChaCha20-Poly1305 (NaCl.Core) | 8KB | 47.608 μs | 0.0093 μs | 0.0087 μs | 72 B |
| Decrypt · XChaCha20-Poly1305 (Managed) | 8KB | 48.467 μs | 0.2036 μs | 0.1905 μs | - |
| Encrypt · XChaCha20-Poly1305 (Neon) | 8KB | 15.204 μs | 0.0069 μs | 0.0065 μs | - |
| Encrypt · XChaCha20-Poly1305 (NaCl.Core) | 8KB | 47.521 μs | 0.0060 μs | 0.0050 μs | 72 B |
| Encrypt · XChaCha20-Poly1305 (Managed) | 8KB | 48.376 μs | 0.2060 μs | 0.1927 μs | - |
| Decrypt · XChaCha20-Poly1305 (Neon) | 128KB | 228.965 μs | 0.1020 μs | 0.0954 μs | - |
| Decrypt · XChaCha20-Poly1305 (NaCl.Core) | 128KB | 751.208 μs | 0.1381 μs | 0.1292 μs | 72 B |
| Decrypt · XChaCha20-Poly1305 (Managed) | 128KB | 757.817 μs | 3.0047 μs | 2.8106 μs | - |
| Encrypt · XChaCha20-Poly1305 (Neon) | 128KB | 229.493 μs | 0.0862 μs | 0.0806 μs | - |
| Encrypt · XChaCha20-Poly1305 (NaCl.Core) | 128KB | 750.458 μs | 0.1387 μs | 0.1298 μs | 72 B |
| Encrypt · XChaCha20-Poly1305 (Managed) | 128KB | 756.772 μs | 2.3731 μs | 2.2198 μs | - |
Regional Block Ciphers
Regional block ciphers implement national cryptographic standards. All operate on 128-bit blocks in CBC mode. Benchmarks compare Managed implementations against BouncyCastle where available.
SM4-CBC (China)
SM4 is the Chinese national block cipher (GB/T 32907-2016). It uses a 128-bit key with 32 rounds of nonlinear key mixing.
- Managed: Lookup-table implementation with 32-bit word operations. Zero allocation.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · SM4-CBC (Managed) | 128B | 1.329 μs | 0.0058 μs | 0.0054 μs | - |
| Decrypt · SM4-CBC (BouncyCastle) | 128B | 1.418 μs | 0.0106 μs | 0.0099 μs | 40 B |
| Encrypt · SM4-CBC (Managed) | 128B | 1.447 μs | 0.0049 μs | 0.0046 μs | - |
| Encrypt · SM4-CBC (BouncyCastle) | 128B | 1.486 μs | 0.0035 μs | 0.0031 μs | 40 B |
| Decrypt · SM4-CBC (BouncyCastle) | 1KB | 8.802 μs | 0.0361 μs | 0.0338 μs | 40 B |
| Decrypt · SM4-CBC (Managed) | 1KB | 9.392 μs | 0.0300 μs | 0.0280 μs | - |
| Encrypt · SM4-CBC (BouncyCastle) | 1KB | 9.618 μs | 0.0402 μs | 0.0356 μs | 40 B |
| Encrypt · SM4-CBC (Managed) | 1KB | 10.431 μs | 0.0371 μs | 0.0329 μs | - |
| Decrypt · SM4-CBC (BouncyCastle) | 8KB | 67.695 μs | 0.2983 μs | 0.2790 μs | 40 B |
| Decrypt · SM4-CBC (Managed) | 8KB | 73.865 μs | 0.3488 μs | 0.3262 μs | - |
| Encrypt · SM4-CBC (BouncyCastle) | 8KB | 74.971 μs | 0.2233 μs | 0.2089 μs | 40 B |
| Encrypt · SM4-CBC (Managed) | 8KB | 82.340 μs | 0.4462 μs | 0.4173 μs | - |
| Decrypt · SM4-CBC (BouncyCastle) | 128KB | 1,078.473 μs | 6.0526 μs | 5.3655 μs | 40 B |
| Decrypt · SM4-CBC (Managed) | 128KB | 1,179.205 μs | 5.2554 μs | 4.9159 μs | - |
| Encrypt · SM4-CBC (BouncyCastle) | 128KB | 1,195.655 μs | 3.5399 μs | 3.1381 μs | 40 B |
| Encrypt · SM4-CBC (Managed) | 128KB | 1,317.563 μs | 7.5847 μs | 7.0948 μs | - |
ARIA-128-CBC (Korea)
ARIA is a Korean national cipher (KS X 1213) with an involutional SPN structure. ARIA-128 uses 12 rounds.
- Managed: S-box substitution with byte-level diffusion layer. Zero allocation.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · ARIA-128-CBC (Managed) | 128B | 2.221 μs | 0.0083 μs | 0.0073 μs | - |
| Decrypt · ARIA-128-CBC (BouncyCastle) | 128B | 2.339 μs | 0.0087 μs | 0.0073 μs | 1288 B |
| Encrypt · ARIA-128-CBC (Managed) | 128B | 2.197 μs | 0.0079 μs | 0.0070 μs | - |
| Encrypt · ARIA-128-CBC (BouncyCastle) | 128B | 2.228 μs | 0.0071 μs | 0.0059 μs | 1288 B |
| Decrypt · ARIA-128-CBC (BouncyCastle) | 1KB | 14.343 μs | 0.0478 μs | 0.0424 μs | 3528 B |
| Decrypt · ARIA-128-CBC (Managed) | 1KB | 15.985 μs | 0.0751 μs | 0.0703 μs | - |
| Encrypt · ARIA-128-CBC (BouncyCastle) | 1KB | 14.107 μs | 0.2760 μs | 0.2711 μs | 3528 B |
| Encrypt · ARIA-128-CBC (Managed) | 1KB | 15.848 μs | 0.0734 μs | 0.0613 μs | - |
| Decrypt · ARIA-128-CBC (BouncyCastle) | 8KB | 109.475 μs | 0.3934 μs | 0.3487 μs | 21448 B |
| Decrypt · ARIA-128-CBC (Managed) | 8KB | 126.115 μs | 0.3413 μs | 0.2850 μs | - |
| Encrypt · ARIA-128-CBC (BouncyCastle) | 8KB | 106.352 μs | 0.2397 μs | 0.2002 μs | 21448 B |
| Encrypt · ARIA-128-CBC (Managed) | 8KB | 125.575 μs | 0.4582 μs | 0.4062 μs | - |
| Decrypt · ARIA-128-CBC (BouncyCastle) | 128KB | 1,719.477 μs | 5.5217 μs | 4.8948 μs | 328648 B |
| Decrypt · ARIA-128-CBC (Managed) | 128KB | 2,023.756 μs | 9.2452 μs | 8.1956 μs | - |
| Encrypt · ARIA-128-CBC (BouncyCastle) | 128KB | 1,696.951 μs | 5.7161 μs | 5.0672 μs | 328648 B |
| Encrypt · ARIA-128-CBC (Managed) | 128KB | 2,021.430 μs | 8.8185 μs | 7.8173 μs | - |
ARIA-256-CBC (Korea)
ARIA-256 uses 16 rounds for 256-bit key security. The same SPN structure applies with additional rounds.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · ARIA-256-CBC (Managed) | 128B | 2.969 μs | 0.0142 μs | 0.0126 μs | - |
| Decrypt · ARIA-256-CBC (BouncyCastle) | 128B | 2.991 μs | 0.0063 μs | 0.0053 μs | 1496 B |
| Encrypt · ARIA-256-CBC (BouncyCastle) | 128B | 2.907 μs | 0.0053 μs | 0.0047 μs | 1496 B |
| Encrypt · ARIA-256-CBC (Managed) | 128B | 2.974 μs | 0.0052 μs | 0.0046 μs | - |
| Decrypt · ARIA-256-CBC (BouncyCastle) | 1KB | 18.676 μs | 0.0366 μs | 0.0306 μs | 3736 B |
| Decrypt · ARIA-256-CBC (Managed) | 1KB | 21.258 μs | 0.0321 μs | 0.0284 μs | - |
| Encrypt · ARIA-256-CBC (BouncyCastle) | 1KB | 18.359 μs | 0.0685 μs | 0.0572 μs | 3736 B |
| Encrypt · ARIA-256-CBC (Managed) | 1KB | 21.345 μs | 0.0424 μs | 0.0397 μs | - |
| Decrypt · ARIA-256-CBC (BouncyCastle) | 8KB | 139.550 μs | 0.2432 μs | 0.1899 μs | 21656 B |
| Decrypt · ARIA-256-CBC (Managed) | 8KB | 168.287 μs | 0.6729 μs | 0.5965 μs | - |
| Encrypt · ARIA-256-CBC (BouncyCastle) | 8KB | 140.725 μs | 0.3938 μs | 0.3491 μs | 21656 B |
| Encrypt · ARIA-256-CBC (Managed) | 8KB | 168.559 μs | 0.2917 μs | 0.2586 μs | - |
| Decrypt · ARIA-256-CBC (BouncyCastle) | 128KB | 2,275.573 μs | 6.0435 μs | 5.3574 μs | 328856 B |
| Decrypt · ARIA-256-CBC (Managed) | 128KB | 2,691.327 μs | 9.0075 μs | 8.4256 μs | - |
| Encrypt · ARIA-256-CBC (BouncyCastle) | 128KB | 2,247.459 μs | 6.1531 μs | 5.7556 μs | 328856 B |
| Encrypt · ARIA-256-CBC (Managed) | 128KB | 2,704.634 μs | 8.3292 μs | 6.9553 μs | - |
Camellia-128-CBC (Japan)
Camellia is a Japanese CRYPTREC/NESSIE cipher (RFC 3713) with a Feistel structure and FL/FL⁻¹ key-dependent layers.
- Managed: Pre-computed SP-box tables with 6 S-boxes. Zero allocation.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · Camellia-128-CBC (BouncyCastle) | 128B | 930.9 ns | 1.39 ns | 1.23 ns | 576 B |
| Decrypt · Camellia-128-CBC (Managed) | 128B | 1,442.9 ns | 5.45 ns | 5.10 ns | - |
| Encrypt · Camellia-128-CBC (BouncyCastle) | 128B | 906.0 ns | 0.89 ns | 0.74 ns | 576 B |
| Encrypt · Camellia-128-CBC (Managed) | 128B | 1,541.0 ns | 18.37 ns | 17.18 ns | - |
| Decrypt · Camellia-128-CBC (BouncyCastle) | 1KB | 5,819.7 ns | 12.68 ns | 11.86 ns | 2816 B |
| Decrypt · Camellia-128-CBC (Managed) | 1KB | 10,318.2 ns | 59.01 ns | 55.20 ns | - |
| Encrypt · Camellia-128-CBC (BouncyCastle) | 1KB | 5,935.5 ns | 12.76 ns | 11.93 ns | 2816 B |
| Encrypt · Camellia-128-CBC (Managed) | 1KB | 10,840.3 ns | 21.60 ns | 20.20 ns | - |
| Decrypt · Camellia-128-CBC (BouncyCastle) | 8KB | 45,077.4 ns | 123.42 ns | 115.45 ns | 20736 B |
| Decrypt · Camellia-128-CBC (Managed) | 8KB | 81,270.2 ns | 575.14 ns | 537.99 ns | - |
| Encrypt · Camellia-128-CBC (BouncyCastle) | 8KB | 45,478.1 ns | 138.37 ns | 129.43 ns | 20736 B |
| Encrypt · Camellia-128-CBC (Managed) | 8KB | 85,170.7 ns | 244.50 ns | 228.70 ns | - |
| Decrypt · Camellia-128-CBC (BouncyCastle) | 128KB | 737,510.6 ns | 1,487.94 ns | 1,391.82 ns | 327936 B |
| Decrypt · Camellia-128-CBC (Managed) | 128KB | 1,299,903.3 ns | 5,020.73 ns | 4,450.74 ns | - |
| Encrypt · Camellia-128-CBC (BouncyCastle) | 128KB | 719,839.6 ns | 2,435.91 ns | 2,278.56 ns | 327936 B |
| Encrypt · Camellia-128-CBC (Managed) | 128KB | 1,377,271.3 ns | 8,525.38 ns | 7,974.65 ns | - |
Camellia-256-CBC (Japan)
Camellia-256 uses 24 rounds (vs 18 for 128-bit). The additional FL/FL⁻¹ layers add minimal overhead.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · Camellia-256-CBC (BouncyCastle) | 128B | 1.169 μs | 0.0037 μs | 0.0035 μs | 592 B |
| Decrypt · Camellia-256-CBC (Managed) | 128B | 1.904 μs | 0.0088 μs | 0.0082 μs | - |
| Encrypt · Camellia-256-CBC (BouncyCastle) | 128B | 1.160 μs | 0.0040 μs | 0.0035 μs | 592 B |
| Encrypt · Camellia-256-CBC (Managed) | 128B | 2.014 μs | 0.0070 μs | 0.0066 μs | - |
| Decrypt · Camellia-256-CBC (BouncyCastle) | 1KB | 7.787 μs | 0.0386 μs | 0.0361 μs | 2832 B |
| Decrypt · Camellia-256-CBC (Managed) | 1KB | 13.417 μs | 0.0418 μs | 0.0370 μs | - |
| Encrypt · Camellia-256-CBC (BouncyCastle) | 1KB | 8.160 μs | 0.1230 μs | 0.1027 μs | 2832 B |
| Encrypt · Camellia-256-CBC (Managed) | 1KB | 14.476 μs | 0.0520 μs | 0.0461 μs | - |
| Decrypt · Camellia-256-CBC (BouncyCastle) | 8KB | 58.380 μs | 0.1671 μs | 0.1396 μs | 20752 B |
| Decrypt · Camellia-256-CBC (Managed) | 8KB | 107.567 μs | 0.6596 μs | 0.6170 μs | - |
| Encrypt · Camellia-256-CBC (BouncyCastle) | 8KB | 59.039 μs | 0.1537 μs | 0.1362 μs | 20752 B |
| Encrypt · Camellia-256-CBC (Managed) | 8KB | 114.540 μs | 0.4344 μs | 0.4063 μs | - |
| Decrypt · Camellia-256-CBC (BouncyCastle) | 128KB | 933.643 μs | 1.8411 μs | 1.6321 μs | 327952 B |
| Decrypt · Camellia-256-CBC (Managed) | 128KB | 1,697.288 μs | 8.4801 μs | 7.9323 μs | - |
| Encrypt · Camellia-256-CBC (BouncyCastle) | 128KB | 935.850 μs | 2.3988 μs | 2.2438 μs | 327952 B |
| Encrypt · Camellia-256-CBC (Managed) | 128KB | 1,830.016 μs | 6.0824 μs | 5.6895 μs | - |
Kuznyechik-CBC (Russia)
Kuznyechik (GOST R 34.12-2015) is the modern Russian cipher with a 256-bit key and 10 rounds. It replaces the older GOST 28147-89.
- Managed: Pre-computed S-box and linear transformation tables. Zero allocation.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · Kuznyechik-CBC (Managed) | 128B | 384.4 μs | 7.64 μs | 10.95 μs | - |
| Encrypt · Kuznyechik-CBC (Managed) | 128B | 377.6 μs | 7.01 μs | 6.22 μs | - |
| Decrypt · Kuznyechik-CBC (Managed) | 1KB | 3,181.6 μs | 12.86 μs | 12.03 μs | - |
| Encrypt · Kuznyechik-CBC (Managed) | 1KB | 2,957.0 μs | 20.45 μs | 18.13 μs | - |
| Decrypt · Kuznyechik-CBC (Managed) | 8KB | 26,392.5 μs | 41.45 μs | 38.77 μs | - |
| Encrypt · Kuznyechik-CBC (Managed) | 8KB | 25,439.2 μs | 29.89 μs | 26.50 μs | - |
| Decrypt · Kuznyechik-CBC (Managed) | 128KB | 412,074.8 μs | 732.89 μs | 685.54 μs | - |
| Encrypt · Kuznyechik-CBC (Managed) | 128KB | 404,593.0 μs | 472.23 μs | 418.62 μs | - |
Kalyna-128-CBC (Ukraine)
Kalyna (DSTU 7624:2014) is the Ukrainian national cipher paired with the Kupyna hash family. Uses MDS matrix diffusion.
- Managed: S-box substitution with MDS matrix multiplication. Zero allocation.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · Kalyna-128-CBC (Managed) | 128B | 2.250 μs | 0.0008 μs | 0.0007 μs | - |
| Decrypt · Kalyna-128-CBC (BouncyCastle) | 128B | 2.417 μs | 0.0038 μs | 0.0033 μs | 872 B |
| Encrypt · Kalyna-128-CBC (BouncyCastle) | 128B | 1.270 μs | 0.0019 μs | 0.0018 μs | 872 B |
| Encrypt · Kalyna-128-CBC (Managed) | 128B | 2.037 μs | 0.0020 μs | 0.0016 μs | - |
| Decrypt · Kalyna-128-CBC (BouncyCastle) | 1KB | 15.392 μs | 0.0216 μs | 0.0202 μs | 872 B |
| Decrypt · Kalyna-128-CBC (Managed) | 1KB | 16.165 μs | 0.0157 μs | 0.0131 μs | - |
| Encrypt · Kalyna-128-CBC (BouncyCastle) | 1KB | 7.156 μs | 0.0095 μs | 0.0084 μs | 872 B |
| Encrypt · Kalyna-128-CBC (Managed) | 1KB | 14.600 μs | 0.0194 μs | 0.0172 μs | - |
| Decrypt · Kalyna-128-CBC (BouncyCastle) | 8KB | 119.049 μs | 0.0755 μs | 0.0669 μs | 872 B |
| Decrypt · Kalyna-128-CBC (Managed) | 8KB | 127.465 μs | 0.0254 μs | 0.0199 μs | - |
| Encrypt · Kalyna-128-CBC (BouncyCastle) | 8KB | 54.206 μs | 0.0761 μs | 0.0712 μs | 872 B |
| Encrypt · Kalyna-128-CBC (Managed) | 8KB | 114.945 μs | 0.1831 μs | 0.1623 μs | - |
| Decrypt · Kalyna-128-CBC (BouncyCastle) | 128KB | 1,898.162 μs | 1.5546 μs | 1.4542 μs | 872 B |
| Decrypt · Kalyna-128-CBC (Managed) | 128KB | 2,040.730 μs | 0.7159 μs | 0.5978 μs | - |
| Encrypt · Kalyna-128-CBC (BouncyCastle) | 128KB | 861.432 μs | 0.6880 μs | 0.6099 μs | 872 B |
| Encrypt · Kalyna-128-CBC (Managed) | 128KB | 1,840.886 μs | 2.2577 μs | 2.0014 μs | - |
Kalyna-256-CBC (Ukraine)
Kalyna-256 uses 14 rounds (vs 10 for 128-bit key). The same MDS-based architecture applies.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · Kalyna-256-CBC (Managed) | 128B | 3.098 μs | 0.0014 μs | 0.0012 μs | - |
| Decrypt · Kalyna-256-CBC (BouncyCastle) | 128B | 3.292 μs | 0.0033 μs | 0.0031 μs | 1112 B |
| Encrypt · Kalyna-256-CBC (BouncyCastle) | 128B | 1.706 μs | 0.0020 μs | 0.0018 μs | 1112 B |
| Encrypt · Kalyna-256-CBC (Managed) | 128B | 2.787 μs | 0.0021 μs | 0.0016 μs | - |
| Decrypt · Kalyna-256-CBC (BouncyCastle) | 1KB | 21.163 μs | 0.0130 μs | 0.0115 μs | 1112 B |
| Decrypt · Kalyna-256-CBC (Managed) | 1KB | 22.254 μs | 0.0094 μs | 0.0079 μs | - |
| Encrypt · Kalyna-256-CBC (BouncyCastle) | 1KB | 9.790 μs | 0.0135 μs | 0.0126 μs | 1112 B |
| Encrypt · Kalyna-256-CBC (Managed) | 1KB | 20.026 μs | 0.0286 μs | 0.0267 μs | - |
| Decrypt · Kalyna-256-CBC (BouncyCastle) | 8KB | 163.975 μs | 0.0928 μs | 0.0775 μs | 1112 B |
| Decrypt · Kalyna-256-CBC (Managed) | 8KB | 175.451 μs | 0.0991 μs | 0.0828 μs | - |
| Encrypt · Kalyna-256-CBC (BouncyCastle) | 8KB | 74.237 μs | 0.1661 μs | 0.1473 μs | 1112 B |
| Encrypt · Kalyna-256-CBC (Managed) | 8KB | 156.759 μs | 0.1026 μs | 0.0909 μs | - |
| Decrypt · Kalyna-256-CBC (BouncyCastle) | 128KB | 2,612.778 μs | 2.0607 μs | 1.8268 μs | 1112 B |
| Decrypt · Kalyna-256-CBC (Managed) | 128KB | 2,807.778 μs | 1.5034 μs | 1.2554 μs | - |
| Encrypt · Kalyna-256-CBC (BouncyCastle) | 128KB | 1,177.886 μs | 1.7797 μs | 1.4862 μs | 1112 B |
| Encrypt · Kalyna-256-CBC (Managed) | 128KB | 2,522.515 μs | 2.0303 μs | 1.6954 μs | - |
SEED-CBC (Korea)
SEED is a Korean cipher (RFC 4269, KISA) with a 128-bit key and 16-round Feistel structure. S-boxes are derived from the golden ratio.
- Managed: Pre-computed 32-bit SS-boxes (SS0–SS3). Zero allocation.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · SEED-CBC (Managed) | 128B | 1.316 μs | 0.0142 μs | 0.0126 μs | - |
| Decrypt · SEED-CBC (BouncyCastle) | 128B | 1.400 μs | 0.0069 μs | 0.0064 μs | 152 B |
| Encrypt · SEED-CBC (BouncyCastle) | 128B | 1.428 μs | 0.0050 μs | 0.0044 μs | 152 B |
| Encrypt · SEED-CBC (Managed) | 128B | 1.439 μs | 0.0052 μs | 0.0049 μs | - |
| Decrypt · SEED-CBC (Managed) | 1KB | 9.363 μs | 0.0453 μs | 0.0424 μs | - |
| Decrypt · SEED-CBC (BouncyCastle) | 1KB | 9.601 μs | 0.0523 μs | 0.0489 μs | 152 B |
| Encrypt · SEED-CBC (BouncyCastle) | 1KB | 9.960 μs | 0.0510 μs | 0.0477 μs | 152 B |
| Encrypt · SEED-CBC (Managed) | 1KB | 10.463 μs | 0.0413 μs | 0.0386 μs | - |
| Decrypt · SEED-CBC (Managed) | 8KB | 73.523 μs | 0.2633 μs | 0.2463 μs | - |
| Decrypt · SEED-CBC (BouncyCastle) | 8KB | 75.218 μs | 0.3217 μs | 0.3009 μs | 152 B |
| Encrypt · SEED-CBC (BouncyCastle) | 8KB | 78.222 μs | 0.4169 μs | 0.3899 μs | 152 B |
| Encrypt · SEED-CBC (Managed) | 8KB | 82.674 μs | 0.3556 μs | 0.3327 μs | - |
| Decrypt · SEED-CBC (Managed) | 128KB | 1,178.190 μs | 5.9217 μs | 5.5392 μs | - |
| Decrypt · SEED-CBC (BouncyCastle) | 128KB | 1,200.086 μs | 5.2922 μs | 4.9504 μs | 152 B |
| Encrypt · SEED-CBC (BouncyCastle) | 128KB | 1,250.964 μs | 5.3121 μs | 4.9690 μs | 152 B |
| Encrypt · SEED-CBC (Managed) | 128KB | 1,324.827 μs | 6.9881 μs | 6.5367 μs | - |
Allocation Summary
All CryptoHives cipher implementations achieve zero heap allocation for both encrypt and decrypt operations across all payload sizes. This is critical for high-throughput scenarios such as network packet processing, where GC pressure directly impacts tail latency.
| Implementation | Allocation | Notes |
|---|---|---|
| CryptoHives (all variants) | 0 B | All tiers (Managed, ArmAes, ArmAes+ArmPmull, Neon) are zero-allocation at all payload sizes |
| OS (.NET) — GCM / ChaCha20-Poly1305 | 0 B | OS AEAD implementations are zero-allocation |
| OS (.NET) — CBC | 72 B | Fixed P/Invoke marshalling overhead per call, independent of payload size |
| BouncyCastle — CBC | 832–1,024 B | Fixed per-call allocation (832 B for AES-128, 1,024 B for AES-256) |
| BouncyCastle — GCM | 1,520–1,744 B | Fixed per-call allocation (1,520 B for AES-128 encrypt, 1,744 B for AES-256 decrypt) |
| BouncyCastle — CCM | 2,424–2,848 B | Fixed per-call allocation (2,424 B for AES-128 decrypt, 2,848 B for AES-256 encrypt) |
| BouncyCastle — ChaCha20-Poly1305 | 336–416 B | Varies slightly by payload size |
| BouncyCastle — ChaCha20 | 96 B | Fixed per-call allocation |
| NaCl.Core — ChaCha20 | 24 B | Small fixed allocation |
| NaCl.Core — ChaCha20-Poly1305 / XChaCha20 | 48–72 B | Small allocation, varies by payload size |