Cipher Algorithm Benchmarks
BenchmarkDotNet measurements for all cipher algorithm implementations in CryptoHives.Foundation.Security.Cryptography. Each algorithm is benchmarked across representative payload sizes (17 bytes through 128 KiB) to capture both latency and throughput characteristics.
Implementation Variants
Each cipher family exposes multiple acceleration tiers. The runtime automatically selects the fastest tier supported by the host CPU via SimdSupport detection. Callers can also force a specific tier through the Create(SimdSupport) factory for testing or compatibility.
AES Family
| Variant | Instructions | .NET Target | When Selected | Description |
|---|---|---|---|---|
| Managed | Scalar | All | No AES-NI support | T-table AES (AESENC/AESDEC emulated via lookup tables). Fully portable, zero-allocation. ~3–16× slower than AES-NI depending on mode and payload size. |
| AES-NI | AES-NI | .NET 8+ | Aes.IsSupported |
Hardware AES round instructions (AESENC, AESDEC, AESIMC). For CBC, uses 8-block interleaved decrypt for maximum instruction-level parallelism. For GCM/CCM, accelerates counter-mode encryption and CBC-MAC. |
| AES-NI+PClMul | AES-NI, PCLMULQDQ | .NET 8+ | Pclmulqdq.IsSupported |
Adds carry-less multiplication for hardware-accelerated GHASH (GCM authentication). Uses an 8-block stitched pipeline that interleaves AES rounds with GHASH CLMUL operations across alternating CPU ports. Modular reduction uses a 2-CLMUL approach (SymCrypt-style MODREDUCE), replacing 26 shift/XOR operations with 2 carry-less multiplies + 6 vector ops. Pre-computes Karatsuba cross-term halves for H¹–H⁸ powers. |
| AES-NI+PClMulV256 | AES-NI, VPCLMULQDQ, AVX2 | .NET 10+ | Pclmulqdq.V256.IsSupported |
Extends PClMul with 256-bit carry-less multiply (VPCLMULQDQ) and AVX2 256-bit loads/stores. Processes two 128-bit GHASH blocks per CLMUL instruction. Counter blocks are generated in batches of 8 using Vector256 increments. Best path for payloads ≥256 bytes on CPUs with VPCLMULQDQ support (Ice Lake+). |
ChaCha20 Family
| Variant | Instructions | .NET Target | When Selected | Description |
|---|---|---|---|---|
| Managed | Scalar | All | No SSSE3 support | Quarter-round operations using scalar uint arithmetic. Fully portable. ~3–6× slower than SIMD paths. |
| SSSE3 | SSSE3 | .NET 8+ | Ssse3.IsSupported |
Maps the 4×4 ChaCha state to four Vector128<uint> rows. Uses Ssse3.Shuffle byte masks for 16-bit and 8-bit rotations (1 instruction vs 3 for shift+or). Diagonal rounds use Sse2.Shuffle to rotate rows. Processes one 64-byte block per iteration. |
| AVX2 | AVX2, SSSE3 | .NET 8+ | Avx2.IsSupported |
Dual-block processing: encrypts two 64-byte blocks per iteration using Vector256<uint>. Falls back to SSSE3 for a remaining single block. ~1.9× faster than SSSE3 at 128 KiB. |
When to Use Each Variant
- Small messages (≤128 B): AES-GCM with AES-NI is ~2× faster than OS due to zero P/Invoke overhead and no kernel transition. ChaCha20-Poly1305 AVX2 is competitive with OS at these sizes.
- Medium messages (256 B–1 KB): AES-GCM V256 stitched pipeline engages at >=256 B (>=16 blocks), providing the best throughput. This range covers QUIC (~1.4 KB), WireGuard (~1.4 KB), and IPsec packets.
- Large messages (8 KB–128 KB): AES-GCM V256 decrypt stays within 1.24× of OS. ChaCha20-Poly1305 AVX2 is ~1.7× faster than OS. This range covers TLS records (1–16 KB) and OPC UA chunks (8 KB default).
- No hardware AES: Use ChaCha20-Poly1305 — it is designed for software-only execution and outperforms managed AES-GCM by 10–20×.
- IoT / constrained devices: AES-CCM with AES-NI provides ~3× speedup over managed. Supports variable nonce (7–13 bytes) and tag sizes.
Machine Profile
Machine Specification
The benchmarks were run on the following machine:
BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.7840/25H2/2025Update/HudsonValley2)
AMD Ryzen 5 7600X 4.70GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK 10.0.103
[Host] : .NET 10.0.3 (10.0.3, 10.0.326.7603), X64 RyuJIT x86-64-v4
.NET 10.0 : .NET 10.0.3 (10.0.3, 10.0.326.7603), X64 RyuJIT x86-64-v4
Job=.NET 10.0 Runtime=.NET 10.0 Toolchain=net10.0
Note: All benchmarks and SIMD optimizations have been developed and measured on this AMD Ryzen 5 / Windows 11 platform only. No results are available yet for Linux, macOS, or ARM processors (e.g. Apple Silicon, AWS Graviton). Performance characteristics — particularly SIMD dispatch paths and OS-backed implementations (CNG vs OpenSSL) — may differ significantly on other platforms. Run benchmarks locally for your specific hardware.
Highlights
| Family | Leader | Key Insight |
|---|---|---|
| ChaCha20 | Managed AVX2 | AVX2 ~3× faster than BouncyCastle; SSSE3 ~1.7×; zero allocation |
| ChaCha20-Poly1305 | Managed AVX2 | ~1.7× faster than OS at 128 KiB; zero allocation |
| XChaCha20-Poly1305 | Managed AVX2 | Same core as ChaCha20-Poly1305; negligible overhead for extended nonce |
| AES-CBC | AES-NI | Decrypt on par with OS at 128 KiB; ~8× faster than OS at 128 B; zero allocation |
| AES-GCM | AES-NI+PClMulV256 | ~2× faster than OS at 128 B encrypt; V256 decrypt within 1.24× of OS at 128 KiB; 8-block stitched AES+GHASH pipeline |
| AES-CCM | AES-NI | ~3× faster than Managed; zero allocation; no OS adapter available |
Stream Ciphers
ChaCha20
ChaCha20 is a stream cipher designed by Daniel J. Bernstein. Three acceleration tiers are available:
- AVX2: Dual-block processing — encrypts two 64-byte keystream blocks per iteration using
Vector256<uint>. Each block consists of 20 quarter-rounds operating on 8 lanes simultaneously. Falls back to SSSE3 for a remaining single block. Yields ~2 GB/s throughput at 128 KiB. - SSSE3: Single-block processing — maps the 4×4 state matrix to four
Vector128<uint>rows. UsesSsse3.Shufflebyte masks for 16-bit and 8-bit rotations (1 instruction vs 3 for shift+or). Yields ~1 GB/s throughput. - Managed: Scalar
uintquarter-round arithmetic. Fully portable across all .NET targets. ~3.4× slower than SSSE3.
Key observations:
- AVX2 is the fastest at all sizes, ~3× faster than BouncyCastle at 128 KiB
- SSSE3 is ~1.7× faster than BouncyCastle, processes single blocks via
Vector128<uint> - BouncyCastle allocates 96 B per call; NaCl.Core allocates 24 B per call
- Managed, SSSE3, and AVX2 paths are zero-allocation
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · ChaCha20 (AVX2) | 128B | 74.67 ns | 0.142 ns | 0.126 ns | - |
| Decrypt · ChaCha20 (SSSE3) | 128B | 127.79 ns | 0.497 ns | 0.441 ns | - |
| Decrypt · ChaCha20 (NaCl.Core) | 128B | 277.98 ns | 0.637 ns | 0.565 ns | 24 B |
| Decrypt · ChaCha20 (BouncyCastle) | 128B | 307.08 ns | 1.682 ns | 1.573 ns | 96 B |
| Decrypt · ChaCha20 (Managed) | 128B | 455.01 ns | 0.964 ns | 0.901 ns | - |
| Encrypt · ChaCha20 (AVX2) | 128B | 73.84 ns | 0.283 ns | 0.221 ns | - |
| Encrypt · ChaCha20 (SSSE3) | 128B | 127.72 ns | 0.407 ns | 0.381 ns | - |
| Encrypt · ChaCha20 (NaCl.Core) | 128B | 277.93 ns | 0.823 ns | 0.770 ns | 24 B |
| Encrypt · ChaCha20 (BouncyCastle) | 128B | 308.18 ns | 1.069 ns | 1.000 ns | 96 B |
| Encrypt · ChaCha20 (Managed) | 128B | 453.89 ns | 0.718 ns | 0.671 ns | - |
| Decrypt · ChaCha20 (AVX2) | 1KB | 524.12 ns | 2.837 ns | 2.369 ns | - |
| Decrypt · ChaCha20 (SSSE3) | 1KB | 996.60 ns | 2.730 ns | 2.279 ns | - |
| Decrypt · ChaCha20 (NaCl.Core) | 1KB | 1,501.58 ns | 4.661 ns | 4.360 ns | 24 B |
| Decrypt · ChaCha20 (BouncyCastle) | 1KB | 1,769.47 ns | 5.448 ns | 5.096 ns | 96 B |
| Decrypt · ChaCha20 (Managed) | 1KB | 3,529.41 ns | 10.726 ns | 10.033 ns | - |
| Encrypt · ChaCha20 (AVX2) | 1KB | 525.86 ns | 1.604 ns | 1.422 ns | - |
| Encrypt · ChaCha20 (SSSE3) | 1KB | 999.89 ns | 6.203 ns | 5.803 ns | - |
| Encrypt · ChaCha20 (NaCl.Core) | 1KB | 1,507.15 ns | 2.222 ns | 1.969 ns | 24 B |
| Encrypt · ChaCha20 (BouncyCastle) | 1KB | 1,773.93 ns | 4.289 ns | 4.012 ns | 96 B |
| Encrypt · ChaCha20 (Managed) | 1KB | 3,528.09 ns | 5.091 ns | 4.513 ns | - |
| Decrypt · ChaCha20 (AVX2) | 8KB | 4,124.02 ns | 17.872 ns | 16.717 ns | - |
| Decrypt · ChaCha20 (SSSE3) | 8KB | 7,982.12 ns | 37.310 ns | 31.156 ns | - |
| Decrypt · ChaCha20 (NaCl.Core) | 8KB | 11,363.04 ns | 13.686 ns | 12.802 ns | 24 B |
| Decrypt · ChaCha20 (BouncyCastle) | 8KB | 13,441.57 ns | 42.868 ns | 40.099 ns | 96 B |
| Decrypt · ChaCha20 (Managed) | 8KB | 28,207.56 ns | 76.489 ns | 71.548 ns | - |
| Encrypt · ChaCha20 (AVX2) | 8KB | 4,121.93 ns | 12.895 ns | 11.431 ns | - |
| Encrypt · ChaCha20 (SSSE3) | 8KB | 7,986.06 ns | 32.093 ns | 30.020 ns | - |
| Encrypt · ChaCha20 (NaCl.Core) | 8KB | 11,322.71 ns | 27.803 ns | 26.007 ns | 24 B |
| Encrypt · ChaCha20 (BouncyCastle) | 8KB | 13,436.35 ns | 34.656 ns | 28.939 ns | 96 B |
| Encrypt · ChaCha20 (Managed) | 8KB | 28,169.65 ns | 64.434 ns | 60.271 ns | - |
| Decrypt · ChaCha20 (AVX2) | 128KB | 65,898.91 ns | 214.841 ns | 167.734 ns | - |
| Decrypt · ChaCha20 (SSSE3) | 128KB | 127,801.24 ns | 533.152 ns | 445.206 ns | - |
| Decrypt · ChaCha20 (NaCl.Core) | 128KB | 180,025.82 ns | 405.332 ns | 379.148 ns | 24 B |
| Decrypt · ChaCha20 (BouncyCastle) | 128KB | 214,098.75 ns | 561.224 ns | 524.969 ns | 96 B |
| Decrypt · ChaCha20 (Managed) | 128KB | 449,581.94 ns | 1,529.613 ns | 1,430.801 ns | - |
| Encrypt · ChaCha20 (AVX2) | 128KB | 65,936.12 ns | 356.337 ns | 315.883 ns | - |
| Encrypt · ChaCha20 (SSSE3) | 128KB | 127,836.06 ns | 445.971 ns | 395.342 ns | - |
| Encrypt · ChaCha20 (NaCl.Core) | 128KB | 178,200.56 ns | 445.600 ns | 416.815 ns | 24 B |
| Encrypt · ChaCha20 (BouncyCastle) | 128KB | 213,946.74 ns | 693.002 ns | 614.328 ns | 96 B |
| Encrypt · ChaCha20 (Managed) | 128KB | 448,639.98 ns | 1,018.874 ns | 903.205 ns | - |
Block Ciphers
AES-128-CBC
AES-CBC (Cipher Block Chaining) is the most widely deployed AES mode. Two acceleration tiers are available:
- AES-NI: Uses hardware
AESENC/AESDECinstructions. Decrypt uses 8-block interleaving — 8 ciphertext blocks are loaded and decrypted simultaneously, exploiting the fact that CBC decrypt is embarrassingly parallel (each block decrypts independently using only its predecessor as the XOR mask). This saturates the AES execution unit pipeline (10 rounds × 8 blocks = 80AESDECinstructions in flight). Encrypt remains serial because each plaintext block must be XORed with the previous ciphertext before encryption. - Managed: T-table AES using four 256-entry lookup tables per round. Fully portable, zero-allocation. Outperforms BouncyCastle by ~22%.
Key observations:
- AES-NI: Fastest overall — on par with OS at 128 KiB decrypt, ~8× faster at 128 B
- AES-NI Encrypt: ~2× slower than OS at large sizes (CBC encrypt is inherently serial; OS may use kernel-level optimizations)
- Managed: Zero-allocation T-table AES, outperforms BouncyCastle by ~22%
- OS: Allocates 128 B per call (P/Invoke marshalling overhead)
| Description | TestDataSize | Mean | Error | StdDev | Median | Allocated |
|---|---|---|---|---|---|---|
| Decrypt · AES-128-CBC (AES-NI) | 128B | 29.90 ns | 0.382 ns | 0.358 ns | 29.78 ns | - |
| Decrypt · AES-128-CBC (OS) | 128B | 246.44 ns | 2.508 ns | 2.346 ns | 246.90 ns | 128 B |
| Decrypt · AES-128-CBC (Managed) | 128B | 439.76 ns | 2.413 ns | 2.257 ns | 439.81 ns | - |
| Decrypt · AES-128-CBC (BouncyCastle) | 128B | 692.44 ns | 4.885 ns | 4.570 ns | 690.71 ns | 832 B |
| Encrypt · AES-128-CBC (AES-NI) | 128B | 171.33 ns | 3.327 ns | 3.112 ns | 171.54 ns | - |
| Encrypt · AES-128-CBC (OS) | 128B | 274.43 ns | 2.993 ns | 2.800 ns | 274.02 ns | 128 B |
| Encrypt · AES-128-CBC (Managed) | 128B | 445.61 ns | 2.419 ns | 2.020 ns | 445.70 ns | - |
| Encrypt · AES-128-CBC (BouncyCastle) | 128B | 636.11 ns | 1.403 ns | 1.096 ns | 635.65 ns | 832 B |
| Decrypt · AES-128-CBC (AES-NI) | 1KB | 89.56 ns | 1.163 ns | 1.087 ns | 88.75 ns | - |
| Decrypt · AES-128-CBC (OS) | 1KB | 303.75 ns | 1.647 ns | 1.460 ns | 303.25 ns | 128 B |
| Decrypt · AES-128-CBC (Managed) | 1KB | 3,111.00 ns | 31.577 ns | 29.537 ns | 3,102.28 ns | - |
| Decrypt · AES-128-CBC (BouncyCastle) | 1KB | 3,895.86 ns | 31.179 ns | 29.165 ns | 3,884.31 ns | 832 B |
| Encrypt · AES-128-CBC (OS) | 1KB | 698.81 ns | 3.463 ns | 3.070 ns | 697.70 ns | 128 B |
| Encrypt · AES-128-CBC (AES-NI) | 1KB | 1,175.13 ns | 5.550 ns | 5.192 ns | 1,174.18 ns | - |
| Encrypt · AES-128-CBC (Managed) | 1KB | 3,120.67 ns | 23.789 ns | 21.088 ns | 3,119.42 ns | - |
| Encrypt · AES-128-CBC (BouncyCastle) | 1KB | 3,725.55 ns | 27.872 ns | 26.072 ns | 3,724.62 ns | 832 B |
| Decrypt · AES-128-CBC (AES-NI) | 8KB | 568.19 ns | 2.719 ns | 2.543 ns | 566.98 ns | - |
| Decrypt · AES-128-CBC (OS) | 8KB | 732.64 ns | 4.745 ns | 4.207 ns | 733.30 ns | 128 B |
| Decrypt · AES-128-CBC (Managed) | 8KB | 24,283.63 ns | 122.578 ns | 108.662 ns | 24,280.46 ns | - |
| Decrypt · AES-128-CBC (BouncyCastle) | 8KB | 29,166.53 ns | 261.978 ns | 245.054 ns | 29,099.30 ns | 832 B |
| Encrypt · AES-128-CBC (OS) | 8KB | 4,278.31 ns | 53.715 ns | 50.245 ns | 4,266.29 ns | 128 B |
| Encrypt · AES-128-CBC (AES-NI) | 8KB | 9,100.45 ns | 64.454 ns | 60.290 ns | 9,084.92 ns | - |
| Encrypt · AES-128-CBC (Managed) | 8KB | 24,549.96 ns | 159.960 ns | 149.627 ns | 24,565.28 ns | - |
| Encrypt · AES-128-CBC (BouncyCastle) | 8KB | 29,835.58 ns | 592.595 ns | 1,419.820 ns | 29,307.96 ns | 832 B |
| Decrypt · AES-128-CBC (OS) | 128KB | 8,302.64 ns | 55.443 ns | 51.861 ns | 8,291.44 ns | 128 B |
| Decrypt · AES-128-CBC (AES-NI) | 128KB | 8,849.10 ns | 89.091 ns | 83.335 ns | 8,857.57 ns | - |
| Decrypt · AES-128-CBC (Managed) | 128KB | 391,412.22 ns | 3,635.462 ns | 3,400.614 ns | 390,479.98 ns | - |
| Decrypt · AES-128-CBC (BouncyCastle) | 128KB | 461,918.15 ns | 2,776.376 ns | 2,597.024 ns | 461,417.43 ns | 832 B |
| Encrypt · AES-128-CBC (OS) | 128KB | 65,933.87 ns | 421.309 ns | 373.479 ns | 65,859.27 ns | 128 B |
| Encrypt · AES-128-CBC (AES-NI) | 128KB | 144,081.35 ns | 590.215 ns | 523.210 ns | 143,899.71 ns | - |
| Encrypt · AES-128-CBC (Managed) | 128KB | 391,274.38 ns | 4,268.993 ns | 3,993.219 ns | 389,712.01 ns | - |
| Encrypt · AES-128-CBC (BouncyCastle) | 128KB | 451,990.97 ns | 3,184.885 ns | 2,486.549 ns | 452,597.80 ns | 832 B |
AES-256-CBC
AES-256-CBC uses 14 rounds (vs 10 for AES-128), adding ~25-30% overhead. The same 8-block interleaved decrypt and serial encrypt architecture applies. The AES-NI decrypt path achieves parity with OS at 128 KiB.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-256-CBC (AES-NI) | 128B | 34.88 ns | 0.091 ns | 0.081 ns | - |
| Decrypt · AES-256-CBC (OS) | 128B | 251.67 ns | 1.635 ns | 1.449 ns | 128 B |
| Decrypt · AES-256-CBC (Managed) | 128B | 555.63 ns | 3.409 ns | 3.189 ns | - |
| Decrypt · AES-256-CBC (BouncyCastle) | 128B | 879.28 ns | 4.828 ns | 4.516 ns | 1024 B |
| Encrypt · AES-256-CBC (AES-NI) | 128B | 196.19 ns | 0.857 ns | 0.759 ns | - |
| Encrypt · AES-256-CBC (OS) | 128B | 311.13 ns | 1.697 ns | 1.587 ns | 128 B |
| Encrypt · AES-256-CBC (Managed) | 128B | 561.94 ns | 3.284 ns | 3.071 ns | - |
| Encrypt · AES-256-CBC (BouncyCastle) | 128B | 791.98 ns | 5.975 ns | 5.589 ns | 1024 B |
| Decrypt · AES-256-CBC (AES-NI) | 1KB | 109.58 ns | 0.451 ns | 0.400 ns | - |
| Decrypt · AES-256-CBC (OS) | 1KB | 329.48 ns | 0.626 ns | 0.523 ns | 128 B |
| Decrypt · AES-256-CBC (Managed) | 1KB | 3,934.52 ns | 10.216 ns | 9.056 ns | - |
| Decrypt · AES-256-CBC (BouncyCastle) | 1KB | 4,795.54 ns | 16.574 ns | 14.692 ns | 1024 B |
| Encrypt · AES-256-CBC (OS) | 1KB | 903.77 ns | 3.537 ns | 3.308 ns | 128 B |
| Encrypt · AES-256-CBC (AES-NI) | 1KB | 1,347.88 ns | 6.644 ns | 6.215 ns | - |
| Encrypt · AES-256-CBC (Managed) | 1KB | 4,000.46 ns | 29.168 ns | 27.284 ns | - |
| Encrypt · AES-256-CBC (BouncyCastle) | 1KB | 4,747.25 ns | 17.633 ns | 16.494 ns | 1024 B |
| Decrypt · AES-256-CBC (AES-NI) | 8KB | 709.38 ns | 4.531 ns | 4.238 ns | - |
| Decrypt · AES-256-CBC (OS) | 8KB | 939.84 ns | 1.744 ns | 1.546 ns | 128 B |
| Decrypt · AES-256-CBC (Managed) | 8KB | 30,856.03 ns | 105.914 ns | 82.691 ns | - |
| Decrypt · AES-256-CBC (BouncyCastle) | 8KB | 36,057.01 ns | 342.150 ns | 320.048 ns | 1024 B |
| Encrypt · AES-256-CBC (OS) | 8KB | 5,918.35 ns | 33.719 ns | 31.541 ns | 128 B |
| Encrypt · AES-256-CBC (AES-NI) | 8KB | 10,595.50 ns | 51.341 ns | 48.025 ns | - |
| Encrypt · AES-256-CBC (Managed) | 8KB | 31,308.13 ns | 304.560 ns | 284.886 ns | - |
| Encrypt · AES-256-CBC (BouncyCastle) | 8KB | 36,366.53 ns | 284.769 ns | 252.440 ns | 1024 B |
| Decrypt · AES-256-CBC (AES-NI) | 128KB | 11,227.40 ns | 47.328 ns | 41.955 ns | - |
| Decrypt · AES-256-CBC (OS) | 128KB | 11,252.46 ns | 30.748 ns | 28.762 ns | 128 B |
| Decrypt · AES-256-CBC (Managed) | 128KB | 496,567.17 ns | 2,924.653 ns | 2,735.722 ns | - |
| Decrypt · AES-256-CBC (BouncyCastle) | 128KB | 570,601.91 ns | 3,691.083 ns | 3,272.049 ns | 1024 B |
| Encrypt · AES-256-CBC (OS) | 128KB | 91,298.06 ns | 181.081 ns | 160.524 ns | 128 B |
| Encrypt · AES-256-CBC (AES-NI) | 128KB | 169,352.33 ns | 232.770 ns | 217.733 ns | - |
| Encrypt · AES-256-CBC (Managed) | 128KB | 498,926.23 ns | 2,985.918 ns | 2,793.030 ns | - |
| Encrypt · AES-256-CBC (BouncyCastle) | 128KB | 574,323.46 ns | 2,775.502 ns | 2,596.207 ns | 1024 B |
AEAD Ciphers (Authenticated Encryption)
Authenticated Encryption with Associated Data (AEAD) ciphers provide both confidentiality and authenticity in a single operation. All CryptoHives AEAD implementations are zero-allocation.
AES-128-GCM
AES-GCM combines counter-mode AES encryption (GCTR) with GHASH polynomial authentication over GF(2¹²⁸). Four acceleration tiers are available:
- AES-NI+PClMulV256 (.NET 10+): The fastest path on CPUs with VPCLMULQDQ (Ice Lake and later). Uses 256-bit carry-less multiply instructions (
VPCLMULQDQ) to process two GHASH blocks per CLMUL instruction. Counter blocks are generated in batches of 8 usingVector256<uint>increments. The stitched loop interleaves 8 blocks of AES encryption with lagged GHASH of the previous 8 ciphertext blocks — AES rounds execute on port 0 while CLMUL operations execute on port 5, achieving near-full utilization of both execution units. Modular reduction uses a 2-CLMUL SymCrypt-styleMODREDUCE(constant0xc200000000000000compensates for reflected bit order). Pre-computed H¹–H⁸ Karatsuba cross-term halves eliminate redundant XORs in the aggregated multiply. Only engages for payloads >128 B (>8 blocks); smaller payloads use the non-stitched path to avoid method call overhead. - AES-NI+PClMul (.NET 8+): Uses 128-bit
PCLMULQDQfor GHASH with the same 8-block stitched architecture. Falls back to this path when VPCLMULQDQ is unavailable (Haswell through Cannon Lake). Within 1.67× of OS at 128 KiB. - AES-NI (.NET 8+): Hardware AES round instructions for GCTR without CLMUL-accelerated GHASH. Uses the managed 4-bit Shoup table for authentication. 14–16× faster than fully managed at 128 KiB.
- Managed: Scalar T-table AES with 4-bit Shoup table GHASH (16-entry reduction table, byte-by-byte multiplication). Fully portable, zero-allocation.
Key observations:
- AES-NI+PClMulV256: ~2× faster than OS at 128 B encrypt; within 1.24× of OS at 128 KiB decrypt
- AES-NI+PClMul: ~2× faster than OS at 128 B encrypt; within 1.67× of OS at 128 KiB
- AES-NI: 14–16× faster than Managed at 128 KiB, zero allocation
- Managed: Uses 4-bit Shoup table GHASH, T-table AES
- BouncyCastle: Uses AES-NI + PCLMULQDQ internally on .NET Core 3.0+; allocates ~1.6 KB per call
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-128-GCM (AES-NI+PClMulV256) | 17B | 102.95 ns | 0.331 ns | 0.293 ns | - |
| Decrypt · AES-128-GCM (AES-NI+PClMul) | 17B | 104.40 ns | 0.371 ns | 0.347 ns | - |
| Decrypt · AES-128-GCM (OS) | 17B | 114.99 ns | 0.637 ns | 0.565 ns | - |
| Decrypt · AES-128-GCM (Managed) | 17B | 340.88 ns | 1.797 ns | 1.681 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 17B | 559.75 ns | 3.006 ns | 2.812 ns | 1624 B |
| Encrypt · AES-128-GCM (AES-NI+PClMul) | 17B | 62.08 ns | 0.189 ns | 0.167 ns | - |
| Encrypt · AES-128-GCM (AES-NI+PClMulV256) | 17B | 62.62 ns | 0.138 ns | 0.123 ns | - |
| Encrypt · AES-128-GCM (OS) | 17B | 122.74 ns | 0.571 ns | 0.534 ns | - |
| Encrypt · AES-128-GCM (Managed) | 17B | 313.01 ns | 1.462 ns | 1.368 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 17B | 499.26 ns | 3.590 ns | 2.998 ns | 1608 B |
| Decrypt · AES-128-GCM (AES-NI+PClMulV256) | 65B | 101.25 ns | 0.527 ns | 0.493 ns | - |
| Decrypt · AES-128-GCM (AES-NI+PClMul) | 65B | 103.23 ns | 0.438 ns | 0.410 ns | - |
| Decrypt · AES-128-GCM (OS) | 65B | 117.66 ns | 0.763 ns | 0.714 ns | - |
| Decrypt · AES-128-GCM (Managed) | 65B | 581.40 ns | 3.339 ns | 3.123 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 65B | 739.54 ns | 2.530 ns | 2.367 ns | 1624 B |
| Encrypt · AES-128-GCM (AES-NI+PClMul) | 65B | 68.42 ns | 0.196 ns | 0.183 ns | - |
| Encrypt · AES-128-GCM (AES-NI+PClMulV256) | 65B | 68.77 ns | 0.340 ns | 0.301 ns | - |
| Encrypt · AES-128-GCM (OS) | 65B | 122.76 ns | 0.407 ns | 0.340 ns | - |
| Encrypt · AES-128-GCM (Managed) | 65B | 548.80 ns | 3.678 ns | 3.441 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 65B | 648.90 ns | 6.516 ns | 5.777 ns | 1608 B |
| Decrypt · AES-128-GCM (AES-NI+PClMulV256) | 128B | 95.70 ns | 0.407 ns | 0.380 ns | - |
| Decrypt · AES-128-GCM (AES-NI+PClMul) | 128B | 98.45 ns | 0.321 ns | 0.300 ns | - |
| Decrypt · AES-128-GCM (OS) | 128B | 116.39 ns | 0.314 ns | 0.262 ns | - |
| Decrypt · AES-128-GCM (Managed) | 128B | 818.32 ns | 3.375 ns | 2.991 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 128B | 876.64 ns | 2.989 ns | 2.796 ns | 1624 B |
| Encrypt · AES-128-GCM (AES-NI+PClMulV256) | 128B | 57.24 ns | 0.106 ns | 0.094 ns | - |
| Encrypt · AES-128-GCM (AES-NI+PClMul) | 128B | 59.48 ns | 0.540 ns | 0.505 ns | - |
| Encrypt · AES-128-GCM (OS) | 128B | 120.30 ns | 0.801 ns | 0.749 ns | - |
| Encrypt · AES-128-GCM (Managed) | 128B | 789.27 ns | 3.525 ns | 3.297 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 128B | 792.65 ns | 5.102 ns | 4.523 ns | 1608 B |
| Decrypt · AES-128-GCM (AES-NI+PClMulV256) | 152B | 120.05 ns | 0.240 ns | 0.188 ns | - |
| Decrypt · AES-128-GCM (AES-NI+PClMul) | 152B | 123.28 ns | 0.365 ns | 0.305 ns | - |
| Decrypt · AES-128-GCM (OS) | 152B | 132.80 ns | 0.332 ns | 0.277 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 152B | 999.24 ns | 4.913 ns | 4.595 ns | 1624 B |
| Decrypt · AES-128-GCM (Managed) | 152B | 1,002.47 ns | 10.254 ns | 9.592 ns | - |
| Encrypt · AES-128-GCM (AES-NI+PClMulV256) | 152B | 82.86 ns | 0.210 ns | 0.196 ns | - |
| Encrypt · AES-128-GCM (AES-NI+PClMul) | 152B | 84.51 ns | 0.161 ns | 0.143 ns | - |
| Encrypt · AES-128-GCM (OS) | 152B | 131.56 ns | 0.614 ns | 0.545 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 152B | 903.64 ns | 5.494 ns | 5.139 ns | 1608 B |
| Encrypt · AES-128-GCM (Managed) | 152B | 953.86 ns | 4.207 ns | 3.729 ns | - |
| Decrypt · AES-128-GCM (AES-NI+PClMulV256) | 256B | 111.85 ns | 0.334 ns | 0.312 ns | - |
| Decrypt · AES-128-GCM (OS) | 256B | 122.60 ns | 0.568 ns | 0.504 ns | - |
| Decrypt · AES-128-GCM (AES-NI+PClMul) | 256B | 124.00 ns | 0.866 ns | 0.810 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 256B | 1,274.52 ns | 5.696 ns | 5.049 ns | 1624 B |
| Decrypt · AES-128-GCM (Managed) | 256B | 1,462.78 ns | 9.769 ns | 9.138 ns | - |
| Encrypt · AES-128-GCM (AES-NI+PClMulV256) | 256B | 72.90 ns | 0.432 ns | 0.404 ns | - |
| Encrypt · AES-128-GCM (AES-NI+PClMul) | 256B | 77.77 ns | 0.266 ns | 0.248 ns | - |
| Encrypt · AES-128-GCM (OS) | 256B | 120.86 ns | 0.584 ns | 0.546 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 256B | 1,187.06 ns | 12.003 ns | 11.228 ns | 1608 B |
| Encrypt · AES-128-GCM (Managed) | 256B | 1,430.67 ns | 7.493 ns | 7.009 ns | - |
| Decrypt · AES-128-GCM (OS) | 1KB | 176.06 ns | 0.851 ns | 0.796 ns | - |
| Decrypt · AES-128-GCM (AES-NI+PClMulV256) | 1KB | 202.11 ns | 0.745 ns | 0.660 ns | - |
| Decrypt · AES-128-GCM (AES-NI+PClMul) | 1KB | 250.02 ns | 1.481 ns | 1.385 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 1KB | 3,646.17 ns | 17.730 ns | 16.585 ns | 1624 B |
| Decrypt · AES-128-GCM (Managed) | 1KB | 5,321.67 ns | 26.885 ns | 25.148 ns | - |
| Encrypt · AES-128-GCM (AES-NI+PClMulV256) | 1KB | 170.82 ns | 1.061 ns | 0.992 ns | - |
| Encrypt · AES-128-GCM (OS) | 1KB | 171.72 ns | 0.928 ns | 0.868 ns | - |
| Encrypt · AES-128-GCM (AES-NI+PClMul) | 1KB | 194.79 ns | 0.744 ns | 0.660 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 1KB | 3,533.12 ns | 7.379 ns | 6.162 ns | 1608 B |
| Encrypt · AES-128-GCM (Managed) | 1KB | 5,275.04 ns | 32.511 ns | 30.411 ns | - |
| Decrypt · AES-128-GCM (OS) | 8KB | 749.33 ns | 2.528 ns | 2.241 ns | - |
| Decrypt · AES-128-GCM (AES-NI+PClMulV256) | 8KB | 1,024.88 ns | 4.731 ns | 4.194 ns | - |
| Decrypt · AES-128-GCM (AES-NI+PClMul) | 8KB | 1,365.80 ns | 5.493 ns | 4.587 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 8KB | 25,495.69 ns | 130.849 ns | 122.396 ns | 1624 B |
| Decrypt · AES-128-GCM (Managed) | 8KB | 41,842.83 ns | 244.320 ns | 228.537 ns | - |
| Encrypt · AES-128-GCM (OS) | 8KB | 665.39 ns | 4.516 ns | 4.224 ns | - |
| Encrypt · AES-128-GCM (AES-NI+PClMulV256) | 8KB | 1,083.19 ns | 5.775 ns | 4.822 ns | - |
| Encrypt · AES-128-GCM (AES-NI+PClMul) | 8KB | 1,296.43 ns | 3.035 ns | 2.691 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 8KB | 25,328.18 ns | 142.092 ns | 125.961 ns | 1608 B |
| Encrypt · AES-128-GCM (Managed) | 8KB | 40,995.91 ns | 131.972 ns | 116.990 ns | - |
| Decrypt · AES-128-GCM (OS) | 128KB | 10,769.09 ns | 70.491 ns | 65.937 ns | - |
| Decrypt · AES-128-GCM (AES-NI+PClMulV256) | 128KB | 16,651.25 ns | 325.878 ns | 362.213 ns | - |
| Decrypt · AES-128-GCM (AES-NI+PClMul) | 128KB | 20,730.36 ns | 114.943 ns | 107.517 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 128KB | 401,251.41 ns | 1,495.452 ns | 1,325.680 ns | 1624 B |
| Decrypt · AES-128-GCM (Managed) | 128KB | 656,117.23 ns | 3,867.558 ns | 3,229.586 ns | - |
| Encrypt · AES-128-GCM (OS) | 128KB | 9,868.98 ns | 64.336 ns | 60.180 ns | - |
| Encrypt · AES-128-GCM (AES-NI+PClMulV256) | 128KB | 16,782.50 ns | 103.115 ns | 96.453 ns | - |
| Encrypt · AES-128-GCM (AES-NI+PClMul) | 128KB | 20,316.93 ns | 99.132 ns | 92.728 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 128KB | 401,508.72 ns | 2,259.139 ns | 2,002.668 ns | 1608 B |
| Encrypt · AES-128-GCM (Managed) | 128KB | 659,908.63 ns | 4,853.744 ns | 4,540.195 ns | - |
AES-192-GCM
AES-192-GCM uses 12 rounds (vs 10 for AES-128), adding ~15-20% overhead. The same stitched pipeline and SIMD dispatch tiers apply. Performance characteristics fall between AES-128-GCM and AES-256-GCM.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-192-GCM (AES-NI+PClMulV256) | 17B | 105.91 ns | 0.257 ns | 0.228 ns | - |
| Decrypt · AES-192-GCM (AES-NI+PClMul) | 17B | 106.22 ns | 0.426 ns | 0.398 ns | - |
| Decrypt · AES-192-GCM (OS) | 17B | 118.06 ns | 0.637 ns | 0.596 ns | - |
| Decrypt · AES-192-GCM (Managed) | 17B | 360.41 ns | 0.989 ns | 0.772 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 17B | 602.39 ns | 4.475 ns | 4.186 ns | 1728 B |
| Encrypt · AES-192-GCM (AES-NI+PClMulV256) | 17B | 66.04 ns | 0.185 ns | 0.173 ns | - |
| Encrypt · AES-192-GCM (AES-NI+PClMul) | 17B | 66.27 ns | 0.247 ns | 0.231 ns | - |
| Encrypt · AES-192-GCM (OS) | 17B | 125.46 ns | 0.659 ns | 0.616 ns | - |
| Encrypt · AES-192-GCM (Managed) | 17B | 326.77 ns | 1.672 ns | 1.564 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 17B | 550.55 ns | 2.682 ns | 2.508 ns | 1712 B |
| Decrypt · AES-192-GCM (AES-NI+PClMulV256) | 65B | 103.80 ns | 0.357 ns | 0.334 ns | - |
| Decrypt · AES-192-GCM (AES-NI+PClMul) | 65B | 106.50 ns | 0.568 ns | 0.504 ns | - |
| Decrypt · AES-192-GCM (OS) | 65B | 123.77 ns | 0.418 ns | 0.391 ns | - |
| Decrypt · AES-192-GCM (Managed) | 65B | 625.27 ns | 2.828 ns | 2.646 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 65B | 809.36 ns | 3.047 ns | 2.701 ns | 1728 B |
| Encrypt · AES-192-GCM (AES-NI+PClMul) | 65B | 73.00 ns | 0.427 ns | 0.399 ns | - |
| Encrypt · AES-192-GCM (AES-NI+PClMulV256) | 65B | 73.43 ns | 0.380 ns | 0.355 ns | - |
| Encrypt · AES-192-GCM (OS) | 65B | 127.68 ns | 0.408 ns | 0.381 ns | - |
| Encrypt · AES-192-GCM (Managed) | 65B | 588.70 ns | 2.439 ns | 2.162 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 65B | 716.15 ns | 2.949 ns | 2.614 ns | 1712 B |
| Decrypt · AES-192-GCM (AES-NI+PClMulV256) | 128B | 105.56 ns | 0.655 ns | 0.581 ns | - |
| Decrypt · AES-192-GCM (AES-NI+PClMul) | 128B | 111.23 ns | 0.779 ns | 0.729 ns | - |
| Decrypt · AES-192-GCM (OS) | 128B | 119.17 ns | 0.718 ns | 0.671 ns | - |
| Decrypt · AES-192-GCM (Managed) | 128B | 883.02 ns | 2.748 ns | 2.571 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 128B | 987.49 ns | 5.565 ns | 5.205 ns | 1728 B |
| Encrypt · AES-192-GCM (AES-NI+PClMulV256) | 128B | 59.96 ns | 0.221 ns | 0.206 ns | - |
| Encrypt · AES-192-GCM (AES-NI+PClMul) | 128B | 62.26 ns | 0.214 ns | 0.189 ns | - |
| Encrypt · AES-192-GCM (OS) | 128B | 122.73 ns | 0.545 ns | 0.510 ns | - |
| Encrypt · AES-192-GCM (Managed) | 128B | 851.09 ns | 5.556 ns | 5.197 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 128B | 880.26 ns | 4.912 ns | 4.594 ns | 1712 B |
| Decrypt · AES-192-GCM (AES-NI+PClMulV256) | 152B | 124.49 ns | 0.607 ns | 0.538 ns | - |
| Decrypt · AES-192-GCM (AES-NI+PClMul) | 152B | 126.86 ns | 0.361 ns | 0.302 ns | - |
| Decrypt · AES-192-GCM (OS) | 152B | 145.64 ns | 0.639 ns | 0.598 ns | - |
| Decrypt · AES-192-GCM (Managed) | 152B | 1,052.51 ns | 3.535 ns | 2.952 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 152B | 1,104.38 ns | 7.629 ns | 7.136 ns | 1728 B |
| Encrypt · AES-192-GCM (AES-NI+PClMulV256) | 152B | 87.23 ns | 0.318 ns | 0.282 ns | - |
| Encrypt · AES-192-GCM (AES-NI+PClMul) | 152B | 88.65 ns | 0.452 ns | 0.401 ns | - |
| Encrypt · AES-192-GCM (OS) | 152B | 137.30 ns | 0.695 ns | 0.651 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 152B | 1,008.18 ns | 6.388 ns | 5.976 ns | 1712 B |
| Encrypt · AES-192-GCM (Managed) | 152B | 1,021.50 ns | 5.500 ns | 4.593 ns | - |
| Decrypt · AES-192-GCM (AES-NI+PClMulV256) | 256B | 114.10 ns | 0.436 ns | 0.408 ns | - |
| Decrypt · AES-192-GCM (AES-NI+PClMul) | 256B | 127.45 ns | 0.725 ns | 0.643 ns | - |
| Decrypt · AES-192-GCM (OS) | 256B | 128.91 ns | 0.607 ns | 0.538 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 256B | 1,438.80 ns | 6.776 ns | 6.339 ns | 1728 B |
| Decrypt · AES-192-GCM (Managed) | 256B | 1,572.05 ns | 8.249 ns | 6.888 ns | - |
| Encrypt · AES-192-GCM (AES-NI+PClMulV256) | 256B | 77.67 ns | 0.191 ns | 0.160 ns | - |
| Encrypt · AES-192-GCM (AES-NI+PClMul) | 256B | 83.38 ns | 0.305 ns | 0.271 ns | - |
| Encrypt · AES-192-GCM (OS) | 256B | 124.39 ns | 0.484 ns | 0.429 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 256B | 1,330.59 ns | 6.732 ns | 5.968 ns | 1712 B |
| Encrypt · AES-192-GCM (Managed) | 256B | 1,541.04 ns | 8.856 ns | 8.284 ns | - |
| Decrypt · AES-192-GCM (OS) | 1KB | 186.56 ns | 0.633 ns | 0.592 ns | - |
| Decrypt · AES-192-GCM (AES-NI+PClMulV256) | 1KB | 207.95 ns | 1.180 ns | 1.046 ns | - |
| Decrypt · AES-192-GCM (AES-NI+PClMul) | 1KB | 254.63 ns | 1.238 ns | 1.098 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 1KB | 4,145.35 ns | 19.536 ns | 17.318 ns | 1728 B |
| Decrypt · AES-192-GCM (Managed) | 1KB | 5,758.13 ns | 43.629 ns | 38.676 ns | - |
| Encrypt · AES-192-GCM (OS) | 1KB | 174.77 ns | 0.923 ns | 0.819 ns | - |
| Encrypt · AES-192-GCM (AES-NI+PClMulV256) | 1KB | 185.61 ns | 0.532 ns | 0.498 ns | - |
| Encrypt · AES-192-GCM (AES-NI+PClMul) | 1KB | 205.37 ns | 0.722 ns | 0.640 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 1KB | 4,103.11 ns | 37.661 ns | 35.228 ns | 1712 B |
| Encrypt · AES-192-GCM (Managed) | 1KB | 5,705.90 ns | 25.707 ns | 22.788 ns | - |
| Decrypt · AES-192-GCM (OS) | 8KB | 780.13 ns | 4.673 ns | 4.371 ns | - |
| Decrypt · AES-192-GCM (AES-NI+PClMulV256) | 8KB | 1,065.81 ns | 5.804 ns | 4.846 ns | - |
| Decrypt · AES-192-GCM (AES-NI+PClMul) | 8KB | 1,465.01 ns | 6.443 ns | 5.381 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 8KB | 29,478.84 ns | 88.623 ns | 82.898 ns | 1728 B |
| Decrypt · AES-192-GCM (Managed) | 8KB | 44,667.82 ns | 282.174 ns | 263.946 ns | - |
| Encrypt · AES-192-GCM (OS) | 8KB | 691.27 ns | 2.886 ns | 2.558 ns | - |
| Encrypt · AES-192-GCM (AES-NI+PClMulV256) | 8KB | 1,206.25 ns | 5.555 ns | 4.925 ns | - |
| Encrypt · AES-192-GCM (AES-NI+PClMul) | 8KB | 1,360.40 ns | 5.647 ns | 5.006 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 8KB | 29,302.83 ns | 87.959 ns | 82.277 ns | 1712 B |
| Encrypt · AES-192-GCM (Managed) | 8KB | 44,741.58 ns | 298.890 ns | 279.582 ns | - |
| Decrypt · AES-192-GCM (OS) | 128KB | 11,566.06 ns | 45.235 ns | 42.313 ns | - |
| Decrypt · AES-192-GCM (AES-NI+PClMulV256) | 128KB | 16,649.51 ns | 314.432 ns | 322.899 ns | - |
| Decrypt · AES-192-GCM (AES-NI+PClMul) | 128KB | 22,003.14 ns | 62.833 ns | 55.700 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 128KB | 464,310.77 ns | 1,443.731 ns | 1,279.830 ns | 1728 B |
| Decrypt · AES-192-GCM (Managed) | 128KB | 709,900.30 ns | 2,517.967 ns | 2,355.308 ns | - |
| Encrypt · AES-192-GCM (OS) | 128KB | 10,228.13 ns | 48.590 ns | 45.451 ns | - |
| Encrypt · AES-192-GCM (AES-NI+PClMulV256) | 128KB | 18,451.56 ns | 82.729 ns | 77.384 ns | - |
| Encrypt · AES-192-GCM (AES-NI+PClMul) | 128KB | 21,223.44 ns | 154.737 ns | 144.741 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 128KB | 465,860.25 ns | 2,875.950 ns | 2,690.165 ns | 1712 B |
| Encrypt · AES-192-GCM (Managed) | 128KB | 725,079.20 ns | 2,145.340 ns | 1,901.788 ns | - |
AES-256-GCM
AES-256-GCM uses 14 rounds (vs 10 for AES-128), adding ~20-30% overhead per block. The same 4-tier acceleration architecture (V256 → PClMul → AES-NI → Managed) applies. The V256 stitched decrypt path achieves near-parity with OS at 1 KiB (1.00×) and stays within 1.24× at 128 KiB. Encrypt is ~2× faster than OS at 128 B due to the lagged GHASH pipeline and zero P/Invoke overhead. The remaining gap at large sizes is primarily due to OS/SymCrypt using VAES (256-bit AES round instructions) which CryptoHives does not yet use.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-256-GCM (AES-NI+PClMulV256) | 17B | 108.72 ns | 0.249 ns | 0.233 ns | - |
| Decrypt · AES-256-GCM (AES-NI+PClMul) | 17B | 109.00 ns | 0.546 ns | 0.456 ns | - |
| Decrypt · AES-256-GCM (OS) | 17B | 122.81 ns | 0.686 ns | 0.642 ns | - |
| Decrypt · AES-256-GCM (Managed) | 17B | 382.74 ns | 1.482 ns | 1.237 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 17B | 644.88 ns | 3.685 ns | 3.447 ns | 1832 B |
| Encrypt · AES-256-GCM (AES-NI+PClMul) | 17B | 68.81 ns | 0.122 ns | 0.108 ns | - |
| Encrypt · AES-256-GCM (AES-NI+PClMulV256) | 17B | 69.39 ns | 0.136 ns | 0.121 ns | - |
| Encrypt · AES-256-GCM (OS) | 17B | 130.51 ns | 0.584 ns | 0.546 ns | - |
| Encrypt · AES-256-GCM (Managed) | 17B | 345.12 ns | 1.129 ns | 1.056 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 17B | 600.85 ns | 3.506 ns | 3.280 ns | 1816 B |
| Decrypt · AES-256-GCM (AES-NI+PClMul) | 65B | 109.08 ns | 0.549 ns | 0.513 ns | - |
| Decrypt · AES-256-GCM (AES-NI+PClMulV256) | 65B | 109.20 ns | 0.213 ns | 0.189 ns | - |
| Decrypt · AES-256-GCM (OS) | 65B | 126.78 ns | 0.585 ns | 0.547 ns | - |
| Decrypt · AES-256-GCM (Managed) | 65B | 662.58 ns | 4.395 ns | 4.112 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 65B | 884.33 ns | 3.945 ns | 3.690 ns | 1832 B |
| Encrypt · AES-256-GCM (AES-NI+PClMulV256) | 65B | 77.05 ns | 0.135 ns | 0.105 ns | - |
| Encrypt · AES-256-GCM (AES-NI+PClMul) | 65B | 77.37 ns | 0.222 ns | 0.197 ns | - |
| Encrypt · AES-256-GCM (OS) | 65B | 131.87 ns | 0.591 ns | 0.553 ns | - |
| Encrypt · AES-256-GCM (Managed) | 65B | 626.16 ns | 3.679 ns | 3.441 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 65B | 795.94 ns | 3.402 ns | 3.182 ns | 1816 B |
| Decrypt · AES-256-GCM (AES-NI+PClMulV256) | 128B | 107.19 ns | 0.540 ns | 0.479 ns | - |
| Decrypt · AES-256-GCM (AES-NI+PClMul) | 128B | 108.44 ns | 0.309 ns | 0.274 ns | - |
| Decrypt · AES-256-GCM (OS) | 128B | 121.92 ns | 0.490 ns | 0.434 ns | - |
| Decrypt · AES-256-GCM (Managed) | 128B | 948.20 ns | 5.615 ns | 5.252 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 128B | 1,056.85 ns | 6.403 ns | 5.676 ns | 1832 B |
| Encrypt · AES-256-GCM (AES-NI+PClMulV256) | 128B | 63.54 ns | 0.213 ns | 0.189 ns | - |
| Encrypt · AES-256-GCM (AES-NI+PClMul) | 128B | 71.46 ns | 0.224 ns | 0.209 ns | - |
| Encrypt · AES-256-GCM (OS) | 128B | 123.50 ns | 0.612 ns | 0.573 ns | - |
| Encrypt · AES-256-GCM (Managed) | 128B | 904.82 ns | 3.732 ns | 3.308 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 128B | 1,012.70 ns | 4.668 ns | 4.366 ns | 1816 B |
| Decrypt · AES-256-GCM (AES-NI+PClMul) | 152B | 131.08 ns | 0.858 ns | 0.802 ns | - |
| Decrypt · AES-256-GCM (AES-NI+PClMulV256) | 152B | 131.99 ns | 0.787 ns | 0.698 ns | - |
| Decrypt · AES-256-GCM (OS) | 152B | 142.00 ns | 0.315 ns | 0.280 ns | - |
| Decrypt · AES-256-GCM (Managed) | 152B | 1,127.07 ns | 6.519 ns | 6.098 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 152B | 1,214.17 ns | 9.240 ns | 8.191 ns | 1832 B |
| Encrypt · AES-256-GCM (AES-NI+PClMulV256) | 152B | 92.68 ns | 0.178 ns | 0.158 ns | - |
| Encrypt · AES-256-GCM (AES-NI+PClMul) | 152B | 94.24 ns | 0.395 ns | 0.370 ns | - |
| Encrypt · AES-256-GCM (OS) | 152B | 148.24 ns | 0.851 ns | 0.796 ns | - |
| Encrypt · AES-256-GCM (Managed) | 152B | 1,096.46 ns | 5.063 ns | 4.228 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 152B | 1,122.65 ns | 6.721 ns | 6.287 ns | 1816 B |
| Decrypt · AES-256-GCM (AES-NI+PClMulV256) | 256B | 122.88 ns | 1.284 ns | 1.201 ns | - |
| Decrypt · AES-256-GCM (AES-NI+PClMul) | 256B | 131.46 ns | 0.889 ns | 0.832 ns | - |
| Decrypt · AES-256-GCM (OS) | 256B | 134.13 ns | 0.575 ns | 0.509 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 256B | 1,590.66 ns | 6.639 ns | 5.885 ns | 1832 B |
| Decrypt · AES-256-GCM (Managed) | 256B | 1,698.82 ns | 11.399 ns | 10.663 ns | - |
| Encrypt · AES-256-GCM (AES-NI+PClMulV256) | 256B | 83.18 ns | 0.284 ns | 0.237 ns | - |
| Encrypt · AES-256-GCM (AES-NI+PClMul) | 256B | 88.55 ns | 0.305 ns | 0.286 ns | - |
| Encrypt · AES-256-GCM (OS) | 256B | 127.40 ns | 0.420 ns | 0.372 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 256B | 1,481.70 ns | 5.179 ns | 4.591 ns | 1816 B |
| Encrypt · AES-256-GCM (Managed) | 256B | 1,650.46 ns | 7.357 ns | 6.144 ns | - |
| Decrypt · AES-256-GCM (AES-NI+PClMulV256) | 1KB | 218.63 ns | 0.611 ns | 0.541 ns | - |
| Decrypt · AES-256-GCM (OS) | 1KB | 220.69 ns | 0.655 ns | 0.613 ns | - |
| Decrypt · AES-256-GCM (AES-NI+PClMul) | 1KB | 267.59 ns | 0.883 ns | 0.783 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 1KB | 4,687.06 ns | 27.773 ns | 25.979 ns | 1832 B |
| Decrypt · AES-256-GCM (Managed) | 1KB | 6,157.86 ns | 22.056 ns | 20.631 ns | - |
| Encrypt · AES-256-GCM (OS) | 1KB | 183.41 ns | 0.504 ns | 0.472 ns | - |
| Encrypt · AES-256-GCM (AES-NI+PClMulV256) | 1KB | 201.37 ns | 0.787 ns | 0.736 ns | - |
| Encrypt · AES-256-GCM (AES-NI+PClMul) | 1KB | 223.81 ns | 0.669 ns | 0.593 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 1KB | 4,614.02 ns | 32.285 ns | 28.619 ns | 1816 B |
| Encrypt · AES-256-GCM (Managed) | 1KB | 6,118.87 ns | 27.491 ns | 25.715 ns | - |
| Decrypt · AES-256-GCM (OS) | 8KB | 936.73 ns | 3.258 ns | 3.047 ns | - |
| Decrypt · AES-256-GCM (AES-NI+PClMulV256) | 8KB | 1,168.22 ns | 5.238 ns | 4.644 ns | - |
| Decrypt · AES-256-GCM (AES-NI+PClMul) | 8KB | 1,554.83 ns | 8.923 ns | 8.347 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 8KB | 33,577.14 ns | 190.100 ns | 168.518 ns | 1832 B |
| Decrypt · AES-256-GCM (Managed) | 8KB | 47,817.30 ns | 235.168 ns | 208.470 ns | - |
| Encrypt · AES-256-GCM (OS) | 8KB | 729.36 ns | 5.497 ns | 5.142 ns | - |
| Encrypt · AES-256-GCM (AES-NI+PClMulV256) | 8KB | 1,305.50 ns | 5.511 ns | 4.602 ns | - |
| Encrypt · AES-256-GCM (AES-NI+PClMul) | 8KB | 1,477.81 ns | 4.976 ns | 4.654 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 8KB | 33,284.19 ns | 149.876 ns | 132.861 ns | 1816 B |
| Encrypt · AES-256-GCM (Managed) | 8KB | 47,789.42 ns | 95.041 ns | 84.251 ns | - |
| Decrypt · AES-256-GCM (OS) | 128KB | 14,265.22 ns | 47.633 ns | 44.556 ns | - |
| Decrypt · AES-256-GCM (AES-NI+PClMulV256) | 128KB | 17,535.12 ns | 220.158 ns | 205.936 ns | - |
| Decrypt · AES-256-GCM (AES-NI+PClMul) | 128KB | 23,406.55 ns | 129.596 ns | 114.883 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 128KB | 530,424.36 ns | 4,877.317 ns | 4,562.246 ns | 1832 B |
| Decrypt · AES-256-GCM (Managed) | 128KB | 765,189.91 ns | 3,844.945 ns | 3,408.444 ns | - |
| Encrypt · AES-256-GCM (OS) | 128KB | 10,592.03 ns | 68.171 ns | 63.767 ns | - |
| Encrypt · AES-256-GCM (AES-NI+PClMulV256) | 128KB | 20,264.57 ns | 71.643 ns | 59.825 ns | - |
| Encrypt · AES-256-GCM (AES-NI+PClMul) | 128KB | 23,047.55 ns | 195.404 ns | 182.781 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 128KB | 526,690.43 ns | 2,718.741 ns | 2,543.112 ns | 1816 B |
| Encrypt · AES-256-GCM (Managed) | 128KB | 764,229.17 ns | 2,364.723 ns | 2,211.963 ns | - |
AES-128-CCM
AES-CCM (Counter with CBC-MAC) combines CTR mode encryption with CBC-MAC authentication. Unlike GCM, CCM requires two sequential passes (encrypt + MAC or MAC + decrypt), making it inherently less parallelizable. It is widely used in IoT protocols (Bluetooth LE, ZigBee, Thread) and supports variable nonce (7–13 bytes) and tag sizes (4–16 bytes). Two acceleration tiers are available:
- AES-NI: Hardware
AESENCinstructions for all block operations — counter-mode encryption, CBC-MAC computation, and AAD processing. UsesVector128<byte>round keys viaMemoryMarshal.Castfrom the shareduint[]key schedule. Dispatched via_useAesNibool flag. - Managed: T-table AES for all block operations. Fully portable, zero-allocation.
Key observations:
- AES-NI: ~3× faster than Managed at 128 KiB, zero allocation
- Managed: T-table AES, outperforms BouncyCastle by ~15-20%
- BouncyCastle: Allocates ~2.4 KB per call
- No OS adapter available for comparison (System.Security.Cryptography does not expose AES-CCM on all platforms)
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-128-CCM (AES-NI) | 128B | 400.5 ns | 0.98 ns | 0.92 ns | - |
| Decrypt · AES-128-CCM (Managed) | 128B | 992.3 ns | 6.26 ns | 5.55 ns | - |
| Decrypt · AES-128-CCM (BouncyCastle) | 128B | 1,598.8 ns | 13.94 ns | 13.04 ns | 2424 B |
| Encrypt · AES-128-CCM (AES-NI) | 128B | 345.1 ns | 0.78 ns | 0.73 ns | - |
| Encrypt · AES-128-CCM (Managed) | 128B | 957.3 ns | 6.80 ns | 6.36 ns | - |
| Encrypt · AES-128-CCM (BouncyCastle) | 128B | 1,575.8 ns | 10.40 ns | 9.73 ns | 2464 B |
| Decrypt · AES-128-CCM (AES-NI) | 1KB | 2,287.3 ns | 3.87 ns | 3.43 ns | - |
| Decrypt · AES-128-CCM (Managed) | 1KB | 6,625.7 ns | 32.52 ns | 30.42 ns | - |
| Decrypt · AES-128-CCM (BouncyCastle) | 1KB | 8,013.6 ns | 15.61 ns | 13.03 ns | 2424 B |
| Encrypt · AES-128-CCM (AES-NI) | 1KB | 2,242.2 ns | 4.89 ns | 4.34 ns | - |
| Encrypt · AES-128-CCM (Managed) | 1KB | 6,225.2 ns | 21.61 ns | 18.05 ns | - |
| Encrypt · AES-128-CCM (BouncyCastle) | 1KB | 7,962.1 ns | 36.31 ns | 28.35 ns | 2464 B |
| Decrypt · AES-128-CCM (AES-NI) | 8KB | 17,431.8 ns | 48.56 ns | 45.43 ns | - |
| Decrypt · AES-128-CCM (Managed) | 8KB | 48,366.1 ns | 246.63 ns | 218.63 ns | - |
| Decrypt · AES-128-CCM (BouncyCastle) | 8KB | 59,152.2 ns | 279.28 ns | 247.57 ns | 2424 B |
| Encrypt · AES-128-CCM (AES-NI) | 8KB | 17,368.0 ns | 47.69 ns | 44.61 ns | - |
| Encrypt · AES-128-CCM (Managed) | 8KB | 48,453.4 ns | 158.32 ns | 140.34 ns | - |
| Encrypt · AES-128-CCM (BouncyCastle) | 8KB | 59,074.0 ns | 180.39 ns | 168.74 ns | 2464 B |
| Decrypt · AES-128-CCM (AES-NI) | 128KB | 276,834.7 ns | 477.47 ns | 446.62 ns | - |
| Decrypt · AES-128-CCM (Managed) | 128KB | 772,132.2 ns | 3,646.06 ns | 3,410.53 ns | - |
| Decrypt · AES-128-CCM (BouncyCastle) | 128KB | 932,033.8 ns | 2,603.81 ns | 2,308.21 ns | 2424 B |
| Encrypt · AES-128-CCM (AES-NI) | 128KB | 276,979.9 ns | 358.57 ns | 299.42 ns | - |
| Encrypt · AES-128-CCM (Managed) | 128KB | 772,513.1 ns | 3,687.35 ns | 3,268.74 ns | - |
| Encrypt · AES-128-CCM (BouncyCastle) | 128KB | 936,123.7 ns | 4,669.35 ns | 4,367.71 ns | 2464 B |
AES-256-CCM
AES-256-CCM uses 14 rounds (vs 10 for AES-128). The same AES-NI / Managed dispatch applies. The additional rounds add ~25-30% overhead.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-256-CCM (AES-NI) | 128B | 443.7 ns | 0.71 ns | 0.59 ns | - |
| Decrypt · AES-256-CCM (Managed) | 128B | 1,252.7 ns | 4.40 ns | 4.12 ns | - |
| Decrypt · AES-256-CCM (BouncyCastle) | 128B | 1,954.9 ns | 9.34 ns | 8.28 ns | 2808 B |
| Encrypt · AES-256-CCM (AES-NI) | 128B | 409.6 ns | 0.69 ns | 0.65 ns | - |
| Encrypt · AES-256-CCM (Managed) | 128B | 1,213.6 ns | 3.42 ns | 2.86 ns | - |
| Encrypt · AES-256-CCM (BouncyCastle) | 128B | 1,916.6 ns | 14.40 ns | 13.47 ns | 2848 B |
| Decrypt · AES-256-CCM (AES-NI) | 1KB | 2,709.6 ns | 6.36 ns | 5.95 ns | - |
| Decrypt · AES-256-CCM (Managed) | 1KB | 7,991.9 ns | 34.96 ns | 32.71 ns | - |
| Decrypt · AES-256-CCM (BouncyCastle) | 1KB | 10,110.7 ns | 58.75 ns | 54.95 ns | 2808 B |
| Encrypt · AES-256-CCM (AES-NI) | 1KB | 2,674.9 ns | 5.33 ns | 4.73 ns | - |
| Encrypt · AES-256-CCM (Managed) | 1KB | 7,949.5 ns | 17.26 ns | 14.42 ns | - |
| Encrypt · AES-256-CCM (BouncyCastle) | 1KB | 10,052.9 ns | 48.15 ns | 42.68 ns | 2848 B |
| Decrypt · AES-256-CCM (AES-NI) | 8KB | 20,836.2 ns | 23.93 ns | 19.98 ns | - |
| Decrypt · AES-256-CCM (Managed) | 8KB | 61,844.9 ns | 436.10 ns | 386.59 ns | - |
| Decrypt · AES-256-CCM (BouncyCastle) | 8KB | 74,781.8 ns | 352.01 ns | 329.27 ns | 2808 B |
| Encrypt · AES-256-CCM (AES-NI) | 8KB | 20,798.7 ns | 34.71 ns | 32.47 ns | - |
| Encrypt · AES-256-CCM (Managed) | 8KB | 61,876.2 ns | 303.40 ns | 253.35 ns | - |
| Encrypt · AES-256-CCM (BouncyCastle) | 8KB | 74,650.1 ns | 338.46 ns | 316.60 ns | 2848 B |
| Decrypt · AES-256-CCM (AES-NI) | 128KB | 331,843.4 ns | 819.39 ns | 766.46 ns | - |
| Decrypt · AES-256-CCM (Managed) | 128KB | 981,880.8 ns | 3,010.77 ns | 2,816.27 ns | - |
| Decrypt · AES-256-CCM (BouncyCastle) | 128KB | 1,183,622.0 ns | 5,562.39 ns | 5,203.06 ns | 2808 B |
| Encrypt · AES-256-CCM (AES-NI) | 128KB | 331,552.7 ns | 600.17 ns | 501.17 ns | - |
| Encrypt · AES-256-CCM (Managed) | 128KB | 985,507.2 ns | 5,331.94 ns | 4,726.63 ns | - |
| Encrypt · AES-256-CCM (BouncyCastle) | 128KB | 1,181,070.8 ns | 4,758.17 ns | 3,973.29 ns | 2848 B |
ChaCha20-Poly1305
ChaCha20-Poly1305 is a software-friendly AEAD cipher (RFC 8439) that combines ChaCha20 stream encryption with Poly1305 MAC authentication. It is the recommended AEAD cipher when hardware AES acceleration is unavailable. Three acceleration tiers are available:
- AVX2: Dual-block ChaCha20 encryption via
Vector256<uint>, combined with Poly1305 donna-64 MAC (3×44-bit limbs, 9 multiplications per 16-byte block usingMath.BigMul). ~1.7× faster than OS at 128 KiB. - SSSE3: Single-block ChaCha20 via
Vector128<uint>with the same Poly1305 donna-64 MAC. ~12% faster than OS at 128 KiB. - Managed: Scalar ChaCha20 + Poly1305 donna-32 (5×26-bit limbs, 25 multiplications per block on .NET Framework / .NET Standard). Fully portable.
Key observations:
- AVX2 ~40% faster than OS at 128 KiB; SSSE3 ~12% faster than OS
- At smaller sizes (128 B), OS is competitive due to lower per-call overhead
- Managed, SSSE3, and AVX2 paths are zero-allocation
- BouncyCastle allocates 336–416 B per call; NaCl.Core allocates 48–72 B per call
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · ChaCha20-Poly1305 (OS) | 128B | 342.9 ns | 2.01 ns | 1.88 ns | - |
| Decrypt · ChaCha20-Poly1305 (AVX2) | 128B | 484.4 ns | 2.05 ns | 1.92 ns | - |
| Decrypt · ChaCha20-Poly1305 (SSSE3) | 128B | 537.9 ns | 1.59 ns | 1.48 ns | - |
| Decrypt · ChaCha20-Poly1305 (NaCl.Core) | 128B | 580.0 ns | 2.76 ns | 2.58 ns | 48 B |
| Decrypt · ChaCha20-Poly1305 (BouncyCastle) | 128B | 706.4 ns | 2.50 ns | 2.34 ns | 416 B |
| Decrypt · ChaCha20-Poly1305 (Managed) | 128B | 857.5 ns | 1.64 ns | 1.54 ns | - |
| Encrypt · ChaCha20-Poly1305 (OS) | 128B | 345.3 ns | 1.94 ns | 1.81 ns | - |
| Encrypt · ChaCha20-Poly1305 (BouncyCastle) | 128B | 414.3 ns | 1.39 ns | 1.30 ns | 336 B |
| Encrypt · ChaCha20-Poly1305 (AVX2) | 128B | 432.2 ns | 1.11 ns | 0.98 ns | - |
| Encrypt · ChaCha20-Poly1305 (SSSE3) | 128B | 492.3 ns | 0.86 ns | 0.80 ns | - |
| Encrypt · ChaCha20-Poly1305 (NaCl.Core) | 128B | 534.7 ns | 1.08 ns | 0.96 ns | 48 B |
| Encrypt · ChaCha20-Poly1305 (Managed) | 128B | 815.6 ns | 2.51 ns | 2.35 ns | - |
| Decrypt · ChaCha20-Poly1305 (AVX2) | 1KB | 1,321.4 ns | 4.15 ns | 3.88 ns | - |
| Decrypt · ChaCha20-Poly1305 (BouncyCastle) | 1KB | 1,707.0 ns | 5.67 ns | 5.31 ns | 416 B |
| Decrypt · ChaCha20-Poly1305 (OS) | 1KB | 1,761.9 ns | 5.32 ns | 4.98 ns | - |
| Decrypt · ChaCha20-Poly1305 (SSSE3) | 1KB | 1,786.6 ns | 4.79 ns | 4.49 ns | - |
| Decrypt · ChaCha20-Poly1305 (NaCl.Core) | 1KB | 2,558.2 ns | 12.78 ns | 11.33 ns | 72 B |
| Decrypt · ChaCha20-Poly1305 (Managed) | 1KB | 4,361.5 ns | 17.12 ns | 16.01 ns | - |
| Encrypt · ChaCha20-Poly1305 (AVX2) | 1KB | 1,277.0 ns | 2.91 ns | 2.72 ns | - |
| Encrypt · ChaCha20-Poly1305 (BouncyCastle) | 1KB | 1,407.6 ns | 6.31 ns | 5.90 ns | 336 B |
| Encrypt · ChaCha20-Poly1305 (SSSE3) | 1KB | 1,740.6 ns | 3.75 ns | 3.51 ns | - |
| Encrypt · ChaCha20-Poly1305 (OS) | 1KB | 1,757.5 ns | 4.80 ns | 4.01 ns | - |
| Encrypt · ChaCha20-Poly1305 (NaCl.Core) | 1KB | 2,515.5 ns | 4.03 ns | 3.77 ns | 72 B |
| Encrypt · ChaCha20-Poly1305 (Managed) | 1KB | 4,304.4 ns | 13.47 ns | 12.60 ns | - |
| Decrypt · ChaCha20-Poly1305 (AVX2) | 8KB | 8,040.8 ns | 26.56 ns | 24.84 ns | - |
| Decrypt · ChaCha20-Poly1305 (BouncyCastle) | 8KB | 9,469.8 ns | 45.91 ns | 42.94 ns | 416 B |
| Decrypt · ChaCha20-Poly1305 (SSSE3) | 8KB | 11,700.6 ns | 21.87 ns | 19.39 ns | - |
| Decrypt · ChaCha20-Poly1305 (OS) | 8KB | 13,100.2 ns | 33.91 ns | 30.06 ns | - |
| Decrypt · ChaCha20-Poly1305 (NaCl.Core) | 8KB | 18,339.1 ns | 67.88 ns | 60.17 ns | 72 B |
| Decrypt · ChaCha20-Poly1305 (Managed) | 8KB | 32,291.3 ns | 167.18 ns | 156.38 ns | - |
| Encrypt · ChaCha20-Poly1305 (AVX2) | 8KB | 8,004.1 ns | 16.82 ns | 15.73 ns | - |
| Encrypt · ChaCha20-Poly1305 (BouncyCastle) | 8KB | 9,355.4 ns | 48.10 ns | 42.64 ns | 336 B |
| Encrypt · ChaCha20-Poly1305 (SSSE3) | 8KB | 11,657.5 ns | 13.83 ns | 11.55 ns | - |
| Encrypt · ChaCha20-Poly1305 (OS) | 8KB | 13,077.6 ns | 31.24 ns | 29.22 ns | - |
| Encrypt · ChaCha20-Poly1305 (NaCl.Core) | 8KB | 18,221.4 ns | 36.85 ns | 32.67 ns | 72 B |
| Encrypt · ChaCha20-Poly1305 (Managed) | 8KB | 32,279.0 ns | 84.84 ns | 75.21 ns | - |
| Decrypt · ChaCha20-Poly1305 (AVX2) | 128KB | 123,266.9 ns | 559.46 ns | 523.32 ns | - |
| Decrypt · ChaCha20-Poly1305 (BouncyCastle) | 128KB | 146,214.9 ns | 541.17 ns | 506.21 ns | 416 B |
| Decrypt · ChaCha20-Poly1305 (SSSE3) | 128KB | 181,971.8 ns | 365.01 ns | 341.43 ns | - |
| Decrypt · ChaCha20-Poly1305 (OS) | 128KB | 206,938.3 ns | 453.29 ns | 424.01 ns | - |
| Decrypt · ChaCha20-Poly1305 (NaCl.Core) | 128KB | 294,267.0 ns | 766.28 ns | 679.29 ns | 72 B |
| Decrypt · ChaCha20-Poly1305 (Managed) | 128KB | 511,743.0 ns | 2,026.45 ns | 1,895.54 ns | - |
| Encrypt · ChaCha20-Poly1305 (AVX2) | 128KB | 123,349.9 ns | 476.94 ns | 446.13 ns | - |
| Encrypt · ChaCha20-Poly1305 (BouncyCastle) | 128KB | 148,023.3 ns | 388.90 ns | 344.75 ns | 336 B |
| Encrypt · ChaCha20-Poly1305 (SSSE3) | 128KB | 181,722.8 ns | 382.18 ns | 319.14 ns | - |
| Encrypt · ChaCha20-Poly1305 (OS) | 128KB | 206,895.1 ns | 280.81 ns | 234.49 ns | - |
| Encrypt · ChaCha20-Poly1305 (NaCl.Core) | 128KB | 287,722.4 ns | 1,433.12 ns | 1,340.54 ns | 72 B |
| Encrypt · ChaCha20-Poly1305 (Managed) | 128KB | 511,277.0 ns | 1,459.14 ns | 1,364.88 ns | - |
XChaCha20-Poly1305
XChaCha20-Poly1305 extends ChaCha20-Poly1305 with a 24-byte nonce (vs 12 bytes), making random nonce generation safe against collisions (2⁹² birthday bound vs 2³² for ChaCha20-Poly1305). The implementation prepends an HChaCha20 key derivation step that derives a subkey from the first 16 bytes of the nonce. The same AVX2 / SSSE3 / Managed acceleration tiers apply to the inner ChaCha20-Poly1305 operation.
Key observations:
- Performance nearly identical to ChaCha20-Poly1305 (HChaCha20 adds ~200 ns constant overhead)
- No OS or BouncyCastle implementations available for comparison
- NaCl.Core allocates 48–72 B per call
- Managed, SSSE3, and AVX2 paths are zero-allocation
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · XChaCha20-Poly1305 (AVX2) | 128B | 703.5 ns | 1.32 ns | 1.23 ns | - |
| Decrypt · XChaCha20-Poly1305 (SSSE3) | 128B | 760.0 ns | 2.12 ns | 1.98 ns | - |
| Decrypt · XChaCha20-Poly1305 (NaCl.Core) | 128B | 906.8 ns | 2.78 ns | 2.60 ns | 48 B |
| Decrypt · XChaCha20-Poly1305 (Managed) | 128B | 1,075.9 ns | 4.32 ns | 3.61 ns | - |
| Encrypt · XChaCha20-Poly1305 (AVX2) | 128B | 657.8 ns | 1.28 ns | 1.14 ns | - |
| Encrypt · XChaCha20-Poly1305 (SSSE3) | 128B | 710.5 ns | 2.26 ns | 2.12 ns | - |
| Encrypt · XChaCha20-Poly1305 (NaCl.Core) | 128B | 873.6 ns | 1.55 ns | 1.45 ns | 48 B |
| Encrypt · XChaCha20-Poly1305 (Managed) | 128B | 1,033.9 ns | 2.27 ns | 2.01 ns | - |
| Decrypt · XChaCha20-Poly1305 (AVX2) | 1KB | 1,536.6 ns | 5.78 ns | 4.83 ns | - |
| Decrypt · XChaCha20-Poly1305 (SSSE3) | 1KB | 2,000.7 ns | 6.87 ns | 6.43 ns | - |
| Decrypt · XChaCha20-Poly1305 (NaCl.Core) | 1KB | 4,036.5 ns | 8.34 ns | 7.39 ns | 72 B |
| Decrypt · XChaCha20-Poly1305 (Managed) | 1KB | 4,558.1 ns | 5.82 ns | 4.86 ns | - |
| Encrypt · XChaCha20-Poly1305 (AVX2) | 1KB | 1,493.3 ns | 3.93 ns | 3.67 ns | - |
| Encrypt · XChaCha20-Poly1305 (SSSE3) | 1KB | 1,955.5 ns | 4.97 ns | 4.65 ns | - |
| Encrypt · XChaCha20-Poly1305 (NaCl.Core) | 1KB | 4,003.9 ns | 11.72 ns | 10.39 ns | 72 B |
| Encrypt · XChaCha20-Poly1305 (Managed) | 1KB | 4,516.6 ns | 10.57 ns | 8.83 ns | - |
| Decrypt · XChaCha20-Poly1305 (AVX2) | 8KB | 8,284.5 ns | 31.68 ns | 29.63 ns | - |
| Decrypt · XChaCha20-Poly1305 (SSSE3) | 8KB | 11,922.9 ns | 29.96 ns | 28.03 ns | - |
| Decrypt · XChaCha20-Poly1305 (NaCl.Core) | 8KB | 29,205.6 ns | 48.94 ns | 40.86 ns | 72 B |
| Decrypt · XChaCha20-Poly1305 (Managed) | 8KB | 32,367.0 ns | 90.79 ns | 75.81 ns | - |
| Encrypt · XChaCha20-Poly1305 (AVX2) | 8KB | 8,219.1 ns | 30.35 ns | 28.39 ns | - |
| Encrypt · XChaCha20-Poly1305 (SSSE3) | 8KB | 11,884.7 ns | 30.49 ns | 28.52 ns | - |
| Encrypt · XChaCha20-Poly1305 (NaCl.Core) | 8KB | 28,973.5 ns | 84.13 ns | 65.68 ns | 72 B |
| Encrypt · XChaCha20-Poly1305 (Managed) | 8KB | 32,474.5 ns | 91.77 ns | 81.36 ns | - |
| Decrypt · XChaCha20-Poly1305 (AVX2) | 128KB | 123,746.1 ns | 348.59 ns | 326.07 ns | - |
| Decrypt · XChaCha20-Poly1305 (SSSE3) | 128KB | 182,123.4 ns | 354.64 ns | 331.73 ns | - |
| Decrypt · XChaCha20-Poly1305 (NaCl.Core) | 128KB | 460,058.9 ns | 1,313.45 ns | 1,228.60 ns | 72 B |
| Decrypt · XChaCha20-Poly1305 (Managed) | 128KB | 510,279.3 ns | 1,301.11 ns | 1,217.06 ns | - |
| Encrypt · XChaCha20-Poly1305 (AVX2) | 128KB | 123,615.0 ns | 287.51 ns | 254.87 ns | - |
| Encrypt · XChaCha20-Poly1305 (SSSE3) | 128KB | 182,337.8 ns | 476.24 ns | 445.47 ns | - |
| Encrypt · XChaCha20-Poly1305 (NaCl.Core) | 128KB | 459,045.8 ns | 891.74 ns | 834.13 ns | 72 B |
| Encrypt · XChaCha20-Poly1305 (Managed) | 128KB | 509,389.8 ns | 2,092.12 ns | 1,956.97 ns | - |
Allocation Summary
All CryptoHives cipher implementations achieve zero heap allocation for both encrypt and decrypt operations across all payload sizes. This is critical for high-throughput scenarios such as network packet processing, where GC pressure directly impacts tail latency.
| Implementation | Allocation | Notes |
|---|---|---|
| CryptoHives (all variants) | 0 B | All tiers (Managed, AES-NI, PClMul, V256, SSSE3, AVX2) are zero-allocation at all payload sizes |
| OS (.NET) — GCM / ChaCha20-Poly1305 | 0 B | OS AEAD implementations are zero-allocation |
| OS (.NET) — CBC | 128 B | Fixed P/Invoke marshalling overhead per call, independent of payload size |
| BouncyCastle — CBC | 832–1,024 B | Fixed per-call allocation (832 B for AES-128, 1,024 B for AES-256) |
| BouncyCastle — GCM | 1,608–1,832 B | Fixed per-call allocation (1,608 B for AES-128, 1,832 B for AES-256) |
| BouncyCastle — CCM | 2,424–2,848 B | Fixed per-call allocation (2,424 B for AES-128, 2,848 B for AES-256) |
| BouncyCastle — ChaCha20-Poly1305 | 336–416 B | Varies slightly by payload size |
| BouncyCastle — ChaCha20 | 96 B | Fixed per-call allocation |
| NaCl.Core — ChaCha20 | 24 B | Small fixed allocation |
| NaCl.Core — ChaCha20-Poly1305 / XChaCha20 | 48–72 B | Small allocation, varies by payload size |