macOS Arm64 Apple M4 Cipher Benchmarks
Machine Profile
Machine Specification
The benchmarks were run on the following machine:
BenchmarkDotNet v0.15.8, macOS Tahoe 26.4.1 (25E253) [Darwin 25.4.0]
Apple M4, 1 CPU, 10 logical and 10 physical cores
.NET SDK 10.0.201
[Host] : .NET 10.0.5 (10.0.5, 10.0.526.15411), Arm64 RyuJIT armv8.0-a
.NET 10.0 : .NET 10.0.5 (10.0.5, 10.0.526.15411), Arm64 RyuJIT armv8.0-a
Method=TryComputeHash Job=.NET 10.0 Runtime=.NET 10.0
Toolchain=net10.0
Note: Results are machine-specific and may vary between systems. Run benchmarks locally for your specific hardware.
BenchmarkDotNet measurements for all cipher algorithm implementations in CryptoHives.Foundation.Security.Cryptography. Each algorithm is benchmarked across representative payload sizes (17 bytes through 128 KiB) to capture both latency and throughput characteristics.
Implementation Variants
Each cipher family exposes multiple acceleration tiers. The runtime automatically selects the fastest tier supported by the host CPU via SimdSupport detection. Callers can also force a specific tier through the Create(SimdSupport) factory for testing or compatibility.
AES Family
| Variant | Instructions | .NET Target | When Selected | Description |
|---|---|---|---|---|
| Managed | Scalar | All | No ARM Crypto support | T-table AES using scalar uint arithmetic. Fully portable, zero-allocation. ~10–16× slower than ArmAes depending on mode and payload size. |
| ArmAes | AES (ARM Crypto Ext.) | .NET 8+ | ArmBase.IsSupported |
Hardware AES round instructions (AESD, AESE, AESMC, AESIMC). For CBC, uses 8-block interleaved decrypt for maximum instruction-level parallelism — all 8 plaintext blocks decoded simultaneously via parallel AESD dispatch. For GCM/CCM, accelerates counter-mode encryption and CBC-MAC. Decrypt is ~5.8× faster than OS at 128 B; at bulk sizes Apple CommonCrypto leads via Apple Silicon–specific AES pipelining. |
| ArmAes+ArmPmull | AES + PMULL (ARM Crypto Ext.) | .NET 8+ | AdvSimd.Arm64.IsSupported |
Adds carry-less polynomial multiplication (PMULL/PMULL2) for hardware-accelerated GHASH over GF(2¹²⁸). PMULL operates on 64-bit polynomial operands to produce 128-bit products; PMULL2 reads from the upper halves of 128-bit NEON registers (a free lane-select requiring no additional instruction). Uses the same 8-block stitched AES+GHASH pipeline as the x86 PClMul path. Modular reduction uses a 2-PMULL SymCrypt-style MODREDUCE. Pre-computes Karatsuba cross-term halves for H¹–H⁸ powers. Encrypt: ~120× faster than OS at 17 B; ~14× at 128 B (OS CommonCrypto incurs ~8 μs per-call overhead at these sizes). OS leads at ≥8 KiB due to Apple Silicon–specific bulk AES acceleration. |
ChaCha20 Family
| Variant | Instructions | .NET Target | When Selected | Description |
|---|---|---|---|---|
| Managed | Scalar | All | No NEON support | Quarter-round operations using scalar uint arithmetic. Fully portable. ~4× slower than Neon at all payload sizes. |
| Neon | AdvSIMD (NEON) | .NET 8+ | AdvSimd.IsSupported |
Maps the 4×4 ChaCha state to four Vector128<uint> rows. Uses ARM NEON shift-left, shift-right, and byte-table permute instructions for the 16-bit, 12-bit, 8-bit, and 7-bit rotations. Diagonal rounds use AdvSimd.ExtractVector128 to rotate rows by one element. Processes one 64-byte keystream block per iteration. ~4× faster than Managed; faster than BouncyCastle at all sizes up to 1 KiB; OS leads from ~8 KiB. |
When to Use Each Variant
- Small messages (≤256 B): AES-GCM with ArmAes+ArmPmull eliminates the ~8 μs CommonCrypto per-call overhead entirely — ~120× faster than OS at 17 B encrypt and ~14× at 128 B. ChaCha20-Poly1305 NEON is ~5× faster than OS at 128 B.
- Medium messages (256 B–4 KB): ArmAes+ArmPmull leads through ~1 KiB. ChaCha20-Poly1305 NEON remains competitive at 1 KiB (~1.4× faster than OS). This range covers QUIC (~1.4 KB), WireGuard (~1.4 KB), and IPsec packets.
- Large messages (8 KB–128 KB): Apple CommonCrypto dominates — OS is ~2× faster for AES-GCM and ~1.54× faster for ChaCha20-Poly1305. This is due to Apple Silicon–specific AES/PMULL micro-architectural pipelining that .NET's current ARMv8 paths do not yet fully exploit. This range covers TLS records (1–16 KB) and OPC UA chunks (8 KB default).
- No hardware AES: Use ChaCha20-Poly1305 NEON — it outperforms Managed AES-GCM by 3–10× depending on payload size and is always zero-allocation.
- IoT / constrained devices: AES-CCM with ArmAes provides ~4× speedup over BouncyCastle at 128 KiB. Supports variable nonce (7–13 bytes) and tag sizes.
Highlights
| Family | Leader | Key Insight |
|---|---|---|
| ChaCha20 | Neon | NEON ~4× faster than Managed; faster than BouncyCastle at all sizes up to 1 KiB; zero allocation |
| ChaCha20-Poly1305 | Neon | ~5× faster than OS at 128 B; OS leads at ≥8 KiB; Neon on par with BouncyCastle at 128 KiB; zero allocation |
| XChaCha20-Poly1305 | Neon | ~3.3× faster than Managed at 128 KiB; zero allocation |
| AES-CBC | ArmAes | Decrypt ~5.8× faster than OS at 128 B; OS leads at ≥8 KiB (Apple Silicon bulk path); zero allocation |
| AES-GCM | ArmAes+ArmPmull | ~120× faster than OS encrypt at 17 B; ~14× at 128 B; OS leads at ≥8 KiB; 8-block stitched AES+GHASH pipeline |
| AES-CCM | ArmAes | ~4× faster than BouncyCastle at 128 KiB; zero allocation; no OS adapter available |
Stream Ciphers
ChaCha20
ChaCha20 is a stream cipher designed by Daniel J. Bernstein. Two acceleration tiers are available on ARM:
- Neon: Single-block processing — maps the 4×4 ChaCha state matrix to four
Vector128<uint>rows. Uses ARM NEONvshl/vsri(shift-and-insert) andvtbl(byte-table permute) instructions for the four rotation widths (16-bit, 12-bit, 8-bit, 7-bit). Diagonal rounds useAdvSimd.ExtractVector128to rotate rows by one element. Yields ~750 MB/s throughput at 128 KiB; ~1.24× faster than BouncyCastle. - Managed: Scalar
uintquarter-round arithmetic. Fully portable across all .NET targets. ~4.1× slower than Neon at 128 KiB.
Key observations:
- Neon is the fastest at all sizes; ~1.24× faster than BouncyCastle at 128 KiB; ~1.35× at 1 KiB
- BouncyCastle allocates 96 B per call; NaCl.Core allocates 24 B per call
- Managed and Neon paths are zero-allocation
| Description | TestDataSize | Mean | Error | StdDev | Median | Allocated |
|---|---|---|---|---|---|---|
| Decrypt · ChaCha20 (CryptoHives-Neon) | 128B | 885.6 ns | 0.98 ns | 0.91 ns | 885.4 ns | - |
| Decrypt · ChaCha20 (BouncyCastle) | 128B | 1,472.0 ns | 28.32 ns | 26.49 ns | 1,478.2 ns | 96 B |
| Decrypt · ChaCha20 (NaCl.Core) | 128B | 2,712.0 ns | 1.87 ns | 1.56 ns | 2,711.6 ns | 24 B |
| Decrypt · ChaCha20 (CryptoHives-Scalar) | 128B | 3,487.6 ns | 1.19 ns | 0.99 ns | 3,487.3 ns | - |
| Encrypt · ChaCha20 (CryptoHives-Neon) | 128B | 885.6 ns | 0.56 ns | 0.52 ns | 885.5 ns | - |
| Encrypt · ChaCha20 (BouncyCastle) | 128B | 1,465.2 ns | 20.58 ns | 17.19 ns | 1,467.5 ns | 96 B |
| Encrypt · ChaCha20 (NaCl.Core) | 128B | 2,713.6 ns | 3.25 ns | 3.04 ns | 2,712.2 ns | 24 B |
| Encrypt · ChaCha20 (CryptoHives-Scalar) | 128B | 3,479.6 ns | 1.46 ns | 1.29 ns | 3,479.4 ns | - |
| Decrypt · ChaCha20 (CryptoHives-Neon) | 1KB | 6,969.0 ns | 6.62 ns | 5.87 ns | 6,967.8 ns | - |
| Decrypt · ChaCha20 (BouncyCastle) | 1KB | 8,688.3 ns | 173.89 ns | 374.31 ns | 8,979.5 ns | 96 B |
| Decrypt · ChaCha20 (NaCl.Core) | 1KB | 15,278.2 ns | 1.60 ns | 1.25 ns | 15,277.8 ns | 24 B |
| Decrypt · ChaCha20 (CryptoHives-Scalar) | 1KB | 27,515.6 ns | 23.72 ns | 19.81 ns | 27,508.5 ns | - |
| Encrypt · ChaCha20 (CryptoHives-Neon) | 1KB | 6,971.7 ns | 5.91 ns | 5.24 ns | 6,972.1 ns | - |
| Encrypt · ChaCha20 (BouncyCastle) | 1KB | 8,747.7 ns | 174.74 ns | 352.98 ns | 8,948.1 ns | 96 B |
| Encrypt · ChaCha20 (NaCl.Core) | 1KB | 15,280.2 ns | 3.34 ns | 2.96 ns | 15,279.7 ns | 24 B |
| Encrypt · ChaCha20 (CryptoHives-Scalar) | 1KB | 27,505.9 ns | 11.81 ns | 10.47 ns | 27,502.4 ns | - |
| Decrypt · ChaCha20 (CryptoHives-Neon) | 8KB | 55,596.4 ns | 57.71 ns | 48.19 ns | 55,597.1 ns | - |
| Decrypt · ChaCha20 (BouncyCastle) | 8KB | 63,145.6 ns | 25.00 ns | 22.17 ns | 63,150.0 ns | 96 B |
| Decrypt · ChaCha20 (NaCl.Core) | 8KB | 116,123.6 ns | 68.27 ns | 57.01 ns | 116,113.7 ns | 24 B |
| Decrypt · ChaCha20 (CryptoHives-Scalar) | 8KB | 219,494.8 ns | 59.29 ns | 52.56 ns | 219,487.6 ns | - |
| Encrypt · ChaCha20 (CryptoHives-Neon) | 8KB | 55,590.6 ns | 33.51 ns | 27.99 ns | 55,579.6 ns | - |
| Encrypt · ChaCha20 (BouncyCastle) | 8KB | 64,638.2 ns | 1,286.61 ns | 1,885.90 ns | 63,799.6 ns | 96 B |
| Encrypt · ChaCha20 (NaCl.Core) | 8KB | 116,157.5 ns | 32.48 ns | 28.79 ns | 116,150.5 ns | 24 B |
| Encrypt · ChaCha20 (CryptoHives-Scalar) | 8KB | 219,433.8 ns | 97.01 ns | 81.01 ns | 219,461.3 ns | - |
| Decrypt · ChaCha20 (CryptoHives-Neon) | 128KB | 888,142.6 ns | 290.36 ns | 226.69 ns | 888,114.3 ns | - |
| Decrypt · ChaCha20 (BouncyCastle) | 128KB | 1,007,533.6 ns | 2,335.69 ns | 2,184.80 ns | 1,008,171.6 ns | 96 B |
| Decrypt · ChaCha20 (NaCl.Core) | 128KB | 1,836,344.0 ns | 880.80 ns | 735.51 ns | 1,836,603.8 ns | 24 B |
| Decrypt · ChaCha20 (CryptoHives-Scalar) | 128KB | 3,512,558.6 ns | 2,277.03 ns | 2,129.93 ns | 3,511,660.2 ns | - |
| Encrypt · ChaCha20 (CryptoHives-Neon) | 128KB | 888,519.1 ns | 795.00 ns | 743.64 ns | 888,202.7 ns | - |
| Encrypt · ChaCha20 (BouncyCastle) | 128KB | 1,007,309.4 ns | 1,247.26 ns | 1,166.69 ns | 1,007,598.7 ns | 96 B |
| Encrypt · ChaCha20 (NaCl.Core) | 128KB | 1,839,740.1 ns | 625.78 ns | 522.56 ns | 1,839,583.2 ns | 24 B |
| Encrypt · ChaCha20 (CryptoHives-Scalar) | 128KB | 3,511,311.3 ns | 1,720.62 ns | 1,525.28 ns | 3,510,710.0 ns | - |
Block Ciphers
AES-128-CBC
AES-CBC (Cipher Block Chaining) is the most widely deployed AES mode. Two acceleration tiers are available on Apple M4:
- ArmAes: Uses ARM Cryptography Extension
AESD/AESE/AESMC/AESIMCinstructions. Decrypt uses 8-block interleaving — 8 ciphertext blocks are loaded and decrypted simultaneously via parallelAESDdispatch. Each block decrypts independently, requiring only the preceding ciphertext block as an XOR mask (10 rounds × 8 blocks = 80AESDinstructions in flight). Encrypt remains serial because each plaintext block must be XORed with the previous ciphertext before the nextAESEcan proceed. - Managed: T-table AES using four 256-entry lookup tables per round. Fully portable, zero-allocation. Comparable to BouncyCastle at large sizes.
Key observations:
- ArmAes Decrypt: ~5.8× faster than OS at 128 B; near OS at 4 KiB; OS leads from ~8 KiB (Apple Silicon uses a wider AES pipeline at bulk sizes)
- ArmAes Encrypt: ~1.5× faster than OS at 128 B; OS leads from 1 KiB (CBC encrypt is inherently serial; CommonCrypto uses NEON-assisted interleaving for partial parallelism)
- Managed: Zero-allocation T-table AES; comparable to BouncyCastle at large sizes
- OS: Allocates 72 B per call (P/Invoke marshalling overhead)
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-128-CBC (CryptoHives-ARM-AES) | 128B | 22.59 ns | 0.039 ns | 0.037 ns | - |
| Decrypt · AES-128-CBC (OS) | 128B | 197.10 ns | 0.822 ns | 0.729 ns | 72 B |
| Decrypt · AES-128-CBC (CryptoHives-Scalar) | 128B | 385.61 ns | 0.187 ns | 0.175 ns | - |
| Decrypt · AES-128-CBC (BouncyCastle) | 128B | 615.04 ns | 0.511 ns | 0.478 ns | 832 B |
| Encrypt · AES-128-CBC (CryptoHives-ARM-AES) | 128B | 136.85 ns | 0.727 ns | 0.680 ns | - |
| Encrypt · AES-128-CBC (OS) | 128B | 202.67 ns | 0.612 ns | 0.542 ns | 72 B |
| Encrypt · AES-128-CBC (CryptoHives-Scalar) | 128B | 436.00 ns | 0.121 ns | 0.107 ns | - |
| Encrypt · AES-128-CBC (BouncyCastle) | 128B | 574.91 ns | 0.358 ns | 0.335 ns | 832 B |
| Decrypt · AES-128-CBC (CryptoHives-ARM-AES) | 1KB | 88.84 ns | 0.117 ns | 0.098 ns | - |
| Decrypt · AES-128-CBC (OS) | 1KB | 234.61 ns | 0.825 ns | 0.772 ns | 72 B |
| Decrypt · AES-128-CBC (CryptoHives-Scalar) | 1KB | 2,705.18 ns | 0.601 ns | 0.533 ns | - |
| Decrypt · AES-128-CBC (BouncyCastle) | 1KB | 3,382.96 ns | 4.409 ns | 4.124 ns | 832 B |
| Encrypt · AES-128-CBC (OS) | 1KB | 557.71 ns | 2.487 ns | 2.326 ns | 72 B |
| Encrypt · AES-128-CBC (CryptoHives-ARM-AES) | 1KB | 984.41 ns | 3.148 ns | 2.945 ns | - |
| Encrypt · AES-128-CBC (CryptoHives-Scalar) | 1KB | 3,133.08 ns | 0.250 ns | 0.195 ns | - |
| Encrypt · AES-128-CBC (BouncyCastle) | 1KB | 3,271.53 ns | 0.931 ns | 0.826 ns | 832 B |
| Decrypt · AES-128-CBC (OS) | 8KB | 591.12 ns | 3.471 ns | 3.247 ns | 72 B |
| Decrypt · AES-128-CBC (CryptoHives-ARM-AES) | 8KB | 627.20 ns | 1.189 ns | 1.112 ns | - |
| Decrypt · AES-128-CBC (CryptoHives-Scalar) | 8KB | 21,309.84 ns | 33.386 ns | 29.596 ns | - |
| Decrypt · AES-128-CBC (BouncyCastle) | 8KB | 25,321.70 ns | 43.447 ns | 40.641 ns | 832 B |
| Encrypt · AES-128-CBC (OS) | 8KB | 3,284.32 ns | 8.210 ns | 7.679 ns | 72 B |
| Encrypt · AES-128-CBC (CryptoHives-ARM-AES) | 8KB | 9,902.54 ns | 49.818 ns | 46.600 ns | - |
| Encrypt · AES-128-CBC (BouncyCastle) | 8KB | 24,664.10 ns | 3.629 ns | 3.217 ns | 832 B |
| Encrypt · AES-128-CBC (CryptoHives-Scalar) | 8KB | 24,696.69 ns | 3.173 ns | 2.968 ns | - |
| Decrypt · AES-128-CBC (OS) | 128KB | 6,686.33 ns | 60.404 ns | 53.546 ns | 72 B |
| Decrypt · AES-128-CBC (CryptoHives-ARM-AES) | 128KB | 9,844.99 ns | 4.951 ns | 4.631 ns | - |
| Decrypt · AES-128-CBC (CryptoHives-Scalar) | 128KB | 341,916.59 ns | 89.519 ns | 83.736 ns | - |
| Decrypt · AES-128-CBC (BouncyCastle) | 128KB | 402,781.64 ns | 1,109.311 ns | 983.375 ns | 832 B |
| Encrypt · AES-128-CBC (OS) | 128KB | 50,596.49 ns | 26.602 ns | 23.582 ns | 72 B |
| Encrypt · AES-128-CBC (CryptoHives-ARM-AES) | 128KB | 123,352.94 ns | 940.343 ns | 879.597 ns | - |
| Encrypt · AES-128-CBC (BouncyCastle) | 128KB | 394,324.31 ns | 81.897 ns | 68.387 ns | 832 B |
| Encrypt · AES-128-CBC (CryptoHives-Scalar) | 128KB | 394,511.75 ns | 103.156 ns | 96.492 ns | - |
AES-256-CBC
AES-256-CBC uses 14 rounds (vs 10 for AES-128), adding ~25-30% overhead. The same 8-block interleaved decrypt and serial encrypt architecture applies via ArmAes. Decrypt is ~1.65× faster than OS at 128 B; OS leads from ~8 KiB. Encrypt is slower than OS from 1 KiB (serial CBC encrypt bottleneck on Apple Silicon).
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-256-CBC (CryptoHives-ARM-AES) | 128B | 24.89 ns | 0.022 ns | 0.018 ns | - |
| Decrypt · AES-256-CBC (OS) | 128B | 226.71 ns | 1.738 ns | 1.452 ns | 72 B |
| Decrypt · AES-256-CBC (CryptoHives-Scalar) | 128B | 519.76 ns | 0.421 ns | 0.328 ns | - |
| Decrypt · AES-256-CBC (BouncyCastle) | 128B | 796.15 ns | 2.271 ns | 1.773 ns | 1024 B |
| Encrypt · AES-256-CBC (CryptoHives-ARM-AES) | 128B | 152.05 ns | 0.659 ns | 0.617 ns | - |
| Encrypt · AES-256-CBC (OS) | 128B | 254.59 ns | 0.770 ns | 0.683 ns | 72 B |
| Encrypt · AES-256-CBC (CryptoHives-Scalar) | 128B | 569.41 ns | 0.174 ns | 0.162 ns | - |
| Encrypt · AES-256-CBC (BouncyCastle) | 128B | 739.33 ns | 0.335 ns | 0.261 ns | 1024 B |
| Decrypt · AES-256-CBC (CryptoHives-ARM-AES) | 1KB | 103.67 ns | 1.002 ns | 0.888 ns | - |
| Decrypt · AES-256-CBC (OS) | 1KB | 277.33 ns | 1.808 ns | 1.510 ns | 72 B |
| Decrypt · AES-256-CBC (CryptoHives-Scalar) | 1KB | 3,665.74 ns | 0.654 ns | 0.546 ns | - |
| Decrypt · AES-256-CBC (BouncyCastle) | 1KB | 4,431.11 ns | 4.299 ns | 3.811 ns | 1024 B |
| Encrypt · AES-256-CBC (OS) | 1KB | 725.48 ns | 6.036 ns | 5.041 ns | 72 B |
| Encrypt · AES-256-CBC (CryptoHives-ARM-AES) | 1KB | 1,111.63 ns | 1.337 ns | 1.250 ns | - |
| Encrypt · AES-256-CBC (CryptoHives-Scalar) | 1KB | 4,080.11 ns | 6.491 ns | 5.754 ns | - |
| Encrypt · AES-256-CBC (BouncyCastle) | 1KB | 4,278.77 ns | 1.812 ns | 1.415 ns | 1024 B |
| Decrypt · AES-256-CBC (OS) | 8KB | 718.94 ns | 5.224 ns | 4.631 ns | 72 B |
| Decrypt · AES-256-CBC (CryptoHives-ARM-AES) | 8KB | 757.18 ns | 7.508 ns | 6.656 ns | - |
| Decrypt · AES-256-CBC (CryptoHives-Scalar) | 8KB | 28,852.85 ns | 11.885 ns | 9.279 ns | - |
| Decrypt · AES-256-CBC (BouncyCastle) | 8KB | 33,210.94 ns | 35.105 ns | 32.837 ns | 1024 B |
| Encrypt · AES-256-CBC (OS) | 8KB | 4,427.06 ns | 2.474 ns | 2.193 ns | 72 B |
| Encrypt · AES-256-CBC (CryptoHives-ARM-AES) | 8KB | 8,449.00 ns | 125.718 ns | 104.980 ns | - |
| Encrypt · AES-256-CBC (CryptoHives-Scalar) | 8KB | 32,271.16 ns | 15.984 ns | 14.951 ns | - |
| Encrypt · AES-256-CBC (BouncyCastle) | 8KB | 32,441.36 ns | 9.565 ns | 8.479 ns | 1024 B |
| Decrypt · AES-256-CBC (OS) | 128KB | 8,427.32 ns | 66.965 ns | 55.919 ns | 72 B |
| Decrypt · AES-256-CBC (CryptoHives-ARM-AES) | 128KB | 12,018.59 ns | 19.040 ns | 14.865 ns | - |
| Decrypt · AES-256-CBC (CryptoHives-Scalar) | 128KB | 460,842.07 ns | 2,400.630 ns | 2,004.635 ns | - |
| Decrypt · AES-256-CBC (BouncyCastle) | 128KB | 527,934.17 ns | 2,460.032 ns | 2,054.238 ns | 1024 B |
| Encrypt · AES-256-CBC (OS) | 128KB | 69,445.56 ns | 565.076 ns | 528.573 ns | 72 B |
| Encrypt · AES-256-CBC (CryptoHives-ARM-AES) | 128KB | 140,490.51 ns | 1,373.498 ns | 1,146.933 ns | - |
| Encrypt · AES-256-CBC (CryptoHives-Scalar) | 128KB | 515,748.26 ns | 457.872 ns | 357.476 ns | - |
| Encrypt · AES-256-CBC (BouncyCastle) | 128KB | 517,875.88 ns | 244.278 ns | 203.984 ns | 1024 B |
AEAD Ciphers (Authenticated Encryption)
Authenticated Encryption with Associated Data (AEAD) ciphers provide both confidentiality and authenticity in a single operation. All CryptoHives AEAD implementations are zero-allocation.
AES-128-GCM
AES-GCM combines counter-mode AES encryption (GCTR) with GHASH polynomial authentication over GF(2¹²⁸). Two acceleration tiers are available on Apple M4:
- ArmAes+ArmPmull (.NET 8+): Uses ARM Cryptography Extension
AESD/AESEfor counter-mode encryption andPMULL/PMULL2for GHASH polynomial multiplication.PMULLoperates on 64-bit polynomial operands to produce 128-bit products;PMULL2reads from the upper halves of 128-bit NEON registers (a free lane-select requiring no additional instruction). Uses an 8-block stitched loop that interleaves AES rounds with lagged GHASH of the previous 8 blocks. Modular reduction uses a 2-PMULL SymCrypt-styleMODREDUCE. Pre-computes Karatsuba cross-term halves for H¹–H⁸ powers. Small payloads use the non-stitched path (≤8 blocks). ~120× faster than OS encrypt at 17 B; ~14× at 128 B (OS CommonCrypto incurs ~8 μs per-call overhead for small payloads). At bulk sizes (≥8 KiB), Apple CommonCrypto leads — due to Apple Silicon–specific AES pipelining not accessible via the .NET ARM intrinsics layer. - Managed: Scalar T-table AES with 4-bit Shoup table GHASH (16-entry reduction table, byte-by-byte multiplication). Fully portable, zero-allocation.
Key observations:
- ArmAes+ArmPmull: ~120× faster than OS encrypt at 17 B; ~14× at 128 B; ~2.5× at 1 KiB; OS leads from ~4–8 KiB
- ArmAes+ArmPmull at 128 KiB: OS is ~4.8× faster for both encrypt and decrypt
- Managed: Uses 4-bit Shoup table GHASH, T-table AES; zero allocation
- BouncyCastle: Uses ARM AES + PMULL internally on ARM64; allocates ~1.5 KB per call
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 17B | 389.43 ns | 3.280 ns | 3.068 ns | - |
| Decrypt · AES-128-GCM (CryptoHives-Scalar) | 17B | 1,638.59 ns | 4.881 ns | 4.327 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 17B | 2,699.87 ns | 2.009 ns | 1.678 ns | 1536 B |
| Decrypt · AES-128-GCM (OS) | 17B | 8,929.34 ns | 89.738 ns | 79.550 ns | - |
| Encrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 17B | 65.32 ns | 0.204 ns | 0.181 ns | - |
| Encrypt · AES-128-GCM (CryptoHives-Scalar) | 17B | 1,492.51 ns | 4.913 ns | 4.355 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 17B | 2,332.59 ns | 4.123 ns | 3.443 ns | 1520 B |
| Encrypt · AES-128-GCM (OS) | 17B | 7,954.80 ns | 35.601 ns | 31.559 ns | - |
| Decrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 65B | 552.80 ns | 1.273 ns | 1.063 ns | - |
| Decrypt · AES-128-GCM (CryptoHives-Scalar) | 65B | 2,867.56 ns | 3.519 ns | 3.120 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 65B | 3,626.11 ns | 6.691 ns | 6.259 ns | 1536 B |
| Decrypt · AES-128-GCM (OS) | 65B | 8,821.32 ns | 67.076 ns | 59.461 ns | - |
| Encrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 65B | 407.28 ns | 0.229 ns | 0.179 ns | - |
| Encrypt · AES-128-GCM (CryptoHives-Scalar) | 65B | 2,704.24 ns | 3.921 ns | 3.668 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 65B | 3,340.67 ns | 5.560 ns | 5.201 ns | 1520 B |
| Encrypt · AES-128-GCM (OS) | 65B | 7,976.92 ns | 40.196 ns | 31.383 ns | - |
| Decrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 128B | 727.20 ns | 5.243 ns | 4.904 ns | - |
| Decrypt · AES-128-GCM (CryptoHives-Scalar) | 128B | 4,066.16 ns | 4.142 ns | 3.875 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 128B | 4,559.98 ns | 4.211 ns | 3.288 ns | 1536 B |
| Decrypt · AES-128-GCM (OS) | 128B | 8,866.51 ns | 47.055 ns | 41.713 ns | - |
| Encrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 128B | 593.66 ns | 0.250 ns | 0.209 ns | - |
| Encrypt · AES-128-GCM (CryptoHives-Scalar) | 128B | 3,941.76 ns | 2.220 ns | 1.733 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 128B | 4,371.03 ns | 6.346 ns | 5.299 ns | 1520 B |
| Encrypt · AES-128-GCM (OS) | 128B | 8,138.36 ns | 80.187 ns | 75.007 ns | - |
| Decrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 152B | 903.42 ns | 14.675 ns | 13.727 ns | - |
| Decrypt · AES-128-GCM (CryptoHives-Scalar) | 152B | 4,933.29 ns | 4.163 ns | 3.690 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 152B | 5,166.28 ns | 7.061 ns | 6.260 ns | 1536 B |
| Decrypt · AES-128-GCM (OS) | 152B | 8,954.32 ns | 47.750 ns | 42.329 ns | - |
| Encrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 152B | 732.69 ns | 0.223 ns | 0.174 ns | - |
| Encrypt · AES-128-GCM (CryptoHives-Scalar) | 152B | 4,709.54 ns | 7.031 ns | 6.232 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 152B | 4,975.79 ns | 10.718 ns | 9.502 ns | 1520 B |
| Encrypt · AES-128-GCM (OS) | 152B | 8,257.26 ns | 129.869 ns | 121.479 ns | - |
| Decrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 256B | 250.03 ns | 5.113 ns | 6.087 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 256B | 1,474.91 ns | 1.244 ns | 1.039 ns | 1536 B |
| Decrypt · AES-128-GCM (CryptoHives-Scalar) | 256B | 1,578.76 ns | 24.977 ns | 23.364 ns | - |
| Decrypt · AES-128-GCM (OS) | 256B | 1,851.45 ns | 14.041 ns | 12.447 ns | - |
| Encrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 256B | 1,072.68 ns | 0.643 ns | 0.537 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 256B | 6,919.93 ns | 4.011 ns | 3.350 ns | 1520 B |
| Encrypt · AES-128-GCM (CryptoHives-Scalar) | 256B | 7,247.23 ns | 2.920 ns | 2.280 ns | - |
| Encrypt · AES-128-GCM (OS) | 256B | 8,277.33 ns | 124.031 ns | 116.019 ns | - |
| Decrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 1KB | 804.89 ns | 1.422 ns | 1.330 ns | - |
| Decrypt · AES-128-GCM (OS) | 1KB | 2,062.07 ns | 15.769 ns | 13.979 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 1KB | 4,503.30 ns | 2.430 ns | 2.029 ns | 1536 B |
| Decrypt · AES-128-GCM (CryptoHives-Scalar) | 1KB | 5,624.93 ns | 16.460 ns | 12.851 ns | - |
| Encrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 1KB | 4,024.61 ns | 4.514 ns | 3.769 ns | - |
| Encrypt · AES-128-GCM (OS) | 1KB | 8,791.51 ns | 38.908 ns | 32.490 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 1KB | 22,295.05 ns | 13.812 ns | 11.533 ns | 1520 B |
| Encrypt · AES-128-GCM (CryptoHives-Scalar) | 1KB | 25,923.08 ns | 12.712 ns | 9.925 ns | - |
| Decrypt · AES-128-GCM (OS) | 8KB | 2,902.27 ns | 21.412 ns | 17.880 ns | - |
| Decrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 8KB | 6,172.11 ns | 36.613 ns | 34.248 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 8KB | 32,444.51 ns | 11.323 ns | 10.037 ns | 1536 B |
| Decrypt · AES-128-GCM (CryptoHives-Scalar) | 8KB | 43,153.23 ns | 22.199 ns | 17.331 ns | - |
| Encrypt · AES-128-GCM (OS) | 8KB | 13,204.40 ns | 115.049 ns | 107.617 ns | - |
| Encrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 8KB | 31,403.84 ns | 14.656 ns | 11.442 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 8KB | 163,736.64 ns | 566.036 ns | 501.776 ns | 1520 B |
| Encrypt · AES-128-GCM (CryptoHives-Scalar) | 8KB | 202,917.77 ns | 288.201 ns | 255.483 ns | - |
| Decrypt · AES-128-GCM (OS) | 128KB | 19,683.11 ns | 129.639 ns | 121.264 ns | - |
| Decrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 128KB | 101,850.67 ns | 721.677 ns | 639.748 ns | - |
| Decrypt · AES-128-GCM (BouncyCastle) | 128KB | 509,399.33 ns | 200.182 ns | 187.250 ns | 1536 B |
| Decrypt · AES-128-GCM (CryptoHives-Scalar) | 128KB | 686,087.41 ns | 109.640 ns | 97.193 ns | - |
| Encrypt · AES-128-GCM (OS) | 128KB | 92,595.84 ns | 1,003.219 ns | 938.412 ns | - |
| Encrypt · AES-128-GCM (CryptoHives-ARM-AES+PMULL) | 128KB | 503,971.26 ns | 643.796 ns | 502.634 ns | - |
| Encrypt · AES-128-GCM (BouncyCastle) | 128KB | 2,584,321.85 ns | 1,926.173 ns | 1,608.442 ns | 1520 B |
| Encrypt · AES-128-GCM (CryptoHives-Scalar) | 128KB | 3,232,210.80 ns | 1,524.435 ns | 1,190.179 ns | - |
AES-192-GCM
AES-192-GCM uses 12 rounds (vs 10 for AES-128), adding ~10-15% overhead. The same ArmAes+ArmPmull pipeline applies. The performance pattern mirrors AES-128-GCM: dominant over OS at small payloads, OS leads at bulk sizes.
| Description | TestDataSize | Mean | Error | StdDev | Median | Allocated |
|---|---|---|---|---|---|---|
| Decrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 17B | 392.45 ns | 3.110 ns | 2.909 ns | 393.96 ns | - |
| Decrypt · AES-192-GCM (CryptoHives-Scalar) | 17B | 1,758.46 ns | 3.325 ns | 3.110 ns | 1,758.40 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 17B | 2,938.89 ns | 2.402 ns | 2.129 ns | 2,939.04 ns | 1640 B |
| Decrypt · AES-192-GCM (OS) | 17B | 8,777.06 ns | 50.778 ns | 39.644 ns | 8,784.71 ns | - |
| Encrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 17B | 57.95 ns | 1.177 ns | 2.182 ns | 58.07 ns | - |
| Encrypt · AES-192-GCM (CryptoHives-Scalar) | 17B | 369.48 ns | 7.030 ns | 10.945 ns | 375.66 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 17B | 633.13 ns | 12.309 ns | 15.567 ns | 625.88 ns | 1624 B |
| Encrypt · AES-192-GCM (OS) | 17B | 1,945.43 ns | 38.579 ns | 93.173 ns | 1,923.46 ns | - |
| Decrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 65B | 553.25 ns | 3.749 ns | 3.507 ns | 553.77 ns | - |
| Decrypt · AES-192-GCM (CryptoHives-Scalar) | 65B | 3,066.36 ns | 4.271 ns | 3.786 ns | 3,066.78 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 65B | 3,921.21 ns | 2.323 ns | 1.813 ns | 3,921.44 ns | 1640 B |
| Decrypt · AES-192-GCM (OS) | 65B | 8,863.75 ns | 93.957 ns | 87.888 ns | 8,857.16 ns | - |
| Encrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 65B | 93.13 ns | 1.876 ns | 3.832 ns | 93.18 ns | - |
| Encrypt · AES-192-GCM (CryptoHives-Scalar) | 65B | 704.40 ns | 13.883 ns | 23.195 ns | 694.92 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 65B | 928.02 ns | 18.266 ns | 17.086 ns | 927.66 ns | 1624 B |
| Encrypt · AES-192-GCM (OS) | 65B | 1,975.34 ns | 38.805 ns | 58.082 ns | 1,964.51 ns | - |
| Decrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 128B | 742.05 ns | 4.259 ns | 3.557 ns | 742.57 ns | - |
| Decrypt · AES-192-GCM (CryptoHives-Scalar) | 128B | 4,368.73 ns | 6.113 ns | 5.718 ns | 4,368.56 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 128B | 4,996.76 ns | 15.475 ns | 14.475 ns | 4,993.00 ns | 1640 B |
| Decrypt · AES-192-GCM (OS) | 128B | 9,045.71 ns | 149.994 ns | 140.304 ns | 8,985.30 ns | - |
| Encrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 128B | 124.62 ns | 0.125 ns | 0.111 ns | 124.58 ns | - |
| Encrypt · AES-192-GCM (CryptoHives-Scalar) | 128B | 1,069.40 ns | 12.551 ns | 11.740 ns | 1,074.00 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 128B | 1,241.79 ns | 17.618 ns | 16.479 ns | 1,239.67 ns | 1624 B |
| Encrypt · AES-192-GCM (OS) | 128B | 2,136.51 ns | 22.193 ns | 19.674 ns | 2,130.53 ns | - |
| Decrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 152B | 918.21 ns | 10.882 ns | 10.179 ns | 918.23 ns | - |
| Decrypt · AES-192-GCM (CryptoHives-Scalar) | 152B | 5,311.08 ns | 17.759 ns | 16.611 ns | 5,305.41 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 152B | 5,674.82 ns | 15.316 ns | 14.327 ns | 5,669.71 ns | 1640 B |
| Decrypt · AES-192-GCM (OS) | 152B | 8,969.10 ns | 79.931 ns | 74.767 ns | 8,949.62 ns | - |
| Encrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 152B | 164.19 ns | 3.243 ns | 4.952 ns | 165.42 ns | - |
| Encrypt · AES-192-GCM (CryptoHives-Scalar) | 152B | 5,117.36 ns | 15.223 ns | 13.495 ns | 5,123.91 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 152B | 5,457.81 ns | 7.070 ns | 6.267 ns | 5,457.61 ns | 1624 B |
| Encrypt · AES-192-GCM (OS) | 152B | 8,187.88 ns | 82.140 ns | 76.834 ns | 8,206.96 ns | - |
| Decrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 256B | 1,269.01 ns | 13.239 ns | 12.384 ns | 1,261.99 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 256B | 7,635.70 ns | 3.393 ns | 2.833 ns | 7,636.31 ns | 1640 B |
| Decrypt · AES-192-GCM (CryptoHives-Scalar) | 256B | 7,991.88 ns | 6.588 ns | 6.163 ns | 7,989.93 ns | - |
| Decrypt · AES-192-GCM (OS) | 256B | 9,162.01 ns | 59.437 ns | 55.597 ns | 9,152.20 ns | - |
| Encrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 256B | 1,086.05 ns | 0.968 ns | 0.756 ns | 1,086.30 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 256B | 7,593.12 ns | 5.763 ns | 5.108 ns | 7,592.27 ns | 1624 B |
| Encrypt · AES-192-GCM (CryptoHives-Scalar) | 256B | 7,811.69 ns | 5.031 ns | 4.201 ns | 7,810.45 ns | - |
| Encrypt · AES-192-GCM (OS) | 256B | 8,337.81 ns | 69.314 ns | 64.837 ns | 8,349.16 ns | - |
| Decrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 1KB | 854.23 ns | 17.011 ns | 22.120 ns | 860.14 ns | - |
| Decrypt · AES-192-GCM (OS) | 1KB | 1,960.22 ns | 39.188 ns | 40.243 ns | 1,947.46 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 1KB | 5,056.71 ns | 39.262 ns | 32.786 ns | 5,042.00 ns | 1640 B |
| Decrypt · AES-192-GCM (CryptoHives-Scalar) | 1KB | 6,134.89 ns | 73.339 ns | 61.241 ns | 6,102.35 ns | - |
| Encrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 1KB | 4,036.06 ns | 3.988 ns | 3.535 ns | 4,035.07 ns | - |
| Encrypt · AES-192-GCM (OS) | 1KB | 8,762.99 ns | 79.900 ns | 74.739 ns | 8,755.16 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 1KB | 24,634.84 ns | 12.668 ns | 10.578 ns | 24,632.61 ns | 1624 B |
| Encrypt · AES-192-GCM (CryptoHives-Scalar) | 1KB | 28,191.66 ns | 11.438 ns | 10.140 ns | 28,191.53 ns | - |
| Decrypt · AES-192-GCM (OS) | 8KB | 2,955.64 ns | 17.672 ns | 15.666 ns | 2,952.98 ns | - |
| Decrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 8KB | 6,083.40 ns | 4.775 ns | 3.728 ns | 6,084.46 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 8KB | 36,111.73 ns | 17.922 ns | 16.764 ns | 36,117.73 ns | 1640 B |
| Decrypt · AES-192-GCM (CryptoHives-Scalar) | 8KB | 46,917.34 ns | 9.490 ns | 8.412 ns | 46,917.63 ns | - |
| Encrypt · AES-192-GCM (OS) | 8KB | 13,389.90 ns | 71.936 ns | 56.163 ns | 13,401.61 ns | - |
| Encrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 8KB | 31,544.06 ns | 9.340 ns | 7.292 ns | 31,544.54 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 8KB | 181,571.55 ns | 235.114 ns | 196.331 ns | 181,621.22 ns | 1624 B |
| Encrypt · AES-192-GCM (CryptoHives-Scalar) | 8KB | 220,715.77 ns | 117.549 ns | 91.774 ns | 220,697.41 ns | - |
| Decrypt · AES-192-GCM (OS) | 128KB | 19,516.26 ns | 88.794 ns | 83.058 ns | 19,509.62 ns | - |
| Decrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 128KB | 98,492.34 ns | 815.194 ns | 722.648 ns | 98,489.86 ns | - |
| Decrypt · AES-192-GCM (BouncyCastle) | 128KB | 569,394.54 ns | 356.216 ns | 297.456 ns | 569,324.58 ns | 1640 B |
| Decrypt · AES-192-GCM (CryptoHives-Scalar) | 128KB | 747,695.84 ns | 550.855 ns | 459.989 ns | 747,488.81 ns | - |
| Encrypt · AES-192-GCM (OS) | 128KB | 96,747.02 ns | 119.204 ns | 93.067 ns | 96,742.32 ns | - |
| Encrypt · AES-192-GCM (CryptoHives-ARM-AES+PMULL) | 128KB | 504,290.32 ns | 347.637 ns | 308.171 ns | 504,308.11 ns | - |
| Encrypt · AES-192-GCM (BouncyCastle) | 128KB | 2,872,363.98 ns | 1,840.506 ns | 1,536.906 ns | 2,872,103.19 ns | 1624 B |
| Encrypt · AES-192-GCM (CryptoHives-Scalar) | 128KB | 3,519,943.02 ns | 2,355.070 ns | 1,966.590 ns | 3,518,982.10 ns | - |
AES-256-GCM
AES-256-GCM uses 14 rounds (vs 10 for AES-128), adding ~20-30% overhead per block. The same 2-tier architecture (ArmAes+ArmPmull → Managed) applies. Encrypt is ~14–16× faster than OS at 128 B; OS leads from ~4–8 KiB. The large-payload gap mirrors AES-128-GCM — Apple CommonCrypto likely exploits Apple Silicon–specific AES/PMULL execution units that are not yet accessible through the .NET ARMv8 intrinsics layer.
| Description | TestDataSize | Mean | Error | StdDev | Median | Allocated |
|---|---|---|---|---|---|---|
| Decrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 17B | 392.48 ns | 4.065 ns | 3.803 ns | 391.99 ns | - |
| Decrypt · AES-256-GCM (CryptoHives-Scalar) | 17B | 1,848.11 ns | 2.697 ns | 2.522 ns | 1,848.36 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 17B | 3,120.12 ns | 7.685 ns | 6.813 ns | 3,118.81 ns | 1744 B |
| Decrypt · AES-256-GCM (OS) | 17B | 9,008.65 ns | 85.134 ns | 79.634 ns | 9,021.45 ns | - |
| Encrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 17B | 55.91 ns | 0.054 ns | 0.045 ns | 55.92 ns | - |
| Encrypt · AES-256-GCM (CryptoHives-Scalar) | 17B | 360.84 ns | 0.204 ns | 0.191 ns | 360.85 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 17B | 586.76 ns | 0.718 ns | 0.672 ns | 586.60 ns | 1728 B |
| Encrypt · AES-256-GCM (OS) | 17B | 1,739.71 ns | 12.648 ns | 11.831 ns | 1,733.40 ns | - |
| Decrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 65B | 562.57 ns | 2.518 ns | 2.355 ns | 561.77 ns | - |
| Decrypt · AES-256-GCM (CryptoHives-Scalar) | 65B | 3,299.54 ns | 4.311 ns | 3.821 ns | 3,299.90 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 65B | 4,257.31 ns | 4.490 ns | 4.200 ns | 4,258.70 ns | 1744 B |
| Decrypt · AES-256-GCM (OS) | 65B | 9,069.33 ns | 44.404 ns | 41.536 ns | 9,060.56 ns | - |
| Encrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 65B | 87.64 ns | 0.030 ns | 0.028 ns | 87.63 ns | - |
| Encrypt · AES-256-GCM (CryptoHives-Scalar) | 65B | 663.07 ns | 0.171 ns | 0.160 ns | 663.07 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 65B | 840.77 ns | 0.835 ns | 0.697 ns | 840.93 ns | 1728 B |
| Encrypt · AES-256-GCM (OS) | 65B | 1,967.93 ns | 35.177 ns | 80.115 ns | 1,950.23 ns | - |
| Decrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 128B | 768.34 ns | 5.102 ns | 4.773 ns | 767.17 ns | - |
| Decrypt · AES-256-GCM (CryptoHives-Scalar) | 128B | 4,746.61 ns | 5.049 ns | 4.476 ns | 4,745.02 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 128B | 5,417.20 ns | 3.699 ns | 3.279 ns | 5,416.63 ns | 1744 B |
| Decrypt · AES-256-GCM (OS) | 128B | 9,212.45 ns | 106.131 ns | 99.275 ns | 9,235.57 ns | - |
| Encrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 128B | 126.78 ns | 0.120 ns | 0.106 ns | 126.80 ns | - |
| Encrypt · AES-256-GCM (CryptoHives-Scalar) | 128B | 963.68 ns | 0.282 ns | 0.220 ns | 963.66 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 128B | 1,290.47 ns | 25.576 ns | 39.819 ns | 1,295.46 ns | 1728 B |
| Encrypt · AES-256-GCM (OS) | 128B | 1,967.29 ns | 24.487 ns | 20.448 ns | 1,962.75 ns | - |
| Decrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 152B | 938.06 ns | 4.264 ns | 3.329 ns | 938.90 ns | - |
| Decrypt · AES-256-GCM (CryptoHives-Scalar) | 152B | 5,673.72 ns | 4.763 ns | 4.455 ns | 5,672.28 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 152B | 6,151.40 ns | 7.209 ns | 6.744 ns | 6,152.46 ns | 1744 B |
| Decrypt · AES-256-GCM (OS) | 152B | 9,095.95 ns | 55.664 ns | 52.069 ns | 9,098.71 ns | - |
| Encrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 152B | 156.25 ns | 0.152 ns | 0.135 ns | 156.29 ns | - |
| Encrypt · AES-256-GCM (CryptoHives-Scalar) | 152B | 1,360.08 ns | 26.254 ns | 26.961 ns | 1,360.72 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 152B | 1,545.96 ns | 30.671 ns | 31.496 ns | 1,553.18 ns | 1728 B |
| Encrypt · AES-256-GCM (OS) | 152B | 2,099.86 ns | 23.709 ns | 22.178 ns | 2,101.45 ns | - |
| Decrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 256B | 1,297.43 ns | 12.461 ns | 11.656 ns | 1,297.96 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 256B | 1,777.07 ns | 1.360 ns | 1.272 ns | 1,776.82 ns | 1744 B |
| Decrypt · AES-256-GCM (CryptoHives-Scalar) | 256B | 1,840.39 ns | 36.503 ns | 61.985 ns | 1,810.38 ns | - |
| Decrypt · AES-256-GCM (OS) | 256B | 1,972.13 ns | 15.607 ns | 14.599 ns | 1,968.01 ns | - |
| Encrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 256B | 233.89 ns | 0.628 ns | 0.524 ns | 233.80 ns | - |
| Encrypt · AES-256-GCM (CryptoHives-Scalar) | 256B | 2,093.08 ns | 40.431 ns | 53.974 ns | 2,108.54 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 256B | 2,122.70 ns | 41.953 ns | 60.168 ns | 2,111.18 ns | 1728 B |
| Encrypt · AES-256-GCM (OS) | 256B | 2,191.74 ns | 43.816 ns | 38.841 ns | 2,198.54 ns | - |
| Decrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 1KB | 837.58 ns | 8.276 ns | 6.911 ns | 837.76 ns | - |
| Decrypt · AES-256-GCM (OS) | 1KB | 2,099.22 ns | 19.172 ns | 16.995 ns | 2,094.14 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 1KB | 5,522.34 ns | 4.760 ns | 3.975 ns | 5,521.94 ns | 1744 B |
| Decrypt · AES-256-GCM (CryptoHives-Scalar) | 1KB | 6,580.40 ns | 2.054 ns | 1.604 ns | 6,580.24 ns | - |
| Encrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 1KB | 889.83 ns | 17.806 ns | 16.656 ns | 889.44 ns | - |
| Encrypt · AES-256-GCM (OS) | 1KB | 9,040.18 ns | 119.524 ns | 111.802 ns | 8,986.19 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 1KB | 27,106.41 ns | 11.764 ns | 10.429 ns | 27,103.38 ns | 1728 B |
| Encrypt · AES-256-GCM (CryptoHives-Scalar) | 1KB | 30,003.83 ns | 1,111.541 ns | 3,153.258 ns | 30,457.34 ns | - |
| Decrypt · AES-256-GCM (OS) | 8KB | 3,099.95 ns | 13.097 ns | 12.251 ns | 3,095.90 ns | - |
| Decrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 8KB | 6,362.26 ns | 83.183 ns | 69.462 ns | 6,371.20 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 8KB | 40,008.45 ns | 8.312 ns | 6.490 ns | 40,009.26 ns | 1744 B |
| Decrypt · AES-256-GCM (CryptoHives-Scalar) | 8KB | 50,781.50 ns | 35.053 ns | 27.367 ns | 50,771.57 ns | - |
| Encrypt · AES-256-GCM (OS) | 8KB | 13,975.05 ns | 175.416 ns | 164.084 ns | 13,928.83 ns | - |
| Encrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 8KB | 32,094.53 ns | 48.704 ns | 38.025 ns | 32,120.38 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 8KB | 200,098.54 ns | 122.654 ns | 102.422 ns | 200,085.24 ns | 1728 B |
| Encrypt · AES-256-GCM (CryptoHives-Scalar) | 8KB | 238,659.34 ns | 145.934 ns | 129.367 ns | 238,671.31 ns | - |
| Decrypt · AES-256-GCM (OS) | 128KB | 20,784.60 ns | 76.921 ns | 71.952 ns | 20,791.28 ns | - |
| Decrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 128KB | 103,150.76 ns | 472.990 ns | 442.435 ns | 103,087.03 ns | - |
| Decrypt · AES-256-GCM (BouncyCastle) | 128KB | 632,521.95 ns | 243.138 ns | 215.535 ns | 632,514.93 ns | 1744 B |
| Decrypt · AES-256-GCM (CryptoHives-Scalar) | 128KB | 808,029.66 ns | 101.426 ns | 94.874 ns | 807,995.85 ns | - |
| Encrypt · AES-256-GCM (OS) | 128KB | 102,155.07 ns | 597.475 ns | 529.646 ns | 101,828.32 ns | - |
| Encrypt · AES-256-GCM (CryptoHives-ARM-AES+PMULL) | 128KB | 511,507.58 ns | 233.359 ns | 182.192 ns | 511,515.99 ns | - |
| Encrypt · AES-256-GCM (BouncyCastle) | 128KB | 3,181,223.93 ns | 1,684.265 ns | 1,493.057 ns | 3,181,190.92 ns | 1728 B |
| Encrypt · AES-256-GCM (CryptoHives-Scalar) | 128KB | 3,806,613.70 ns | 1,725.147 ns | 1,529.298 ns | 3,805,917.32 ns | - |
AES-128-CCM
AES-CCM (Counter with CBC-MAC) combines CTR mode encryption with CBC-MAC authentication. Unlike GCM, CCM requires two sequential passes (encrypt + MAC or MAC + decrypt), making it inherently less parallelizable. It is widely used in IoT protocols (Bluetooth LE, ZigBee, Thread) and supports variable nonce (7–13 bytes) and tag sizes (4–16 bytes). Two acceleration tiers are available:
- ArmAes: ARM Cryptography Extension
AESD/AESEinstructions for all block operations — counter-mode encryption, CBC-MAC computation, and AAD processing. UsesVector128<byte>round keys viaMemoryMarshal.Castfrom the shareduint[]key schedule. Dispatched via_useAesNibool flag (shared with x86 dispatch; indicates hardware AES availability on any ISA). - Managed: T-table AES for all block operations. Fully portable, zero-allocation.
Key observations:
- ArmAes: ~4× faster than Managed at 128 KiB; ~4.3× faster than BouncyCastle; zero allocation
- Managed: T-table AES; comparable to BouncyCastle at large sizes
- BouncyCastle: Allocates ~2.4–2.5 KB per call
- No OS adapter available for comparison (System.Security.Cryptography does not expose AES-CCM on all platforms)
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · AES-128-CCM (CryptoHives-ARM-AES) | 128B | 273.1 ns | 1.38 ns | 1.07 ns | - |
| Decrypt · AES-128-CCM (CryptoHives-Scalar) | 128B | 955.0 ns | 0.57 ns | 0.44 ns | - |
| Decrypt · AES-128-CCM (BouncyCastle) | 128B | 1,441.3 ns | 2.64 ns | 2.20 ns | 2424 B |
| Encrypt · AES-128-CCM (CryptoHives-ARM-AES) | 128B | 237.8 ns | 4.33 ns | 3.84 ns | - |
| Encrypt · AES-128-CCM (CryptoHives-Scalar) | 128B | 907.0 ns | 2.51 ns | 2.09 ns | - |
| Encrypt · AES-128-CCM (BouncyCastle) | 128B | 1,381.0 ns | 6.24 ns | 5.84 ns | 2464 B |
| Decrypt · AES-128-CCM (CryptoHives-ARM-AES) | 1KB | 1,562.1 ns | 3.02 ns | 2.52 ns | - |
| Decrypt · AES-128-CCM (CryptoHives-Scalar) | 1KB | 5,997.1 ns | 1.73 ns | 1.61 ns | - |
| Decrypt · AES-128-CCM (BouncyCastle) | 1KB | 6,842.8 ns | 2.05 ns | 1.91 ns | 2424 B |
| Encrypt · AES-128-CCM (CryptoHives-ARM-AES) | 1KB | 1,509.6 ns | 9.00 ns | 7.98 ns | - |
| Encrypt · AES-128-CCM (CryptoHives-Scalar) | 1KB | 5,953.8 ns | 1.47 ns | 1.31 ns | - |
| Encrypt · AES-128-CCM (BouncyCastle) | 1KB | 6,562.9 ns | 40.12 ns | 33.50 ns | 2464 B |
| Decrypt · AES-128-CCM (CryptoHives-ARM-AES) | 8KB | 11,859.9 ns | 4.50 ns | 4.21 ns | - |
| Decrypt · AES-128-CCM (CryptoHives-Scalar) | 8KB | 46,249.3 ns | 7.77 ns | 6.89 ns | - |
| Decrypt · AES-128-CCM (BouncyCastle) | 8KB | 50,068.6 ns | 18.44 ns | 17.25 ns | 2424 B |
| Encrypt · AES-128-CCM (CryptoHives-ARM-AES) | 8KB | 11,361.4 ns | 59.40 ns | 55.56 ns | - |
| Encrypt · AES-128-CCM (CryptoHives-Scalar) | 8KB | 45,878.6 ns | 537.76 ns | 419.84 ns | - |
| Encrypt · AES-128-CCM (BouncyCastle) | 8KB | 49,030.8 ns | 174.47 ns | 145.69 ns | 2464 B |
| Decrypt · AES-128-CCM (CryptoHives-ARM-AES) | 128KB | 188,074.5 ns | 317.22 ns | 296.73 ns | - |
| Decrypt · AES-128-CCM (CryptoHives-Scalar) | 128KB | 736,377.3 ns | 117.59 ns | 109.99 ns | - |
| Decrypt · AES-128-CCM (BouncyCastle) | 128KB | 793,130.6 ns | 451.95 ns | 422.76 ns | 2424 B |
| Encrypt · AES-128-CCM (CryptoHives-ARM-AES) | 128KB | 182,973.9 ns | 1,031.92 ns | 965.26 ns | - |
| Encrypt · AES-128-CCM (CryptoHives-Scalar) | 128KB | 726,173.1 ns | 5,011.64 ns | 4,184.95 ns | - |
| Encrypt · AES-128-CCM (BouncyCastle) | 128KB | 799,450.8 ns | 6,038.70 ns | 5,042.59 ns | 2464 B |
AES-256-CCM
AES-256-CCM uses 14 rounds (vs 10 for AES-128). The same ArmAes / Managed dispatch applies. The additional rounds add ~10-15% overhead on the Apple M4.
| Description | TestDataSize | Mean | Error | StdDev | Median | Allocated |
|---|---|---|---|---|---|---|
| Decrypt · AES-256-CCM (CryptoHives-ARM-AES) | 128B | 337.8 ns | 6.49 ns | 5.75 ns | 339.0 ns | - |
| Decrypt · AES-256-CCM (CryptoHives-Scalar) | 128B | 1,437.3 ns | 28.78 ns | 63.78 ns | 1,407.8 ns | - |
| Decrypt · AES-256-CCM (BouncyCastle) | 128B | 2,113.7 ns | 42.01 ns | 118.48 ns | 2,078.6 ns | 2808 B |
| Encrypt · AES-256-CCM (CryptoHives-ARM-AES) | 128B | 272.3 ns | 0.33 ns | 0.31 ns | 272.5 ns | - |
| Encrypt · AES-256-CCM (CryptoHives-Scalar) | 128B | 1,209.3 ns | 0.45 ns | 0.43 ns | 1,209.4 ns | - |
| Encrypt · AES-256-CCM (BouncyCastle) | 128B | 1,759.0 ns | 1.36 ns | 1.27 ns | 1,758.8 ns | 2848 B |
| Decrypt · AES-256-CCM (CryptoHives-ARM-AES) | 1KB | 1,988.8 ns | 39.16 ns | 60.97 ns | 1,964.0 ns | - |
| Decrypt · AES-256-CCM (CryptoHives-Scalar) | 1KB | 9,733.2 ns | 193.41 ns | 180.91 ns | 9,743.4 ns | - |
| Decrypt · AES-256-CCM (BouncyCastle) | 1KB | 10,563.3 ns | 206.98 ns | 383.65 ns | 10,490.3 ns | 2808 B |
| Encrypt · AES-256-CCM (CryptoHives-ARM-AES) | 1KB | 1,716.5 ns | 1.43 ns | 1.34 ns | 1,716.7 ns | - |
| Encrypt · AES-256-CCM (CryptoHives-Scalar) | 1KB | 7,904.6 ns | 1.19 ns | 1.06 ns | 7,904.5 ns | - |
| Encrypt · AES-256-CCM (BouncyCastle) | 1KB | 8,874.2 ns | 3.47 ns | 3.24 ns | 8,873.8 ns | 2848 B |
| Decrypt · AES-256-CCM (CryptoHives-ARM-AES) | 8KB | 14,769.1 ns | 228.42 ns | 368.85 ns | 14,659.5 ns | - |
| Decrypt · AES-256-CCM (CryptoHives-Scalar) | 8KB | 74,334.0 ns | 1,429.61 ns | 1,701.85 ns | 74,752.2 ns | - |
| Decrypt · AES-256-CCM (BouncyCastle) | 8KB | 74,908.0 ns | 619.45 ns | 579.43 ns | 74,797.7 ns | 2808 B |
| Encrypt · AES-256-CCM (CryptoHives-ARM-AES) | 8KB | 13,200.3 ns | 4.76 ns | 3.98 ns | 13,201.8 ns | - |
| Encrypt · AES-256-CCM (CryptoHives-Scalar) | 8KB | 64,871.5 ns | 1,275.19 ns | 2,165.37 ns | 65,645.0 ns | - |
| Encrypt · AES-256-CCM (BouncyCastle) | 8KB | 71,548.6 ns | 1,389.83 ns | 1,300.05 ns | 71,899.2 ns | 2848 B |
| Decrypt · AES-256-CCM (CryptoHives-ARM-AES) | 128KB | 238,909.2 ns | 4,709.22 ns | 8,491.69 ns | 235,102.1 ns | - |
| Decrypt · AES-256-CCM (CryptoHives-Scalar) | 128KB | 1,147,014.8 ns | 22,832.17 ns | 32,745.21 ns | 1,157,165.6 ns | - |
| Decrypt · AES-256-CCM (BouncyCastle) | 128KB | 1,281,403.6 ns | 13,744.75 ns | 12,856.85 ns | 1,276,111.1 ns | 2808 B |
| Encrypt · AES-256-CCM (CryptoHives-ARM-AES) | 128KB | 226,366.6 ns | 1,819.51 ns | 1,612.95 ns | 226,620.7 ns | - |
| Encrypt · AES-256-CCM (CryptoHives-Scalar) | 128KB | 1,083,294.2 ns | 19,719.07 ns | 24,938.32 ns | 1,086,819.6 ns | - |
| Encrypt · AES-256-CCM (BouncyCastle) | 128KB | 1,156,635.3 ns | 17,336.29 ns | 20,637.62 ns | 1,156,524.3 ns | 2848 B |
ChaCha20-Poly1305
ChaCha20-Poly1305 is a software-friendly AEAD cipher (RFC 8439) that combines ChaCha20 stream encryption with Poly1305 MAC authentication. It is the recommended AEAD cipher when hardware AES acceleration is unavailable. Two acceleration tiers are available on ARM:
- Neon: Single-block ChaCha20 via
Vector128<uint>combined with Poly1305 donna-64 MAC (3×44-bit limbs, 9 multiplications per 16-byte block usingMath.BigMul). ~5× faster than OS at 128 B (1.84 μs vs 9.58 μs); competitive with OS at 1 KiB; OS leads from 8 KiB. At 128 KiB, Neon (~1.19 ms) is ~1.54× slower than OS (~0.77 ms) and on par with BouncyCastle (~1.18 ms). A dual-block NEON path (comparable to the x86 AVX2 path) would be required to close this gap. - Managed: Scalar ChaCha20 + Poly1305 donna-32 (5×26-bit limbs, 25 multiplications per block on .NET Framework / .NET Standard). Fully portable.
Key observations:
- Neon ~5× faster than OS at 128 B; ~1.4× faster than OS at 1 KiB; OS leads from ~8 KiB (~1.54× faster at 128 KiB)
- Neon beats BouncyCastle at all sizes up to ~4 KiB; on par with BouncyCastle at 128 KiB (potential improvement area: a dual-block NEON path)
- Managed and Neon paths are zero-allocation
- BouncyCastle allocates 336–416 B per call; NaCl.Core allocates 48–72 B per call
| Description | TestDataSize | Mean | Error | StdDev | Median | Allocated |
|---|---|---|---|---|---|---|
| Decrypt · ChaCha20-Poly1305 (CryptoHives-Neon) | 128B | 2.156 μs | 0.0046 μs | 0.0043 μs | 2.157 μs | - |
| Decrypt · ChaCha20-Poly1305 (BouncyCastle) | 128B | 3.288 μs | 0.0058 μs | 0.0049 μs | 3.287 μs | 416 B |
| Decrypt · ChaCha20-Poly1305 (NaCl.Core) | 128B | 4.272 μs | 0.0011 μs | 0.0010 μs | 4.272 μs | 48 B |
| Decrypt · ChaCha20-Poly1305 (CryptoHives-Scalar) | 128B | 6.557 μs | 0.1285 μs | 0.1428 μs | 6.672 μs | - |
| Decrypt · ChaCha20-Poly1305 (OS) | 128B | 11.071 μs | 0.1061 μs | 0.0941 μs | 11.038 μs | - |
| Encrypt · ChaCha20-Poly1305 (CryptoHives-Neon) | 128B | 1.836 μs | 0.0108 μs | 0.0101 μs | 1.840 μs | - |
| Encrypt · ChaCha20-Poly1305 (BouncyCastle) | 128B | 2.344 μs | 0.0040 μs | 0.0033 μs | 2.344 μs | 336 B |
| Encrypt · ChaCha20-Poly1305 (NaCl.Core) | 128B | 4.112 μs | 0.0013 μs | 0.0012 μs | 4.112 μs | 48 B |
| Encrypt · ChaCha20-Poly1305 (CryptoHives-Scalar) | 128B | 6.232 μs | 0.0103 μs | 0.0092 μs | 6.229 μs | - |
| Encrypt · ChaCha20-Poly1305 (OS) | 128B | 9.582 μs | 0.1183 μs | 0.1106 μs | 9.581 μs | - |
| Decrypt · ChaCha20-Poly1305 (CryptoHives-Neon) | 1KB | 10.548 μs | 0.0649 μs | 0.0607 μs | 10.533 μs | - |
| Decrypt · ChaCha20-Poly1305 (BouncyCastle) | 1KB | 11.339 μs | 0.0041 μs | 0.0032 μs | 11.338 μs | 416 B |
| Decrypt · ChaCha20-Poly1305 (OS) | 1KB | 15.822 μs | 0.0718 μs | 0.0672 μs | 15.797 μs | - |
| Decrypt · ChaCha20-Poly1305 (NaCl.Core) | 1KB | 19.073 μs | 0.0068 μs | 0.0056 μs | 19.074 μs | 72 B |
| Decrypt · ChaCha20-Poly1305 (CryptoHives-Scalar) | 1KB | 33.905 μs | 0.0143 μs | 0.0112 μs | 33.902 μs | - |
| Encrypt · ChaCha20-Poly1305 (CryptoHives-Neon) | 1KB | 10.092 μs | 0.0111 μs | 0.0104 μs | 10.090 μs | - |
| Encrypt · ChaCha20-Poly1305 (BouncyCastle) | 1KB | 10.421 μs | 0.0099 μs | 0.0083 μs | 10.418 μs | 336 B |
| Encrypt · ChaCha20-Poly1305 (OS) | 1KB | 14.138 μs | 0.1184 μs | 0.1108 μs | 14.106 μs | - |
| Encrypt · ChaCha20-Poly1305 (NaCl.Core) | 1KB | 18.914 μs | 0.0143 μs | 0.0111 μs | 18.917 μs | 72 B |
| Encrypt · ChaCha20-Poly1305 (CryptoHives-Scalar) | 1KB | 33.723 μs | 0.0110 μs | 0.0098 μs | 33.721 μs | - |
| Decrypt · ChaCha20-Poly1305 (OS) | 8KB | 54.149 μs | 0.1266 μs | 0.1123 μs | 54.176 μs | - |
| Decrypt · ChaCha20-Poly1305 (BouncyCastle) | 8KB | 75.021 μs | 0.1526 μs | 0.1191 μs | 75.002 μs | 416 B |
| Decrypt · ChaCha20-Poly1305 (CryptoHives-Neon) | 8KB | 75.338 μs | 0.2684 μs | 0.2095 μs | 75.238 μs | - |
| Decrypt · ChaCha20-Poly1305 (NaCl.Core) | 8KB | 137.033 μs | 0.0552 μs | 0.0517 μs | 137.046 μs | 72 B |
| Decrypt · ChaCha20-Poly1305 (CryptoHives-Scalar) | 8KB | 243.532 μs | 0.2211 μs | 0.1726 μs | 243.494 μs | - |
| Encrypt · ChaCha20-Poly1305 (OS) | 8KB | 51.838 μs | 0.1845 μs | 0.1636 μs | 51.836 μs | - |
| Encrypt · ChaCha20-Poly1305 (BouncyCastle) | 8KB | 74.306 μs | 0.0370 μs | 0.0328 μs | 74.303 μs | 336 B |
| Encrypt · ChaCha20-Poly1305 (CryptoHives-Neon) | 8KB | 75.394 μs | 0.1183 μs | 0.0988 μs | 75.344 μs | - |
| Encrypt · ChaCha20-Poly1305 (NaCl.Core) | 8KB | 136.922 μs | 0.1618 μs | 0.1434 μs | 136.941 μs | 72 B |
| Encrypt · ChaCha20-Poly1305 (CryptoHives-Scalar) | 8KB | 243.441 μs | 0.0565 μs | 0.0472 μs | 243.437 μs | - |
| Decrypt · ChaCha20-Poly1305 (OS) | 128KB | 773.679 μs | 2.9226 μs | 2.4405 μs | 774.563 μs | - |
| Decrypt · ChaCha20-Poly1305 (BouncyCastle) | 128KB | 1,174.941 μs | 2.0728 μs | 1.7309 μs | 1,174.399 μs | 416 B |
| Decrypt · ChaCha20-Poly1305 (CryptoHives-Neon) | 128KB | 1,189.233 μs | 3.4307 μs | 3.2091 μs | 1,188.708 μs | - |
| Decrypt · ChaCha20-Poly1305 (NaCl.Core) | 128KB | 2,173.125 μs | 5.2605 μs | 4.6633 μs | 2,172.153 μs | 72 B |
| Decrypt · ChaCha20-Poly1305 (CryptoHives-Scalar) | 128KB | 3,835.770 μs | 3.5903 μs | 3.1827 μs | 3,835.275 μs | - |
| Encrypt · ChaCha20-Poly1305 (OS) | 128KB | 720.085 μs | 2.2119 μs | 2.0690 μs | 720.637 μs | - |
| Encrypt · ChaCha20-Poly1305 (BouncyCastle) | 128KB | 1,180.734 μs | 2.7441 μs | 2.4326 μs | 1,181.029 μs | 336 B |
| Encrypt · ChaCha20-Poly1305 (CryptoHives-Neon) | 128KB | 1,193.358 μs | 1.4276 μs | 1.3354 μs | 1,192.924 μs | - |
| Encrypt · ChaCha20-Poly1305 (NaCl.Core) | 128KB | 2,158.831 μs | 0.5012 μs | 0.4443 μs | 2,158.781 μs | 72 B |
| Encrypt · ChaCha20-Poly1305 (CryptoHives-Scalar) | 128KB | 3,840.596 μs | 3.5656 μs | 3.1608 μs | 3,840.163 μs | - |
XChaCha20-Poly1305
XChaCha20-Poly1305 extends ChaCha20-Poly1305 with a 24-byte nonce (vs 12 bytes), making random nonce generation safe against collisions (2⁹² birthday bound vs 2³² for ChaCha20-Poly1305). The implementation prepends an HChaCha20 key derivation step that derives a subkey from the first 16 bytes of the nonce. The same Neon / Managed acceleration tiers apply to the inner ChaCha20-Poly1305 operation.
Key observations:
- Performance nearly identical to ChaCha20-Poly1305 (HChaCha20 adds ~400 ns constant overhead)
- Neon ~3.3× faster than Managed at 128 KiB; ~3.3× faster than NaCl.Core at 128 KiB
- No OS or BouncyCastle implementations available for comparison
- NaCl.Core allocates 48–72 B per call
- Managed and Neon paths are zero-allocation
| Description | TestDataSize | Mean | Error | StdDev | Median | Allocated |
|---|---|---|---|---|---|---|
| Decrypt · XChaCha20-Poly1305 (CryptoHives-Neon) | 128B | 870.8 ns | 7.85 ns | 6.96 ns | 869.1 ns | - |
| Decrypt · XChaCha20-Poly1305 (NaCl.Core) | 128B | 1,494.1 ns | 1.85 ns | 1.64 ns | 1,493.5 ns | 48 B |
| Decrypt · XChaCha20-Poly1305 (CryptoHives-Scalar) | 128B | 1,777.1 ns | 10.78 ns | 9.00 ns | 1,774.8 ns | - |
| Encrypt · XChaCha20-Poly1305 (CryptoHives-Neon) | 128B | 724.6 ns | 5.29 ns | 4.95 ns | 726.2 ns | - |
| Encrypt · XChaCha20-Poly1305 (NaCl.Core) | 128B | 1,448.5 ns | 1.33 ns | 1.18 ns | 1,447.9 ns | 48 B |
| Encrypt · XChaCha20-Poly1305 (CryptoHives-Scalar) | 128B | 1,681.4 ns | 9.47 ns | 8.39 ns | 1,681.9 ns | - |
| Decrypt · XChaCha20-Poly1305 (CryptoHives-Neon) | 1KB | 2,468.2 ns | 2.50 ns | 2.21 ns | 2,467.3 ns | - |
| Decrypt · XChaCha20-Poly1305 (NaCl.Core) | 1KB | 6,662.1 ns | 54.00 ns | 45.10 ns | 6,647.6 ns | 72 B |
| Decrypt · XChaCha20-Poly1305 (CryptoHives-Scalar) | 1KB | 7,248.1 ns | 28.61 ns | 23.89 ns | 7,255.3 ns | - |
| Encrypt · XChaCha20-Poly1305 (CryptoHives-Neon) | 1KB | 2,360.2 ns | 1.34 ns | 1.12 ns | 2,360.2 ns | - |
| Encrypt · XChaCha20-Poly1305 (NaCl.Core) | 1KB | 6,632.1 ns | 91.79 ns | 81.37 ns | 6,590.8 ns | 72 B |
| Encrypt · XChaCha20-Poly1305 (CryptoHives-Scalar) | 1KB | 7,140.9 ns | 22.92 ns | 20.32 ns | 7,141.7 ns | - |
| Decrypt · XChaCha20-Poly1305 (CryptoHives-Neon) | 8KB | 14,941.5 ns | 12.66 ns | 10.57 ns | 14,938.7 ns | - |
| Decrypt · XChaCha20-Poly1305 (NaCl.Core) | 8KB | 48,078.7 ns | 590.11 ns | 551.99 ns | 47,752.9 ns | 72 B |
| Decrypt · XChaCha20-Poly1305 (CryptoHives-Scalar) | 8KB | 49,552.9 ns | 159.09 ns | 141.03 ns | 49,545.3 ns | - |
| Encrypt · XChaCha20-Poly1305 (CryptoHives-Neon) | 8KB | 14,876.7 ns | 4.58 ns | 3.58 ns | 14,875.9 ns | - |
| Encrypt · XChaCha20-Poly1305 (CryptoHives-Scalar) | 8KB | 49,262.7 ns | 197.17 ns | 184.43 ns | 49,166.9 ns | - |
| Encrypt · XChaCha20-Poly1305 (NaCl.Core) | 8KB | 49,715.8 ns | 987.01 ns | 2,402.51 ns | 48,239.9 ns | 72 B |
| Decrypt · XChaCha20-Poly1305 (CryptoHives-Neon) | 128KB | 230,374.7 ns | 1,734.54 ns | 1,448.42 ns | 230,310.9 ns | - |
| Decrypt · XChaCha20-Poly1305 (NaCl.Core) | 128KB | 753,041.0 ns | 2,679.29 ns | 2,375.12 ns | 752,726.2 ns | 72 B |
| Decrypt · XChaCha20-Poly1305 (CryptoHives-Scalar) | 128KB | 778,248.7 ns | 5,998.17 ns | 5,317.22 ns | 777,958.1 ns | - |
| Encrypt · XChaCha20-Poly1305 (CryptoHives-Neon) | 128KB | 232,627.7 ns | 3,798.88 ns | 3,553.47 ns | 231,496.5 ns | - |
| Encrypt · XChaCha20-Poly1305 (NaCl.Core) | 128KB | 754,021.8 ns | 2,403.71 ns | 2,130.83 ns | 753,460.4 ns | 72 B |
| Encrypt · XChaCha20-Poly1305 (CryptoHives-Scalar) | 128KB | 785,492.5 ns | 3,332.60 ns | 2,782.87 ns | 784,970.5 ns | - |
Regional Block Ciphers
Regional block ciphers implement national cryptographic standards. All operate on 128-bit blocks in CBC mode. Benchmarks compare Managed implementations against BouncyCastle where available.
SM4-CBC (China)
SM4 is the Chinese national block cipher (GB/T 32907-2016). It uses a 128-bit key with 32 rounds of nonlinear key mixing.
- Managed: Lookup-table implementation with 32-bit word operations. Zero allocation.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · SM4-CBC (CryptoHives-Scalar) | 128B | 921.5 ns | 12.28 ns | 10.89 ns | - |
| Decrypt · SM4-CBC (BouncyCastle) | 128B | 1,411.9 ns | 8.01 ns | 6.25 ns | 40 B |
| Encrypt · SM4-CBC (CryptoHives-Scalar) | 128B | 1,036.8 ns | 4.76 ns | 4.45 ns | - |
| Encrypt · SM4-CBC (BouncyCastle) | 128B | 1,472.8 ns | 5.07 ns | 4.49 ns | 40 B |
| Decrypt · SM4-CBC (CryptoHives-Scalar) | 1KB | 6,479.2 ns | 26.79 ns | 22.37 ns | - |
| Decrypt · SM4-CBC (BouncyCastle) | 1KB | 8,835.6 ns | 115.11 ns | 107.67 ns | 40 B |
| Encrypt · SM4-CBC (CryptoHives-Scalar) | 1KB | 7,430.8 ns | 16.52 ns | 15.45 ns | - |
| Encrypt · SM4-CBC (BouncyCastle) | 1KB | 9,525.7 ns | 71.14 ns | 63.07 ns | 40 B |
| Decrypt · SM4-CBC (CryptoHives-Scalar) | 8KB | 51,202.7 ns | 428.44 ns | 400.76 ns | - |
| Decrypt · SM4-CBC (BouncyCastle) | 8KB | 67,298.9 ns | 334.28 ns | 279.13 ns | 40 B |
| Encrypt · SM4-CBC (CryptoHives-Scalar) | 8KB | 58,660.9 ns | 176.29 ns | 147.21 ns | - |
| Encrypt · SM4-CBC (BouncyCastle) | 8KB | 73,962.4 ns | 398.15 ns | 352.95 ns | 40 B |
| Decrypt · SM4-CBC (CryptoHives-Scalar) | 128KB | 826,478.1 ns | 13,701.75 ns | 15,778.96 ns | - |
| Decrypt · SM4-CBC (BouncyCastle) | 128KB | 1,076,688.9 ns | 4,939.38 ns | 4,378.63 ns | 40 B |
| Encrypt · SM4-CBC (CryptoHives-Scalar) | 128KB | 937,807.3 ns | 1,932.93 ns | 1,713.49 ns | - |
| Encrypt · SM4-CBC (BouncyCastle) | 128KB | 1,176,637.9 ns | 6,588.28 ns | 6,162.68 ns | 40 B |
ARIA-128-CBC (Korea)
ARIA is a Korean national cipher (KS X 1213) with an involutional SPN structure. ARIA-128 uses 12 rounds.
- Managed: S-box substitution with byte-level diffusion layer. Zero allocation.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · ARIA-128-CBC (CryptoHives-Scalar) | 128B | 927.8 ns | 1.82 ns | 1.70 ns | - |
| Decrypt · ARIA-128-CBC (BouncyCastle) | 128B | 2,324.1 ns | 8.46 ns | 7.91 ns | 1288 B |
| Encrypt · ARIA-128-CBC (CryptoHives-Scalar) | 128B | 941.6 ns | 4.68 ns | 3.91 ns | - |
| Encrypt · ARIA-128-CBC (BouncyCastle) | 128B | 2,240.6 ns | 7.23 ns | 6.76 ns | 1288 B |
| Decrypt · ARIA-128-CBC (CryptoHives-Scalar) | 1KB | 6,627.7 ns | 4.24 ns | 3.76 ns | - |
| Decrypt · ARIA-128-CBC (BouncyCastle) | 1KB | 14,412.4 ns | 40.69 ns | 36.07 ns | 3528 B |
| Encrypt · ARIA-128-CBC (CryptoHives-Scalar) | 1KB | 6,779.2 ns | 12.17 ns | 10.79 ns | - |
| Encrypt · ARIA-128-CBC (BouncyCastle) | 1KB | 14,089.4 ns | 36.20 ns | 33.86 ns | 3528 B |
| Decrypt · ARIA-128-CBC (CryptoHives-Scalar) | 8KB | 52,162.7 ns | 46.58 ns | 43.57 ns | - |
| Decrypt · ARIA-128-CBC (BouncyCastle) | 8KB | 109,308.5 ns | 199.95 ns | 187.04 ns | 21448 B |
| Encrypt · ARIA-128-CBC (CryptoHives-Scalar) | 8KB | 52,727.6 ns | 693.84 ns | 649.02 ns | - |
| Encrypt · ARIA-128-CBC (BouncyCastle) | 8KB | 106,426.1 ns | 272.56 ns | 241.62 ns | 21448 B |
| Decrypt · ARIA-128-CBC (CryptoHives-Scalar) | 128KB | 830,361.3 ns | 2,946.62 ns | 2,756.27 ns | - |
| Decrypt · ARIA-128-CBC (BouncyCastle) | 128KB | 1,746,905.1 ns | 4,124.45 ns | 3,858.02 ns | 328648 B |
| Encrypt · ARIA-128-CBC (CryptoHives-Scalar) | 128KB | 855,540.8 ns | 724.33 ns | 565.51 ns | - |
| Encrypt · ARIA-128-CBC (BouncyCastle) | 128KB | 1,719,585.5 ns | 3,806.57 ns | 3,560.67 ns | 328648 B |
ARIA-256-CBC (Korea)
ARIA-256 uses 16 rounds for 256-bit key security. The same SPN structure applies with additional rounds.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · ARIA-256-CBC (CryptoHives-Scalar) | 128B | 1.210 μs | 0.0005 μs | 0.0005 μs | - |
| Decrypt · ARIA-256-CBC (BouncyCastle) | 128B | 3.003 μs | 0.0050 μs | 0.0047 μs | 1496 B |
| Encrypt · ARIA-256-CBC (CryptoHives-Scalar) | 128B | 1.245 μs | 0.0011 μs | 0.0010 μs | - |
| Encrypt · ARIA-256-CBC (BouncyCastle) | 128B | 2.914 μs | 0.0058 μs | 0.0054 μs | 1496 B |
| Decrypt · ARIA-256-CBC (CryptoHives-Scalar) | 1KB | 8.639 μs | 0.0104 μs | 0.0097 μs | - |
| Decrypt · ARIA-256-CBC (BouncyCastle) | 1KB | 18.666 μs | 0.0537 μs | 0.0476 μs | 3736 B |
| Encrypt · ARIA-256-CBC (CryptoHives-Scalar) | 1KB | 8.922 μs | 0.0042 μs | 0.0039 μs | - |
| Encrypt · ARIA-256-CBC (BouncyCastle) | 1KB | 18.424 μs | 0.0485 μs | 0.0454 μs | 3736 B |
| Decrypt · ARIA-256-CBC (CryptoHives-Scalar) | 8KB | 68.079 μs | 0.0810 μs | 0.0758 μs | - |
| Decrypt · ARIA-256-CBC (BouncyCastle) | 8KB | 143.145 μs | 0.3001 μs | 0.2807 μs | 21656 B |
| Encrypt · ARIA-256-CBC (CryptoHives-Scalar) | 8KB | 70.319 μs | 0.1612 μs | 0.1346 μs | - |
| Encrypt · ARIA-256-CBC (BouncyCastle) | 8KB | 140.554 μs | 0.2828 μs | 0.2645 μs | 21656 B |
| Decrypt · ARIA-256-CBC (CryptoHives-Scalar) | 128KB | 1,088.538 μs | 2.2789 μs | 2.1317 μs | - |
| Decrypt · ARIA-256-CBC (BouncyCastle) | 128KB | 2,266.721 μs | 6.7677 μs | 6.3305 μs | 328856 B |
| Encrypt · ARIA-256-CBC (CryptoHives-Scalar) | 128KB | 1,124.559 μs | 0.7575 μs | 0.7085 μs | - |
| Encrypt · ARIA-256-CBC (BouncyCastle) | 128KB | 2,246.388 μs | 6.5943 μs | 6.1683 μs | 328856 B |
Camellia-128-CBC (Japan)
Camellia is a Japanese CRYPTREC/NESSIE cipher (RFC 3713) with a Feistel structure and FL/FL⁻¹ key-dependent layers.
- Managed: Pre-computed SP-box tables with 6 S-boxes. Zero allocation.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · Camellia-128-CBC (CryptoHives-Scalar) | 128B | 583.4 ns | 2.61 ns | 2.44 ns | - |
| Decrypt · Camellia-128-CBC (BouncyCastle) | 128B | 900.0 ns | 3.16 ns | 2.96 ns | 576 B |
| Encrypt · Camellia-128-CBC (CryptoHives-Scalar) | 128B | 635.4 ns | 3.71 ns | 3.47 ns | - |
| Encrypt · Camellia-128-CBC (BouncyCastle) | 128B | 892.1 ns | 2.70 ns | 2.52 ns | 576 B |
| Decrypt · Camellia-128-CBC (CryptoHives-Scalar) | 1KB | 4,109.9 ns | 20.70 ns | 19.36 ns | - |
| Decrypt · Camellia-128-CBC (BouncyCastle) | 1KB | 5,837.9 ns | 17.67 ns | 16.53 ns | 2816 B |
| Encrypt · Camellia-128-CBC (CryptoHives-Scalar) | 1KB | 4,670.0 ns | 14.73 ns | 13.78 ns | - |
| Encrypt · Camellia-128-CBC (BouncyCastle) | 1KB | 5,871.9 ns | 20.41 ns | 19.09 ns | 2816 B |
| Decrypt · Camellia-128-CBC (CryptoHives-Scalar) | 8KB | 32,465.9 ns | 140.94 ns | 131.84 ns | - |
| Decrypt · Camellia-128-CBC (BouncyCastle) | 8KB | 45,009.8 ns | 137.31 ns | 128.44 ns | 20736 B |
| Encrypt · Camellia-128-CBC (CryptoHives-Scalar) | 8KB | 37,064.0 ns | 164.05 ns | 153.45 ns | - |
| Encrypt · Camellia-128-CBC (BouncyCastle) | 8KB | 45,261.6 ns | 154.66 ns | 144.67 ns | 20736 B |
| Decrypt · Camellia-128-CBC (CryptoHives-Scalar) | 128KB | 521,839.9 ns | 2,451.14 ns | 2,292.80 ns | - |
| Decrypt · Camellia-128-CBC (BouncyCastle) | 128KB | 728,615.9 ns | 2,845.84 ns | 2,662.00 ns | 327936 B |
| Encrypt · Camellia-128-CBC (CryptoHives-Scalar) | 128KB | 595,950.4 ns | 2,549.53 ns | 2,384.83 ns | - |
| Encrypt · Camellia-128-CBC (BouncyCastle) | 128KB | 718,452.6 ns | 2,779.84 ns | 2,321.29 ns | 327936 B |
Camellia-256-CBC (Japan)
Camellia-256 uses 24 rounds (vs 18 for 128-bit). The additional FL/FL⁻¹ layers add minimal overhead.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · Camellia-256-CBC (CryptoHives-Scalar) | 128B | 831.3 ns | 3.36 ns | 2.98 ns | - |
| Decrypt · Camellia-256-CBC (BouncyCastle) | 128B | 1,196.5 ns | 3.71 ns | 3.29 ns | 592 B |
| Encrypt · Camellia-256-CBC (CryptoHives-Scalar) | 128B | 888.3 ns | 4.69 ns | 4.39 ns | - |
| Encrypt · Camellia-256-CBC (BouncyCastle) | 128B | 1,167.4 ns | 5.89 ns | 5.51 ns | 592 B |
| Decrypt · Camellia-256-CBC (CryptoHives-Scalar) | 1KB | 5,884.4 ns | 71.64 ns | 63.50 ns | - |
| Decrypt · Camellia-256-CBC (BouncyCastle) | 1KB | 7,653.6 ns | 18.85 ns | 17.63 ns | 2832 B |
| Encrypt · Camellia-256-CBC (CryptoHives-Scalar) | 1KB | 6,439.5 ns | 20.14 ns | 18.84 ns | - |
| Encrypt · Camellia-256-CBC (BouncyCastle) | 1KB | 7,675.2 ns | 25.75 ns | 20.11 ns | 2832 B |
| Decrypt · Camellia-256-CBC (CryptoHives-Scalar) | 8KB | 46,431.4 ns | 179.93 ns | 168.31 ns | - |
| Decrypt · Camellia-256-CBC (BouncyCastle) | 8KB | 59,034.3 ns | 99.45 ns | 93.03 ns | 20752 B |
| Encrypt · Camellia-256-CBC (CryptoHives-Scalar) | 8KB | 51,102.3 ns | 236.79 ns | 209.91 ns | - |
| Encrypt · Camellia-256-CBC (BouncyCastle) | 8KB | 59,348.3 ns | 200.40 ns | 187.46 ns | 20752 B |
| Decrypt · Camellia-256-CBC (CryptoHives-Scalar) | 128KB | 742,813.1 ns | 3,121.44 ns | 2,606.54 ns | - |
| Decrypt · Camellia-256-CBC (BouncyCastle) | 128KB | 939,621.1 ns | 2,158.14 ns | 1,802.14 ns | 327952 B |
| Encrypt · Camellia-256-CBC (CryptoHives-Scalar) | 128KB | 818,136.6 ns | 4,710.09 ns | 3,933.14 ns | - |
| Encrypt · Camellia-256-CBC (BouncyCastle) | 128KB | 946,057.0 ns | 4,722.34 ns | 3,943.36 ns | 327952 B |
Kuznyechik-CBC (Russia)
Kuznyechik (GOST R 34.12-2015) is the modern Russian cipher with a 256-bit key and 10 rounds. It replaces the older GOST 28147-89.
- Managed: Pre-computed S-box and linear transformation tables. Zero allocation.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · Kuznyechik-CBC (CryptoHives-Scalar) | 128B | 391.7 μs | 7.41 μs | 6.57 μs | - |
| Encrypt · Kuznyechik-CBC (CryptoHives-Scalar) | 128B | 408.6 μs | 8.11 μs | 15.23 μs | - |
| Decrypt · Kuznyechik-CBC (CryptoHives-Scalar) | 1KB | 3,107.0 μs | 17.59 μs | 14.68 μs | - |
| Encrypt · Kuznyechik-CBC (CryptoHives-Scalar) | 1KB | 3,284.4 μs | 28.57 μs | 22.31 μs | - |
| Decrypt · Kuznyechik-CBC (CryptoHives-Scalar) | 8KB | 26,156.3 μs | 186.13 μs | 155.42 μs | - |
| Encrypt · Kuznyechik-CBC (CryptoHives-Scalar) | 8KB | 27,158.4 μs | 532.26 μs | 612.95 μs | - |
| Decrypt · Kuznyechik-CBC (CryptoHives-Scalar) | 128KB | 418,542.0 μs | 5,012.39 μs | 4,185.57 μs | - |
| Encrypt · Kuznyechik-CBC (CryptoHives-Scalar) | 128KB | 420,882.6 μs | 2,505.93 μs | 2,221.44 μs | - |
Kalyna-128-CBC (Ukraine)
Kalyna (DSTU 7624:2014) is the Ukrainian national cipher paired with the Kupyna hash family. Uses MDS matrix diffusion.
- Managed: S-box substitution with MDS matrix multiplication. Zero allocation.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · Kalyna-128-CBC (CryptoHives-Scalar) | 128B | 798.7 ns | 2.73 ns | 2.42 ns | - |
| Decrypt · Kalyna-128-CBC (BouncyCastle) | 128B | 2,425.8 ns | 3.09 ns | 2.89 ns | 872 B |
| Encrypt · Kalyna-128-CBC (CryptoHives-Scalar) | 128B | 396.9 ns | 1.65 ns | 1.38 ns | - |
| Encrypt · Kalyna-128-CBC (BouncyCastle) | 128B | 1,252.7 ns | 4.56 ns | 4.26 ns | 872 B |
| Decrypt · Kalyna-128-CBC (CryptoHives-Scalar) | 1KB | 5,642.5 ns | 17.20 ns | 16.09 ns | - |
| Decrypt · Kalyna-128-CBC (BouncyCastle) | 1KB | 15,461.9 ns | 12.48 ns | 11.06 ns | 872 B |
| Encrypt · Kalyna-128-CBC (CryptoHives-Scalar) | 1KB | 2,886.6 ns | 12.01 ns | 11.23 ns | - |
| Encrypt · Kalyna-128-CBC (BouncyCastle) | 1KB | 7,027.7 ns | 23.47 ns | 21.96 ns | 872 B |
| Decrypt · Kalyna-128-CBC (CryptoHives-Scalar) | 8KB | 44,408.3 ns | 191.24 ns | 169.53 ns | - |
| Decrypt · Kalyna-128-CBC (BouncyCastle) | 8KB | 119,614.1 ns | 88.51 ns | 78.46 ns | 872 B |
| Encrypt · Kalyna-128-CBC (CryptoHives-Scalar) | 8KB | 22,890.9 ns | 66.92 ns | 62.60 ns | - |
| Encrypt · Kalyna-128-CBC (BouncyCastle) | 8KB | 53,186.3 ns | 138.19 ns | 122.50 ns | 872 B |
| Decrypt · Kalyna-128-CBC (CryptoHives-Scalar) | 128KB | 706,983.4 ns | 2,881.66 ns | 2,249.81 ns | - |
| Decrypt · Kalyna-128-CBC (BouncyCastle) | 128KB | 1,902,282.2 ns | 2,917.11 ns | 2,277.49 ns | 872 B |
| Encrypt · Kalyna-128-CBC (CryptoHives-Scalar) | 128KB | 369,102.1 ns | 873.51 ns | 817.08 ns | - |
| Encrypt · Kalyna-128-CBC (BouncyCastle) | 128KB | 847,327.0 ns | 1,901.11 ns | 1,685.28 ns | 872 B |
Kalyna-256-CBC (Ukraine)
Kalyna-256 uses 14 rounds (vs 10 for 128-bit key). The same MDS-based architecture applies.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · Kalyna-256-CBC (CryptoHives-Scalar) | 128B | 1,128.5 ns | 3.81 ns | 3.18 ns | - |
| Decrypt · Kalyna-256-CBC (BouncyCastle) | 128B | 3,303.0 ns | 2.29 ns | 2.03 ns | 1112 B |
| Encrypt · Kalyna-256-CBC (CryptoHives-Scalar) | 128B | 554.4 ns | 2.69 ns | 2.10 ns | - |
| Encrypt · Kalyna-256-CBC (BouncyCastle) | 128B | 1,699.2 ns | 3.38 ns | 2.82 ns | 1112 B |
| Decrypt · Kalyna-256-CBC (CryptoHives-Scalar) | 1KB | 8,036.5 ns | 31.95 ns | 24.95 ns | - |
| Decrypt · Kalyna-256-CBC (BouncyCastle) | 1KB | 21,269.0 ns | 16.57 ns | 12.93 ns | 1112 B |
| Encrypt · Kalyna-256-CBC (CryptoHives-Scalar) | 1KB | 4,010.7 ns | 17.98 ns | 15.01 ns | - |
| Encrypt · Kalyna-256-CBC (BouncyCastle) | 1KB | 9,626.2 ns | 19.34 ns | 17.15 ns | 1112 B |
| Decrypt · Kalyna-256-CBC (CryptoHives-Scalar) | 8KB | 63,253.3 ns | 244.99 ns | 229.16 ns | - |
| Decrypt · Kalyna-256-CBC (BouncyCastle) | 8KB | 164,958.9 ns | 155.34 ns | 145.31 ns | 1112 B |
| Encrypt · Kalyna-256-CBC (CryptoHives-Scalar) | 8KB | 31,585.1 ns | 164.62 ns | 145.93 ns | - |
| Encrypt · Kalyna-256-CBC (BouncyCastle) | 8KB | 73,257.0 ns | 363.53 ns | 322.26 ns | 1112 B |
| Decrypt · Kalyna-256-CBC (CryptoHives-Scalar) | 128KB | 1,005,918.9 ns | 3,380.08 ns | 3,161.73 ns | - |
| Decrypt · Kalyna-256-CBC (BouncyCastle) | 128KB | 2,628,385.5 ns | 1,761.63 ns | 1,375.36 ns | 1112 B |
| Encrypt · Kalyna-256-CBC (CryptoHives-Scalar) | 128KB | 506,426.0 ns | 1,779.25 ns | 1,577.25 ns | - |
| Encrypt · Kalyna-256-CBC (BouncyCastle) | 128KB | 1,164,810.6 ns | 4,690.58 ns | 4,158.07 ns | 1112 B |
SEED-CBC (Korea)
SEED is a Korean cipher (RFC 4269, KISA) with a 128-bit key and 16-round Feistel structure. S-boxes are derived from the golden ratio.
- Managed: Pre-computed 32-bit SS-boxes (SS0–SS3). Zero allocation.
| Description | TestDataSize | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|---|
| Decrypt · SEED-CBC (CryptoHives-Scalar) | 128B | 1.352 μs | 0.0154 μs | 0.0128 μs | - |
| Decrypt · SEED-CBC (BouncyCastle) | 128B | 1.438 μs | 0.0145 μs | 0.0121 μs | 152 B |
| Encrypt · SEED-CBC (BouncyCastle) | 128B | 1.475 μs | 0.0066 μs | 0.0058 μs | 152 B |
| Encrypt · SEED-CBC (CryptoHives-Scalar) | 128B | 1.493 μs | 0.0072 μs | 0.0067 μs | - |
| Decrypt · SEED-CBC (CryptoHives-Scalar) | 1KB | 9.553 μs | 0.0354 μs | 0.0314 μs | - |
| Decrypt · SEED-CBC (BouncyCastle) | 1KB | 9.780 μs | 0.0552 μs | 0.0490 μs | 152 B |
| Encrypt · SEED-CBC (BouncyCastle) | 1KB | 10.390 μs | 0.1815 μs | 0.1698 μs | 152 B |
| Encrypt · SEED-CBC (CryptoHives-Scalar) | 1KB | 11.119 μs | 0.1949 μs | 0.2733 μs | - |
| Decrypt · SEED-CBC (CryptoHives-Scalar) | 8KB | 74.943 μs | 0.3096 μs | 0.2896 μs | - |
| Decrypt · SEED-CBC (BouncyCastle) | 8KB | 76.362 μs | 0.3769 μs | 0.3342 μs | 152 B |
| Encrypt · SEED-CBC (BouncyCastle) | 8KB | 80.960 μs | 1.2714 μs | 1.0617 μs | 152 B |
| Encrypt · SEED-CBC (CryptoHives-Scalar) | 8KB | 85.853 μs | 0.4465 μs | 0.3958 μs | - |
| Decrypt · SEED-CBC (CryptoHives-Scalar) | 128KB | 1,192.777 μs | 8.1407 μs | 7.2165 μs | - |
| Decrypt · SEED-CBC (BouncyCastle) | 128KB | 1,225.473 μs | 8.5432 μs | 7.9913 μs | 152 B |
| Encrypt · SEED-CBC (BouncyCastle) | 128KB | 1,286.279 μs | 4.3548 μs | 3.8605 μs | 152 B |
| Encrypt · SEED-CBC (CryptoHives-Scalar) | 128KB | 1,368.396 μs | 9.5930 μs | 7.4896 μs | - |
Allocation Summary
All CryptoHives cipher implementations achieve zero heap allocation for both encrypt and decrypt operations across all payload sizes. This is critical for high-throughput scenarios such as network packet processing, where GC pressure directly impacts tail latency.
| Implementation | Allocation | Notes |
|---|---|---|
| CryptoHives (all variants) | 0 B | All tiers (Managed, ArmAes, ArmAes+ArmPmull, Neon) are zero-allocation at all payload sizes |
| OS (.NET) — GCM / ChaCha20-Poly1305 | 0 B | OS AEAD implementations are zero-allocation |
| OS (.NET) — CBC | 72 B | Fixed P/Invoke marshalling overhead per call, independent of payload size |
| BouncyCastle — CBC | 832–1,024 B | Fixed per-call allocation (832 B for AES-128, 1,024 B for AES-256) |
| BouncyCastle — GCM | 1,520–1,744 B | Fixed per-call allocation (1,520 B for AES-128 encrypt, 1,744 B for AES-256 decrypt) |
| BouncyCastle — CCM | 2,424–2,848 B | Fixed per-call allocation (2,424 B for AES-128 decrypt, 2,848 B for AES-256 encrypt) |
| BouncyCastle — ChaCha20-Poly1305 | 336–416 B | Varies slightly by payload size |
| BouncyCastle — ChaCha20 | 96 B | Fixed per-call allocation |
| NaCl.Core — ChaCha20 | 24 B | Small fixed allocation |
| NaCl.Core — ChaCha20-Poly1305 / XChaCha20 | 48–72 B | Small allocation, varies by payload size |