Boost.Corosio Performance Benchmarks
Executive Summary
This report presents comprehensive performance benchmarks comparing Boost.Corosio, Boost.Asio with coroutines (co_spawn/use_awaitable), and Boost.Asio with callbacks on Windows using the IOCP (I/O Completion Ports) backend. The benchmarks cover handler dispatch, socket throughput, socket latency, HTTP server workloads, timers, and connection churn.
Bottom Line
Corosio outperforms Asio coroutines in handler dispatch (9-50% faster) and scales dramatically better under multi-threaded load. It delivers equivalent performance in socket I/O, latency, and HTTP server workloads. Asio callbacks achieve the highest raw single-threaded dispatch throughput, but Corosio closes the gap as thread counts increase.
Where Corosio Excels
-
Multi-threaded handler scaling: Best scaling of all three — maintains 89% throughput at 8 threads vs 58% (Asio coroutines) and 53% (Asio callbacks)
-
Concurrent post and run: 46% faster than Asio coroutines (2.35 Mops/s vs 1.61 Mops/s)
-
Interleaved post/run: 34% faster than Asio coroutines (2.14 Mops/s vs 1.60 Mops/s)
-
HTTP concurrent connections: 5-7% higher throughput than Asio coroutines
Where Asio Callbacks Leads
-
Single-threaded handler post: 51% faster than Corosio (2.59 Mops/s vs 1.71 Mops/s)
-
Bidirectional socket throughput: 2.6× higher at large buffers (5.74 GB/s vs 2.18 GB/s at 64KB)
Where Asio Has an Edge
-
Timer schedule/cancel: 10× faster (35-38 Mops/s vs 3.44 Mops/s)
-
Bidirectional socket throughput at large buffers: Asio coroutines 2.5× faster than Corosio
Where They’re Equal
-
Unidirectional socket throughput: Within 5% across all buffer sizes
-
Socket latency: Mean within 2%, p99 within 3%
-
HTTP server throughput: Within 5% at all thread counts
-
Concurrent timer latency: Identical across all implementations
Key Insights
| Component | Assessment |
|---|---|
Handler Dispatch |
Corosio 9-50% faster than Asio coroutines; Asio callbacks fastest single-threaded |
Multi-threaded Scaling |
Corosio scales best — only implementation to improve at 2 threads |
Socket Throughput |
Equivalent unidirectional; Asio faster bidirectional at large buffers |
Socket Latency |
Equivalent across all three |
HTTP Server |
Equivalent across all three |
Timers |
Asio faster at schedule/cancel; equivalent fire rate and concurrent behavior |
Detailed Results
Handler Dispatch Summary
| Scenario | Corosio | Asio Coroutines | Asio Callbacks | Winner |
|---|---|---|---|---|
Single-threaded post |
1.71 Mops/s |
1.57 Mops/s |
2.59 Mops/s |
Callbacks |
Multi-threaded (8 threads) |
1.54 Mops/s |
1.03 Mops/s |
1.51 Mops/s |
Corosio |
Interleaved post/run |
2.14 Mops/s |
1.60 Mops/s |
2.88 Mops/s |
Callbacks |
Concurrent post/run |
2.35 Mops/s |
1.61 Mops/s |
2.58 Mops/s |
Callbacks |
Socket Throughput Summary
| Scenario | Corosio | Asio Coroutines | Asio Callbacks | Winner |
|---|---|---|---|---|
Unidirectional 1KB |
85.68 MB/s |
78.63 MB/s |
77.33 MB/s |
Corosio (+9%) |
Unidirectional 64KB |
2.19 GB/s |
2.24 GB/s |
2.31 GB/s |
Tie |
Bidirectional 1KB |
84.34 MB/s |
73.13 MB/s |
191.75 MB/s |
Callbacks |
Bidirectional 64KB |
2.18 GB/s |
5.56 GB/s |
5.74 GB/s |
Callbacks |
Socket Latency Summary
| Scenario | Corosio | Asio Coroutines | Asio Callbacks | Winner |
|---|---|---|---|---|
Ping-pong mean (64B) |
10.78 μs |
10.98 μs |
10.52 μs |
Tie |
Ping-pong p99 (64B) |
15.00 μs |
15.10 μs |
14.70 μs |
Tie |
16 concurrent pairs mean |
180.64 μs |
180.71 μs |
174.83 μs |
Tie |
Test Environment
Platform |
Windows (IOCP backend) |
Duration |
3 seconds per benchmark |
Comparison |
Asio coroutines ( |
Measurement |
Client-side latency and throughput |
Handler Dispatch Benchmarks
These benchmarks measure raw handler posting and execution throughput, isolating the scheduler from I/O completion overhead.
Single-Threaded Handler Post
Each implementation posts and runs handlers from a single thread for 3 seconds.
| Metric | Corosio | Asio Coroutines | Asio Callbacks |
|---|---|---|---|
Handlers |
5,134,000 |
4,712,000 |
7,764,000 |
Elapsed |
3.001 s |
3.000 s |
3.000 s |
Throughput |
1.71 Mops/s |
1.57 Mops/s |
2.59 Mops/s |
Key finding: Asio callbacks achieve the highest single-threaded dispatch rate. Corosio is 9% faster than Asio coroutines, providing a meaningful advantage for coroutine users.
Multi-Threaded Scaling
Multiple threads running handlers concurrently.
| Threads | Corosio | Asio Coroutines | Asio Callbacks |
|---|---|---|---|
1 |
1.72 Mops/s |
1.78 Mops/s |
2.82 Mops/s |
2 |
2.10 Mops/s (1.23×) |
1.40 Mops/s (0.78×) |
2.33 Mops/s (0.83×) |
4 |
2.02 Mops/s (1.18×) |
1.25 Mops/s (0.70×) |
2.10 Mops/s (0.74×) |
8 |
1.54 Mops/s (0.89×) |
1.03 Mops/s (0.58×) |
1.51 Mops/s (0.53×) |
Scaling Analysis
Throughput vs Thread Count:
Threads Corosio Asio Coro Asio CB Best Scaling
1 1.72 M 1.78 M 2.82 M —
2 2.10 M 1.40 M 2.33 M Corosio (1.23×)
4 2.02 M 1.25 M 2.10 M Corosio (1.18×)
8 1.54 M 1.03 M 1.51 M Corosio (0.89×)
Notable observations:
-
Corosio is the only implementation that improves at 2 threads (1.23× speedup)
-
Both Asio approaches degrade immediately at 2 threads (0.78×, 0.83×)
-
At 8 threads, Corosio surpasses Asio callbacks despite starting from a lower baseline
-
Corosio retains 89% of single-thread throughput at 8 threads, vs 58% (Asio coroutines) and 53% (Asio callbacks)
Interleaved Post/Run
Alternating between posting batches of 100 handlers and running them.
| Metric | Corosio | Asio Coroutines | Asio Callbacks |
|---|---|---|---|
Handlers/iter |
100 |
100 |
100 |
Total handlers |
6,408,000 |
4,792,100 |
8,651,900 |
Elapsed |
3.000 s |
3.000 s |
3.000 s |
Throughput |
2.14 Mops/s |
1.60 Mops/s |
2.88 Mops/s |
Key finding: Corosio is 34% faster than Asio coroutines in this common real-world pattern.
Concurrent Post and Run
Four threads simultaneously posting and running handlers.
| Metric | Corosio | Asio Coroutines | Asio Callbacks |
|---|---|---|---|
Threads |
4 |
4 |
4 |
Total handlers |
7,130,000 |
4,870,000 |
7,830,000 |
Elapsed |
3.029 s |
3.024 s |
3.030 s |
Throughput |
2.35 Mops/s |
1.61 Mops/s |
2.58 Mops/s |
Key finding: Corosio is 46% faster than Asio coroutines and within 9% of Asio callbacks in this multi-producer scenario.
Socket Throughput Benchmarks
Unidirectional Throughput
Single direction transfer with varying buffer sizes.
| Buffer Size | Corosio | Asio Coroutines | Asio Callbacks |
|---|---|---|---|
1024 bytes |
85.68 MB/s |
78.63 MB/s |
77.33 MB/s |
4096 bytes |
259.30 MB/s |
265.84 MB/s |
291.03 MB/s |
16384 bytes |
956.58 MB/s |
947.64 MB/s |
997.23 MB/s |
65536 bytes |
2.19 GB/s |
2.24 GB/s |
2.31 GB/s |
Observation: Unidirectional throughput is within 10% across all three implementations. Corosio has a slight edge at the smallest buffer size. All three are bounded by the same kernel socket path.
Bidirectional Throughput
Simultaneous transfer in both directions.
| Buffer Size | Corosio | Asio Coroutines | Asio Callbacks |
|---|---|---|---|
1024 bytes |
84.34 MB/s |
73.13 MB/s |
191.75 MB/s |
4096 bytes |
258.49 MB/s |
401.06 MB/s |
674.75 MB/s |
16384 bytes |
979.91 MB/s |
2.20 GB/s |
2.33 GB/s |
65536 bytes |
2.18 GB/s |
5.56 GB/s |
5.74 GB/s |
Observation: Bidirectional throughput at larger buffer sizes reveals a gap. Corosio’s combined bidirectional throughput is comparable to its unidirectional throughput, while both Asio implementations scale beyond their unidirectional numbers. At 64KB, Asio achieves 2.5-2.6× higher bidirectional throughput than Corosio.
Socket Latency Benchmarks
Ping-Pong Round-Trip Latency
A single socket pair exchanges messages for 3 seconds.
| Message Size | Corosio Mean | Asio Coroutines Mean | Asio Callbacks Mean |
|---|---|---|---|
1 byte |
10.75 μs |
10.90 μs |
10.56 μs |
64 bytes |
10.78 μs |
10.98 μs |
10.52 μs |
1024 bytes |
11.05 μs |
11.09 μs |
10.79 μs |
Latency Distribution (64-byte messages)
| Percentile | Corosio | Asio Coroutines | Asio Callbacks |
|---|---|---|---|
p50 |
10.40 μs |
10.60 μs |
10.20 μs |
p90 |
10.70 μs |
10.80 μs |
10.40 μs |
p99 |
15.00 μs |
15.10 μs |
14.70 μs |
p99.9 |
119.50 μs |
128.67 μs |
110.56 μs |
min |
9.10 μs |
9.20 μs |
9.40 μs |
max |
1.98 ms |
1.22 ms |
927.80 μs |
Observation: All three implementations deliver latency within 5% of each other. Asio callbacks has marginally better tail latency. The differences are small enough to be within measurement noise.
Concurrent Socket Pairs
Multiple socket pairs operating concurrently (64-byte messages).
| Pairs | Corosio Mean | Asio Coro Mean | Asio CB Mean | Corosio p99 | Asio Coro p99 | Asio CB p99 |
|---|---|---|---|---|---|---|
1 |
10.78 μs |
10.94 μs |
10.57 μs |
15.30 μs |
15.30 μs |
14.70 μs |
4 |
44.71 μs |
45.04 μs |
43.46 μs |
94.00 μs |
93.23 μs |
87.97 μs |
16 |
180.64 μs |
180.71 μs |
174.83 μs |
377.77 μs |
353.27 μs |
368.23 μs |
Observation: All three implementations scale similarly. Asio callbacks has a marginal edge in mean latency. At 16 pairs, Asio coroutines has slightly better p99.
HTTP Server Benchmarks
Single Connection (Sequential Requests)
| Metric | Corosio | Asio Coroutines | Asio Callbacks |
|---|---|---|---|
Completed |
261,715 |
255,257 |
264,158 |
Throughput |
87.04 Kops/s |
84.74 Kops/s |
87.79 Kops/s |
Mean latency |
11.46 μs |
11.76 μs |
11.36 μs |
p99 latency |
16.30 μs |
16.30 μs |
15.90 μs |
Observation: Single-connection HTTP performance is comparable across all three. Corosio and Asio callbacks are within 1%.
Concurrent Connections (Single Thread)
| Connections | Corosio Throughput | Asio Coro Throughput | Asio CB Throughput | Corosio Mean | Asio Coro Mean | Asio CB Mean |
|---|---|---|---|---|---|---|
1 |
86.79 Kops/s |
81.50 Kops/s |
85.65 Kops/s |
11.49 μs |
12.24 μs |
11.65 μs |
4 |
85.34 Kops/s |
80.11 Kops/s |
83.02 Kops/s |
46.84 μs |
49.85 μs |
48.15 μs |
16 |
83.40 Kops/s |
79.30 Kops/s |
82.80 Kops/s |
191.79 μs |
201.13 μs |
193.20 μs |
32 |
80.07 Kops/s |
78.47 Kops/s |
81.71 Kops/s |
399.56 μs |
406.99 μs |
391.54 μs |
Observation: Corosio consistently outperforms Asio coroutines by 5-7% in concurrent connection throughput. Corosio and Asio callbacks trade the lead depending on connection count.
Multi-Threaded HTTP (32 Connections)
| Threads | Corosio Throughput | Asio Coroutines Throughput | Asio Callbacks Throughput |
|---|---|---|---|
1 |
81.31 Kops/s |
77.49 Kops/s |
83.36 Kops/s |
2 |
115.80 Kops/s |
114.29 Kops/s |
118.18 Kops/s |
4 |
196.40 Kops/s |
194.05 Kops/s |
201.64 Kops/s |
8 |
319.24 Kops/s |
325.73 Kops/s |
327.99 Kops/s |
16 |
422.10 Kops/s |
422.20 Kops/s |
426.31 Kops/s |
Multi-Threaded Latency
| Threads | Corosio Mean | Asio Coro Mean | Asio CB Mean | Corosio p99 | Asio Coro p99 | Asio CB p99 |
|---|---|---|---|---|---|---|
1 |
393.50 μs |
412.09 μs |
383.85 μs |
656.65 μs |
730.44 μs |
682.81 μs |
2 |
276.23 μs |
279.53 μs |
270.69 μs |
424.65 μs |
509.19 μs |
423.52 μs |
4 |
162.81 μs |
163.85 μs |
158.52 μs |
230.55 μs |
230.66 μs |
224.11 μs |
8 |
100.10 μs |
97.77 μs |
97.44 μs |
139.12 μs |
134.07 μs |
144.19 μs |
16 |
75.61 μs |
75.33 μs |
74.57 μs |
99.86 μs |
94.40 μs |
94.93 μs |
Key finding: All three implementations converge at high thread counts, reaching ~422-426 Kops/s at 16 threads. Both show excellent near-linear scaling. Corosio has slightly higher mean latency at lower thread counts but converges at 8+ threads.
Timer Benchmarks
Timer Schedule/Cancel
Measures the rate of creating and cancelling timers without firing them.
| Metric | Corosio | Asio Coroutines | Asio Callbacks |
|---|---|---|---|
Timers |
10,328,000 |
107,190,000 |
114,149,000 |
Elapsed |
3.000 s |
3.000 s |
3.000 s |
Throughput |
3.44 Mops/s |
35.73 Mops/s |
38.05 Mops/s |
Observation: Asio is approximately 10× faster at scheduling and cancelling timers. This benchmark isolates the timer data structure operations without involving I/O completion.
Timer Fire Rate
Measures the rate of timers that actually expire and fire their handlers.
| Metric | Corosio | Asio Coroutines | Asio Callbacks |
|---|---|---|---|
Fires |
331,398 |
356,602 |
361,523 |
Elapsed |
3.012 s |
3.012 s |
3.018 s |
Throughput |
110.03 Kops/s |
118.39 Kops/s |
119.80 Kops/s |
Observation: When timers actually fire, the gap narrows to ~8%. The bottleneck shifts from the timer data structure to the I/O completion mechanism.
Concurrent Timers
Multiple timers firing at 15 ms intervals concurrently.
| Timers | Corosio Mean | Asio Coro Mean | Asio CB Mean | Corosio p99 | Asio Coro p99 | Asio CB p99 |
|---|---|---|---|---|---|---|
10 |
15.39 ms |
15.40 ms |
15.42 ms |
18.23 ms |
16.89 ms |
17.29 ms |
100 |
15.43 ms |
15.40 ms |
15.40 ms |
17.02 ms |
16.59 ms |
17.61 ms |
1000 |
15.45 ms |
15.39 ms |
15.41 ms |
16.71 ms |
17.47 ms |
18.17 ms |
Observation: Concurrent timer latency is identical across all three implementations. Mean latency stays within 0.06 ms of the 15 ms target regardless of concurrency level. Corosio has the best p99 at 1000 concurrent timers.
Analysis
Handler Dispatch
The handler dispatch results tell a nuanced story across the three implementations.
| Pattern | Corosio vs Asio Coro | Corosio vs Asio CB | Notes |
|---|---|---|---|
Single-threaded |
+9% |
-34% |
Callbacks benefit from lower per-handler overhead |
Multi-threaded (8T) |
+49% |
+2% |
Corosio’s scaling advantage closes the gap |
Interleaved |
+34% |
-26% |
Common real-world pattern |
Concurrent |
+46% |
-9% |
Multi-producer scenario |
The most telling result is multi-threaded scaling. Every implementation loses throughput as threads increase due to coordination overhead, but Corosio degrades the least:
Throughput retained at 8 threads (vs 1 thread):
Corosio: 89%
Asio Coroutines: 58%
Asio Callbacks: 53%
This makes Corosio the best choice for applications that distribute work across threads.
Socket I/O
Unidirectional socket throughput is equivalent across all three implementations, confirming that the kernel socket path — not the user-space framework — is the bottleneck.
Bidirectional throughput reveals a difference: Asio implementations achieve significantly higher combined throughput at larger buffer sizes. Corosio’s bidirectional throughput is comparable to its unidirectional throughput, suggesting serialization between the read and write paths. This is an area for future optimization.
Socket Latency
Latency results are tightly clustered across all three. Mean latencies differ by less than 0.5 μs. Tail latencies (p99) differ by less than 0.4 μs at the single-pair level. These differences are within measurement noise.
HTTP Server
HTTP server performance is comparable across all three implementations at all concurrency levels and thread counts. At 16 threads with 32 connections, all three converge to ~422-426 Kops/s. This confirms that for real-world HTTP workloads, the choice of framework has minimal performance impact.
Timers
Timer schedule/cancel throughput is a notable gap — Asio’s timer operations are approximately 10× faster. However, the gap narrows substantially for timer fire rate (8%) and disappears entirely for concurrent timer latency accuracy. Applications that create and cancel timers at very high rates may notice this difference; applications that primarily use timers for timeouts and delays will not.
Summary
| Component | Assessment |
|---|---|
Handler Dispatch (vs Asio Coro) |
Corosio 9-50% faster |
Handler Dispatch (vs Asio CB) |
Callbacks faster single-threaded; Corosio matches at 8 threads |
Multi-threaded Scaling |
Corosio best — only one that improves at 2 threads |
Socket Throughput (unidirectional) |
Equivalent |
Socket Throughput (bidirectional) |
Asio 2.5× faster at large buffers |
Socket Latency |
Equivalent |
HTTP Throughput |
Equivalent |
Timer Schedule/Cancel |
Asio 10× faster |
Timer Fire/Concurrent |
Equivalent |
Conclusions
Summary
Corosio delivers equivalent or better performance compared to Asio coroutines across the majority of benchmarks:
-
Handler dispatch: Corosio is 9-50% faster than Asio coroutines
-
Multi-threaded scaling: Corosio retains 89% throughput at 8 threads vs 58% for Asio coroutines
-
Socket I/O: Equivalent unidirectional throughput, equivalent latency
-
HTTP server: Equivalent throughput and latency
-
Bidirectional throughput: Asio faster at large buffers — area for optimization
-
Timer schedule/cancel: Asio faster — area for optimization
Asio callbacks achieve the highest raw single-threaded dispatch rate, but this advantage diminishes under multi-threaded load where Corosio matches or exceeds it.
Recommendations
| Workload | Recommendation |
|---|---|
Handler-intensive (single-threaded) |
Asio callbacks fastest; Corosio 9% faster than Asio coroutines |
Handler-intensive (multi-threaded) |
Corosio scales best |
Socket I/O (unidirectional) |
All equivalent |
Socket I/O (bidirectional, large buffers) |
Asio currently faster |
HTTP servers |
All equivalent |
Timer-heavy workloads |
Asio faster at schedule/cancel; equivalent for firing |
Key Takeaway
For coroutine-based async programming on Windows (IOCP), Corosio provides equivalent or better performance compared to Asio coroutines in every category except bidirectional socket throughput and timer schedule/cancel. Corosio’s superior multi-threaded scaling makes it particularly well-suited for applications that distribute work across threads. Bidirectional throughput and timer operations are identified areas for future optimization.
Appendix: Raw Data
Corosio Results
Backend: iocp
Duration: 3 s per benchmark
=== Single-threaded Handler Post (Corosio) ===
Handlers: 5134000
Elapsed: 3.001 s
Throughput: 1.71 Mops/s
=== Multi-threaded Scaling (Corosio) ===
1 thread(s): 1.72 Mops/s
2 thread(s): 2.10 Mops/s (speedup: 1.23x)
4 thread(s): 2.02 Mops/s (speedup: 1.18x)
8 thread(s): 1.54 Mops/s (speedup: 0.89x)
=== Interleaved Post/Run (Corosio) ===
Handlers/iter: 100
Total handlers: 6408000
Elapsed: 3.000 s
Throughput: 2.14 Mops/s
=== Concurrent Post and Run (Corosio) ===
Threads: 4
Total handlers: 7130000
Elapsed: 3.029 s
Throughput: 2.35 Mops/s
=== Unidirectional Throughput (Corosio) ===
Buffer size: 1024 bytes: 85.68 MB/s
Buffer size: 4096 bytes: 259.30 MB/s
Buffer size: 16384 bytes: 956.58 MB/s
Buffer size: 65536 bytes: 2.19 GB/s
=== Bidirectional Throughput (Corosio) ===
Buffer size: 1024 bytes: 84.34 MB/s (combined)
Buffer size: 4096 bytes: 258.49 MB/s (combined)
Buffer size: 16384 bytes: 979.91 MB/s (combined)
Buffer size: 65536 bytes: 2.18 GB/s (combined)
=== Ping-Pong Round-Trip Latency (Corosio) ===
1 byte: mean=10.75 us, p50=10.30 us, p99=15.00 us
64 bytes: mean=10.78 us, p50=10.40 us, p99=15.00 us
1024 bytes: mean=11.05 us, p50=10.60 us, p99=15.30 us
=== Concurrent Socket Pairs Latency (Corosio) ===
1 pair: mean=10.78 us, p99=15.30 us
4 pairs: mean=44.71 us, p99=94.00 us
16 pairs: mean=180.64 us, p99=377.77 us
=== HTTP Single Connection (Corosio) ===
Throughput: 87.04 Kops/s
Latency: mean=11.46 us, p99=16.30 us
=== HTTP Concurrent Connections (Corosio, single thread) ===
1 conn: 86.79 Kops/s, mean=11.49 us, p99=16.60 us
4 conns: 85.34 Kops/s, mean=46.84 us, p99=105.41 us
16 conns: 83.40 Kops/s, mean=191.79 us, p99=403.74 us
32 conns: 80.07 Kops/s, mean=399.56 us, p99=679.69 us
=== HTTP Multi-threaded (Corosio, 32 connections) ===
1 thread: 81.31 Kops/s, mean=393.50 us, p99=656.65 us
2 threads: 115.80 Kops/s, mean=276.23 us, p99=424.65 us
4 threads: 196.40 Kops/s, mean=162.81 us, p99=230.55 us
8 threads: 319.24 Kops/s, mean=100.10 us, p99=139.12 us
16 threads: 422.10 Kops/s, mean=75.61 us, p99=99.86 us
=== Timer Schedule/Cancel (Corosio) ===
Timers: 10328000, Throughput: 3.44 Mops/s
=== Timer Fire Rate (Corosio) ===
Fires: 331398, Throughput: 110.03 Kops/s
=== Concurrent Timers (Corosio) ===
10 timers: mean=15.39 ms, p99=18.23 ms
100 timers: mean=15.43 ms, p99=17.02 ms
1000 timers: mean=15.45 ms, p99=16.71 ms
=== Sequential Accept Churn (Corosio) ===
Cycles: 14452, Throughput: 4.80 Kops/s
Latency: mean=208.28 us, p99=457.55 us
Asio Coroutines Results
=== Single-threaded Handler Post (Asio Coroutines) ===
Handlers: 4712000
Elapsed: 3.000 s
Throughput: 1.57 Mops/s
=== Multi-threaded Scaling (Asio Coroutines) ===
1 thread(s): 1.78 Mops/s
2 thread(s): 1.40 Mops/s (speedup: 0.78x)
4 thread(s): 1.25 Mops/s (speedup: 0.70x)
8 thread(s): 1.03 Mops/s (speedup: 0.58x)
=== Interleaved Post/Run (Asio Coroutines) ===
Handlers/iter: 100
Total handlers: 4792100
Elapsed: 3.000 s
Throughput: 1.60 Mops/s
=== Concurrent Post and Run (Asio Coroutines) ===
Threads: 4
Total handlers: 4870000
Elapsed: 3.024 s
Throughput: 1.61 Mops/s
=== Unidirectional Throughput (Asio Coroutines) ===
Buffer size: 1024 bytes: 78.63 MB/s
Buffer size: 4096 bytes: 265.84 MB/s
Buffer size: 16384 bytes: 947.64 MB/s
Buffer size: 65536 bytes: 2.24 GB/s
=== Bidirectional Throughput (Asio Coroutines) ===
Buffer size: 1024 bytes: 73.13 MB/s (combined)
Buffer size: 4096 bytes: 401.06 MB/s (combined)
Buffer size: 16384 bytes: 2.20 GB/s (combined)
Buffer size: 65536 bytes: 5.56 GB/s (combined)
=== Ping-Pong Round-Trip Latency (Asio Coroutines) ===
1 byte: mean=10.90 us, p50=10.50 us, p99=15.10 us
64 bytes: mean=10.98 us, p50=10.60 us, p99=15.10 us
1024 bytes: mean=11.09 us, p50=10.50 us, p99=15.30 us
=== Concurrent Socket Pairs Latency (Asio Coroutines) ===
1 pair: mean=10.94 us, p99=15.30 us
4 pairs: mean=45.04 us, p99=93.23 us
16 pairs: mean=180.71 us, p99=353.27 us
=== HTTP Single Connection (Asio Coroutines) ===
Throughput: 84.74 Kops/s
Latency: mean=11.76 us, p99=16.30 us
=== HTTP Concurrent Connections (Asio Coroutines, single thread) ===
1 conn: 81.50 Kops/s, mean=12.24 us, p99=24.10 us
4 conns: 80.11 Kops/s, mean=49.85 us, p99=104.69 us
16 conns: 79.30 Kops/s, mean=201.13 us, p99=398.32 us
32 conns: 78.47 Kops/s, mean=406.99 us, p99=645.61 us
=== HTTP Multi-threaded (Asio Coroutines, 32 connections) ===
1 thread: 77.49 Kops/s, mean=412.09 us, p99=730.44 us
2 threads: 114.29 Kops/s, mean=279.53 us, p99=509.19 us
4 threads: 194.05 Kops/s, mean=163.85 us, p99=230.66 us
8 threads: 325.73 Kops/s, mean=97.77 us, p99=134.07 us
16 threads: 422.20 Kops/s, mean=75.33 us, p99=94.40 us
=== Timer Schedule/Cancel (Asio Coroutines) ===
Timers: 107190000, Throughput: 35.73 Mops/s
=== Timer Fire Rate (Asio Coroutines) ===
Fires: 356602, Throughput: 118.39 Kops/s
=== Concurrent Timers (Asio Coroutines) ===
10 timers: mean=15.40 ms, p99=16.89 ms
100 timers: mean=15.40 ms, p99=16.59 ms
1000 timers: mean=15.39 ms, p99=17.47 ms
Asio Callbacks Results
=== Single-threaded Handler Post (Asio Callbacks) ===
Handlers: 7764000
Elapsed: 3.000 s
Throughput: 2.59 Mops/s
=== Multi-threaded Scaling (Asio Callbacks) ===
1 thread(s): 2.82 Mops/s
2 thread(s): 2.33 Mops/s (speedup: 0.83x)
4 thread(s): 2.10 Mops/s (speedup: 0.74x)
8 thread(s): 1.51 Mops/s (speedup: 0.53x)
=== Interleaved Post/Run (Asio Callbacks) ===
Handlers/iter: 100
Total handlers: 8651900
Elapsed: 3.000 s
Throughput: 2.88 Mops/s
=== Concurrent Post and Run (Asio Callbacks) ===
Threads: 4
Total handlers: 7830000
Elapsed: 3.030 s
Throughput: 2.58 Mops/s
=== Unidirectional Throughput (Asio Callbacks) ===
Buffer size: 1024 bytes: 77.33 MB/s
Buffer size: 4096 bytes: 291.03 MB/s
Buffer size: 16384 bytes: 997.23 MB/s
Buffer size: 65536 bytes: 2.31 GB/s
=== Bidirectional Throughput (Asio Callbacks) ===
Buffer size: 1024 bytes: 191.75 MB/s (combined)
Buffer size: 4096 bytes: 674.75 MB/s (combined)
Buffer size: 16384 bytes: 2.33 GB/s (combined)
Buffer size: 65536 bytes: 5.74 GB/s (combined)
=== Ping-Pong Round-Trip Latency (Asio Callbacks) ===
1 byte: mean=10.56 us, p50=10.30 us, p99=14.70 us
64 bytes: mean=10.52 us, p50=10.20 us, p99=14.70 us
1024 bytes: mean=10.79 us, p50=10.40 us, p99=15.10 us
=== Concurrent Socket Pairs Latency (Asio Callbacks) ===
1 pair: mean=10.57 us, p99=14.70 us
4 pairs: mean=43.46 us, p99=87.97 us
16 pairs: mean=174.83 us, p99=368.23 us
=== HTTP Single Connection (Asio Callbacks) ===
Throughput: 87.79 Kops/s
Latency: mean=11.36 us, p99=15.90 us
=== HTTP Concurrent Connections (Asio Callbacks, single thread) ===
1 conn: 85.65 Kops/s, mean=11.65 us, p99=19.40 us
4 conns: 83.02 Kops/s, mean=48.15 us, p99=106.16 us
16 conns: 82.80 Kops/s, mean=193.20 us, p99=361.47 us
32 conns: 81.71 Kops/s, mean=391.54 us, p99=638.11 us
=== HTTP Multi-threaded (Asio Callbacks, 32 connections) ===
1 thread: 83.36 Kops/s, mean=383.85 us, p99=682.81 us
2 threads: 118.18 Kops/s, mean=270.69 us, p99=423.52 us
4 threads: 201.64 Kops/s, mean=158.52 us, p99=224.11 us
8 threads: 327.99 Kops/s, mean=97.44 us, p99=144.19 us
16 threads: 426.31 Kops/s, mean=74.57 us, p99=94.93 us
=== Timer Schedule/Cancel (Asio Callbacks) ===
Timers: 114149000, Throughput: 38.05 Mops/s
=== Timer Fire Rate (Asio Callbacks) ===
Fires: 361523, Throughput: 119.80 Kops/s
=== Concurrent Timers (Asio Callbacks) ===
10 timers: mean=15.42 ms, p99=17.29 ms
100 timers: mean=15.40 ms, p99=17.61 ms
1000 timers: mean=15.41 ms, p99=18.17 ms