Feedback

Theory vs. Practice

Diagnosis is not the end, but the beginning of practice. Martin H. Fischer





Critical wrk and wrk2 bugs: all wrk/wrk2 benchmarks since 2012 are bogus

Nowadays, benchmarking is not a walk in a park. As yet another coincidence, wrk and wrk2 (2012) have been created to complement weighttp (2006) and IBM Apache Benchmark (1996). The world wide web, totally flat, was much simpler before G-WAN and its first 2010-2013 benchmarks bringing new unknown heights in this otherwise boring, infinitely self-complacent industry.

That's G-WAN/cache October 2025 version – tested with a corrected wrk2 version renamed wrk3:

              G-WAN RPS            NGINX RPS         G-WAN is N times faster
  -----  -------------------   ------------------    -----------------------
  users  10s  30s  3m   30m    10s  30s  3m   30m     (all 4 tests combined)
  -----  -------------------   ------------------    -----------------------
     1   151k 142k 152k 141k   104k 103k 103k 104k     586 /  141 =    1.42
    10   977k 996k 945k 927k   623k 616k 608k 601k    3845 / 2448 =    1.57
    1k   2.0m 1.9m 1.8m 1.8m   1.0m 963k 964k 956k    7.6m / 3.9m =    1.93
   10k   3.4m 1.8m 896k 729k   789k 729k 696k 716k    6.8m / 2930 = 2,334.73
   20k   5.6m 3.7m 1.0m 713k   755k 724k 671k 682k   11.1m / 2832 = 3,948.96
   30k   9.0m 4.2m 1.1m 723k      Terminated (OOM)   15.1m /    0 =  infinity
   40k  15.0m 5.8m 2.1m 723k      Terminated (OOM)   23.6m /    0 =  infinity

  G-WAN is 1.42 to 3,948 times faster than NGINX with 1 to 20k users and 10sec to 30min tests.
  G-WAN keeps going while NGINX has to stop, due to a lack of memory on a 192 GB RAM PC
  (the wrk family architecture consumes 190 GB RAM at 30-40k users).

  G-WAN running for 30 minutes is faster than NGINX running for 10 seconds – on all concurrencies.
  G-WAN runs marathons faster than NGINX runs 100m sprints (with 1-20k users).

  So G-WAN is faster than NGINX at short, middle and long test runs, beating the market leader
  (with the wrk NGINX benchmark) for the 100m sprint, 1.5km, 5km, half and full marathon races.

What follows, is the (long) why, the how, and the wrk3 source code fixing 4 wrk2 major multi-threading programming errors.

In 2023-2024, I first used wrk (written by the NGINX team, and based on NGINX HTTP parser), which takes forever to complete benchmarks with a fast server because wrk waits for all the server replies to send new requests, so, if the server takes 10 seconds to complete the test, and wrk is 500 times slower than the server, then wrk will need 500 * 10 seconds = 5000 seconds = 1 hour 23 minutes to complete a "10-second" test.

Late 2024, an engineer suggested the "slower but more reliable" wrk2 which stops at the specified time (instead of taking forever... and delivering extreme volatility inviting people to do cherry-picking – acceptable for NGINX, its author, but unacceptable for G-WAN apparently):

./wrk -t5k -c5k "http://127.0.0.1:8080/100.html"
Running 10s test @ http://127.0.0.1:8080/100.html
  5000 threads and 5000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.59ms    2.45ms  96.87ms   60.26%
    Req/Sec   316.65    484.74    24.66k    91.62%
  27896487 requests in 10.22s, 8.60GB read
Requests/sec: 2730160.42
Transfer/sec:    861.82MB

./wrk -t5k -c5k "http://127.0.0.1:8080/100.html"
Running 10s test @ http://127.0.0.1:8080/100.html
  5000 threads and 5000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.20ms    1.95ms  56.34ms   65.67%
    Req/Sec   242.80    239.83    19.81k    92.76%
  89898562 requests in 10.37s, 27.71GB read
Requests/sec: 8673159.26 ................. 3.18 higher score!
Transfer/sec:      2.67GB
./wrk -t10k -c10k "http://127.0.0.1:8080/100.html"
---------------------------------------------------------------
./wrk -t10k -c10k "http://127.0.0.1:8080/100.html"
Running 10s test @ http://127.0.0.1:8080/100.html
  10000 threads and 10000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    11.04ms   71.67ms   2.00s    99.23%
    Req/Sec   155.65    294.40    47.14k    97.57%
  49782723 requests in 10.57s, 15.35GB read
  Socket errors: connect 0, read 0, write 0, timeout 5349
Requests/sec: 4711830.98
Transfer/sec:      1.45GB

./wrk -t10k -c10k "http://127.0.0.1:8080/100.html"
Running 10s test @ http://127.0.0.1:8080/100.html
  10000 threads and 10000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     6.11ms    4.49ms 610.32ms   67.86%
    Req/Sec   130.23    175.12    26.26k    93.32%
  279233782 requests in 10.56s, 86.08GB read
Requests/sec: 26436985.15 ................. 5.61 higher score!
Transfer/sec:      8.15GB

So, in April 2025, I published new [1k-40k users] wrk2 benchmarks (G-WAN reaching 242m RPS at 10k users). But a few months later, I discovered that installing wrk2 on new machines was crashing at... 10k users.

This was odd because 10k users is the concurrency where G-WAN (at 242m RPS) is vaporizing NGINX and others (which top with less than 1m RPS at 1k users). But I did not have time to fix wrk2, and I was thinking that writting a G-WAN-based benchmark would be a much better value-proposition than fixing the slow, obscure and large code of wrk2 (5,316 lines of code).

Near September 2025, I noticed that an OS update had slowed-down G-WAN from 242m RPS to 8m RPS (so I wrote the G-WAN cache to bypass a suddenly 'faulty' Linux kernel syscall – restoring G-WAN performance to 281m RPS at 10k users with wrk2 again).

I though I was safe from this point. But in April 2026 I have been told that:

  • my wrk2 [1-40k] users G-WAN benchmarks are "not honest because they last a mere 10 seconds" (the default wrk2 duration).

    At the Olympic Games, a 100m sprint is as "honest" as a 42 km marathon, nobody would dare to pretend that a sprinter is not faster than a marathon winner... or that a runner able to win both short and large races should be disqualified.

    For computers, larger test durations make server scores converge, due to the OS kernel becoming the bottleneck. Then, every server delivers equal performance because you are no longer testing the server. You benchmark the OS kernel.

    So the interesting question is why, in a benchmark supposedly differentiating the fast from the slow HTTP servers, some "experts" insist to disqualify the sprint competition?

    The ones that will benefit from such a decision are... the slowest HTTP servers. When the "neutral experts" revert to fallacies, all discussion is vain.

    G-WAN RPS grow with the number of users (in contrast with NGINX where RPS decline as the number of users grows)... that is, until you grow NOT THE NUMBER OF USERS, BUT RATHER THE DURATION OF THE TEST (proving that the argument is a fallacy).



  • I should use another benchmark tool – the fastest I have been sent was an asthmatic newborn Rust 12 MB executable called "Oha":

      oha -c 1 -z 10s -w "http://127.0.0.1:8080/100.html"
      Requests/sec: 85219.9506
      Requests/sec: 92941.4151 --no-tui
    
      oha -c 1500 -z 10s -w "http://127.0.0.1:8080/100.html"
      Requests/sec: 644058.2775
      Requests/sec: 869730.2572 --no-tui
    
      oha -c 10000 -z 10s -w "http://127.0.0.1:8080/100.html"
      Requests/sec: 546039.7608
      Requests/sec: 581882.1891 --no-tui
      
      ----- on the top of disastrous single-core and multi-core performance, the "Oha" Rust tool
      ----- has a proudly stated agenda that can hardly be associated to fairness:
    
      $ oha -h
      "-z 
       Duration of application to send requests.
    
       On HTTP/1, When the duration is reached, ongoing requests
       are aborted and counted as "aborted due to deadline".
    
       Currently, on HTTP/2, When the duration is reached,
       ongoing requests are waited."
       
     You can avoid the above penalty by using the -w switch. Why not make it the default behavior?


    Here again, if you use a slow benchmark tool, then the server can't reply faster than the requests are sent to it, and the benchmark tool is the bottleneck. Then, every servers deliver equal performance because you are no longer testing the server. You test the benchmark tool.

    It's crazy to see how many of such "experts" rely to fallacies. Seriously, there should be public policies to disqualify these recurring outright liars from any public/private educational, research, media, legal and judicial activities.



  • creating many threads could take so much time that wrk2 could leave no time to the actual benchmark. The following patch was provided, where stop_at is created after start (wrk2 was creating stop_at before start and the creation of threads, ignoring completely the event-loops and threads creation times!):
 --- a/src/wrk.c
 +++ b/src/wrk.c
 @@ -122,7 +122,8 @@

      uint64_t connections = cfg.connections / cfg.threads;
      double throughput    = (double)cfg.rate / cfg.threads;
 -    uint64_t stop_at     = time_us() + (cfg.duration * 1000000);
 +    uint64_t start       = time_us();
 +    uint64_t stop_at     = start + (cfg.duration * 1000000);

      for (uint64_t i = 0; i < cfg.threads; i++) {
          thread *t = &threads[i];
 @@ -163,7 +164,6 @@
      printf("  %"PRIu64" threads and %"PRIu64" connections\n",
              cfg.threads, cfg.connections);

 -    uint64_t start    = time_us();
      uint64_t complete = 0;
      uint64_t bytes    = 0;
      errors errors     = { 0 };

In comparison, wrk (from which wrk2 is derived) creates worker threads and then waits for the specified benchmark duration. While not technically correct (threads all start and stop at different times), it's far less wrong than what wrk2 felt the urge to do:

 uint64_t start = time_us(); // GPG: after threads have been created
 sleep(cfg.duration);        // GPG: wait for the specified benchmark duration (that's incorrect)
 stop = 1;                   // GPG: tell any remaining threads to stop now (that's bad)

But wrk and wrk2, despite being well-promoted and widely praised, are not exactly what I would call champions:

At 30-50k users (depending on the RAM footprint of the tested HTTP server) the kernel OOM kill-switch "Terminates" wrk/wrk2 for using 190+ GB of my 192 GB RAM machine.

In contrast, G-WAN, which is doing many more things, consumes around 700 MB of RAM (a client being simpler than a server, this fact alone is revealing about how much expertise and care is dedicated to benchmark tools by the best-funded "scalability and benchmark experts").

That's why I felt the need to make my own benchmark, which will be integrated to and published with G-WAN. With it, it will be possible to benchmark high-concurrencies on miniPCs with 4 GB of RAM. A much (much) welcome change for the unfunded crowds in a world with ever-raising acquisition and operating costs (hardware, energy, floor-space. etc.).

Nevertheless, I have promised to investigate the wrk2 issue further, and have discovered that the situation was much worse than presented, as the proposed patch would not fix the main issue:

(1) wrk2's thread calibration takes as much time as the benchmark itself (default for both: 10 seconds – the benchmark duration can be specified on the command-line... but the calibration duration is silently extended: calibrate_delay = 10_seconds + (thread->connections * 5), a total nonsense for all concurrencies, carefully hidden with the use of MACROS!).

(2) wrk2's main() setups a stop_at time before creating the threads and a start time after creating and calibrating the threads, so the benchmark_effective_duration = benchmark_specified_duration - calibration_duration
(what could possibly go wrong in wonderland, right?).

(3) wrk2's main() does the RPS calculation req_per_s = complete / runtime_s which turns the division into a multiplication (leading to bogus values) when the actual benchmark time (default: 10 seconds) is reduced by the calibration time (default: 10 seconds) to less than 1 second (that's the first parallelization bug).

This deadly issue happens most of the time because the calibration time and actual benchmarking time are nearly identical!

The obvious fix was to do this in wrk.c, not in main() but rather in the threads' function (both for wrk and wrk2):

 thread->start   = time_us();
 thread->stop_at = thread->start + (cfg.duration * 1e6); // GPG: <= THE FIX
 aeMain(loop); // GPG: => the actual benchmark starts here, AFTER thread calibration was done
 thread->stop_at = time_us(); // GPG: save the REAL (not planned) thread exit

Now, we tell every thread to run for (at least) the user-specified time. wrk2 benchmarks will last longer than before because the thread calibration time will not be subtracted from the thread benchmarking execution time (they will be cumulated).

And, most probably, like in real life, not all client threads will start and end at the same time, making benchmarks last even longer (than the default duration, or the one specified on the command-line)... so I have reduced the crappy calibration duration from 10+ seconds to 1 second (the fairy tale of measuring latencies is a scam since the wrk2 client is massively slower than any decent server, with CPU-starved "ready" connections queuing into their event-based loop and waiting for their turn to get some CPU cycles).

But since the starting time and execution time are different for each thread, we can't calculate the RPS in main() like wrk and wrk2 (actually wrk2 was doing much worse by interverting start and stop) are doing it since 2012: by taking the start of the first thread and the end of the last one (that's the second parallelization bug).

Doing so is necessarily false (due to the OS tasks and threads scheduling, background processes, etc. threads do not start and stop at the same time, nor they all have the same lifespan) – that's basic parallelism synchronization, a discipline publicly normalized with the 1995 POSIX threads publication. In 2026, 30 years later, there is no excuse for doing it wrong by-design to such an extent... in a tool supposedly benchmarking high-performance multi-threaded servers!

Instead, the RPS must be accounted for in each thread – which in turn will contribute to report the final server performance (in RPS) better since all the thread execution durations are more exactly matching the specified benchmark time.

I have modified wrk2 to do this properly, and... wrk2 still failed to report coherent threads benchmark lifespans: wrk2 now reports that G-WAN delivers 469m RPS at 10k users. Why?

Unfortunately, wrk2 has added to wrk yet another deadly issue: it stops the thread workers far before the planned time, and for no obvious reason. The potential reasons are multiple: broken connections, event loops errors, signals, and even more bugs in all of these organs (hence, probably, the loss of options for end-users trying to make sense out of the resulting incoherence).

This explains the erratic performance, and the "elegant bypass" of the author (who decided to bury the problem rather than to resolve it, by picking out of his hat, as seen previously, a nonsensical threads lifespan from main() without consideration for the thread calibration process overhead), generating the dire consequences exposed here in all of their splendor:

./wrk2 -d10s -t3 -c3 -R100m "http://127.0.0.1:8080/100.html"
Created 3 event-loop(s) in 0.000 seconds
Created 3 thread(s)     in 0.000 seconds
Running 10s test @ http://127.0.0.1:8080/100.html
  3 threads and 3 connections
- thread #? PLANNED start: 1,776,343,474.613 sec, stop:1,776,343,484.613 sec, duration:10.000 sec, cfg.dur:10
- thread #? PLANNED start: 1,776,343,474.613 sec, stop:1,776,343,484.613 sec, duration:10.000 sec, cfg.dur:10
- thread #? PLANNED start: 1,776,343,474.613 sec, stop:1,776,343,484.613 sec, duration:10.000 sec, cfg.dur:10
- thread #0 benchmark time: 0.018 sec (18,200 usec)
- thread #1 benchmark time: 0.029 sec (29,228 usec)
- thread #2 benchmark time: 0.034 sec (34,120 usec)

There's no more wonder about why the wrk2 results may be nonsensical: a benchmark supposedly lasting 10 seconds (or 10 minutes) can stop far before the worker threads have been running for ONE single second.

Now comes the real problem, because what we see here this should have never, ever happened. Something is broken, somewhere, in that horrible mess of 5,316 lines of code called wrk2. And, your mission, if you accept it, is to find it (well, nobody has ever done this in the past 14 years, not even the wrk2 author).

This "something" is so horrible that the author himself has abandoned all hopes and has left his broken work "as is", without even warning the world about the abomination that has been delivered, packed with a gift-card. That tells a lot about the quality standards of these people – the authors and their friends, but also all the open-source "for profit" companies that have documented and distributed these broken test tools for decades, without the most elementary examination, maintenance, or technical support (despite plenty of github incidents, so some have worked on this code, without seeing anything bad).

I am not paid by anyone. My products and papers are censored for 3 decades by the friends of the above geniuses. Yet I did the work they failed to do – for their own benchmark tools. And I have documented the guilty: disabling the offending organ (presented as a "major achievement", a recurring pattern it seems) has resolved the problem:

  // GPG: let's give away the "Crown Jewels" for the sake of... operability:
  //aeCreateTimeEvent(loop, calibrate_delay, calibrate, thread, NULL);
  aeCreateTimeEvent(loop, timeout_delay, check_timeouts, thread, NULL);

The tiny threads lifespan issue is resolved (remember, I setup and check all thread benchmark time from within threads):

./wrk2 -d10s -t3 -c3 -R100m "http://127.0.0.1:8080/100.html"
Created 3 event-loop(s) in 0.000 seconds
Created 3 thread(s)     in 0.000 seconds
Running 10s test @ http://127.0.0.1:8080/100.html
  10 threads and 10 connections
- thread #? PLANNED  start: 1,776,344,751.151 sec, stop:1,776,344,761.151 sec, duration:10.000 sec, cfg.dur:10
- thread #? PLANNED  start: 1,776,344,751.151 sec, stop:1,776,344,761.151 sec, duration:10.000 sec, cfg.dur:10
- thread #? PLANNED  start: 1,776,344,751.151 sec, stop:1,776,344,761.151 sec, duration:10.000 sec, cfg.dur:10
- thread #? ACTUAL    STOP: 1,776,344,761.151 sec, thread lifespan: 10.000 sec
- thread #? ACTUAL    STOP: 1,776,344,761.151 sec, thread lifespan: 10.000 sec
- thread #? ACTUAL    STOP: 1,776,344,761.151 sec, thread lifespan: 10.000 sec
- thread #0 benchmark time: 10.000 sec (10,000,005 usec)
- thread #1 benchmark time: 10.000 sec (10,000,006 usec)
- thread #2 benchmark time: 10.000 sec (10,000,004 usec)

See? By just removing the "Crown Jewels" of wrk2 (its pointless and broken yet celebrated calibration), you get back a reliable tool. The advantage of using wrk2 is that it stops, more or less at the specified time, instead of taking forever to complete the benchmark, like wrk does it (when it is slower than the tested HTTP server).

wrk2 has been first published in 2012 by Gil Tene. In 2026, these 4 major by-design flaws are 14-year old – for something presented as "A constant throughput, correct latency recording variant of wrk".

Yet, wrk, created by Will Glozer, only slightly miscalculates the RPS and has no calibration/benchmark time flaw at all (its only gap, beyond its slow architecture, is about not stopping at the specified time when a server (like G-WAN) is faster than the benchmark tool – something that some may have interpreted as a G-WAN flaw: "Oh, you see, this server is so slow that the test lasts forever!" while actually the opposite was true: wrk is slow, not G-WAN).

It would be very interesting to hear about why Gil Tene felt the need to extend the calibration time so much in wrk2, to the point where it completely defeats the purpose of benchmarking... while claiming that wrk2 is "more exact" than wrk!

After examining the wrk2 source code, there are very (very) strange things like variables and functions implemented and not used, redundant slow function calls, and... purposedly misleading messages like "Initialised %d threads in %.3f ms" while the timing was for event-loops creation (thread creation was not timed and was not reported) – here we don't talk about a skills gap: fairness is absent, for decades.

The source code of wrk2 stinks and would deserve a complete rewrite (if it was not badly designed in the first place). Its only purpose seems to be as slow, faulty and inefficient as possible... while carefully hiding its sins with pointless layers, redundancy, chaos and complexity (like in "a haystack is required to hide a needle")... while eventually boosting the benchmark scores via unpardonable elementary thread-synchronization programming mistakes.

More generally, using event-queues for high-latency networks, low concurrencies and mostly-idle clients works (slowly), but this model will quickly show its limits on localhost (or on fast networks), and generate VERY HIGH latencies ("ready" queued connections are starved while only one is processed at a time), and while higher concurrencies (more users) will hit the small 2-second wrk2 timeouts:

  #define SOCKET_TIMEOUT_MS     60000 // GPG: 1 minute, was 2000 (2 seconds)
  #define TIMEOUT_INTERVAL_MS   60000 // GPG: 1 minute, was 2000 (2 seconds)

In the same spirit, calibrate_delay = 10_seconds + (thread->connections * 5);, is an absolute nonsense, especially with high concurrencies (and has disastrous consequences when subtracted from the actual benchmark time like wrk2 is, very wrongly, doing it).

Last but not least, the calibration disaster should have been made optional by its author – at least, without it, wrk2 would have been useful (instead of a major nuisance for decades and myriads of end-users, all over the world).

Either these "widely praised scalability experts" are not familiar with the concept of arithmetic overflow and compiler warnings... or they knew very well what they were doing. In both cases, their source code is not trustworthy – and the fact that nobody has felt the need to correct it (in 14 years) reveals how serious is the whole self-congratulating cohort (that feels the need to exclude anyone doing better).

I have quickly corrected the most deadly bugs, added some useful comments and printed messages, added pretty thousands for the readability of RPS and timing, a crash handler to show where and why the animal fails, etc. but I don't see the point of wasting more time on the outrageously amateurish wrk2 codebase. Stating "amateurish" is much nicer than "criminal" because there are many hints that all this mediocrity and bad design choices was a plan rather than merely due to utter incompetence: one cannot at the same time do difficult things and fail miserably on the most basic things... that are critical for the whole to operate correctly.

So much for the "plausible deniability" too often presented as an excuse by the serial wrongdoers: "don't attribute to maliciousness what can be attributed to stupidity". Yes, right. Strangely, the "stupid guys" are reaping the bounty every single time, by censoring, denigration and sabotaging the competition... and they only make mistakes that actually benefit them. Stupidity is supposedly enjoying a more random distribution than unleashed greed enjoying infinite impunity.

Reminder: wrk was written by the NGINX team. wrk2 enthusiastically went even further in the promotion of multi-threading errors (some among us consider that cheating is always right). G-WAN, while faster than all others since 2009, was constantly censored, denigrated and sabotaged. And NGINX was sold by a server vendor for $670m in 2019.
On one side, there's relentless funding and promotion, and on the other side, 17 years of discreet but constant murders. Call this "accidental" if you can.

SO, SINCE 2012 MOST WRK AND WRK2 BENCHMARKS ARE BOGUS – AND NOBODY HAS EVER NOTICED... IN 14 YEARS!

After fixing wrk2's latest available source code and recompiling it, I quickly tested it and... it crashed at 10k users. Wow, nobody seemed to have addressed the bug I have experienced a year ago.

I re-downloaded wrk2 from several sources to compare it to the version I downloaded in October 2024. In the 2024 source code, the RPS flaws were already there... but at least this 2024 version of wrk2 (published before the April 2025 G-WAN benchmarks) had no problem to test up to 40k users without crashing.

In the newest versions of wrk2 available on Github, Ubuntu repositories, etc., the Makefile has also been heavily rewritten (so they have time for this, but not to make better tools, or to correct the faulty ones they distribute). The resulting executable file is now 10 times smaller than before (by not embedding the libraries it relies on, so the executable will fail if copied to another machine, due to GNU GLibC incompatibilities and shared library versioning) – and all these new versions are crashing at... 10k users!

If someone wanted to sabotage the tool that allows G-WAN (or any reasonnably performing server) to shine (and that reveals the deffects of NGINX and all other servers), the exact same thing would have been required to be done.

I am sure that there will be people claiming that all of these mistakes are "accidental" – but I hardly see why and how wrk2 crashing at 10k+ users was a necessary feature for a multicore benchmark tool widely considered and celebrated as the "best of its class".

If there's no outright fraud here, I can't understand why it si so difficult to find a reasonably designed, performing and reliable benchmark tool for servers: all other tools, including the most recent made in Go and Rust, are even slower and less capable than wrk2... so ever-degraded quality and spiraling budgets are presented as "the unescapable march of progress"! Cui bono?

I have called wrk3 this redesigned version of wrk2 (without the 3 bogus RPS by-design flaws) that doesn't crash at 10k users... and which is much easier to compile since (1) it comes with all its dependencies and (2) has a Makefile using them.

wrk3 gives "exact numbers" (that is, far less volatile and far less inflated than wrk2 and wrk): G-WAN now tops at 15m RPS at 40k users on the same machine where the same (relatively old) version of G-WAN topped at 281m RPS at 10k users and 63m RPS at 40k users... with the original (now known as faulty) wrk2 (and 500+m RPS with NGINX's wrk) benchmark tools published by the main Linux distributions and countless "scalability experts" and hosting web sites.

The 2026 G-WAN version is now faster than the 2025 version tested here, but that will be for another blog post, where we will compare G-WAN server benchmarks made by wrk3 (wrk2 being too broken to produce useful benchmarks) and the integrated G-WAN benchmark tool.

I share wrk3 to let people test their own works and G-WAN... because we all deserve better tools than the ones provided by the best-funded and well-promoted "experts" of this "big-tech" industry. Wake-up, small is beautiful (and reliable, maybe because it can't afford to buy favorable media, fiscal and legal exposure).