Theory vs. Practice
Diagnosis is not the end, but the beginning of practice.
How to Choose a Multicore CPU in 2025 – and How Much it will Matter
When I was young, CPUs were single-core, and the CPU frequency was an indicator of its speed. As the frequency was doubling every 2 years, the code execution was also doubling (sometimes a bit more thanks to new CPU instructions). The faster the CPU frequency, the better – things were simple:


After year 2001 (and the CPU frequency halt), things became more difficult because CPU vendors started to market multicore CPUs under names that were no longer related to their capacity.
Today, larger reference numbers do not necessarily imply better performance: for example, there are Intel i5 or i7 CPUs that are faster than some i9 or Xeon CPUs.
CPU frequency remains a relevant criteria – but since CPU frequencies no longer grow beyond the 5-6 GHz limit there are other important metrics to consider.
Higher clock speeds imply higher energy consumption, that's true.
But CPU "governors" (powersave vs. performance) and a user-defined variable clock-speed range (from 800 MHz to 5500 MHz for my i9 CPU) let you decide how much your CPU will consume (so a larger range is more desirable).
In 2009-2024, G-WAN was benchmarked on the 2008 Intel Xeon 6-Core that tops the charts on the left.
Purchased in 2024, my 2022 Intel Core i9 is 8.36 times faster overall, and its processing speed per Core (or thread) is 4.07 times higher than for the 2008 Xeon.
So the problem that everyone is facing is... how to navigate the hundreds of CPU references to pick the CPU you need?
Is it better to have many Cores – or to have less CPU Cores and a higher CPU frequency? Are there other criterias to take into account?
For Desktop machines, this maze is further obscured by the fact that PC vendors may forcibly add a (high-margin) graphic card... that is redundant with your CPU features (my i9 CPU embeds an "Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770]").
Unless you need a GPU (my i9 runs 3D video games effortlessly), a PCI 3D graphic card is pointlessly consuming a lot of power.
We will resolve these questions, explain what are the gains, and how to reduce your purchase and operating costs.
Why Bother?
Well, it's all about costs – a fast, many-Core CPU can save you many computers – that is, if your software is able to exploit parallelism. As the chart below illustrates it, in 2020, the GLibC memory allocator and Microsoft Research Mimalloc scaled poorly as compared to TWD's SLIMalloc (used by G-WAN v15+):

A popular (yet inefficient) alternative to scalable software is to use Virtual Machines.
Virtual Machines let you split a computer into several 'virtual' computers... but this feature comes at a cost (a 3-5 times slow-down as compared to bare-metal hardware).
Also, hosted servers using Virtual Machines don't let you setup the CPU and Linux kernel options (aggravating the performance losses).
So, any time you can, select software that scales on multicore systems: it will do the job much faster, on many less computers.
Thanks to SLIMalloc, on the top of adding memory-safety, G-WAN removed the GLibC bottleneck related to memory-allocation.
And, since G-WAN (1) is self-contained (statically-linked) and (2) does its own irrevocable sandboxing (its code and servlets cannot exit the /gwan folder), it is already isolating itself from any other program running on the same system.
Since G-WAN starts with a 500-700 KB memory footprint (the "RSS", Resident Set Size), you can run many, many G-WAN instances that are secure because they are completely isolated from each-other (and from the system).
Under higher load, G-WAN allocates more memory – and it frees it once the traffic goes down (here a short excerpt of the G-WAN "status" page for one of the 6 instances that have been running for a week, along other network applications, on the same bare-metal server):
- server uptime .... 00.00.07 04:03:39 - traffic total .... in:54.4 MB, out:7.2 GB - traffic daily .... in:7.8 MB, out:994.4 MB - traffic today .... in:3.9 MB, out:512.1 MB - VM address-space . 3'694'592 bytes (3.5 MB) - RSS physical RAM . 851'968 bytes (832.0 KB) - RSS peak ......... 7'200'768 bytes (6.8 MB) - pages reclaims ... 172'220'416 bytes (164.2 MB) - RAM free/total ... 91% of 15.2 GB - Disk free/total .. 99% of 417.8 GB - request time ..... 52 µs [27-106]
Of course, with such a low RAM and CPU consumption, Vitual Machines can be used with G-WAN, but they no longer serve any purpose (they just slow down G-WAN, consume more memory than necessary, and inject more vulnerabilities than already present in the OS).
How to pick the best CPUs? (larger caches, highest frequency, largest number of Cores)
I have used the 'Passmark' CPU benchmark for almost 2 decades because it is very reliably reflecting the real performance of CPUs. There are many lists, for Desktop machines, low-consumption devices, servers – so you don't have to waste hours to find what you are looking for.
For example, here are the CPU I have used for benchmarks from 2008 to 2024 and the one used since 2024:

Everything (clock speed, number of Cores, cache sizes, energy consumption) has improved... but the most game-changing criterias for me are the single-thread rating (4.07 times better!) and, of course, the overall score (8.36 times better – due to the x4 number of Cores).
Before 2001, a factor four in single-thread performance was reached in 4 years! Here, we had to wait from 2008 to 2022 (16 years!) to get such a performance boost. This is why, IMHO, you should not miss such an opportunity when it becomes available.
I repeat: the single-thread rating defines the speed at which your code will run (faster or slower). So, for me, that's the most important metric (the number of Cores matters, but having many SLOW Cores is pointless).
- Here is a list of the most powerful CPUs (some have very high prices).
- This list is for more reasonable prices.
- Today, the top (with Intel CPUs at a reasonable price) is the 14th generation.
You have to consult these lists because the CPU reference names can be close – yet have very different capacities:


Since most software programs do not scale on multicore, it is more rewarding to have a higher passmark "single-thread" score than more (slower) CPU Cores.
And, with programs that scale on multicore systems (like G-WAN), target the highest number of Cores and the highest passmark "single-thread" score (this results in the highest passmark overall score).
If you have many machines, the promise of energy savings may invite you to pick lower CPU frequencies – but that's an error because you will just waste energy by needing many more machines than necessary (that is, if you are using software that scales on multicore, like G-WAN).
How CPU specs impact program execution – and how code quality has an impact
The reasons of this multicore 'tax' (inability for programs to scale) is easier to visualize with a schema of the hardware memory layout of a computer (see how much faster data access is when made from the CPU caches, as compared to system memory, or, even worse, disks):

And what is true for data is also true for code (so organizing data and code properly, in compact and contiguous areas for each CPU Core, has dramatic consequences (eg: saving TLBs, the need to keep translation tables between the system physical memory and the process virtual memory address-space) – just like the CPU caches sizes: here, larger is better):

Even worse, CPUs are much, much faster than system memory, so, CPUs waste most of their time waiting for data to crunch (or code to execute) – hence the value of keeping things (code and data) altogether (rather than spending too much time at reloading the CPU caches):

How this helps to explain the difference of performance between G-WAN and NGINX
Already in late 2008 (before the first public release in 2009) G-WAN was faster than NGINX on a single CPU Core (or single thread). The code was just beter written, simpler, less convoluted.
On my 2008 Intel Xeon 6-Core CPU, I could already appreciate how much better G-WAN was performing, due to better thread isolation (NGINX uses shared memory between several worker processes).
At this time, since the GLibC did not offer memory pools (nor per-thread allocation) I have added these features to G-WAN. This reached a new level with SLIMalloc in 2020 because SLIMalloc was designed since day one to perform on single-core CPUs, and to scale on multicore systems.
Over the years, I rewrote G-WAN and replaced GLibC with the above constraints in mind: simpler, smaller, more efficient. The result is now much, much more visible than in 2009, because my code has improved, but also because Intel CPUs now embed many more CPU Cores – that execute code much faster.
G-WAN better respects the architecture of multicore CPUs than NGINX, so it runs hundreds of times faster. That's mere computer science.