Zhaoxin, a China-based x86-licensed CPU developer, has yet to officially unveil its next-generation KaiSheng KH-40000 CPUs with up to 16 cores for data centers. However, it has already started submitting benchmark results to the Geekbench 5 database. The new processors show noticeable performance improvements related to the microarchitecture compared to their predecessors, but they can barely catch up with modern processors from AMD and Intel.
Zhaoxin, co-owned by Via Technologies and the Shanghai Municipal Government, has been gradually using microarchitectures designed by Via (or rather Centaur) since the mid-2010s, and its upcoming KaiSheng KH-40000 series processors for data centers are based of CentaurHauls microarchitecture that some claim resembles Intel’s 2013 Haswell microarchitecture.
The KaiSheng KH-40000/16 and KaiSheng KH-40000/12 processors run at 2.20 GHz, have 16 and 12 cores, and are equipped with 32MB and 24MB of L3 cache, respectively. In addition, the 16-core model appears to feature simultaneous multi-threading (SMT) technology, so it can handle up to 32 threads simultaneously, assuming Geekbench 5 correctly reads its capabilities. Based on Zhaoxin specifications KaiSheng KH-40000/16 and KaiSheng KH-40000/12 published on the Geekbench 5 database, these processors look very similar to Centaur’s never-released CHA processor discovered earlier this year.
However, there are differences: the CHA has eight cores, does not support SMT, and is designed for TSMC’s N16 nodes, while the KaiSheng KH-40000 has up to 16 cores, appears to include SMT, and is believed to be designed for TSMC’s N7 manufacturing process. Also, the CPU IDs on both KH-40000 CPUs read “CentaurHauls Family 7 Model 11 Step 3” (1, 2), while the Centaur CPU ID CHA is “CentaurHauls Family 6 Model 71 Stepping 2”, so the processors in question use different silicon.
The odd thing though is that both the CHA and the KH-4000 run at 2.20 GHz, so if we didn’t know the CPU ID we could speculate that the KH-4000/16 model uses two eight-core CHA dies made on TSMC’s N16 knot and glued together using binding.
For Zhaoxin, the CentaurHauls should be a significant microarchitectural advance over the LuJiazui microarchitecture of 2019. Additionally, the improved core count should make the KaiSheng KH-40000 processors more competitive in the server market. So, let’s take a look at the performance numbers presented by the processor developer.
|Zhaoxin KH-40000/16||Zhaoxin KH-40000/12||Centaur CHA||Zhaoxin KX-U6780A||AMD FX-8350||Core i9-12900K||Ryzen 9 5950X|
|General specifications||16C/32T, 2.20GHz, 32MB L3||12C/12T, 2.20GHz, 24MB L3||8C/8T, 2.20GHz, 16MB L3||8C/8T, 2.70GHz, 8MB L3||4C/8T||8P, 8E, 3.20~5.10GHz, 30MB||16C, 3.40 ~ 5.0 GHz, 64MB||General specifications|
|Microarchitecture||CentaurHauls||CentaurHauls||CentaurHauls||LuJiaZui||Bulldozer/Pile Driver||Golden Cove + Gracemont||Zen 3||Microarchitecture|
|OS||UnionTech OS DT 20 Pro||Windows 10 Pro||Windows 10 Pro||Windows 10 Pro||?||Windows 11 Pro||Windows 10 Pro||OS|
|Single core | Integer||450||439||476||366||670||1830||1435||Single core | Integer|
|Single core | A float||559||538||541||318||607||2189||1881||Single core | A float|
|Single core | Crypto||1039||934||782||583||1040||6064||4089||Single core | Crypto|
|Single core | result||512||493||511||362||670||2149||1702||Single core | result|
|Multicore | Integer||9293||3452||3307||2364||3570||20631||16695||Multicore | Integer|
|Multicore | A float||11875||4176||3723||2089||3563||23205||18695||Multicore | A float|
|Multicore | Crypto||5233||2119||4825||3390||2431||17413||8145||Multicore | Crypto|
|Multicore | result||9915||3603||3508||2333||3511||21242||16868||Multicore | result|
When it comes to single-threaded performance, Zhaoxin’s CentaurHaul microarchitecture (or Centaur) significantly outperforms the company’s previous-generation LuJiazui microarchitecture in both integer (by 22%) and floating-point workloads (by 75%), although the new processor operates at 2.20 GHz. In contrast, the older one runs at 2.70 GHz. The increase in FPU performance looks pretty dramatic, but you have to remember that we’re dealing with a synthetic benchmark.
Although the new microarchitecture is significantly better than the previous one, the KaiSheng KH-40000 processors with 12 and 16 cores cannot compete with any modern processors. Also, their single-threaded performance is even lower than that of AMD’s ill-fated Bulldozer/Piledriver architecture from mid-2012.
In terms of multi-threaded performance, we see a rather odd advantage that Zhaoxin’s 16-core KaiSheng KH-40000/16 with SMT has over the 12-core KaiSheng KH-40000/12 CPU. While the 16C/32T chip can theoretically handle 2.66 times more threads than its 12C/12T counterparts (and we’ve never seen that kind of SMT efficiency from any well-known CPU microarchitecture before), its actual performance advantage is higher than even the hypothetical 2.66X (2.69X in integer, 2.84X in float). Since we are dealing with a situation where one CPU has only four more cores than its rival, but its performance is almost three times higher, we believe that there are factors beyond the number of cores that have such an effect on performance.
Considering that Windows 10/11 does not always work optimally with programmers planning unfamiliar multi-core processors, we believe that the results of the 12-core KaiSheng KH-40000/12 processor obtained in Windows 10 Pro do not reflect its true potential.
Still, even under Windows 10 Pro and without SMT, CentaurHoals is significantly faster than LuJiazui on multithreaded targets (by 40%) and multithreaded floating-point workloads (78%). The problem is that the absolute performance values demonstrated by the KaiSheng KH-40000 and Centaur CHA processors are insufficient by today’s standards.
Interestingly, the multi-threaded performance values demonstrated by Zhaoxin’s 12-core KaiSheng KH-40000/12 under Windows and without SMT are comparable to AMD’s FX-8350 processor (four modules, eight threads) that the company once sold like an octa core PROCESSOR. We can hardly call the performance of a decade-old processor competitive by today’s standards, at least in Geekbench 5, which is not the best benchmark.
While the 12-core and 16-core configurations seem suitable for entry-level desktops and servers, the 12- and 16-core Zhaoxin cores do not deliver performance comparable to 12-core or 16-core processors from AMD and Intel. Under Windows, and judging by Geekbench 5 scores alone, Zhaoxin appears to be a decade behind AMD and Intel in terms of performance. Even if Zhaoxin enables SMT on its upcoming CentaurHoals-based processors (for client and server applications) and Windows “learns” how to properly use these cores, the KaiSheng KH-40000/16 will still be twice as slow as the 2021 processors from AMD and Intel with the same number of cores.