As always with caching questions, the answer would be "it entirely depends on your workload". The cache is only of any use if your running processes are spending a significant amount of time accessing memory and exhibit a noticeable locality of reference for memory addressing (and are not happy with the smaller L1/L2 caches present per core for this matter).
Having a high number of processes running within different threads increases the odds for thrashing of the shared cache and thus diminishes performance gains which possibly would have been achieved otherwise. This is also the reason for increasing the cache size with an increased core count - the more memory-competing threads you have running, the larger your shared cache likely needs to be in order to be useful at all.
There is an oldish article from Tom's Hardware comparing two old P4 chips with and without L3 cache for a number of rendering / graphical workloads. The numbers are rubbish, as is the whole benchmark, but it contains a nice explanation of the caching architecture in general and L3 caching in particular.
The bottom line: you likely would not notice the difference, but if you need the exact numbers, you would have to purchase both CPUs and run your workload on both of them to compare runtimes.
The A12X is an enormous CPU built on the latest technology, leaving far behind
the older i7-4790T dating from 2014.
First difference is the manufacturing process:
The A12X is a 7 nm chip, while the i7-4790T Haswell-DT is built on older 22 nm.
Smaller transistors mean less space, less operating power and faster signals
across shorter chip paths.
The A12X has a whopping 10 billion
transistors, while the i7-4790T has only 1.4 billion.
This allows the A12X to have six integer execution pipelines, among which
two are complex units, two load and store units, two branch ports,
and three FP/vector pipelines, giving a total of an estimated 13 execution ports,
far more than the eight execution ports of the Haswell-DT architecture.
For cache size, per core we have on the A12: Each Big core has
L1 cache of 128kB and L2 cache of 8MB. Each Little core has 32kB of L1and 2MB of L2. There’s also an additional 8 MB of SoC-wide$ (also used for other things).
Haswell architecture has L1 cache of 64KB per core, L2 cache of 256KB per core,
and L3 cache of 2–40 MB (shared).
It can be seen that the A12X beats the i7-4790T on all points and by a large margin.
Regarding RISC vs CISC architecture, this is now a moot point on modern processors.
Both architectures have evolved to the point where they now
emulate each other’s features to a degree in order to mitigate weaknesses.
I quote here the chart of comparisons to Xeon 8192, i7 6700k,
and AMD EPYC 7601 CPUs, compiled by Reddit (link below),
where the A12 compares well even with desktop processors:
Sources :
Best Answer
(This post is asking for speculation and I'm happy to oblige.)
The problem is that the current technology had hit its limits, so only minor performance improvements are now possible. Improvements of 10-20% just don't sound very convincing.
On the other hand, manufacturers do not wish to fall behind Moore's law, stating that computer chip performance would roughly double every 18 months (with no increase in power consumption). This needs an improvement factor of 100%, and such single-core technology just does not exist.
Solution : Double the number of cores and sum up their total capacity, as proof that performance is evolving fast enough by 100%.
In real life this theoretical increase of the number of cores is not guaranteed to increase the total performance, since some computer resources are shared and may become bottlenecks, for example the RAM, bus and disk.
Increasing the number of cores cannot be done indefinitely, especially in view of electrical consumption. For a core to work faster, it needs more electricity. This means that the more cores you have, each will have a smaller part of the total available electricity and so must work slower.
The solution here is turbo mode, whereby one core gets most of the available electrical supply. So you have one fast core and the others either turned off or slowed down. But as one core cannot support that mode indefinitely, the solution is to switch turbo mode on for multiple cores in rotation.
In general, for comparable technology, a CPU with fewer cores may prove faster than a multi-core CPU, for a core-to-core comparison. Other factors may influence the speed, but choosing between the number of cores and single-core performance is often the question. The applicability of turbo mode to the work-load is another question.