The trend towards multiple cores is an engineering approach that helps the CPU designers avoid the power consumption problem that came with ever increasing frequency scaling. As CPU speeds rose into the 3-4 Ghz range the amount of electrical power required to go faster started to become prohibitive. The technical reasons for this are complex but factors like heat losses and leakage current (power that simply passes through the circuitry without doing anything useful) both increase faster as frequencies rise. While it's certainly possible to build a 6 GHz general purpose x86 CPU, it's not proven economical to do so efficiently. That's why the move to multi-core started and it is why we will see that trend continue at least until the parallelization issues become insurmountable. At the moment the trend towards virtualization has helped in the server arena as that allows us to parallelize aggregate workloads efficiently, for the moment at any rate.
As a practical example the E5640 Xeon (4 cores @ 2.66 GHz) has a power envelope of 95 watts while the L5630 (4 Cores @ 2.13 GHz) requires only 40 watts. That's 137% more electrical power for 24% more CPU power for CPU's that are for the most part feature compatible. The X5677 pushes the speed up to 3.46 GHz with some more features but that's only 60% more processing power for 225% more electrical power.
Now compare the X5560 (2.8 GHz, 4 cores, 95 watts) with the newer X5660 (2.8 GHz, 6 cores, 95 watts) and there's 50% extra computing power in the socket (potentially, assuming that Amdahl's law being kind to us for now) without requiring any additional electrical power. AMD's 6100 series CPU's see a similar gains in aggregate performance over the 2400\8400 series while keeping electrical power consumption flat.
For single-threaded tasks this is a problem but if your requirements are to deliver large amounts of aggregate CPU power to a distributed processing cluster or a virtualization cluster then this is a reasonable approach. This means that for most server environments today scaling out the number of cores in each CPU is a much better approach than trying to build faster\better single core CPU's.
The trend will continue for a while but there are challenges and continually scaling out the number of cores is not easy (keeping memory bandwidth high enough and managing caches gets much harder as the number of cores grows). That means that the current fairly explosive growth in the number of cores per socket will have to slow down in a couple of generations and we will see some other approach.
This is known as heterogeneous multiprocessing (HMP) and is widely adopted by mobile devices. In ARM-based devices which implement big.LITTLE, the processor contains cores with different performance and power profiles, e.g. some cores run fast but draw lots of power (faster architecture and/or higher clocks) while others are energy-efficient but slow (slower architecture and/or lower clocks). This is useful because power usage tends to increase disproportionately as you increase performance once you get past a certain point. The idea here is to get performance when you need it and battery life when you don't.
On desktop platforms, power consumption is much less of an issue so this is not truly necessary. Most applications expect each core to have similar performance characteristics, and scheduling processes for HMP systems is much more complex than scheduling for traditional SMP systems. (Windows 10 technically has support for HMP, but it's mainly intended for mobile devices that use ARM big.LITTLE.)
Also, most desktop and laptop processors today are not thermally or electrically limited to the point where some cores need to run faster than others even for short bursts. We've basically hit a wall on how fast we can make individual cores, so replacing some cores with slower ones won't allow the remaining cores to run faster.
While there are a few desktop processors that have one or two cores capable of running faster than the others, this capability is currently limited to certain very high-end Intel processors (as Turbo Boost Max Technology 3.0) and only involves a slight gain in performance for those cores that can run faster.
While it is certainly possible to design a traditional x86 processor with both large, fast cores and smaller, slower cores to optimize for heavily-threaded workloads, this would add considerable complexity to the processor design and applications are unlikely to properly support it.
Take a hypothetical processor with two fast Kaby Lake (7th-generation Core) cores and eight slow Goldmont (Atom) cores. You'd have a total of 10 cores, and heavily-threaded workloads optimized for this kind of processor may see a gain in performance and efficiency over a normal quad-core Kaby Lake processor. However, the different types of cores have wildly different performance levels, and the slow cores don't even support some of the instructions the fast cores support, like AVX. (ARM avoids this issue by requiring both the big and LITTLE cores to support the same instructions.)
Again, most Windows-based multithreaded applications assume that every core has the same or nearly the same level of performance and can execute the same instructions, so this kind of asymmetry is likely to result in less-than-ideal performance, perhaps even crashes if it uses instructions not supported by the slow cores. While Intel could modify the slow cores to add advanced instruction support so that all cores can execute all instructions, this would not resolve issues with software support for heterogeneous processors.
A different approach to application design, closer to what you're probably thinking about in your question, would use the GPU for acceleration of highly parallel portions of applications. This can be done using APIs like OpenCL and CUDA. As for a single-chip solution, AMD promotes hardware support for GPU acceleration in its APUs, which combine a traditional CPU and a high-performance integrated GPU onto the same chip, as Heterogeneous System Architecture, though this has not seen much industry uptake outside of a few specialized applications.
Best Answer
The main reason why a quad-core 3GHz processor is never as fast as a 12GHz single core is to do with how the task running on that processor works, i.e. single-threaded or multi-threaded. Amdahl's Law is important when considering the types of tasks you are running.
If you have a task that is inherently linear and has to be done precisely step-by-step such as (a grossly simple program)
Then the task depends highly on the result of the previous pass and cannot run multiple copies of itself without corrupting the value of
'a'
as each copy would be getting the value of'a'
at different times and writing it back differently. This restricts the task to a single thread and thus the task can only ever be running on a single core at any given time, if it were to run on multiple cores then the synchronisation corruption would happen. This limits it to 1/2 of the cpu power of a dual core system, or 1/4 in a quad core system.Now take a task such as:
All of these lines are independent and could be split into 4 separate programs like the first and run at the same time, each one able to make effective use of the full power of one of the cores without any synchronisation problem, this is where Amdahl's Law comes into it.
So if you have a single threaded application doing brute force calculations the single 12GHz processor would win hands down, if you can somehow make the task split into separate parts and multi-threaded then the 4 cores could come close to, but not quite reach, the same performance, as per Amdahl's Law.
The main thing that a multi CPU system gives you is responsiveness. On a single core machine that is working hard the system can seem sluggish as most of the time could be being used by one task and the other tasks only run in short bursts in between the larger task, resulting in a system that seems sluggish or juddery. On a multi-core system the heavy task gets one core and all the other tasks play on the other cores, doing their jobs quickly and efficiently.
The argument of "6 cores x 0.2GHz = 1.2Ghz" is rubbish in every situation except where tasks are perfectly parallel and independant. There are a good number of tasks that are highly parallel, but they still require some form of synchronsation. Handbrake is a video trancoder that is very good at using all the CPUs available but it does require a core process to keep the other threads filled with data and collect the data that they are done with.
Each core is capable of doing x calculations per second, assuming the workload is suitable parallel, on a linear program all you have is 1 core.
I think it is a fallacy to think that 4 x 3GHz = 12GHz, granted the maths works, but you're comparing apples to oranges and the sums just aren't right, GHz can't simply be added together for every situation. I would change it to 4 x 3GHz = 4 x 3GHz.