What’s the difference between a superscalar and a vector processor

cpucpu-architecture

They both can process multiple instructions in the same time, but I suppose there is a fundamental difference which explains why there are two names and we haven't just switched to using superscalar ones always?

Also, if I understood correctly, both scalar and vector instructions are present in a modern CPU, so I suppose those two are not mutually exclusive (scalar instructions such as mov or add will be executed superscalar-ly and e.g. dot product will be calculated vector-ly in some special black magic-kind of way)?

Best Answer

A superscalar processor is capable of executing multiple instructions within a single program in parallel. It does this by analyzing the instruction stream to determine which instructions do not depend on each other, and having multiple execution units within the processor to do the work simultaneously (e.g. multiple ALUs). Compiler support is generally not required to optimize code for superscalar processors as the functionality is typically implemented entirely in hardware.1

A vector processor contains instructions specifically designed to operate on whole groups of multiple data values at once (called arrays or vectors). Most modern high-performance processors contain some form of vector processing capability; for example; the SSE ADDPS instruction available in most x86 processors computes the sum of two vectors each containing four single-precision values. Compiler, developer, and operating system support are typically required to use vector instructions, and not every processor, even in current generations, support the most advanced vector instructions (e.g. Intel Celeron and Pentium processors, even as of Kaby Lake, do not support AVX).

More technical information about how today's processors achieve high performance is available in this answer.


1 An alternative, and rather unusual, design approach is to have multiple execution units but let the compiler determine what instructions to issue to each execution unit for each clock cycle. This is called very long instruction word and is typically only found on specialized processors.