How to find a list of all SSE instructions? What happens if a CPU doesn’t support SSE

computer-architecturecpucpu-architecture

So I've been reading about how processors work. Now I'm on the instructions (SSE, SSE2, etc) stuff. (Which is pretty interesting).

I have lot of questions (I've been reading this stuff on Wikipedia):

  1. I've saw the names of some instructions that were added on SSE, however there's no explanation about any of them (Maybe SSE4? They're not even listed on Wikipedia). Where can I read about what they do?

  2. How do I know which of these instructions are being used?

  3. If we do know which are being used, let's say I'm doing a comparison, (This may be the most stupid question I've ever asked, I don't know about assembly, though) Is it possible to directly use the instruction on an assembly code? (I've been looking at this: http://asm.inightmare.org/opcodelst/index.php?op=CMP)

  4. How does the processor interpret the instructions?

  5. What would happen if I had a processor without any of the SSE instructions? (I suppose in the case we want to do a comparison, we wouldn't be able to, right?)

Best Answer

I've saw the names of some instructions that we're added on SSE, however there's no explain about all of them (Maybe SSE4? They're not even listed on Wikipedia). Where i can read about what they do?

The best source would be straight from the people who designed the extensions: Intel. The definitive references are the Intel® 64 and IA-32 Architectures Software Developer Manuals; I would recommend that you download the combined Volumes 1 through 3C (first download link on that page). You may want to look at Vol. 1, Ch. 12 - Programming with SSE3, SSSE3, SSE4 and AESNI. To refer to specific instructions, see Vol. 2, Ch. 3-4. (Appendix B is also helpful)


How do i know which of these instructions are being used?

The instructions are only used if a program you're running actually uses them (i.e. the bytecode corresponding to the various SSE4 instructions are being called). To find out what instructions a program uses, you need to use a disassembler.


If we do know which are being used, let's say i'm doing a comparation, (This may be the stupidest question i've ever done, i don't know about assembly, though) It's possible to directly use the instruction on an assembly code? (I've been looking at this: http://asm.inightmare.org/opcodelst/index.php?op=CMP)

How does the processor interpret the instructions?

You may want to have a look at my answer to the question, "How does a CPU 'know' what commands and instructions actually mean?". When you write out assembly code by hand, to make an executable, you pass the "human readable" assembly code to an assembler, which turns the instructions into the actual 0's and 1's the processor executes.


What would happen if i have a processor without any of the SSE instructions? (I suppose if in the case we want to do a comparation, we wouldn't be able, right?)

Since your computer is Turing complete, it can execute any arbitrary mathematical function using a software algorithm if it does not have the dedicated hardware to do so. Obviously, doing intense parallel or matrix mathematics in hardware is much faster than in software (requiring many loops of instructions), so this would cause a slow-down for the end user. Depending on how the program was created, it's possible that it may require a particular instruction (i.e. one from the SSE4 set), although given it's possible to do the same thing in software (and thus useable on more processors), this practice is rare.


As an example of the above, you may recall when processors first came out with the MMX instruction set extension. Let's say we want to add two 8-element, signed 8-bit vectors together (so each vector is 64-bits, equal to a single MMX register), or in other words, A + B = C. This could be done with a single MMX instruction called paddsb. For brevity, let's say our vectors are held at memory locations A, B, and C as well. Our equivalent assembly code would be:

movq   MM0, [A]
paddsb MM0, [B]
movq   [C], MM0

However, this operation could also easily be done in software. For example, the following C code performs the equivalent operation (since a char is 8-bits wide):

#define LEN 8
char A[LEN], B[LEN], C[LEN];

/* Code to initialize vectors A and B... */

for (i = 0; i < LEN; i++)
{
    C[i] = A[i] + B[i];
}

You can probably guess how the assembly code of the above loop would look, but it's clear that it would contain significantly more instructions (as we now need a loop to handle adding the vectors), and thus, we would need to perform that many more fetches. This is similar to how the word length of a processor affects a computer's performance (the purpose of MMX/SSEx is to provide both larger registers, as well as the ability to perform the same instruction on multiple pieces of data).

Related Question