Latency of CPU instructions on x86 and x64 processors

64-bitassemblycpulatencyx86

I'm looking for some table or something similar that could help me to calculate efficiency of assembly code.

As I know bit shifting takes 1 CPU clock, but I really looking how much takes addition (subtraction should take the same), multiplication and how to presumably calculate division time if I know values that are dividing.

I really need info about integer values, but float execution times are welcome too.

Best Answer

In general, each of these operations takes a single clock cycle as well to execute if the arguments are in registers at the various stages of the pipeline.

What do you mean by latency? How many cycles an operation spends in the ALU?

You might find this table useful: http://www.agner.org/optimize/instruction_tables.pdf

Since modern processors are super scalar and can execute out of order, you can often get total instructions per cycle that exceed 1. The arguments for the macro command are the most important, but the operation also matters since divides take longer than XOR (<1 cycle latency).

Many x86 instructions can take multiple cycles to complete some stages if they are complex (REP commands or worse MWAIT for example).

Related Question