A superscalar processor can independently execute multiple instructions at once during a single clock cycle. It includes redundant execution resources, such as multiple floating-point units, arithmetic logic units and integer shifters. This type of processor is designed for parallel computing and speculative execution without the need for special software. It can improve the execution speed of many processor-intensive applications by manipulating and rearranging code. It is frequently used in several classes of computers, including servers, desktops and even laptops.
Although some aspects of the architecture have been used in processors since the 1960s, true superscalar processors were not released until much later. Some Reduced Instruction Set Computing (RISC) processors sold in the late 1980s and early 1990s were superscalar. Their simple cores and fixed-length instructions made dispatching and scheduling of parallel instructions relatively easy. Many non-RISC processors manufactured since the late 1990s have superscalar architectures as well. Embedded, low-power and other specialty processors are often exceptions, optimizing other aspects of their designs instead of parallel execution.
Some processors can share a single pipeline with instructions from multiple execution threads, known as super-threading. When a functional unit is idle because it is waiting on the currently-executing thread, it can execute an instruction from another thread in the meantime. This technique helps to fully utilize the processor but is not as efficient as simultaneous multithreading (SMT). A single superscalar processor can execute multiple instructions from multiple threads at the same time using SMT. Simultaneously executing threads can compete for system and processor resources, however, which may slow the system down.
Processors with multi-stage pipelines may execute multiple instructions simultaneously as long as they are at different stages of execution. In contrast, several instructions can execute in the same stage at the same time in a superscalar processor. While there are many similarities between this type of processor and a multi-core processor, they are not the same. A multi-core processor contains several complete processors, called cores, in one device. Each core of a multi-core processor is usually superscalar and may include several parallel pipelines.
A key feature of a superscalar processor is its ability to execute program code out of order for efficiency reasons. To do this, it must be aware of any instruction's dependencies upon another. If an instruction affects the results of or uses the resources of another, it is highly likely that the two must not be executed in parallel. Techniques exist to eliminate some types of dependencies, but others must delay processing until resources are available.
Out-of-order execution must also be able to guarantee proper contents of registers, flags and other resources if a system interrupt occurs. In this case, the system state must look the same, as if the code had been executed sequentially as designed. Another consideration is how many instructions the processor should examine to find opportunities for parallel execution. The greater the number, the more efficient the execution can be. This also includes speculative execution of instructions on the other side of each branch in the examined code.