A stack register is a computer central processor register whose purpose is to keep track of a call stack. On an accumulator-based architecture machine, this may be a dedicated register. On a machine with multiple general-purpose registers, it may be a register that is reserved by convention, such as on the IBM System/360 through z/Architecture architecture and RISC architectures, or it may be a register that procedure call and return instructions are hardwired to use, such as on the PDP-11, VAX, and Intel x86 architectures. Some designs such as the Data General Eclipse had no dedicated register, but used a reserved hardware memory address for this function.

Machines before the late 1960s—such as the PDP-8 and HP 2100—did not have compilers which supported recursion. Their subroutine instructions typically would save the current location in the jump address, and then set the program counter to the next address.^[1] While this is simpler than maintaining a stack, since there is only one return location per subroutine code section, there cannot be recursion without considerable effort on the part of the programmer.

A stack machine has 2 or more stack registers — one of them keeps track of a call stack, the other(s) keep track of other stack(s).

Stack registers in x86

In 8086, the main stack register is called "stack pointer" (SP). The stack segment register (SS) is usually used to store information about the memory segment that stores the call stack of currently executed program. SP points to current stack top. By default, the stack grows downward in memory, so newer values are placed at lower memory addresses. To save a value to the stack, the PUSH instruction is used. To retrieve a value from the stack, the POP instruction is used.

Example: Assuming that SS = 1000h and SP = 0xF820. This means that current stack top is the physical address 0x1F820 (this is due to memory segmentation in 8086). The next two machine instructions of the program are:

PUSH AX
PUSH BX

These first instruction shall push the value stored in AX (16-bit register) to the stack. This is done by subtracting a value of 2 (2 bytes) from SP.
The new value of SP becomes 0xF81E. The CPU then copies the value of AX to the memory word whose physical address is 0x1F81E.
When "PUSH BX" is executed, SP is set to 0xF81C and BX is copied to 0x1F81C.^[2]

This illustrates how PUSH works. Usually, the running program pushes registers to the stack to make use of the registers for other purposes, like to call a routine that may change the current values of registers. To restore the values stored at the stack, the program shall contain machine instructions like this:

POP BX
POP AX

POP BX copies the word at 0x1F81C (which is the old value of BX) to BX, then increases SP by 2. SP now is 0xF81E.
POP AX copies the word at 0x1F81E to AX, then sets SP to 0xF820.^{[nb 1]}^{[nb 2]}

Stack engine

Simpler processors store the stack pointer in a regular hardware register and use the arithmetic logic unit (ALU) to manipulate its value. Typically push and pop are translated into multiple micro-ops, to separately add/subtract the stack pointer, and perform the load/store in memory.^[3]

Newer processors contain a dedicated stack engine to optimize stack operations. Pentium M was the first x86 processor to introduce a stack engine. In its implementation, the stack pointer is split among two registers: ESP_O, which is a 32-bit register, and ESP_d, an 8-bit delta value that is updated directly by stack operations. PUSH, POP, CALL and RET opcodes operate directly with the ESP_d register. If ESP_d is near overflow or the ESP register is referenced from other instructions (when ESP_d ≠ 0), a synchronisation micro-op is inserted that updates the ESP_O using the ALU and resets ESP_d to 0. This design has remained largely unmodified in later Intel processors, although ESP_O has been expanded to 64 bits.^[4]

A stack engine similar to Intel's was also adopted in the AMD K8 microarchitecture. In Bulldozer, the need for synchronization micro-ops was removed, but the internal design of the stack engine is not known.^[4]

Notes

References

^ Salomon, David (February 1993) [1992]. Written at California State University, Northridge, California, USA. Chivers, Ian D. (ed.). Assemblers and Loaders (PDF). Ellis Horwood Series In Computers And Their Applications (1 ed.). Chicester, West Sussex, UK: Ellis Horwood Limited / Simon & Schuster International Group. ISBN 0-13-052564-2. Archived (PDF) from the original on 2020-03-23. Retrieved 2008-10-01. Most computers save the return address in either the stack, in one of the registers, or in the first word of the procedure (in which case the first executable instruction of the procedure should be stored in the second word). If the latter method is used, a return from the procedure is a jump to the memory location whose address is contained in the first word of the procedure. (xiv+294+4 pages)
^ Howard, Brian. "Assembly Tutorial - Instructions". Computer Science Department, DePauw University. Retrieved 2013-07-19.
^ Stokes, Jon "Hannibal" (2004-02-25). "A Look at Centrino's Core: The Pentium M". archive.arstechnica.com. p. 5.
^ ^a ^b Fog, Agner. "The microarchitecture of Intel, AMD and VIA CPUs" (PDF). Technical University of Denmark.

Processor technologies

Models

Architecture

Instruction set
architectures

Types	Orthogonal instruction set CISC RISC Application-specific EDGE TRIPS VLIW EPIC MISC OISC NISC ZISC VISC architecture Quantum computing Comparison Addressing modes
Instruction sets	Motorola 68000 series VAX PDP-11 x86 ARM Stanford MIPS MIPS MIPS-X Power POWER PowerPC Power ISA Clipper architecture SPARC SuperH DEC Alpha ETRAX CRIS M32R Unicore Itanium OpenRISC RISC-V MicroBlaze LMC System/3x0 S/360 S/370 S/390 z/Architecture Tilera ISA VISC architecture Epiphany architecture Others

Execution

Instruction pipelining	Pipeline stall Operand forwarding Classic RISC pipeline
Hazards	Data dependency Structural Control False sharing
Out-of-order	Scoreboarding Tomasulo's algorithm Reservation station Re-order buffer Register renaming Wide-issue
Speculative	Branch prediction Memory dependence prediction

Parallelism

Level	Bit Bit-serial Word Instruction Pipelining Scalar Superscalar Task Thread Process Data Vector Memory Distributed
Multithreading	Temporal Simultaneous Hyperthreading Simultaneous and heterogenous Speculative Preemptive Cooperative
Flynn's taxonomy	SISD SIMD Array processing (SIMT) Pipelined processing Associative processing SWAR MISD MIMD SPMD

Processor
performance

Transistor count
Instructions per cycle (IPC)
- Cycles per instruction (CPI)
Instructions per second (IPS)
Floating-point operations per second (FLOPS)
Transactions per second (TPS)
Synaptic updates per second (SUPS)
Performance per watt (PPW)
Cache performance metrics
Computer performance by orders of magnitude

Types

By application	Embedded system Microprocessor Microcontroller Mobile Ultra-low-voltage ASIP Soft microprocessor
Systems on chip	System on a chip (SoC) Multiprocessor (MPSoC) Cypress PSoC Network on a chip (NoC)
Hardware accelerators	Coprocessor AI accelerator Graphics processing unit (GPU) Image processor Vision processing unit (VPU) Physics processing unit (PPU) Digital signal processor (DSP) Tensor Processing Unit (TPU) Secure cryptoprocessor Network processor Baseband processor

Word size

Core count

Components

Functional units	Arithmetic logic unit (ALU) Address generation unit (AGU) Floating-point unit (FPU) Memory management unit (MMU) Load–store unit Translation lookaside buffer (TLB) Branch predictor Branch target predictor Integrated memory controller (IMC) Memory management unit Instruction decoder
Logic	Combinational Sequential Glue Logic gate Quantum Array
Registers	Processor register Status register Stack register Register file Memory buffer Memory address register Program counter
Control unit	Hardwired control unit Instruction unit Data buffer Write buffer Microcode ROM Counter
Datapath	Multiplexer Demultiplexer Adder Multiplier CPU Binary decoder Address decoder Sum-addressed decoder Barrel shifter
Circuitry	Integrated circuit 3D Mixed-signal Power management Boolean Digital Analog Quantum Switch

Power
management