In computer science, a calling convention is an implementation-level (low-level) scheme for how subroutines receive parameters from their caller and how they return a result. Differences in various implementations include where parameters, return values, return addresses and scope links are placed (registers, stack or memory etc.), and how the tasks of preparing for a function call and restoring the environment afterwards are divided between the caller and the callee.
Calling conventions may be related to a particular programming language's evaluation strategy, but most often are not considered part of it (or vice versa), as the evaluation strategy is usually defined on a higher abstraction level and seen as a part of the language rather than as a low-level implementation detail of a particular language's compiler.
Calling conventions may differ in:
In some cases, differences also include the following:
Many architectures only have one widely-used calling convention, often suggested by the architect. For RISCs including SPARC, MIPS, and RISC-V, registers names based on this calling convention are often used. For example, MIPS registers
$7 have "ABI names"
$a3, reflecting their use for parameter passing in the standard calling convention. (RISC CPUs have many equivalent general-purpose registers so there's typically no hardware reason for giving them names other than numbers.)
Although some[which?] programming languages may partially specify the calling sequence in the language specification, or in a pivotal implementation, different implementations of such languages (i.e. different compilers) may still use various calling conventions, and an implementation may offer a choice of more than one calling convention. Reasons for this are performance, frequent adaptation to the conventions of other popular languages, with or without technical reasons, and restrictions or conventions imposed by various "computing platforms".
Main article: x86 calling conventions
The x86 architecture is used with many different calling conventions. Due to the small number of architectural registers, and historical focus on simplicity and small code-size, many x86 calling conventions pass arguments on the stack. The return value (or a pointer to it) is returned in a register. Some conventions use registers for the first few parameters which may improve performance, especially for short and simple leaf-routines very frequently invoked (i.e. routines that do not call other routines).
push EAX ; pass some register result push dword [EBP+20] ; pass some memory variable (FASM/TASM syntax) push 3 ; pass some constant call calc ; the returned result is now in EAX
Typical callee structure: (some or all (except ret) of the instructions below may be optimized away in simple procedures). Some conventions leave the parameter space allocated, using plain
ret instead of
ret imm16. In that case, the caller could
add esp,12 in this example, or otherwise deal with the change to ESP.
calc: push EBP ; save old frame pointer mov EBP,ESP ; get new frame pointer sub ESP,localsize ; reserve stack space for locals . . ; perform calculations, leave result in EAX . mov ESP,EBP ; free space for locals pop EBP ; restore old frame pointer ret paramsize ; free parameter space and return.
The standard 32-bit ARM calling convention allocates the 15 general-purpose registers as:
If the type of value returned is too large to fit in r0 to r3, or whose size cannot be determined statically at compile time, then the caller must allocate space for that value at run time, and pass a pointer to that space in r0.
Subroutines must preserve the contents of r4 to r11 and the stack pointer (perhaps by saving them to the stack in the function prologue, then using them as scratch space, then restoring them from the stack in the function epilogue). In particular, subroutines that call other subroutines must save the return address in the link register r14 to the stack before calling those other subroutines. However, such subroutines do not need to return that value to r14—they merely need to load that value into r15, the program counter, to return.
The ARM calling convention mandates using a full-descending stack.
This calling convention causes a "typical" ARM subroutine to:
The 64-bit ARM (AArch64) calling convention allocates the 31 general-purpose registers as:
All registers starting with x have a corresponding 32-bit register prefixed with w. Thus, a 32-bit x0 is called w0.
Similarly, the 32 floating-point registers are allocated as:
The PowerPC architecture has a large number of registers so most functions can pass all arguments in registers for single level calls. Additional arguments are passed on the stack, and space for register-based arguments is also always allocated on the stack as a convenience to the called function in case multi-level calls are used (recursive or otherwise) and the registers must be saved. This is also of use in variadic functions, such as
printf(), where the function's arguments need to be accessed as an array. A single calling convention is used for all procedural languages.
Main article: MIPS architecture § Calling conventions
The O32 ABI is the most commonly-used ABI, owing to its status as the original System V ABI for MIPS. It is strictly stack-based, with only four registers
$a0-$a3 available to pass arguments. This perceived slowness, along with an antique floating-point model with 16 registers only, has encouraged the proliferation of many other calling conventions. The ABI took shape in 1990 and was never updated since 1994. It is only defined for 32-bit MIPS, but GCC has created a 64-bit variation called O64.
For 64-bit, the N64 ABI (not related to Nintendo 64) by Silicon Graphics is most commonly used. The most important improvement is that eight registers are now available for argument passing; It also increases the number of floating-point registers to 32. There is also an ILP32 version called N32, which uses 32-bit pointers for smaller code, analogous to the x32 ABI. Both run under the 64-bit mode of the CPU.
A few attempts have been made to replace O32 with a 32-bit ABI that resembles N32 more. A 1995 conference came up with MIPS EABI, for which the 32-bit version was quite similar. EABI inspired MIPS Technologies to propose a more radical "NUBI" ABI that additionally reuses argument registers for the return value. MIPS EABI is supported by GCC but not LLVM; neither supports NUBI.
For all of O32 and N32/N64, the return address is stored in a
$ra register. This is automatically set with the use of the
JAL (jump and link) or
JALR (jump and link register) instructions. The stack grows downwards.
The SPARC architecture, unlike most RISC architectures, is built on register windows. There are 24 accessible registers in each register window: 8 are the "in" registers (%i0-%i7), 8 are the "local" registers (%l0-%l7), and 8 are the "out" registers (%o0-%o7). The "in" registers are used to pass arguments to the function being called, and any additional arguments need to be pushed onto the stack. However, space is always allocated by the called function to handle a potential register window overflow, local variables, and (on 32-bit SPARC) returning a struct by value. To call a function, one places the arguments for the function to be called in the "out" registers; when the function is called, the "out" registers become the "in" registers and the called function accesses the arguments in its "in" registers. When the called function completes, it places the return value in the first "in" register, which becomes the first "out" register when the called function returns.
The System V ABI, which most modern Unix-like systems follow, passes the first six arguments in "in" registers %i0 through %i5, reserving %i6 for the frame pointer and %i7 for the return address.
The IBM System/360 is another architecture without a hardware stack. The examples below illustrate the calling convention used by OS/360 and successors prior to the introduction of 64-bit z/Architecture; other operating systems for System/360 might have different calling conventions.
LA 1,ARGS Load argument list address L 15,=A(SUB) Load subroutine address BALR 14,15 Branch to called routine1 ... ARGS DC A(FIRST) Address of 1st argument DC A(SECOND) ... DC A(THIRD)+X'80000000' Last argument2
SUB EQU * This is the entry point of the subprogram
Standard entry sequence:
USING *,153 STM 14,12,12(13) Save registers4 ST 13,SAVE+4 Save caller's savearea addr LA 12,SAVE Chain saveareas ST 12,8(13) LR 13,12 ...
Standard return sequence:
L 13,SAVE+45 LM 14,12,12(13) L 15,RETVAL6 BR 14 Return to caller SAVE DS 18F Savearea7
BALRinstruction stores the address of the next instruction (return address) in the register specified by the first argument—register 14—and branches to the second argument address in register 15.
STMinstruction saves registers 14, 15, and 0 through 12 in a 72-byte area provided by the caller called a save area pointed to by register 13. The called routine provides its own save area for use by subroutines it calls; the address of this area is normally kept in register 13 throughout the routine. The instructions following
STMupdate forward and backward chains linking this save area to the caller's save area.
In the System/390 ABI and the z/Architecture ABI, used in Linux:
Main article: SuperH
|Register||Windows CE 5.0||gcc||Renesas|
|R0||Return values. Temporary for expanding assembly pseudo-instructions. Implicit source/destination for 8/16-bit operations. Not preserved.||Return value, caller saves||Variables/temporary. Not guaranteed|
|R1..R3||Serves as temporary registers. Not preserved.||Caller saved scratch. Structure address (caller save, by default)||Variables/temporary. Not guaranteed|
|R4..R7||First four words of integer arguments. The argument build area provides space into which R4 through R7 holding arguments may spill. Not preserved.||Parameter passing, caller saves||Arguments. Not guaranteed.|
|R8..R13||Serves as permanent registers. Preserved.||Callee Saves||Variables/temporary. Guaranteed.|
|R14||Default frame pointer. (R8-R13 may also serve as frame pointer and leaf routines may use R1–R3 as frame pointer.) Preserved.||Frame Pointer, FP, callee saves||Variables/temporary. Guaranteed.|
|R15||Serves as stack pointer or as a permanent register. Preserved.||Stack Pointer, SP, callee saves||Stack pointer. Guaranteed.|
Note: "preserved" reserves to callee saving; same goes for "guaranteed".
The most common calling convention for the Motorola 68000 series is:
Main article: IBM 1130
The IBM 1130 was a small 16-bit word-addressable machine. It had only six registers plus condition indicators, and no stack. The registers are Instruction Address Register (IAR), Accumulator (ACC), Accumulator Extension (EXT), and three index registers X1–X3. The calling program is responsible for saving ACC, EXT, X1, and X2. There are two pseudo-operations for calling subroutines,
CALL to code non-relocatable subroutines directly linked with the main program, and
LIBF to call relocatable library subroutines through a transfer vector. Both pseudo-ops resolve to a Branch and Store IAR (
BSI) machine instruction that stores the address of the next instruction at its effective address (EA) and branches to EA+1.
Arguments follow the
BSI—usually these are one-word addresses of arguments—the called routine must know how many arguments to expect so that it can skip over them on return. Alternatively, arguments can be passed in registers. Function routines returned the result in ACC for real arguments, or in a memory location referred to as the Real Number Pseudo-Accumulator (FAC). Arguments and the return address were addressed using an offset to the IAR value stored in the first location of the subroutine.
* 1130 subroutine example ENT SUB Declare "SUB" an external entry point SUB DC 0 Reserved word at entry point, conventionally coded "DC *-*" * Subroutine code begins here * If there were arguments the addresses can be loaded indirectly from the return addess LDX I 1 SUB Load X1 with the address of the first argument (for example) ... * Return sequence LD RES Load integer result into ACC * If no arguments were provided, indirect branch to the stored return address B I SUB If no arguments were provided END SUB
Subroutines in IBM 1130, CDC 6600 and PDP-8 (all three computers were introduced in 1965) store the return address in the first location of a subroutine.
This variability must be considered when combining modules written in multiple languages, or when calling operating system or library APIs from a language other than the one in which they are written; in these cases, special care must be taken to coordinate the calling conventions used by caller and callee. Even a program using a single programming language may use multiple calling conventions, either chosen by the compiler, for code optimization, or specified by the programmer.
Main article: Threaded code
Threaded code places all the responsibility for setting up for and cleaning up after a function call on the called code. The calling code does nothing but list the subroutines to be called. This puts all the function setup and clean-up code in one place—the prologue and epilogue of the function—rather than in the many places that function is called. This makes threaded code the most compact calling convention.
Threaded code passes all arguments on the stack. All return values are returned on the stack. This makes naive implementations slower than calling conventions that keep more values in registers. However, threaded code implementations that cache several of the top stack values in registers—in particular, the return address—are usually faster than subroutine calling conventions that always push and pop the return address to the stack.
The default calling convention for programs written in the PL/I language passes all arguments by reference, although other conventions may optionally be specified. The arguments are handled differently for different compilers and platforms, but typically the argument addresses are passed via an argument list in memory. A final, hidden, address may be passed pointing to an area to contain the return value. Because of the wide variety of data types supported by PL/I a data descriptor may also be passed to define, for example, the lengths of character or bit strings, the dimension and bounds of arrays (dope vectors), or the layout and contents of a data structure. Dummy arguments are created for arguments which are constants or which do not agree with the type of argument the called procedure expects.
all registers except d0, d1, a0, a1 and a7 should be preserved across a call.
On the 6809 or Zilog Super8, DTC is faster than STC.
Although direct-threaded interpreters are known to have poor branch prediction properties... the latency of a call and return may be greater than an indirect jump.