This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: "Branch target predictor" – news · newspapers · books · scholar · JSTOR (October 2007) (Learn how and when to remove this template message) This article's factual accuracy may be compromised due to out-of-date information. Please help update this article to reflect recent events or newly available information. (March 2017) (Learn how and when to remove this template message)


In computer architecture, a branch target predictor is the part of a processor that predicts the target, i.e. the address of the instruction that is executed next, of a taken conditional branch or an unconditional branch instruction before the target of the branch instruction is computed by the execution unit of the processor.

Branch target prediction is not the same as branch prediction which attempts to guess whether a conditional branch will be taken or not-taken (i.e., binary).

In more parallel processor designs, as the instruction cache latency grows longer and the fetch width grows wider, branch target extraction becomes a bottleneck. The recurrence is:

In machines where this recurrence takes two cycles, the machine loses one full cycle of fetch after every predicted taken branch. As predicted branches happen every 10 instructions or so, this can force a substantial drop in fetch bandwidth. Some machines with longer instruction cache latencies would have an even larger loss. To ameliorate the loss, some machines implement branch target prediction: given the address of a branch, they predict the target of that branch. A refinement of the idea predicts the start of a sequential run of instructions given the address of the start of the previous sequential run of instructions.

This predictor reduces the recurrence above to:

As the predictor RAM can be 5–10% of the size of the instruction cache, the fetch happens much faster than the instruction cache fetch, and so this recurrence is much faster. If it were not fast enough, it could be parallelized, by predicting target addresses of target branches.

See also

Further reading