Next: Appendices Up: Out of Order Processor Previous: Cache Hierarchy Contents Subsections
Branch Prediction Introduction PTLsim provides a variety of branch predictors in
branchpred.cpp. The branch prediction subsystem is relatively independent of the core simulator and can be treated as a black box, so long as it implements the interfaces in
branchpred.h.
The branch prediction subsystem always contains at least three distinct predictors for the three main classes of branches:
- Conditional Branch Predictor returns a boolean (taken or not taken) for each conditional branch (br.cc uop)
- Branch Target Buffer (BTB) predicts indirect branch (jmp uop) targets
- Return Address Stack (RAS) predicts return instructions (i.e. specially marked indirect jmp uops) based on prior calls
- Unconditional branches (bru) are never predicted since their destination is explicitly encoded.
All these predictors are accessed by the core through the
BranchPredictorInterface object. Based on the opcode and other uop information, the core determines the type flags of each branch uop:
- BRANCH_HINT_UNCOND for unconditional branches. These are never predicted since the destination is implied.
- BRANCH_HINT_COND for conditional branches.
- BRANCH_HINT_INDIRECT for indirect branches, including returns.
- BRANCH_HINT_CALL for calls (both direct and indirect). This implies that the return address of the call should be a should be pushed on the RAS.
- BRANCH_HINT_RET for returns (indirect branches). This implies that the return address should be taken from the top RAS stack entry, not the BTB.
Multiple flags may be present for each uop (for instance,
BRANCH_HINT_RET and
BRANCH_HINT_INDIRECT are both used for the
jmp uop terminating an x86
ret instruction).
To make a prediction at fetch time, the core calls the
BranchPredictorInterface::predict() method, passing it a
PredictorUpdate structure. This structure is carried along with each uop until it retires, and contains all the information needed to eventually update the branch predictor at the end of the pipeline. The contents will vary depending on the predictor chosen, but in general this structure contains pointers into internal predictor counter tables and various flags. The
predict()method fills in this structure.
As each uop commits, the
BranchPredictorInterface::update() method is passed the uop's saved
PredictorUpdate structure and the branch outcome (expected target RIP versus real target RIP) so the branch predictor can be updated. In PTLsim, predictor updates only occur at retirement to avoid corruption caused by speculative instructions.
Conditional Branch Predictor The PTLsim conditional branch predictor is the most flexible predictor, since it can be easily replaced. The default predictor implemented in
branchpred.cpp is a selection based predictor. In essence, two separate predictors are maintained. The
history predictor hashes a shift register of previously predicted branches into a table slot; this slot returns whether or not the branch with that history is predicted as taken. PTLsim supports various combinations of the history and branch address to provide
gshare based semantics. The
bimodal predictor is simpler; it uses 2-bit saturating counters to predict if a given branch is likely to be taken. Finally, a
selection predictor specifies which of the two predictors is more accurate and should be used for future predictions. This style of predictor, sometimes called a
McFarling predictor, has been described extensively in the literature and variations are used by most modern processors.
Through the
CombinedPredictor template class, the user can specify the sizes of all the tables (history, bimodal, selector), the history depth, the method in which the global history and branch address are combined and so on. Alternatively, the conditional branch predictor can be replaced with something entirely different if desired.
Branch Target Buffer The Branch Target Buffer (BTB) is essentially a small cache that maps indirect branch RIP addresses (i.e.,
jmp uops) into predicted target RIP addresses. It is set associative, with a user configurable number of sets and ways. In PTLsim, the BTB does not take into account any indirect branch history information. The BTB is a nearly universal structure in branch prediction; see the literature for more information.
Return Address Stack The Return Address Stack (RAS) predicts the target address of indirect jumps marked with the
BRANCH_HINT_RET flag. Whenever the
BRANCH_HINT_RET flag is passed to the predict() method, the top RAS stack entry is returned as the predicted target, overriding anything in the BTB.
Unlike the conditional branch predictor and BTB, the RAS updated speculatively in the frontend pipeline, before the outcome of calls and returns are known. This allows better performance when closely spaced calls and returns must be predicted as they are fetched, before either the call or corresponding return have actually executed. However, when called with the
BRANCH_HINT_RET flag, the
predict() method only returns the RIP at the top of the RAS, but does not push or pop the RAS. This must be done after the corresponding
bru or
jmp (for direct and or indirect calls, respectively) or
jmp (for returns) uop is actually allocated in the ROB.
This approach is required since the RAS is speculatively updated: if uops must be annulled (because of branch mispredictions or mis-speculations), the annulment occurs by walking backwards in the ROB until the excepting uop is encountered. However, if the RAS were updated during the fetch stage, some uops may not be in the ROB yet and hence the rollback logic cannot undo speculative changes made to the RAS by these uops. This causes the RAS to get out of alignment and performance suffers.
To solve this problem, the RAS is only updated in the allocate stage immediately after fetch. In the out of order core's
rename() function, the
BranchPredictorInterface::updateras() method is called to either push or pop an entry from the RAS (calls push entries, returns pop entries). Unlike the conditional branch predictor and BTB, this is the only place the RAS is updated, rather than performing updates at commit time.
If uops must be annulled, the
ReorderBufferEntry::annul() method calls the
BranchPredictorInterface::annulras() method with the
PredictorUpdate structure for each uop it encounters in reverse program order. This method effectively undoes whatever change was made to the RAS when the
updateras() method was called with the same
PredictorUpdate information during renaming and allocation. This is possible because
updateras() saves checkpoint information (namely, the old RAS top of stack and the value at that stack slot) before updating the RAS; this allows the RAS state to be rolled backwards in time as uops are annulled in reverse program order. At the end of the annulment process when fetching is restarted at the correct RIP, the RAS state should be identical to the state that existed before the last uop to be annulled was originally fetched.
Next: Appendices Up: Out of Order Processor Previous: Cache Hierarchy Contents Matt T Yourst 2007-09-26