高级计算机体系结构设计及其在数据中心和云计算的应用Where Are the Branches?Topredict abranch,mustfindthebranchPCLI-I00101010Where is the branch in the fetch group?
高级计算机体系结构设计及其在数据中心和云计算的应用 Where Are the Branches? • To predict a branch, must find the branch L1-I PC 1001010101011010101001 0101001010110101001010 0101010101101010010010 0000100100111001001010 Where is the branch in the fetch group?
高级计算机体系结构设计及其在数据中心和云计算的应用SimplisticFetchEngineFetchPCLI-IDirTargetPDPDPDPDPredPredsizeof(inst)Branch's PCHugelatency(reducesclockfrequency)
高级计算机体系结构设计及其在数据中心和云计算的应用 Simplistic Fetch Engine L1-I PD PD PD PD Dir Pred Target Pred Fetch PC PD PD PD PD Dir Pred Target Pred Branch’s PC + sizeof(inst) Huge latency (reduces clock frequency)
高级计算机体系结构设计及其在数据中心和云计算的应用Branch IdentificationPredecodebranchesonfillfromL2LI-IDirTargetPredPredBranch's PCStore I bit perinst,setif instsizeof(inst)is a branchpartial-decodelogic removedHighlatency(L1-Ionthecriticalpath)
高级计算机体系结构设计及其在数据中心和云计算的应用 Branch Identification L1-I Dir Pred Target Pred Predecode branches on fill from L2 + Branch’s PC sizeof(inst) Store 1 bit per inst, set if inst is a branch partial-decode logic removed High latency (L1-I on the critical path)
高级计算机体系结构设计及其在数据中心和云计算的应用Line GranularityPredictfetchgroup withoutlocation of branches-With one branch infetchgroup, does it matterwhere it is?X区区区区Z区区NOnepredictor entryper fetch groupOnepredictorentryper instruction PC
高级计算机体系结构设计及其在数据中心和云计算的应用 Line Granularity • Predict fetch group without location of branches – With one branch in fetch group, does it matter where it is? X X T X X N X X T N One predictor entry per instruction PC One predictor entry per fetch group
高级计算机体系结构设计及其在数据中心和云计算的应用Predicting byLineLI-Ibrbr2DirTargetXPredPredCorrectCorrectDirPredTarget Predbrlbr2sizeof($-line)NNNTTYNCacheLineaddressTTXThis is still challenging:we mayneedto choosebetween multipletargets for the same cache lineLatencydeterminedbybranchpredictor
高级计算机体系结构设计及其在数据中心和云计算的应用 Predicting by Line L1-I br1 br2 Dir Pred Target Pred Correct Correct X Y + sizeof($-line) Correct Dir Pred Correct br1 br2 Target Pred Cache Line address N N N - N T T Y T - T X This is still challenging: we may need to choose between multiple targets for the same cache line Latency determined by branch predictor