高级计算机体系结构设计及其在数据中心和云计算的应用Lecture 8Instruction Fetch and Branch Prediction
高级计算机体系结构设计及其在数据中心和云计算的应用 Lecture 8 Instruction Fetch and Branch Prediction
高级计算机体系结构设计及其在数据中心和云计算的应用Fetch Rate is an ILP Upper BoundInstruction fetch limits performance- To sustain IPC of N, must sustain a fetch rate of N per cycle·If you consume 1500 calories per day,but burn2000 calories per day,then you will eventually starve.- Need to fetch N on average, not on every cycleN-wide superscalar ideally fetches N insns.per cycle.This doesn't happen in practice due to:-Instructioncacheorganization-Branches-...andinteractionbetweenthetwo
高级计算机体系结构设计及其在数据中心和云计算的应用 Fetch Rate is an ILP Upper Bound • Instruction fetch limits performance – To sustain IPC of N, must sustain a fetch rate of N per cycle • If you consume 1500 calories per day, but burn 2000 calories per day, then you will eventually starve. – Need to fetch N on average, not on every cycle • N-wide superscalar ideally fetches N insns. per cycle • This doesn’t happen in practice due to: – Instruction cache organization – Branches – . and interaction between the two
高级计算机体系结构设计及其在数据中心和云计算的应用Instruction Cache Organization.To fetch N instructions per cycle...- L1-l line must be wide enough for N instructionsPCregisterselectsL1-l line. A fetch group is the set of insns. starting at PC- For N-wide machine, [PC,PC+N-1]PCInstInstInstCacheLineTagInstTagInstInstInstInstTagInstInstInstInstDecoder.TagInstInstInstInstTagInstInstInstInst
高级计算机体系结构设计及其在数据中心和云计算的应用 Instruction Cache Organization • To fetch N instructions per cycle. – L1-I line must be wide enough for N instructions • PC register selects L1-I line • A fetch group is the set of insns. starting at PC – For N-wide machine, [PC,PC+N wide machine, [PC,PC+N-1] Decoder Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Cache Line PC
高级计算机体系结构设计及其在数据中心和云计算的应用Fetch Misalignment (1/2)IfPC=xxx01001,N=4- ldeal fetch group is xxx01001 through xxx01100 (inclusive)o1PC:xxx01007100011000TagInstInstInstInst001TagInstInstInstInst010TagInstInstInstInstDecoderol1InstTagInstInstInst...?TagInstInstInstInstLinewidthFetchgroupMisalignment reducesfetchwidth
高级计算机体系结构设计及其在数据中心和云计算的应用 Fetch Misalignment (1/2) • If PC = xxx01001, N=4: – Ideal fetch group is xxx01001 through xxx01100 (inclusive) Tag Inst Inst Inst Inst 000 001 PC: xxx01001 00 01 10 11 Decoder Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst 001 010 011 111 Line width Fetch group Misalignment reduces fetch width
高级计算机体系结构设计及其在数据中心和云计算的应用Fetch Misalignment (2/2)NowtakestwocyclestofetchNinstructions一Yfetchbandwidth!PC:xXx01001000TaginstInstInstInst001Tagnstnstnstnst010Tar福instInstInstInst011ecoTagnstInstInstnsi...JerTagInstInstInstInstPC:xxx01100000TarnstInstinstInst001InstInstInstTagInstInstInstInstCycleI010TagInstnstinstInst0TagInstnstinstinstTagInstInstInstInstInstInstInstInstMightnotbeYbycombiningwiththenextfetch
高级计算机体系结构设计及其在数据中心和云计算的应用 Fetch Misalignment (2/2) • Now takes two cycles to fetch N instructions – ½ fetch bandwidth! Decoder Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst 000 001 010 011 PC: xxx01001 00 01 10 11 Decoder Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst 010 011 111 Decoder Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst 000 001 010 011 111 PC: xxx01100 00 01 10 11 Inst Inst Inst Inst Cycle 1 Cycle 2 Inst Inst Inst Might not be ½ by combining with the next fetch