高级计算机体系结构设计及其在数据中心和云计算的应用IndirectBranchFrequency in JavaProcessingJava:uniformlyhigh (dueto OO features + JvM)C:gcc, perl and li (compilation and interpretation)Benchmarks%ofIndirectBranchesJITInterpretationdb3.02.53.3jess2.42.61.9javac2.52.1jack2.72.0mtrt4.31.6compress0.7go0.4compress0.8m88ksim1.1gcc0.2jjpegli2.02.2perl0.9vortex6
高级计算机体系结构设计及其在数据中心和云计算的应用 Indirect Branch Frequency in Java Processing • Java: uniformly high (due to OO features + JVM) • C: gcc, perl and li (compilation and interpretation) Benchmarks % of Indirect Branches Interpretation JIT db 3.0 2.5 jess 3.3 2.4 javac 2.6 1.9 Java jack 2.5 2.1 (SPECJvm98) 6 jack 2.5 2.1 mtrt 2.7 2.0 compress 4.3 1.6 Java (SPECJvm98) go 0.7 compress 0.4 m88ksim 0.8 gcc 1.1 ijpeg 0.2 li 2.0 C (SPECInt95) perl 2.2 vortex 0.9
高级计算机体系结构设计及其在数据中心和云计算的应用ProblemStatementImpactof OS activityonbranchbehaviorof Javais notwellunderstood.IndirectbranchbehaviorinJavaisnotwellunderstoodDothecharacteristicsofOSbranchesorindirectbranchesmotivatearchitecturalenhancementsinprocessors?Ifso,whatmodifications?
高级计算机体系结构设计及其在数据中心和云计算的应用 Problem Statement • Impact of OS activity on branch behavior of Java is not well understood. • Indirect branch behavior in Java is not well understood. 7 • Do the characteristics of OS branches or indirect branches motivate architectural enhancements in processors? If so, what modifications?
高级计算机体系结构设计及其在数据中心和云计算的应用Characterization of Os branch behaviorKernelinvocationsareshort-livedandkernelexecutesfewerbranchespercontext10001000jack (kernel)jack (user)10010010020003000500001000200010004000300040005000UserContextSerialNoKernelContextSerialNoExecutedBranchesinUser andKernel Contexts(5,000SamplingContextsonSPECjvm98Benchmarkjack)8
高级计算机体系结构设计及其在数据中心和云计算的应用 Characterization of OS branch behavior – Kernel invocations are short-lived and kernel executes fewer branches per context. jack (user) 100 1000 Number of Executed jack (kernel) 100 1000 Number of Executed 8 1 10 100 0 1000 2000 3000 4000 5000 Context Switch Number of Executed Branches 1 10 100 0 1000 2000 3000 4000 5000 Context Switch Number of Executed Branches Executed Branches in User and Kernel Contexts (5,000 Sampling Contexts on SPECjvm98 Benchmark jack) User Context Serial No. Kernel Context Serial No
高级计算机体系结构设计及其在数据中心和云计算的应用80kernel70usererer6050403020100compress80vadiaAverageNumberofExecutedBranchesper Context in User and Kernel Modes
高级计算机体系结构设计及其在数据中心和云计算的应用 30 40 50 60 70 80 kernel user 9 0 10 20 Average Number of Executed Branches per Context in User and Kernel Modes
高级计算机体系结构设计及其在数据中心和云计算的应用ThePredictabilityof OsbranchesBranch(PC)bitsusedforbranchaddressTotalBHTof2j+kEntriesBHSRibitsSizeofbitsSchemebits usedBHSRBHTschemeAsizeforBHTindex(# ofselectionBHSRS(i=1..9)index(i)(i)BHT(k)entries)2bc.2'K002'Ki+10002'KGAg.2'Ki+10.0i+642'KGAs.2'K2'K00Gshare.2'Ki+10kbitsi+5i+842'KSAs.2'KBHT: Branch History TablepredictionBHSR:BranchHistoryShiftRegister14□2K12■4K108K□16K86■32K4202bcSAsGAsGAgGshareBranchPredictorConfigurations and Kernel BranchesPredictability[ISPASS'o1]
高级计算机体系结构设计及其在数据中心和云计算的应用 The Predictability of OS branches BHT of 2j+k Entries branch address . . . . . . . BHSRs . . . i bits k bits j bits Branch (PC) bits used for Scheme size (i=1.9) BHSR selection (i) BHT index (j) BHSR bits used for BHT index (k) Total Size of scheme (# of BHT entries) 2bc.2i K 0 i+10 0 2i K GAg.2i K 0 0 i+10 2i K GAs.2i K 0 i+6 4 2i K Gshare.2i K 0 0 i+10 2i K SAs.2i K i+8 i+5 4 2i K 10 prediction BHT: Branch History Table BHSR: Branch History Shift Register SAs.2i K i+8 i+5 4 2i K 0 2 4 6 8 10 12 14 2bc GAs SAs GAg Gshare Misprediction Rate (%) 2K 4K 8K 16K 32K Branch Predictor Configurations and Kernel Branches Predictability [ISPASS’01]