A Modern Memory HierarchyRegisterFile32words,sub-nsecmanual/compilerregister spillingMemoryL1cacheAbstraction~32 KB,~nsecL2cacheAutomatic512KB~1MB,manynsecHW cachemanagementL3 cache,...Mainmemory (DRAM)GB,~100 nsecautomaticdemandSwapDiskpaging100GB,~10msecComputerArchitecture26
Computer Architecture A Modern Memory Hierarchy 26 Register File 32 words, sub-nsec L1 cache ~32 KB, ~nsec L2 cache 512 KB ~ 1MB, many nsec L3 cache, . Main memory (DRAM), GB, ~100 nsec Swap Disk 100 GB, ~10 msec manual/compiler register spilling automa4c demand paging Automa4c HW cache management Memory Abstrac4on
Hierarchical Latency AnalysisFor a given memory hierarchy leveli it has a technologyintrinsic access time of t, The perceived access time T islonger than t.Except for the outer-most hierarchy, when looking foragivenaddress there is- a chance (hit-rate h,) you "hit" and access time is t,- a chance (miss-rate m,) you “miss" and access time t, +Ti+1- h,+ m, = 1. ThusT, = h,'t; + m; (t, + Ti+1)T, = t; + m, Ti+1keepinmind,h,and m,aredefinedto bethehit-rateand miss-rate of just the references that missed at Li-1ComputerArchitecture27
Computer Architecture Hierarchical Latency Analysis • For a given memory hierarchy level i it has a technologyintrinsic access 4me of ti, The perceived access 4me Ti is longer than ti • Except for the outer-most hierarchy, when looking for a given address there is – a chance (hit-rate hi ) you “hit” and access 4me is ti – a chance (miss-rate mi ) you “miss” and access 4me ti +Ti+1 – hi + mi = 1 • Thus Ti = hi ·ti + mi ·(ti + Ti+1) Ti = ti + mi ·Ti+1 keep in mind, hi and mi are defined to be the hit-rate and miss-rate of just the references that missed at Li-1 27
Hierarchy Design ConsiderationsRecursivelatencyequationT, = t; + m, Ti+1The goal: achieve desired T, within allowed costT, ~t, is desirableKeepm,low- increasing capacity C,lowers m, but beware of increasing t- lower m,by smarter management (replacement::anticipate whatyoudon't need,prefetching::anticipate whatyou willneed)KeepTi+low-fasterlowerhierarchies,butbewareofincreasing cost-introduceintermediatehierarchiesasa compromiseComputerArchitecture28
Computer Architecture Hierarchy Design Considerations • Recursive latency equa4on Ti = ti + mi ·Ti+1 • The goal: achieve desired T1 within allowed cost • Ti ≈ ti is desirable • Keep mi low – increasing capacity Ci lowers mi , but beware of increasing ti – lower mi by smarter management (replacement::an4cipate what you don’t need, prefetching::an4cipate what you will need) • Keep Ti+1 low – faster lower hierarchies, but beware of increasing cost – introduce intermediate hierarchies as a compromise 28
Intel Pentium4Example·90nmP4.3.6GHzif m,=0.1, m,=0.1· L1 D-cacheT1=7.6, T,=36- Ci = 16Kif m1=0.01, mz=0.01- t, = 4 cyc int / 9 cycle fpT1=4.2, Tz=19.8· L2 D-cache-C,=1024KBif m,=0.05,m,=0.01T,=5.00, T,=19.8- t, = 18 cyc int / 18 cyc fpMainmemoryif m,=0.01, mz=0.50T1=5.08, T,=108- t, = ~50ns or 180 cycNotice-bestcaselatencyis not1-worstcaseaccesslatenciesareinto500+cyclesComputerArchitecture
Computer Architecture • 90nm P4, 3.6 GHz • L1 D-cache – C1 = 16K – t1 = 4 cyc int / 9 cycle fp • L2 D-cache – C2 =1024 KB – t2 = 18 cyc int / 18 cyc fp • Main memory – t3 = ~ 50ns or 180 cyc • No4ce – best case latency is not 1 – worst case access latencies are into 500+ cycles if m1=0.1, m2=0.1 T1=7.6, T2=36 if m1=0.01, m2=0.01 T1=4.2, T2=19.8 if m1=0.05, m2=0.01 T1=5.00, T2=19.8 if m1=0.01, m2=0.50 T1=5.08, T2=108 Intel Pentium 4 Example
Cache Basics and OperationComputerArchitecture
Computer Architecture Cache Basics and Operation