高级计算机体系结构设计及其在数据中心和云计算的应用Cache Terminologyblock(cacheline):minimumunit thatmaybecachedframe: cache storage location to hold one blockhit: block is found in the cachemiss: block is not found in the cachemiss ratio: fraction of references that misshit time: time to access the cachemiss penalty:time to replace block ona miss
高级计算机体系结构设计及其在数据中心和云计算的应 用 Cache Terminology • block (cache line): minimum unit that may be cached • frame: cache storage location to hold one block • hit: block is found in the cache • miss: block is not found in the cache • miss ratio: fraction of references that miss • hit time: time to access the cache • miss penalty: time to replace block on a miss
高级计算机体系结构设计及其在数据中心和云计算的应用Cache ExampleAddress sequence from core:Core(assume8-bytelines)Miss0x10000Oxlo00o (...data...Hit0x100040x10008(..data...)Miss0x101200x10120(...data...)Miss0x10008Hit0x10124Hit0x10004MemoryFinalmissratiois50%
高级计算机体系结构设计及其在数据中心和云计算的应 用 Miss Cache Example • Address sequence from core: (assume 8-byte lines) Memory 0x10000 (.data.) 0x10120 (.data.) Hit 0x10008 (.data.) Miss Miss Hit Hit Final miss ratio is 50% Core 0x10000 0x10004 0x10120 0x10008 0x10124 0x10004
高级计算机体系结构设计及其在数据中心和云计算的应用AMAT (1/2)Verypowerful tooltoestimateperformanceIf..cache hit is 10 cycles (core to L1 and back)memory access is 100 cycles (core to mem and back)Then...at 50% miss ratio, avg. access: 0.5x10+0.5x100 = 55at 10% miss ratio, avg. access: 0.9x10+0.1x100 = 19at 1% miss ratio, avg. access: 0.99x10+0.01x100 ~ 11
高级计算机体系结构设计及其在数据中心和云计算的应 用 AMAT (1/2) • Very powerful tool to estimate performance • If . cache hit is 10 cycles (core to L1 and back) memory access is 100 cycles (core to mem and back) • Then . at 50% miss ratio, avg. access: 0.5×10+0.5×100 = 55 at 10% miss ratio, avg. access: 0.9×10+0.1×100 = 19 at 1% miss ratio, avg. access: 0.99×10+0.01×100 ≈ 11
高级计算机体系结构设计及其在数据中心和云计算的应用AMAT (2/2). Generalizes nicely to any-depth hierarchyIf...L1 cache hit is 5 cycles (core to L1 and back)L2 cache hit is 20 cycles (core to L2 and back)memory access is 100 cycles (core to mem and back).Then...at 20% miss ratio in L1 and 40% miss ratio in L2 ..avg. access: 0.8x5+0.2×(0.6x20+0.4x100) ~ 14
高级计算机体系结构设计及其在数据中心和云计算的应 用 AMAT (2/2) • Generalizes nicely to any-depth hierarchy • If . L1 cache hit is 5 cycles (core to L1 and back) L2 cache hit is 20 cycles (core to L2 and back) memory access is 100 cycles (core to mem and back) • Then . at 20% miss ratio in L1 and 40% miss ratio in L2 . avg. access: 0.8×5+0.2×(0.6×20+0.4×100) ≈ 14
高级计算机体系结构设计及其在数据中心和云计算的应用Memory Organization (1/3)ProcessorRegistersI-TLBLID-CacheD-TLBLII-CacheL2CacheL3Cache (LLC)MainMemory (DRAM)
高级计算机体系结构设计及其在数据中心和云计算的应 用 Processor Memory Organization (1/3) Registers L1 I-Cache L1 D-Cache L2 Cache I-TLB D-TLB Main Memory (DRAM) L3 Cache (LLC)