高级计算机体系结构设计及其在数据中心和云计算的应用Software Prefetching (1/4):Compiler/programmer places prefetch instructionsPut prefetched value into...- Register (binding, also called "hoistinq"):May prevent instructions from committing- Cache (non-binding)·RequiresISA support.Mayget evicted from cache beforedemand
高级计算机体系结构设计及其在数据中心和云计算的应 用 Software Prefetching (1/4) • Compiler/programmer places prefetch instructions • Put prefetched value into. – Register (binding, also called “hoisting”) • May prevent instructions from committing – Cache (non-binding) • Requires ISA support • May get evicted from cache before demand
高级计算机体系结构设计及其在数据中心和云计算的应用Software Prefetching (2/4)Hoisting mustbe aware ofdependenciesRI=[R2]PREFETCH[R2]AARI=RI-IRI=RI-1BCCCBBRI= [R2]RI=[R2]R3=RI+4R3=RI+4R3=RI+4Using a prefetch instructionHopefullytheload miss(Cachemisses in red)canavoidproblemswithisservicedbythetimedata dependencieswe get to the consumer
高级计算机体系结构设计及其在数据中心和云计算的应 用 A B C R3 = R1+4 R1 = [R2] Software Prefetching (2/4) A B C R1 = [R2] R3 = R1+4 (Cache misses in red) Hopefully the load miss is serviced by the time we get to the consumer R1 = R1- 1 R1 = R1- 1 Hoisting must be aware of dependencies A B C R1 = [R2] R3 = R1+4 PREFETCH[R2] Using a prefetch instruction can avoid problems with data dependencies
高级计算机体系结构设计及其在数据中心和云计算的应用Software Prefetching (3/4)for(I = 1; I < rows I++)((J = l; J < columns; J++)for(prefetch(&x[I+l,J]);sum + x[I,J] isum=11
高级计算机体系结构设计及其在数据中心和云计算的应 用 Software Prefetching (3/4) for (I = 1; I < rows; I++) { for (J = 1; J < columns; J++) { prefetch(&x[I+1,J]); sum = sum + x[I,J]; } }
高级计算机体系结构设计及其在数据中心和云计算的应用Software Prefetching (4/4).Pros:- Gives programmer control and flexibility-Allowstimeforcomplex(compiler)analysis- No (major) hardware modifications neededCons:-Hard to perform timely prefetches: At IPC=2 and 100-cycle memory > move load 200 inst.earlier.Might notevenhave2oo inst.incurrentfunction- Prefetching earlier and more often leads to low accuracy. Program may go down a different path- Prefetch instructionsincrease codefootprint.May cause more Is misses, code alignment issues
高级计算机体系结构设计及其在数据中心和云计算的应 用 Software Prefetching (4/4) • Pros: – Gives programmer control and flexibility – Allows time for complex (compiler) analysis – No (major) hardware modifications needed • Cons: – Hard to perform timely prefetches • At IPC=2 and 100-cycle memory move load 200 inst. earlier • Might not even have 200 inst. in current function – Prefetching earlier and more often leads to low accuracy • Program may go down a different path – Prefetch instructions increase code footprint • May cause more I$ misses, code alignment issues
高级计算机体系结构设计及其在数据中心和云计算的应用Hardware Prefetching (1/3). Hardware monitors memory accesses- Looks forcommon patterns. Guessed addresses are placed into prefetch queue- Queue is checked when no demand accesses waitingPrefetcherslooklikeREADregueststothehierarchy-Although may get special"prefetched"flag in the state bits.Prefetchers trade bandwidth for latency-ExtrabandwidthusedonlywhenguessingincorrectlyLatency reduced onlywhenguessing correctlyNo needto change software
高级计算机体系结构设计及其在数据中心和云计算的应 用 Hardware Prefetching (1/3) • Hardware monitors memory accesses – Looks for common patterns • Guessed addresses are placed into prefetch queue – Queue is checked when no demand accesses waiting • Prefetchers look like READ requests to the hierarchy – Although may get special “prefetched” flag in the state bits • Prefetchers trade bandwidth for latency – Extra bandwidth used only when guessing incorrectly – Latency reduced only when guessing correctly No need to change software