Operating a pipeline Figure 4.35 P312 2241300359 Clock OP1 A B OP2 A OP3 B C 120 240 360 480 640 Time 100 ps 20 ps 100 ps 20 ps 100 ps 20 ps Comb Comb Comb Reg logic logic B C Reg Clock Processor
– 11 – Processor Operating a Pipeline Time OP1 OP2 OP3 A B C A B C A B C 0 120 240 360 480 640 Clock R e g Clock Comb. logic A R e g Comb. logic B R e g Comb. logic C 100 ps 20 ps 100 ps 20 ps 100 ps 20 ps 239 R e g Clock Comb. logic A R e g Comb. logic B R e g Comb. logic C 100 ps 20 ps 100 ps 20 ps 100 ps 20 ps 241 R e g R e g R e g 100 ps 20 ps 100 ps 20 ps 100 ps 20 ps Comb. logic A Comb. logic B Comb. logic C Clock 300 R e g Comb. logic A R e g Comb. logic B R e g Comb. logic C 100 ps 20 ps 100 ps 20 ps 100 ps 20 ps 359 Figure 4.35 P312
Limitations: Nonuniform Delays Figure 4.36 P313 50 ps 20 ps 150ps 20 ps 100ps 20 ps Comb Comb Comb logic Delay =510 ps gIc A B Reg C Throughput =5.88 GOPS g Clock OP1A B C OP2 A B C OP3 C Time a Throughput limited by slowest stage a Other stages sit idle for much of the time a Challenging to partition system into balanced stages Processor
– 12 – Processor Limitations: Nonuniform Delays ◼ Throughput limited by slowest stage ◼ Other stages sit idle for much of the time ◼ Challenging to partition system into balanced stages R e g Clock R e g Comb. logic B R e g Comb. logic C 50 ps 20 ps 150 ps 20 ps 100 ps 20 ps Delay = 510 ps Throughput = 5.88 GOPS Comb. logic A Time OP1 OP2 OP3 A B C A B C A B C Figure 4.36 P313
Limitations: Register Overhead Figure 4.37 P315 50 ps 20 ps 50 ps 20 ps 50 ps 20 ps 50 ps 20 ps 50 ps 20 ps 50 ps 20 ps R R Comb Comb Comb Comb Comb odIc gIC gIc logic// comb gIc OgIc g g Clock Delay 420 ps, Throughput =14.29 GOPS a As try to deepen pipeline, overhead of loading registers becomes more significant a Percentage of clock cycle spent loading register: o 1-stage pipeline: 6.25% o 3-stage pipeline: 16.67% o 6-stage pipeline: 28.57% a High speeds of modern processor designs obtained through very deep pipelining 13- Overhead:开销 Processor
– 13 – Processor Limitations: Register Overhead ◼ As try to deepen pipeline, overhead of loading registers becomes more significant ◼ Percentage of clock cycle spent loading register: ⚫ 1-stage pipeline: 6.25% ⚫ 3-stage pipeline: 16.67% ⚫ 6-stage pipeline: 28.57% ◼ High speeds of modern processor designs obtained through very deep pipelining Clock Delay = 420 ps, Throughput = 14.29 GOPS R e g Comb. logic 50 ps 20 ps R e g Comb. logic 50 ps 20 ps R e g Comb. logic 50 ps 20 ps R e g Comb. logic 50 ps 20 ps R e g Comb. logic 50 ps 20 ps R e g Comb. logic 50 ps 20 ps Figure 4.37 P315 Overhead:开销
Data Dependencies Figure 4.38 A), B)P316 Combinational logic g Clock OP1 OP2 OP3 Time System Each operation depends on result from preceding one Processor
– 14 – Processor Data Dependencies System ◼ Each operation depends on result from preceding one Clock Combinational logic R e g Time OP1 OP2 OP3 Figure 4.38 A) ,B) P316
Data Hazards Figure 4. 38 C) P316 Comb Comb Comb logic eg logic logic A B Reg Reg Clock OP1A C OP2 BA B OP3 CBA C OP4 B Time a Result does not feed back around in time for next operation a Pipelining has changed behavior of system Processor
– 15 – Processor Data Hazards ◼ Result does not feed back around in time for next operation ◼ Pipelining has changed behavior of system R e g Clock Comb. logic A R e g Comb. logic B R e g Comb. logic C Time OP1 OP2 OP3 A B C A B C A B C OP4 A B C Figure 4.38 C) P316