Problem of sEQ and seQ+ Too slow Too many tasks needed to finish in one clock cycle a Signals need long time to propagate through all of the stages a The clock must run slowly enough Does not make good use of hardware units Every unit is active for part of the total clock cycle Processor
– 6 – Processor Problem of SEQ and SEQ+ Too slow ◼ Too many tasks needed to finish in one clock cycle ◼ Signals need long time to propagate through all of the stages ◼ The clock must run slowly enough Does not make good use of hardware units ◼ Every unit is active for part of the total clock cycle
Real-World Pipelines: Car Washes Sequential Parallel Pipelined ldea Divide process into independent stages a Move objects through stages In sequence a At any given times, multiple objects being processed Processor
– 7 – Processor Real-World Pipelines: Car Washes Idea ◼ Divide process into independent stages ◼ Move objects through stages in sequence ◼ At any given times, multiple objects being processed Sequential Parallel Pipelined
Computational Example Figure 4.32 P310 300ps 20 ps R Combinational Delay 320 ps logic Throughput =3. 12 GOPS g Clock System a Computation requires total of 300 picoseconds a Additional 20 picoseconds to save result in register a Can must have clock cycle of at least 320 ps 8 Processor
– 8 – Processor Computational Example System ◼ Computation requires total of 300 picoseconds ◼ Additional 20 picoseconds to save result in register ◼ Can must have clock cycle of at least 320 ps Combinational logic R e g 300 ps 20 ps Clock Delay = 320 ps Throughput = 3.12 GOPS Figure 4.32 P310
3-Way Pipelined Version Figure 4.33 A)P310 100ps 20 ps 100ps 20 ps 100ps 20 ps Comb Comb R Comb R logic e logic Delay 360 ps gIc e A g B C Throughput =8.33 GOP g g Clock System a Divide com binational logic into 3 blocks of 100 ps each a Can begin new operation as soon as previous one passes through stage A e Begin new operation every 120 ps Overall latency increases o 360 ps from start to finish -9 Processor
– 9 – Processor 3-Way Pipelined Version System ◼ Divide combinational logic into 3 blocks of 100 ps each ◼ Can begin new operation as soon as previous one passes through stage A. ⚫ Begin new operation every 120 ps ◼ Overall latency increases ⚫ 360 ps from start to finish R e g Clock Comb. logic A R e g Comb. logic B R e g Comb. logic C 100 ps 20 ps 100 ps 20 ps 100 ps 20 ps Delay = 360 ps Throughput = 8.33 GOPS Figure 4.33 A) P310
Pipeline Diagrams Figure 4.33 B)P310 Unpipelined OP1 OP2 OP3 Time a Cannot start new operation until previous one com pletes 3-Way Pipelined OPlA OP2 BA CBA OP3 B C Time Up to 3 operations in process simultaneously Processor
– 10 – Processor Pipeline Diagrams Unpipelined ◼ Cannot start new operation until previous one completes 3-Way Pipelined ◼ Up to 3 operations in process simultaneously Time OP1 OP2 OP3 Time A B C A B C A B C OP1 OP2 OP3 Figure 4.33 B) P310