RemembertheSingle-CycleUarchJump address [310]Instruction [250] JumpPC+4 [3128]RegDs!JumpamReaIrstruction[3126]PCSrc,=BrTakenALUODWemWriteALUSFegWriteIretnuefinn 125~211ReadReadregister1Readaddressdata1nstucbon20161Resdregister2InstructionRegislers ReacALUALL[310]RealWitedata 2AddressdatsInstructionegistermemoryInstruction [15~11]DatamemoryWritbcondInstruction [150]ALUoperationBW=~(1/T)ComputerArchitecture16
Computer Architecture Remember the Single-Cycle Uarch 16 Shift left 2 PC Instruction memory Read address Instruction [31– 0] Data memory Read data Write data Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Instruction [15– 11] Instruction [20– 16] Instruction [25– 21] Add ALU result Zero Instruction [5– 0] MemtoReg ALUOp MemWrite RegWrite MemRead Branch Jump RegDst ALUSrc Instruction [31– 26] 4 M u x Instruction [25– 0] Jump address [31– 0] PC+4 [31– 28] Sign extend 16 32 Instruction [15– 0] 1 M u x 1 0 M u x 0 1 M u x 0 1 ALU control Control Add ALU result M u x 0 1 0 ALU Shift left 2 26 28 Address PCSrc2=Br Taken PCSrc1=Jump ALU operaBon bcond T BW=~(1/T)
DividingInto Stages200ps100ps100ps200ps200psWB:WritebackIF:Instruction fetchID:Instructiondecode/EX:ExecutelMEM:Memoryaccessregister file readaddresscalculationignorefornowReadregister 1AddresReaddataReadZeroegister 2InstructiorALIRegistersALURendWriesuAddresstata2RFegisterInstrudtionData.....memoryWritememorywriteatwimieIsthisthecorrectpartitioning?Why not 4 or 6 stages? Why not different boundaries?ComputerArchitecture17
Computer Architecture Dividing Into Stages 17 200ps Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data Address Data memory 1 ALU result M u x ALU Zero IF: Instruction fetch ID: Instruction decode/ register file read EX: Execute/ address calculation MEM: Memory access WB: Write back Is this the correct parBBoning? Why not 4 or 6 stages? Why not different boundaries? 100ps 200ps 200ps 100ps RF write ignore for now
InstructionPipeline ThroughputProgram220040060080014001000120016001800executionTimeorder(ininstructions)InstructionDataALUIw $1, 100($0)RegRegfetchaccessInstructionDataALURegIw $2, 200($0)Reg800psfetchaccessInstructionIw $3, 300($0)800psfetch800psProgram2004006008001000120014002executionTimeorder(ininstructions)InstructionDataALUIw $1,100($0)RegRegfetchaccessDataInstructionALUIw $2,200($0)RegReg200psfetchaccessDataInstructionALUw$3,300($0)RegReg200psfetchaccess200ps200ps200ps200ps200ps5-stagespeedupis4,not5aspredicatedbytheidealmodel.Why?ComputerArchitecture18
Computer Architecture InstrucBon Pipeline Throughput 18 Instruction fetch Reg ALU Data access Reg 8 ns Instruction fetch Reg ALU Data access Reg 8 ns Instruction fetch 8 ns Time lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) 2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 . Program execution order (in instructions) Instruction fetch Reg ALU Data access Reg Time lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) 2 ns Instruction fetch Reg ALU Data access Reg 2 ns Instruction fetch Reg ALU Data access Reg 2 ns 2 ns 2 ns 2 ns 2 ns Program execution order (in instructions) 200 400 600 800 1000 1200 1400 1600 1800 200 400 600 800 1000 1200 1400 800ps 800ps 800ps 200ps 200ps 200ps 200ps 200ps 200ps 200ps 5-stage speedup is 4, not 5 as predicated by the ideal model. Why?
EnablingPipelined Processing:PipelineRegistersNo resource is usedbymorethan1stage!IFIDID/EXEXIMEMMEMWB4egister1AddredrearReacZonegister 2InstructionRegistersALUALUOmemoryReWriteAddressTata2resultegister=DataWrite3memoryWritesw6wT/kT/kpsps
Computer Architecture Enabling Pipelined Processing: Pipeline Registers 19 19 T Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data Address Data memory 1 ALU result M u x ALU Zero IF: Instruction fetch ID: Instruction decode/ register file read EX: Execute/ address calculation MEM: Memory access WB: Write back Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM MEM/WB M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX Data memory Address No resource is used by more than 1 stage! IR D PCF PC D+4 PCE+4 nPC M AE BE ImmE Aout M B M MDR W Aout W T/k ps T/k ps
PipelinedOperationExampleAllinstruction classesmustfollowthesamepathandtimingthroughthepipelinestagesAnyperformanceimpact?FArMEMWBDVEEXIMEMReadegister1Reacdata1Readegister2InstructionRegistersALIRegmemoryRadAddressdata2DatswintememoryWnite1ComputerArchitecture20
Computer Architecture Pipelined OperaBon Example 20 Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM MEM/WB M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX Instruction fetch lw Address Data memory Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX MEM/WB Instruction decode lw Address Data memory Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM MEM/WB M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX Instruction fetch lw Address Data memory Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX MEM/WB Instruction decode lw Address Data memory Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX MEM/WB Execution lw Address Data memory Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data Data memory 1 ALU result M u x ALU Zero ID/EX MEM/WB Memory lw Address Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write data Read data Data memory 1 ALU result M u x ALU Zero ID/EX MEM/WB Write back lw Write register Address 97108/Patterson Figure 06.15 Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data Data memory 1 ALU result M u x ALU Zero ID/EX MEM/WB Memory lw Address Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write data Read data Data memory 1 ALU result M u x ALU Zero ID/EX MEM/WB Write back lw Write register Address 97108/Patterson Figure 06.15 Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM MEM/WB M u x 0 1 Add PC 0 Address Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data Data memory 1 ALU result M u x ALU Zero ID/EX All instrucBon classes must follow the same path and Bming through the pipeline stages. Any performance impact?