Interleaved Vector Memory System Cray-1, 16 banks, 4 cycle bank busy time 12 cycle latency Bank busy time: Time before bank ready to accept next request Base stride Vector Registers Address 式 Generator 0123456789 ABCDEF Memory banks 2021/2/1 计算机体系结构
Interleaved Vector Memory System 22 0 1 2 3 4 5 6 7 8 9 A B C D E F + Base Stride Vector Registers Memory Banks Address Generator Cray-1, 16 banks, 4 cycle bank busy time, 12 cycle latency • Bank busy time: Time before bank ready to accept next request 2021/2/1 计算机体系结构
Vector Unit structure unctional Unit Vector Registers Elements Elements Elements Elements 仓,4,8 1,5,9 2,6,16, 3,7,11, Lane Memory Subsystem 2021/2/1 计算机体系结构 23
Vector Unit Structure 23 Lane Functional Unit Vector Registers Memory Subsystem Elements 0, 4, 8, … Elements 1, 5, 9, … Elements 2, 6, 10, … Elements 3, 7, 11, … 2021/2/1 计算机体系结构
TO Vector Microprocessor(UCB/CSL, 1995) ector register Llane elements striped over lanes 241252627281量29l3 16 231 8]「9]「10 0][1][2J[3J厘4]5]H[6 2021/2/1 计算机体系结构
T0 Vector Microprocessor (UCB/ICSI, 1995) 24 Vector register Lane elements striped over lanes [0] [8] [16] [24] [1] [9] [17] [25] [2] [10] [18] [26] [3] [11] [19] [27] [4] [12] [20] [28] [5] [13] [21] [29] [6] [14] [22] [30] [7] [15] [23] [31] 2021/2/1 计算机体系结构
Vector Instruction parallelism 多条向量指令可重叠执行(链接技术) 例如:每个向量32个元素,8 lanes(车道) Load Unit Multiply Unit Add Unit load m IN time add load mul add Instruction Issue Complete 24 operations/ cycle while issuing 1 short instruction/cycle
Vector Instruction Parallelism • 多条向量指令可重叠执行(链接技术) • 例如:每个向量 32 个元素,8 lanes(车道) 25 load load mul mul add add Load Unit Multiply Unit Add Unit time Instruction issue Complete 24 operations/cycle while issuing 1 short instruction/cycle 2021/2/1 计算机体系结构
DLXV Vector Instructions Instr. Operands Operation Comment ADDv V1,v2,V3 V1=V2+V3 vector vector ADDSV V1,F0, V2V1=F0+V2 scalar+ vector MULTV V1, 2 3V1=V2XV3 vector x vector MULSV V1 FO, V2V1=F0xV2 sca|arⅹ vector LV V1, R1 V1=M[R1.R1+63 load, stride=1 LvWS V1, Rl, R2 V1=M[R1.R1+63*R2 load, stride=R2 LVI V1, RI, V2 V1=M[R1+V2i, i=0.63 indir "gather Cev vM, V1, V2 VMASKi=(Vli=v2i)? comp. setmask MOV VLR, R1 Vec. Len Reg =R1 set vector length MOV VM, R1 Vec Mask= R1 set vector mask 2021/2/1 计算机体系结构
“DLXV” Vector Instructions Instr. Operands Operation Comment • ADDV V1,V2,V3 V1=V2+V3 vector + vector • ADDSV V1,F0,V2 V1=F0+V2 scalar + vector • MULTV V1,V2,V3 V1=V2xV3 vector x vector • MULSV V1,F0,V2 V1=F0xV2 scalar x vector • LV V1,R1 V1=M[R1..R1+63] load, stride=1 • LVWS V1,R1,R2 V1=M[R1..R1+63*R2] load, stride=R2 • LVI V1,R1,V2 V1=M[R1+V2i,i=0..63] indir.("gather") • CeqV VM,V1,V2 VMASKi = (V1i=V2i)? comp. setmask • MOV VLR,R1 Vec. Len. Reg. = R1 set vector length • MOV VM,R1 Vec. Mask = R1 set vector mask 2021/2/1 计算机体系结构 26