Vector unit structure unctional Unit Vector Registers Elements Elements Elements Elements 仓,4,8 1,5,9 2,6,16, 3,7,11, Lane Memory Subsystem 2021/1/29 计算机体系结构 22
Vector Unit Structure 22 Lane Functional Unit Vector Registers Memory Subsystem Elements 0, 4, 8, … Elements 1, 5, 9, … Elements 2, 6, 10, … Elements 3, 7, 11, … 2021/1/29 计算机体系结构
Vector instruction execution ADDV CA,B 使用一条流水化的功 使用4条流水化的功能 能部件热行 部件执行 A[6]B[6] A[24]B[24]A[25]B[25]A[26]B[26]A[27B[27 A[5]B[5] A[20]B[20]A[21]B[21]A[22]B[22]A[23]B[23] A[4]B[4] A[16]B[16]A[17]B[17A18]B[18]A[19]B[19] A[3]B[3] A[12]B[12]A[13]B[13]A[14]B[14]A[15]B[15] C[2] C[8] C[9] C[10] C[11] C[1] C[4] C[5] C[6] C[7] C[0] CLOT C[1] C[2] C[3] 2021/1/29 机体系结构
Vector Instruction Execution 2021/1/29 计算机体系结构 ADDV C,A,B C[1] C[2] C[0] A[3] B[3] A[4] B[4] A[5] B[5] A[6] B[6] 使用一条流水化的功 能部件执行 C[4] C[8] C[0] A[12] B[12] A[16] B[16] A[20] B[20] A[24] B[24] C[5] C[9] C[1] A[13] B[13] A[17] B[17] A[21] B[21] A[25] B[25] C[6] C[10] C[2] A[14] B[14] A[18] B[18] A[22] B[22] A[26] B[26] C[7] C[11] C[3] A[15] B[15] A[19] B[19] A[23] B[23] A[27] B[27] 使用4条流水化的功能 部件执行 23
Interleaved Vector Memory System Cray-1, 16 banks, 4 cycle bank busy time 12 cycle latency Bank busy time: Time before bank ready to accept next request Base stride Vector Registers Address 式 Generator 0123456789 ABCDEF Memory banks 2021/1/29 计算机体系结构
Interleaved Vector Memory System 24 0 1 2 3 4 5 6 7 8 9 A B C D E F + Base Stride Vector Registers Memory Banks Address Generator Cray-1, 16 banks, 4 cycle bank busy time, 12 cycle latency • Bank busy time: Time before bank ready to accept next request 2021/1/29 计算机体系结构
TO Vector Microprocessor (UCB/CSI, 1995) ector register Llane elements striped over lanes 241252627281量29l3 16 231 8]「9]「10 121「13114115 0][1][2J[3J厘4]5]H[6 2021/1/29 计算机体系结构
T0 Vector Microprocessor (UCB/ICSI, 1995) 25 Vector register Lane elements striped over lanes [0] [8] [16] [24] [1] [9] [17] [25] [2] [10] [18] [26] [3] [11] [19] [27] [4] [12] [20] [28] [5] [13] [21] [29] [6] [14] [22] [30] [7] [15] [23] [31] 2021/1/29 计算机体系结构
Vector Instruction Parallelism 多条向量指令可重叠执行(链接技术) 例如:每个向量32个元素,8 lanes(车道) Load Unit Multiply Unit Add Unit load m IN time add load mul add Instruction Issue Complete 24 operations/ cycle while issuing 1 short instruction/cycle
Vector Instruction Parallelism • 多条向量指令可重叠执行(链接技术) − 例如:每个向量 32 个元素,8 lanes(车道) 26 load load mul mul add add Load Unit Multiply Unit Add Unit time Instruction issue Complete 24 operations/cycle while issuing 1 short instruction/cycle 2021/1/29 计算机体系结构