Practical vectorization Intro Measure Feeeg Practical look at assembly Extract assembly code o Run objdump-d-C on your executable or library o Search for your function name Check for vectorization oFor avx2,look for ymm For avx512,look for zmm o Otherwise look for instructions with ps or pd at the end o but ignore mov operations o only concentrate on arithmetic ones 13/50 S.Ponce-CERN
Practical vectorization 13 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Practical look at assembly Extract assembly code Run objdump -d -C on your executable or library Search for your function name Check for vectorization For avx2, look for ymm For avx512, look for zmm Otherwise look for instructions with ps or pd at the end but ignore mov operations only concentrate on arithmetic ones
Practical vectorization Exercise 1 Code d18: c5 fc 59 d8 vmulps %ymm0,%ymm0,%ymm3 d1c: c5 fc 58 c0 vaddps %ymm0,%ymm0,%ymmO d20: c5 e4 5c de vsubps %ymm6,%ymm3,%ymm3 d24: c4c17c59c0 vmulps %ymm8,%ymm0,%ymmo d29: c4c16458da vaddps %ymm10,%ymm3,%ymm3 d2e: c4417c58c3 vaddps %ymm11,%ymm0,%ymm8 d33: c5e459d3 vmulps %ymm3,%ymm3,%ymm2 d37: c4c13c59f0 vmulps %ymm8,%ymm8,%ymm6 d3c: c5 ec 58 d6 vaddps %ymm6,%ymm2,%ymm2 14/50 S.Ponce-CERN
Practical vectorization 14 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Exercise 1 Code d18: c5 fc 59 d8 vmulps %ymm0,%ymm0,%ymm3 d1c: c5 fc 58 c0 vaddps %ymm0,%ymm0,%ymm0 d20: c5 e4 5c de vsubps %ymm6,%ymm3,%ymm3 d24: c4 c1 7c 59 c0 vmulps %ymm8,%ymm0,%ymm0 d29: c4 c1 64 58 da vaddps %ymm10,%ymm3,%ymm3 d2e: c4 41 7c 58 c3 vaddps %ymm11,%ymm0,%ymm8 d33: c5 e4 59 d3 vmulps %ymm3,%ymm3,%ymm2 d37: c4 c1 3c 59 f0 vmulps %ymm8,%ymm8,%ymm6 d3c: c5 ec 58 d6 vaddps %ymm6,%ymm2,%ymm2 Solution Presence of ymm Vectorized, AVX level
Practical vectorization Intro Measure Feeeg Techniques Enp Exercise 1 Code d18: c5 fc 59 d8 vmulps %ymm0,%ymm0,%ymm3 d1c: c5 fc 58 c0 vaddps %ymm0,%ymm0,%ymmO d20: c5 e4 5c de vsubps %ymm6,%ymm3,%ymm3 d24: c4c17c59c0 vmulps %ymm8,%ymm0,%ymmo d29: c4c16458da vaddps %ymm10,%ymm3,%ymm3 d2e: c4417c58c3 vaddps %ymm11,%ymm0,%ymm8 d33: c5e459d3 vmulps %ymm3,%ymm3,%ymm2 d37: c4c13c59 fo vmulps %ymm8,%ymm8,%ymm6 d3c: c5 ec 58 d6 vaddps %ymm6,%ymm2,%ymm2 Solution 。Presence of ymm Vectorized,AVX level 14/50 S.Ponce-CERN
Practical vectorization 14 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Exercise 1 Code d18: c5 fc 59 d8 vmulps %ymm0,%ymm0,%ymm3 d1c: c5 fc 58 c0 vaddps %ymm0,%ymm0,%ymm0 d20: c5 e4 5c de vsubps %ymm6,%ymm3,%ymm3 d24: c4 c1 7c 59 c0 vmulps %ymm8,%ymm0,%ymm0 d29: c4 c1 64 58 da vaddps %ymm10,%ymm3,%ymm3 d2e: c4 41 7c 58 c3 vaddps %ymm11,%ymm0,%ymm8 d33: c5 e4 59 d3 vmulps %ymm3,%ymm3,%ymm2 d37: c4 c1 3c 59 f0 vmulps %ymm8,%ymm8,%ymm6 d3c: c5 ec 58 d6 vaddps %ymm6,%ymm2,%ymm2 Solution Presence of ymm Vectorized, AVX level
Practical vectorization Exercise 2 Code b97: 0f28e5 movaps %xmm5,%xmm4 b9a: f3 Of 59e5 mulss %xmm5,%xmm4 b9e: f3 Of 58 ed addss %xmm5,%xmm5 ba2: f30f59 ee mulss %xmm6,%xmm5 ba6: f3 Of 5c e7 subss %xmm7,%xmm4 baa: 0f28f5 movaps %xmm5,%xmm6 bad: f3410f58e0 addss %xmm8,%xmm4 bb2: f30f58f2 addss %xmm2,%xmm6 bb6: Of 28 ec movaps %xmm4,%xmm5 15/50 S.Ponce-CERN
Practical vectorization 15 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Exercise 2 Code b97: 0f 28 e5 movaps %xmm5,%xmm4 b9a: f3 0f 59 e5 mulss %xmm5,%xmm4 b9e: f3 0f 58 ed addss %xmm5,%xmm5 ba2: f3 0f 59 ee mulss %xmm6,%xmm5 ba6: f3 0f 5c e7 subss %xmm7,%xmm4 baa: 0f 28 f5 movaps %xmm5,%xmm6 bad: f3 41 0f 58 e0 addss %xmm8,%xmm4 bb2: f3 0f 58 f2 addss %xmm2,%xmm6 bb6: 0f 28 ec movaps %xmm4,%xmm5 Solution Presence of xmm but ps only in mov Not vectorized
Practical vectorization Exercise 2 Code b97: 0f28e5 movaps %xmm5,%xmm4 b9a: f30f59e5 mulss %xmm5,%xmm4 b9e: f3 Of 58 ed addss %xmm5,%xmm5 ba2: f3 Of 59 ee mulss %xmm6,%xmm5 ba6: f3 Of 5c e7 subss %xmm7,%xmm4 baa: 0f28f5 movaps %xmm5,%xmm6 bad: f3410f58e0 addss %xmm8,%xmm4 bb2: f30f58f2 addss %xmm2,%xmm6 bb6: Of 28 ec movaps %xmm4,%xmm5 Solution o Presence of xmm but ps only in mov ●Not vectorized 15/50 S.Ponce-CERN
Practical vectorization 15 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Exercise 2 Code b97: 0f 28 e5 movaps %xmm5,%xmm4 b9a: f3 0f 59 e5 mulss %xmm5,%xmm4 b9e: f3 0f 58 ed addss %xmm5,%xmm5 ba2: f3 0f 59 ee mulss %xmm6,%xmm5 ba6: f3 0f 5c e7 subss %xmm7,%xmm4 baa: 0f 28 f5 movaps %xmm5,%xmm6 bad: f3 41 0f 58 e0 addss %xmm8,%xmm4 bb2: f3 0f 58 f2 addss %xmm2,%xmm6 bb6: 0f 28 ec movaps %xmm4,%xmm5 Solution Presence of xmm but ps only in mov Not vectorized