Practical vectorization Intro Measure Techniques Am I using vector registers Yes you are As vector registers are used for scalar operations o Remember Andrzej's picture Wasted pasn 10/50 S.Ponce-CERN
Practical vectorization 10 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Am I using vector registers ? Yes you are As vector registers are used for scalar operations Remember Andrzej’s picture Wasted Used Am I efficiently using vector registers ? Here we have to look at the generated assembly code Looking for specific intructions Or for the use of specific names of registers
Practical vectorization Intro Measure Techniques Am I using vector registers Yes you are As vector registers are used for scalar operations o Remember Andrzej's picture Wasted pasn Am I efficiently using vector registers o Here we have to look at the generated assembly code Looking for specific intructions oOr for the use of specific names of registers 10/50 S.Ponce-CERN
Practical vectorization 10 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Am I using vector registers ? Yes you are As vector registers are used for scalar operations Remember Andrzej’s picture Wasted Used Am I efficiently using vector registers ? Here we have to look at the generated assembly code Looking for specific intructions Or for the use of specific names of registers
Practical vectorization Intro Measure Peeeg Side note:what to look at What you should look at oSpecific,CPU intensive pieces of code o The most time consuming functions o Very small subset of your code (often <5%) Where you should not waste your time o Try to have an overall picture of vectorization in your application o As most of the code won't use vectors anyway 11/50 S.Ponce-CERN
Practical vectorization 11 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Side note : what to look at ? What you should look at Specific, CPU intensive pieces of code The most time consuming functions Very small subset of your code (often < 5%) Where you should not waste your time Try to have an overall picture of vectorization in your application As most of the code won’t use vectors anyway
Practical vectorization Measure Crash course in SIMD assembly Register names SSE xmm0 to xmm15(128 bits) AVX2:ymm0 to ymm15(256 bits) AVX512 zmm0 to zmm31 (512 bits) In scalar mode,SSE registers are used floating point instruction names <op><simd or not><raw type> where o <op>is something like vmul,vadd,vmov or vfmadd o<simd or not>is either 's'for scalar or 'p'for packed (i.e.vector) <raw type>is either's'for single precision or'd'for double precision Typically vmulss,vmovaps,vaddpd,vfmaddpd 12/50 S.Ponce-CERN
Practical vectorization 12 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Crash course in SIMD assembly Register names SSE : xmm0 to xmm15 (128 bits) AVX2 : ymm0 to ymm15 (256 bits) AVX512 : zmm0 to zmm31 (512 bits) In scalar mode, SSE registers are used floating point instruction names <op><simd or not><raw type> where <op> is something like vmul, vadd, vmov or vfmadd <simd or not> is either ’s’ for scalar or ’p’ for packed (i.e. vector) <raw type> is either ’s’ for single precision or ’d’ for double precision Typically : vmulss, vmovaps, vaddpd, vfmaddpd
Practical vectorization Intro Measure Practical look at assembly Extract assembly code o Run objdump-d-C on your executable or library Search for your function name 13/50 S.Ponce-CERN
Practical vectorization 13 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Practical look at assembly Extract assembly code Run objdump -d -C on your executable or library Search for your function name Check for vectorization For avx2, look for ymm For avx512, look for zmm Otherwise look for instructions with ps or pd at the end but ignore mov operations only concentrate on arithmetic ones