当前位置：和泉文库 > 计算机 > 浏览文档

中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Practical vectorization-pres

1 Introduction 2 Measuring vectorization 3 Vectorization Prerequisite 4 Vectorizing techniques in C++ Autovectorization Inline assembly Intrinsics Compiler extensions Libraries 5 What to expect ?

文件格式：PDF，文件大小：1.23MB，售价：15.5元

共63页，可试读20页，点击往前阅读 ↑↑

文档详细内容（约63页）

Practical vectorization Intro Measure Techniques Am I using vector registers Yes you are As vector registers are used for scalar operations o Remember Andrzej's picture Wasted pasn 10/50 S.Ponce-CERN

Practical vectorization 10 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Am I using vector registers ? Yes you are As vector registers are used for scalar operations Remember Andrzej’s picture Wasted Used Am I efficiently using vector registers ? Here we have to look at the generated assembly code Looking for specific intructions Or for the use of specific names of registers

Practical vectorization Intro Measure Techniques Am I using vector registers Yes you are As vector registers are used for scalar operations o Remember Andrzej's picture Wasted pasn Am I efficiently using vector registers o Here we have to look at the generated assembly code Looking for specific intructions oOr for the use of specific names of registers 10/50 S.Ponce-CERN

Practical vectorization Intro Measure Peeeg Side note:what to look at What you should look at oSpecific,CPU intensive pieces of code o The most time consuming functions o Very small subset of your code (often <5%) Where you should not waste your time o Try to have an overall picture of vectorization in your application o As most of the code won't use vectors anyway 11/50 S.Ponce-CERN

Practical vectorization 11 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Side note : what to look at ? What you should look at Specific, CPU intensive pieces of code The most time consuming functions Very small subset of your code (often < 5%) Where you should not waste your time Try to have an overall picture of vectorization in your application As most of the code won’t use vectors anyway

Practical vectorization Measure Crash course in SIMD assembly Register names SSE xmm0 to xmm15(128 bits) AVX2:ymm0 to ymm15(256 bits) AVX512 zmm0 to zmm31 (512 bits) In scalar mode,SSE registers are used floating point instruction names <op><simd or not><raw type> where o <op>is something like vmul,vadd,vmov or vfmadd o<simd or not>is either 's'for scalar or 'p'for packed (i.e.vector) <raw type>is either's'for single precision or'd'for double precision Typically vmulss,vmovaps,vaddpd,vfmaddpd 12/50 S.Ponce-CERN

Practical vectorization 12 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Crash course in SIMD assembly Register names SSE : xmm0 to xmm15 (128 bits) AVX2 : ymm0 to ymm15 (256 bits) AVX512 : zmm0 to zmm31 (512 bits) In scalar mode, SSE registers are used floating point instruction names <op><simd or not><raw type> where <op> is something like vmul, vadd, vmov or vfmadd <simd or not> is either ’s’ for scalar or ’p’ for packed (i.e. vector) <raw type> is either ’s’ for single precision or ’d’ for double precision Typically : vmulss, vmovaps, vaddpd, vfmaddpd

Practical vectorization Intro Measure Practical look at assembly Extract assembly code o Run objdump-d-C on your executable or library Search for your function name 13/50 S.Ponce-CERN

Practical vectorization 13 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Practical look at assembly Extract assembly code Run objdump -d -C on your executable or library Search for your function name Check for vectorization For avx2, look for ymm For avx512, look for zmm Otherwise look for instructions with ps or pd at the end but ignore mov operations only concentrate on arithmetic ones

点击进入文档下载页（PDF格式）

共63页，可试读20页，点击继续阅读 ↓↓

您可能感兴趣的文档

中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Writing Parallel software（booklet）
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Writing Parallel software（pres）
中国科学院高能所计算中心：数据技术上机 Data Technologies – CERN School of Computing 2019
中国科学院高能所计算中心：数据技术课程 CSC 2018 Data Technologies Exercises（CSC DT 2018 Introduction）
中国科学院高能所计算中心：高能物理数据的存储和管理（汪璐）
南京大学：《数据结构 Data Structures》课程教学资源（PPT课件讲稿）第九章排序
南京大学：《数据结构 Data Structures》课程教学资源（PPT课件讲稿）第八章图
南京大学：《数据结构 Data Structures》课程教学资源（PPT课件讲稿）第七章搜索结构
南京大学：《数据结构 Data Structures》课程教学资源（PPT课件讲稿）第六章集合与字典
南京大学：《数据结构 Data Structures》课程教学资源（PPT课件讲稿）第五章树
南京大学：《数据结构 Data Structures》课程教学资源（PPT课件讲稿）第四章数组、串与广义表
南京大学：《数据结构 Data Structures》课程教学资源（PPT课件讲稿）第三章栈和队列
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Practical vectorization-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Modern programming languages for HEP-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Modern programming languages for HEP-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Optimizing existing large codebase-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Optimizing existing large codebase-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Structuring data for efficient I/O-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Structuring data for efficient I/O-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Many ways to store data-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Many ways to store data-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Preserving data-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Optimizing existing large codebase-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Optimizing existing large codebase-booklet

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录