当前位置：和泉文库 > 计算机 > 浏览文档

中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Optimizing existing large codebase-pres

文件格式：PDF，文件大小：1.25MB，售价：21.25元

文档详细内容（约91页）

Optimizing existing large codebase Measure Mem threads Defining our performance Key question is:what is performance o simply going faster not at all costs(money,physics results) o making better use of the hardware most of the time hardware is cheaper than people o you need to define your "Key Performance Indicators" e.g.nb Evts/s/S with constant man power for a trigger o and get a clear idea of your different costs flops/S of your machines including network,cabling,cooling,buildings,... ◆human costs ●cost of transition perf tools bottlenecks 7/62 S.Ponce-CERN

Optimizing existing large codebase 7 / 62 S. Ponce - CERN Measure Modernize Mem threads low level c/c perf tools bottlenecks Defining our performance Key question is : what is performance simply going faster ? not at all costs (money, physics results) making better use of the hardware most of the time hardware is cheaper than people ! you need to define your “Key Performance Indicators” e.g. nb Evts / s / ✩ with constant man power for a trigger and get a clear idea of your different costs flops/✩ of your machines including network, cabling, cooling, buildings, ... human costs cost of transition

Optimizing existing large codebase Measuring our software Many parameters can be measured o overall timing o memory usage and cache efficiency CPU efficiency (Cycles per instructions,vectorization level) level of parallelism,usage of the different cores I/O limitations if any For each of them,you need both overall data and detailed split per code unit o per item,per core and full machine measurement perf tools bottlenecks 8/62 S.Ponce-CERN

Optimizing existing large codebase 8 / 62 S. Ponce - CERN Measure Modernize Mem threads low level c/c perf tools bottlenecks Measuring our software Many parameters can be measured overall timing memory usage and cache efficiency CPU efficiency (Cycles per instructions, vectorization level) level of parallelism, usage of the different cores I/O limitations if any For each of them, you need both overall data and detailed split per code unit per item, per core and full machine measurement

Optimizing existing large codebase How to measure The counters approach o use CPU counters to find out what happened during actual execution o do not slow down execution,so only do sampling The software instrumentation o run your code in a "virtual"environment o measure everything precisely o at the cost of speed perf tools bottlenecks 9/62 S.Ponce-CERN

Optimizing existing large codebase 9 / 62 S. Ponce - CERN Measure Modernize Mem threads low level c/c perf tools bottlenecks How to measure The counters approach use CPU counters to find out what happened during actual execution do not slow down execution, so only do sampling The software instrumentation run your code in a “virtual” environment measure everything precisely at the cost of speed

Optimizing existing large codebase Measuire Modemiss Mem threads low Counters approach in practice o give precise timing of a realistic execution on your CPU ousing real cache prediction,actual vectorization,... using real CPU behavior(e.g.downclocking when overheating...) o allows to measure CPI (Cycles Per Instruction)and low level behavior in general (caching,pipelining) but data is only statistical so you need sufficient statistics o also not always reproducible,so hard to compare e.g.first test on cold processor,second on warm one Main tools available:perf and variants,Intel VTune ef tools bottlenecks 10/62 S.Ponce-CERN

Optimizing existing large codebase 10 / 62 S. Ponce - CERN Measure Modernize Mem threads low level c/c perf tools bottlenecks Counters approach in practice give precise timing of a realistic execution on your CPU using real cache prediction, actual vectorization, ... using real CPU behavior (e.g. downclocking when overheating...) allows to measure CPI (Cycles Per Instruction) and low level behavior in general (caching, pipelining) but data is only statistical so you need sufficient statistics also not always reproducible, so hard to compare e.g. first test on cold processor, second on warm one Main tools available : perf and variants, Intel VTune

Optimizing existing large codebase Measure Mem threads low Software instrumentation in practice o give precise measurements of where you spend instructions including many details oreproducible,so your can compare stuff but not always realistic no real timing,only instructions count memory caching is only simulated,often far from real case no clue on low level efficiency (CPI in particular) and gives no clue on hardware /OS behavior o Main tool available:valgrind family ef tools bottlenecks 11/62 S.Ponce-CERN

Optimizing existing large codebase 11 / 62 S. Ponce - CERN Measure Modernize Mem threads low level c/c perf tools bottlenecks Software instrumentation in practice give precise measurements of where you spend instructions including many details reproducible, so your can compare stuff but not always realistic no real timing, only instructions count memory caching is only simulated, often far from real case no clue on low level efficiency (CPI in particular) and gives no clue on hardware / OS behavior Main tool available : valgrind family

点击进入文档下载页（PDF格式）

共91页，可试读20页，点击继续阅读 ↓↓

您可能感兴趣的文档

中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Preserving data-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Many ways to store data-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Many ways to store data-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Structuring data for efficient I/O-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Structuring data for efficient I/O-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Optimizing existing large codebase-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Optimizing existing large codebase-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Modern programming languages for HEP-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Modern programming languages for HEP-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Practical vectorization-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Practical vectorization-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Writing Parallel software（booklet）
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Optimizing existing large codebase-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Preserving data-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Key ingredients to achieve effective I/O-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Key ingredients to achieve effective I/O-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Data storage and preservation-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Data storage and preservation-booklet
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第1章绪论（许录平）
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第2章数字图像处理基础
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第3章图像变换
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第4章图像增强
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第5章图象恢复
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第6章图像压缩编码

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录