当前位置：和泉文库 > 计算机 > 浏览文档

中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Structuring data for efficient I/O-booklet

1 Data format Row vs Column 2 Compressing data Compression algorithms Efficiency and use cases 3 Data addressing Hierarchical namespaces Limitations Flat namespaces 4 Stateful interfaces POSIX Limitations Stateless interfaces 5 Conclusion

文件格式：PDF，文件大小：232.72KB，售价：10.73元

共42页，可试读14页，点击往前阅读 ↑↑

文档详细内容（约42页）

Structuring data for efficient I/O format compress addr stats c/e Data structure by example -row storage Naive structure o You arrange your captors in a sequential order according to the detector geometry Each minute,you create a new "row"of data,with 10K floats representing temperatures given by the captors,in that order Time (mn) Captor 1 Captor 2 Captor c 0 ao bo 20 1 a1 b1 Z1 n an bn Zn File content a0bo.2oa1b1…z1…anbn.zn o 6/42 S.Ponce-CERN

Structuring data for efficient I/O 6 / 42 S. Ponce - CERN format compress addr state c/c row/col Data structure by example - row storage Naive structure You arrange your captors in a sequential order according to the detector geometry Each minute, you create a new “row” of data, with 10K floats representing temperatures given by the captors, in that order Time (mn) Captor 1 Captor 2 ... Captor c 0 a0 b0 ... z0 1 a1 b1 ... z1 ... ... ... ... ... n an bn ... zn File content a0 b0 ... z0 a1 b1 ... z1 ... an bn ... zn

Structuring data for efficient l/O 4 format compress addr state c/e Data structure by example-access Find out overheated devices at a given time o find the offset of that time in the file ●read10 Knumbers o apply simple filter read seek Cost 。one seek o one read of 10K ints This is efficient row/cal 7/42 S.Ponce-CERN

Structuring data for efficient I/O 7 / 42 S. Ponce - CERN format compress addr state c/c row/col Data structure by example - access Find out overheated devices at a given time find the offset of that time in the file read 10K numbers apply simple filter seek read Cost one seek one read of 10K ints This is efficient !

Structuring data for efficient I/O format compre= Data structure by example access (2 Graph the temperature evolution of a given device o read 43.2K numbers from the file,every 40K bytes ●graph them → 下→ ead "read read see seek seek Cost o43.2K reads of 4 bytes and 43.2K seeks o on top typical block size in a filesystem is 8k you will probably read effectively 20%of the file o actually reading the whole file will be more efficient Here the structure of our data is a killer 8/42 S.Ponce-CERN

Structuring data for efficient I/O 8 / 42 S. Ponce - CERN format compress addr state c/c row/col Data structure by example - access (2) Graph the temperature evolution of a given device read 43.2K numbers from the file, every 40K bytes graph them seekread seekread seekread Cost 43.2K reads of 4 bytes and 43.2K seeks ! on top typical block size in a filesystem is 8k you will probably read effectively 20% of the file ! actually reading the whole file will be more efficient Here the structure of our data is a killer

Structuring data for efficient I/O 4 format compress addr state c/c 花5 Column storage Time (mn) Captor 1 Captor 2 Captor c 0 ao bo Zo 1 a1 b1 41 4 。。。 n an bn Zn File content a0a1.an bo b1…bn…z021…Zn Back to efficient read seek read row/cal 9/42 S.Ponce-CERN

Structuring data for efficient I/O 9 / 42 S. Ponce - CERN format compress addr state c/c row/col Column storage Time (mn) Captor 1 Captor 2 ... Captor c 0 a0 b0 ... z0 1 a1 b1 ... z1 ... ... ... ... ... n an bn ... zn File content a0 a1 ... an b0 b1 ... bn ... z0 z1 ... zn Back to efficient read seek read

Structuring data for efficient I/O 4 format compre addr. Row vs column storage Definition Row storage respects internal structure of the data and puts the different items one next in a sequence Column storage breaks the internal structure of the data to collate similar pieces Why to use column o to optimize I/O in general and avoid scattered reads o to optimize data compression o to optimize parallelization of processing Drawback of column storage o a column organized file cannot be updated easily o column storage is usually created from row storage in a postprocessing phase. 10

Structuring data for efficient I/O 10 / 42 S. Ponce - CERN format compress addr state c/c row/col Row vs column storage Definition Row storage respects internal structure of the data and puts the different items one next in a sequence Column storage breaks the internal structure of the data to collate similar pieces Why to use column ? to optimize I/O in general and avoid scattered reads to optimize data compression to optimize parallelization of processing Drawback of column storage a column organized file cannot be updated easily column storage is usually created from row storage in a postprocessing phase

点击进入文档下载页（PDF格式）

共42页，可试读14页，点击继续阅读 ↓↓

您可能感兴趣的文档

中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Structuring data for efficient I/O-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Optimizing existing large codebase-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Optimizing existing large codebase-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Modern programming languages for HEP-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Modern programming languages for HEP-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Practical vectorization-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Practical vectorization-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Writing Parallel software（booklet）
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Writing Parallel software（pres）
中国科学院高能所计算中心：数据技术上机 Data Technologies – CERN School of Computing 2019
中国科学院高能所计算中心：数据技术课程 CSC 2018 Data Technologies Exercises（CSC DT 2018 Introduction）
中国科学院高能所计算中心：高能物理数据的存储和管理（汪璐）
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Many ways to store data-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Many ways to store data-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Preserving data-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Optimizing existing large codebase-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Optimizing existing large codebase-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Preserving data-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Key ingredients to achieve effective I/O-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Key ingredients to achieve effective I/O-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Data storage and preservation-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Data storage and preservation-booklet
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第1章绪论（许录平）
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第2章数字图像处理基础

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录