当前位置：和泉文库 > 计算机 > 中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Data storage and preservation-booklet

中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Data storage and preservation-booklet

1 Storage devices Existing devices 2 Parallelizing files’ storage Striping Introduction to Map/Reduce 3 Risks of data loss and corruption 4 Data consistency Checksums Practical usage 5 Data safety Redundancy Parity Erasure coding 6 Conclusion

文件格式：PDF，文件大小：419.17KB，售价：16.79元

共69页，可试读20页，点击往前阅读 ↑↑

文档详细内容（约69页）

Data storage and preservation devices Risks for my data -Hardware For disks ● probability of losing a disk per year:few %up to 10% with 60K disks,it's around 10 per day and all files are lost o one unrecoverable bit error in 1014 bits read/written for 10GB files,that's one file corrupted per 1000 files written For tapes probability of losing a tape per year:10-4 and you recover most of the data on it o net result is 10-7 file loss per year one unrecoverable bit error in 1019 bits read/written for 10GB files,that's one file corrupted per 100M files written 6/62 S.Ponce-CERN

Data storage and preservation 6 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo Risks for my data - Hardware For disks probability of losing a disk per year : few %, up to 10% with 60K disks, it’s around 10 per day and all files are lost one unrecoverable bit error in 1014 bits read/written for 10GB files, that’s one file corrupted per 1000 files written For tapes probability of losing a tape per year : 10−4 and you recover most of the data on it net result is 10−7 file loss per year one unrecoverable bit error in 1019 bits read/written for 10GB files, that’s one file corrupted per 100M files written

Data storage and preservation 花5 Parallelizing files'storage Storage devices 2Parallelizing files'storage Striping o Introduction to Map/Reduce 3 Risks of data loss and corruption Data consistency Data safety Conclusion 世nping mapred 7/62 S.Ponce-CERN

Data storage and preservation 7 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce Parallelizing files’ storage 1 Storage devices 2 Parallelizing files’ storage Striping Introduction to Map/Reduce 3 Risks of data loss and corruption 4 Data consistency 5 Data safety 6 Conclusion

Data storage and preservation Why to parallelize storage to work around limitations o individual device speed(think disk) .a file is typically stored on a single device ·network cards'speed 1 Gbit network still present network congestion on a node reduces bandwidth per stream o core network throughput o switches/routers are expensive o machines may have less throughput than their card(s)allow(s) ●hot data congestions o and the black hole it can generate as slower tranfers allow to accumulate more transfers strping mapreduce 8/62 S.Ponce-CERN

Data storage and preservation 8 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce Why to parallelize storage ? to work around limitations individual device speed (think disk) a file is typically stored on a single device network cards’ speed 1 Gbit network still present network congestion on a node reduces bandwidth per stream core network throughput switches / routers are expensive machines may have less throughput than their card(s) allow(s) hot data congestions and the black hole it can generate as slower tranfers allow to accumulate more transfers

Data storage and preservation Parallelizing through striping Main idea o use several devices in parallel for a single stream o moving the limitations up by summing performances Basic striping:Divide and conquer for storage o split data into chunks aka stripes on different devices o access in parallel File A.1 File A.2 File A.3 File A.4 File B.1 File B.2 File B.3 File C.1 File C.2 File C.3 File C.4 File C.5 File C.6 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 nping mapreduce 9/62 S.Ponce-CERN

Data storage and preservation 9 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce Parallelizing through striping Main idea use several devices in parallel for a single stream moving the limitations up by summing performances Basic striping : Divide and conquer for storage split data into chunks aka stripes on different devices access in parallel Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 File C.4 File C.5 File C.6 File B.2 File B.3 File C.1 File C.2 File C.3 File A.1 File A.2 File A.3 File A.4 File B.1

Data storage and preservation RAID O RAID stands to "Redundant Array of Inexpensive Disks" o set of configurations that employ the techniques of striping, mirroring,or parity to create large reliable data stores from multiple general-purpose computer hard disk drives(Wikipedia) Useful RAID levels RAID 0 striping RAID 1 mirroring RAID 5 parity See Data Safety part RAID 6 double parity Can be implemented in hardware or software striping mapreduce 10/62 S.Ponce-CERN

Data storage and preservation 10 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce RAID 0 RAID stands to “Redundant Array of Inexpensive Disks” set of configurations that employ the techniques of striping, mirroring, or parity to create large reliable data stores from multiple general-purpose computer hard disk drives (Wikipedia) Useful RAID levels RAID 0 striping RAID 1 mirroring RAID 5 parity RAID 6 double parity See Data Safety part Can be implemented in hardware or software

点击进入文档下载页（PDF格式）

共69页，可试读20页，点击继续阅读 ↓↓

您可能感兴趣的文档

中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Data storage and preservation-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Key ingredients to achieve effective I/O-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Key ingredients to achieve effective I/O-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Preserving data-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Optimizing existing large codebase-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Optimizing existing large codebase-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Preserving data-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Many ways to store data-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Many ways to store data-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Structuring data for efficient I/O-booklet
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Structuring data for efficient I/O-pres
中国科学院：CERN专题计算学校《T-CSC数据存储》课程教学资源（讲义）Optimizing existing large codebase-booklet
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第1章绪论（许录平）
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第2章数字图像处理基础
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第3章图像变换
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第4章图像增强
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第5章图象恢复
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第6章图像压缩编码
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第7章图像分割
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第8章图像描述
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（授课教案）第9章图像分类识别
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（作业习题）各章要求及必做题参考答案
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（实验指导）数字图像处理与Matlab
西安电子科技大学：《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源（实验指导）上机辅导讲义 - Matlab简介

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录