当前位置：和泉文库 > 计算机 > 浏览文档

中国科学院高能所计算中心：数据技术上机 Data Technologies – CERN School of Computing 2019

文件格式：PDF，文件大小：17.39MB，售价：21.38元

文档详细内容（约51页）

22 Data Technologies – CERN School of Computing 2019 Between extremes...  There are intermediate solutions between  File systems (no database)  Storage systems with database lookup placed the “file” level  Examples of high performance scalable solutions  AFS / DFS  Database placed at the domain / host level (same as the web)  Very scalable  But within the domain (eg: “cern.ch”), identical to a file system with physical files directly mapped to logical filenames  XROOTD / Scalla / NFS / clustered storage / cloud storage  Database (somehow) placed at the volume level (this is a simplified statement)  Similar scalability, with more flexibility in terms of data management with physical files “calculated” from logical filenames not requiring database lookup below the “volume” level  Federated storage 23 Data Technologies – CERN School of Computing 2019 Agenda  Introduction to data management  Data Workflows in scientific computing  Storage Models  Data management components  Name Servers and databases  Data Access protocols  Reliability  Availability  Access Control and Security  Cryptography  Authentication, Authorization, Accounting  Scalability  Cloud storage  Block storage  Analytics  Data Replication  Data Caching  Monitoring, Alarms  Quota  Summary 24 Data Technologies – CERN School of Computing 2019 Access protocols  File-level (posix) access is the starting point  Open, Stat, Read, Write, Delete, ...  Several extensions that are “implementation specific” which cannot be mapped to posix calls  Pre-staging data from slow storage into fast storage  Managing pool creation / sizes  Reading or changing access permissions  Interpretation of extended data attributes and meta data  Some parts of the posix standard are not scalable (ls, chdir, … )  Not all storage systems implements posix entirely  Various protocols offering file access  rfio (Castor, DPM, ...), dcap (dCache), xroot (Scalla, Castor, DPM, EOS ...), NFS, AFS, S3, …  Various protocols for bulk data movements over wide area networks  GridFTP, ... 25 Data Technologies – CERN School of Computing 2019 Dataflows across sites  Storage in scientific computing is distributed across multiple data centres  Data flows from the experiments to all datacenters where there is CPU available to process the data Tier-0 (scientific Experiments) Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-1 Tier-1 Tier-1 Tier-1

Data Technologies-CERN School af Compuing 2019 Data Technologes-CERN School of Computing 2019 Agenda Storage Reliability Reliability is related to the probability to lose data ◆ga3t鱼 Def:"the probability that a storage device will perform an arbitrarily large number of 1/O operations without data loss during a specified period of time" ◆Reliability of the“service'”depends on the environment(energy, ◆Reliabiity cooling,people,...) Avolabihe Will not discuss this further Reliability of the "service"starts from the reliability of the underlying hardware Example of disk servers with simple disks:reliability of service ochi reliability of disks But data management solutions can increase the reliability of the hardware at the expenses of performance and/or additional hardware software ◆Disk Mirroring Redundant Array of Inexpensive Disks (RAID) Data Technologles-CERN School af Compuang 2019 Data Technologies-CERN School of Computing 2019 Hardware reliability Reminder:types of RAID ◆Do we need tapes? Tapes have a bad reputation in some use cases ◆RAID0 Slow in random access mode ◆Disk striping high latency in mounting process and when seeking data (F.FWD,REW) Inefficient for small files (in some cases) ◆RAID1 Comparable cost per (peta)byte as hard disks ◆Disk mirroring .Tapes have also some advantages ◆RAID5 Fast in sequential access mode >2xfaster than disk,with physical read after wrie verrcation Parity information is distributed across all disks Several orders of magnitude more reliable than disks .Few hundreds GB loss per year on 80 P8 tape repository ◆RAID6 .Few hundreds TB loss per year an 50 PB disk repostory No power required to preserve the data Uses Reed-Solomon error correction,allowing the Less physical volume required per (peta)byte loss of 2 disks in the array without data loss Inefficiency for small fles issue resolved by recent developments Nobody can delete hundreds of PB in minutes Bottom line:if not used for random access,tapes have a clear role in the architecture 中on.wkipeda.o9 GRAID 33

30 Data Technologies – CERN School of Computing 2019 Agenda  Introduction to data management  Data Workflows in scientific computing  Storage Models  Data management components  Name Servers and databases  Data Access protocols  Reliability  Availability  Access Control and Security  Cryptography  Authentication, Authorization, Accounting  Scalability  Cloud storage  Block storage  Analytics  Data Replication  Data Caching  Monitoring, Alarms  Quota  Summary 31 Data Technologies – CERN School of Computing 2019 Storage Reliability  Reliability is related to the probability to lose data  Def: “the probability that a storage device will perform an arbitrarily large number of I/O operations without data loss during a specified period of time”  Reliability of the “service” depends on the environment (energy, cooling, people, ...)  Will not discuss this further  Reliability of the “service” starts from the reliability of the underlying hardware  Example of disk servers with simple disks: reliability of service = reliability of disks  But data management solutions can increase the reliability of the hardware at the expenses of performance and/or additional hardware / software  Disk Mirroring  Redundant Array of Inexpensive Disks (RAID) 32 Data Technologies – CERN School of Computing 2019 Hardware reliability  Do we need tapes ?  Tapes have a bad reputation in some use cases  Slow in random access mode  high latency in mounting process and when seeking data (F-FWD, REW)  Inefficient for small files (in some cases)  Comparable cost per (peta)byte as hard disks  Tapes have also some advantages  Fast in sequential access mode  > 2x faster than disk, with physical read after write verification  Several orders of magnitude more reliable than disks  Few hundreds GB loss per year on 80 PB tape repository  Few hundreds TB loss per year on 50 PB disk repository  No power required to preserve the data  Less physical volume required per (peta)byte  Inefficiency for small files issue resolved by recent developments  Nobody can delete hundreds of PB in minutes  Bottom line: if not used for random access, tapes have a clear role in the architecture 33 Data Technologies – CERN School of Computing 2019 Reminder: types of RAID  RAID0  Disk striping  RAID1  Disk mirroring  RAID5  Parity information is distributed across all disks  RAID6  Uses Reed–Solomon error correction, allowing the loss of 2 disks in the array without data loss http://en.wikipedia.org/wiki/RAID

点击进入文档下载页（PDF格式）

共51页，可试读17页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录