Data Technologies-CERN School af Compuing 2019 Data Technologes-CERN School of Computing 2019 Between extremes... Agenda There are intermediate solutions between File systems(no database) Storage systems with database lookup placed the "file"level Examples of high performance scalable solutions Data Access protocols ·AFS/DFS .Database placed at the domain host level (same as the web) Very scalable .But within the domain (eg:'cern.ch").identical to a file system with physical files directly mapped to logical filenames XROOTD Scalla/NFS/clustered storage cloud storage ,进山山 Database (somehow)placed at the volume level (this is a simplified statemenf) Similar scalability.with more flexibility in terms of data management with physical files 'calculated"from logical filenames not requiring database lookup below the volume"level ◆Federated storage Data Technologles-CERN School af Compuang 2019 Data Technologies-CERN School of Computing 2019 Access protocols Dataflows across sites File-level(posix)access is the starting point Storage in scientific computing is distributed Open,Stat.Read,Write,Delete,.. across multiple data centres ◆Several extensions that are“implementation specific”which cannot be mapped to posix calls Data flows from the experiments to all Pre-staging data from slow storage into fast storage datacenters where there is CPU available to Managing pool creation sizes process the data Reading or changing access permissions Interpretation of extended data attributes and meta data Tier-0 Some parts of the posix standard are not scalable (Is,chdir,...) (sclentific Experiments) Not all storage systems implements posix entirely Various protocols offering file access rfio (Castor,DPM.).dcap(dCache).xroot(Scalla.Castor.DPM, Tier-1 Tier-1 Tier-1 EOS.).NFS.AFS.S3.... Various protocols for bulk data movements over wide area networks Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tler-2 ◆GridFTP
22 Data Technologies – CERN School of Computing 2019 Between extremes... There are intermediate solutions between File systems (no database) Storage systems with database lookup placed the “file” level Examples of high performance scalable solutions AFS / DFS Database placed at the domain / host level (same as the web) Very scalable But within the domain (eg: “cern.ch”), identical to a file system with physical files directly mapped to logical filenames XROOTD / Scalla / NFS / clustered storage / cloud storage Database (somehow) placed at the volume level (this is a simplified statement) Similar scalability, with more flexibility in terms of data management with physical files “calculated” from logical filenames not requiring database lookup below the “volume” level Federated storage 23 Data Technologies – CERN School of Computing 2019 Agenda Introduction to data management Data Workflows in scientific computing Storage Models Data management components Name Servers and databases Data Access protocols Reliability Availability Access Control and Security Cryptography Authentication, Authorization, Accounting Scalability Cloud storage Block storage Analytics Data Replication Data Caching Monitoring, Alarms Quota Summary 24 Data Technologies – CERN School of Computing 2019 Access protocols File-level (posix) access is the starting point Open, Stat, Read, Write, Delete, ... Several extensions that are “implementation specific” which cannot be mapped to posix calls Pre-staging data from slow storage into fast storage Managing pool creation / sizes Reading or changing access permissions Interpretation of extended data attributes and meta data Some parts of the posix standard are not scalable (ls, chdir, … ) Not all storage systems implements posix entirely Various protocols offering file access rfio (Castor, DPM, ...), dcap (dCache), xroot (Scalla, Castor, DPM, EOS ...), NFS, AFS, S3, … Various protocols for bulk data movements over wide area networks GridFTP, ... 25 Data Technologies – CERN School of Computing 2019 Dataflows across sites Storage in scientific computing is distributed across multiple data centres Data flows from the experiments to all datacenters where there is CPU available to process the data Tier-0 (scientific Experiments) Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-1 Tier-1 Tier-1 Tier-1
Data Technologies-CERN School of Compuaing 2019 Data Technologes-CERN School of Computing 2019 Efficiency Data distribution .A key parameter in distributed scientific Analysis made with high efficiency requires the data to be pre- computing is the efficiency located to where the CPUs are available High efficiency requires to have the CPUs colocated with the data to analyze,using the network. ◆Whenever a site has.. Tier-0 idle CPUs (because no data is available to (Scientific Experments) process) or excess of Data (because there is no CPU left Tier-1 Tier-1 Tier-1 Tier-1 for analysis) ldle or saturated networks ...the efficiency drops Tier-2 Tier-2 Tier-2 Tier-2 Tler-2 Tier-2 Tier-2 er-2 Data Technologles-CERN School af Compuang 2019 Data Technologles-CERN School of Computing 2019 Data distribution Data distribution Analysis made with high efficiency requires the data to be pre- Both approaches coexists in High Energy Physics located to where the CPUs are available ◆Data is pre-placed Or to allow peer-to peer data transfer This allows sites with excess of CPU,to schedule the pre-fetching This is the role of the experiments that plans the analysis of data when missing locally or to access it remotely if the analysis Data is globally accessible and federated in a application has been designed to cope with high latency global namespace .The middleware always attempt to take the local data and Tier-0 uses an access protocol that redirects to the nearest remote (Scientific Experiments) copy when the local data is not available All middleware and jobs are designed to minimize the impact er-1 of the additional latency that the redirection requires Using access protocols that allows global data federation is essential Tier-2Tier-2 Tier-2 Tier-2 Tier-2Tier-2 Tier-2 Ter-2 ◆http,xroot
26 Data Technologies – CERN School of Computing 2019 Efficiency A key parameter in distributed scientific computing is the efficiency High efficiency requires to have the CPUs colocated with the data to analyze, using the network. Whenever a site has … idle CPUs (because no data is available to process) or excess of Data (because there is no CPU left for analysis) Idle or saturated networks … the efficiency drops 27 Data Technologies – CERN School of Computing 2019 Data distribution Analysis made with high efficiency requires the data to be prelocated to where the CPUs are available Tier-0 (Scientific Experiments) Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-1 Tier-1 Tier-1 Tier-1 28 Data Technologies – CERN School of Computing 2019 Data distribution Analysis made with high efficiency requires the data to be prelocated to where the CPUs are available Or to allow peer-to peer data transfer This allows sites with excess of CPU, to schedule the pre-fetching of data when missing locally or to access it remotely if the analysis application has been designed to cope with high latency Tier-0 (Scientific Experiments) Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-1 Tier-1 Tier-1 Tier-1 29 Data Technologies – CERN School of Computing 2019 Data distribution Both approaches coexists in High Energy Physics Data is pre-placed This is the role of the experiments that plans the analysis Data is globally accessible and federated in a global namespace The middleware always attempt to take the local data and uses an access protocol that redirects to the nearest remote copy when the local data is not available All middleware and jobs are designed to minimize the impact of the additional latency that the redirection requires Using access protocols that allows global data federation is essential http, xroot
Data Technologies-CERN School af Compuing 2019 Data Technologes-CERN School of Computing 2019 Agenda Storage Reliability Reliability is related to the probability to lose data ◆ga3t鱼 Def:"the probability that a storage device will perform an arbitrarily large number of 1/O operations without data loss during a specified period of time" ◆Reliability of the“service'”depends on the environment(energy, ◆Reliabiity cooling,people,...) Avolabihe Will not discuss this further Reliability of the "service"starts from the reliability of the underlying hardware Example of disk servers with simple disks:reliability of service ochi reliability of disks But data management solutions can increase the reliability of the hardware at the expenses of performance and/or additional hardware software ◆Disk Mirroring Redundant Array of Inexpensive Disks (RAID) Data Technologles-CERN School af Compuang 2019 Data Technologies-CERN School of Computing 2019 Hardware reliability Reminder:types of RAID ◆Do we need tapes? Tapes have a bad reputation in some use cases ◆RAID0 Slow in random access mode ◆Disk striping high latency in mounting process and when seeking data (F.FWD,REW) Inefficient for small files (in some cases) ◆RAID1 Comparable cost per (peta)byte as hard disks ◆Disk mirroring .Tapes have also some advantages ◆RAID5 Fast in sequential access mode >2xfaster than disk,with physical read after wrie verrcation Parity information is distributed across all disks Several orders of magnitude more reliable than disks .Few hundreds GB loss per year on 80 P8 tape repository ◆RAID6 .Few hundreds TB loss per year an 50 PB disk repostory No power required to preserve the data Uses Reed-Solomon error correction,allowing the Less physical volume required per (peta)byte loss of 2 disks in the array without data loss Inefficiency for small fles issue resolved by recent developments Nobody can delete hundreds of PB in minutes Bottom line:if not used for random access,tapes have a clear role in the architecture 中on.wkipeda.o9 GRAID 33
30 Data Technologies – CERN School of Computing 2019 Agenda Introduction to data management Data Workflows in scientific computing Storage Models Data management components Name Servers and databases Data Access protocols Reliability Availability Access Control and Security Cryptography Authentication, Authorization, Accounting Scalability Cloud storage Block storage Analytics Data Replication Data Caching Monitoring, Alarms Quota Summary 31 Data Technologies – CERN School of Computing 2019 Storage Reliability Reliability is related to the probability to lose data Def: “the probability that a storage device will perform an arbitrarily large number of I/O operations without data loss during a specified period of time” Reliability of the “service” depends on the environment (energy, cooling, people, ...) Will not discuss this further Reliability of the “service” starts from the reliability of the underlying hardware Example of disk servers with simple disks: reliability of service = reliability of disks But data management solutions can increase the reliability of the hardware at the expenses of performance and/or additional hardware / software Disk Mirroring Redundant Array of Inexpensive Disks (RAID) 32 Data Technologies – CERN School of Computing 2019 Hardware reliability Do we need tapes ? Tapes have a bad reputation in some use cases Slow in random access mode high latency in mounting process and when seeking data (F-FWD, REW) Inefficient for small files (in some cases) Comparable cost per (peta)byte as hard disks Tapes have also some advantages Fast in sequential access mode > 2x faster than disk, with physical read after write verification Several orders of magnitude more reliable than disks Few hundreds GB loss per year on 80 PB tape repository Few hundreds TB loss per year on 50 PB disk repository No power required to preserve the data Less physical volume required per (peta)byte Inefficiency for small files issue resolved by recent developments Nobody can delete hundreds of PB in minutes Bottom line: if not used for random access, tapes have a clear role in the architecture 33 Data Technologies – CERN School of Computing 2019 Reminder: types of RAID RAID0 Disk striping RAID1 Disk mirroring RAID5 Parity information is distributed across all disks RAID6 Uses Reed–Solomon error correction, allowing the loss of 2 disks in the array without data loss http://en.wikipedia.org/wiki/RAID
Data Technologies-CERN School af Compuaing 2019 Data Technologes-CERN School of Computing 2019 Reminder:types of RAID Reminder:types of RAID RAID O RAID 1 ◆RAID0 ◆RAID0 ◆Disk striping ◆Disk striping ◆RAID1 AB ◆RAID1 ◆Disk mirroring ◆Disk mirroring ◆RAID5 Disk 0 ◆RAID5 Disk 1 Parity information is distributed across all disks Parity information is distributed across all disks ◆RAID6 ◆RAID6 Uses Reed-Solomon error correction,allowing the Uses Reed-Solomon error correction,allowing the loss of 2 disks in the array without data loss loss of 2 disks in the array without data loss httpcnen.wilipedia.oro/niRAD htp:Men.mkipedia.org/iRAID Data Technologles-CERN Schodl af Compuang 2019 Data Technologies-CERN School of Computing 2019 Reminder:types of RAID Reminder:types of RAID RAID 4 8888-0 RAID 6 ◆RAID0 RAID 5 ◆RAID0 ◆Disk striping ◆Disk striping ◆RAID1 ◆RAID1 ◆Disk mirroring ◆Disk mirroring ◆RAID5 ◆RAID5 Disk 1 Parity information is distributed across all disks Parity information is distributed across all disks ◆RAID6 ◆RAID6 Uses Reed-Solomon error correction,allowing the Uses Reed-Solomon error correction,allowing the loss of 2 disks in the array without data loss loss of 2 disks in the array without data loss httpanen.wilpedia.oro/iRAD 中en.wkipeda.o9 GRAID
34 Data Technologies – CERN School of Computing 2019 Reminder: types of RAID RAID0 Disk striping RAID1 Disk mirroring RAID5 Parity information is distributed across all disks RAID6 Uses Reed–Solomon error correction, allowing the loss of 2 disks in the array without data loss http://en.wikipedia.org/wiki/RAID 35 Data Technologies – CERN School of Computing 2019 Reminder: types of RAID RAID0 Disk striping RAID1 Disk mirroring RAID5 Parity information is distributed across all disks RAID6 Uses Reed–Solomon error correction, allowing the loss of 2 disks in the array without data loss http://en.wikipedia.org/wiki/RAID 36 Data Technologies – CERN School of Computing 2019 Reminder: types of RAID RAID0 Disk striping RAID1 Disk mirroring RAID5 Parity information is distributed across all disks RAID6 Uses Reed–Solomon error correction, allowing the loss of 2 disks in the array without data loss http://en.wikipedia.org/wiki/RAID 37 Data Technologies – CERN School of Computing 2019 Reminder: types of RAID RAID0 Disk striping RAID1 Disk mirroring RAID5 Parity information is distributed across all disks RAID6 Uses Reed–Solomon error correction, allowing the loss of 2 disks in the array without data loss http://en.wikipedia.org/wiki/RAID
Data Technologies-CERN School af Compuaing 2019 Data Technologes-CERN School of Computing 2019 Understanding error correction If we lose some information .. A line is defined by 2 numbers:a,b If we transmit more than 2 points,we can lose (a,b)is the information any point,provided the total number of point left ◆y=ax+b is>=2 Instead of transmitting a and b,transmit some points on the line at known abscissa.2 points define a line.If I transmit more points,these should be aligned. 1 point instead of 2 2 points instead of 3 2 or 3 points instead of 4 2 points 3 points 4 points information lost Data Technologles-CERN School af Compuang 2019 Data Technologles-CERN School of Computing 2019 If we have an error... If you have checksumming on data .. If there is an error,I can detect it if I have You can detect errors by verifying the transmitted more than 2 points,and correct it if consistency of the data with the respective have transmitted more than 3 points checksums.So you can detect errors independently. ..and use all redundancy for error correction Informason lost Error detection Error comection Infommation lost 2 Eror corrections possible (and you do not notice) Information is lost Information is recovered (and you notice) Infommation is recovered Information is recovered (and you notice)
38 Data Technologies – CERN School of Computing 2019 Understanding error correction A line is defined by 2 numbers: a, b (a, b) is the information y = ax + b Instead of transmitting a and b, transmit some points on the line at known abscissa. 2 points define a line. If I transmit more points, these should be aligned. 2 points 3 points 4 points 39 Data Technologies – CERN School of Computing 2019 2 points instead of 3 2 or 3 points instead of 4 1 point instead of 2 information lost If we lose some information … If we transmit more than 2 points, we can lose any point, provided the total number of point left is >= 2 ? 40 Data Technologies – CERN School of Computing 2019 If we have an error … If there is an error, I can detect it if I have transmitted more than 2 points, and correct it if have transmitted more than 3 points Information lost (and you do not notice) Error detection Information is lost (and you notice) Error correction Information is recovered ? 41 Data Technologies – CERN School of Computing 2019 If you have checksumming on data … You can detect errors by verifying the consistency of the data with the respective checksums. So you can detect errors independently. … and use all redundancy for error correction Information lost (and you notice) Error correction Information is recovered 2 Error corrections possible Information is recovered ?