Data storage and preservation Data storage and preservation Sebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2019 1/62 S.Ponce-CERN
Data storage and preservation 1 / 62 S. Ponce - CERN devices // risks consistency safety c/c Data storage and preservation S´ebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2019
Data storage and preservation Outline ①Storage devices Existing devices Parallelizing files'storage o Striping Introduction to Map/Reduce Risks of data loss and corruption ④Data consistency o Checksums Practical usage ⑤Data safety oRedundancy Parity o Erasure coding 6 Conclusion 2/62 S.Ponce-CERN
Data storage and preservation 2 / 62 S. Ponce - CERN devices // risks consistency safety c/c Outline 1 Storage devices Existing devices 2 Parallelizing files’ storage Striping Introduction to Map/Reduce 3 Risks of data loss and corruption 4 Data consistency Checksums Practical usage 5 Data safety Redundancy Parity Erasure coding 6 Conclusion
Data storage and preservation Storage devices ①Storage devices o Existing devices Parallelizing files'storage Risks of data loss and corruption Data consistency Data safety Conclusion 3/62 S.Ponce-CERN
Data storage and preservation 3 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo Storage devices 1 Storage devices Existing devices 2 Parallelizing files’ storage 3 Risks of data loss and corruption 4 Data consistency 5 Data safety 6 Conclusion
Data storage and preservation devices y A variety of storage devices Main differences Capacities from 1GB to 10TB per unit o Prices from 1 to 300 for the same capacity o Very different reliability oVery different speeds 200 4/62 S.Ponce-CERN
Data storage and preservation 4 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo A variety of storage devices Main differences Capacities from 1 GB to 10 TB per unit Prices from 1 to 300 for the same capacity Very different reliability Very different speeds Typical numbers in 2019 Capacity per unit Latency $/TB Speed reliability RAM 16 GB 10 ns 7000 ✩ 10 GB s −1 volatile SSD 500 GB 10 ➭s 200 ✩ 1 GB s −1 poor HD 6 TB 3 ms 25 ✩ 150 MB s −1 average Tape 20 TB 100 s 20 ✩ 500 MB s −1 good
Data storage and preservation devices A variety of storage devices Main differences o Capacities from 1 GB to 10TB per unit o Prices from 1 to 300 for the same capacity o Very different reliability o Very different speeds Typical numbers in 2019 Capacity Latency $/TB Speed reliability per unit RAM 16GB 10ns 7000$ 10GBs-1 volatile SSD 500GB 10μs 200$ 1GBs-1 poor HD 6TB 3ms 25$ 150MBs-1 average Tape 20TB 100s 20$ 500MBs-1 good 4/62 S.Ponce-CERN
Data storage and preservation 4 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo A variety of storage devices Main differences Capacities from 1 GB to 10 TB per unit Prices from 1 to 300 for the same capacity Very different reliability Very different speeds Typical numbers in 2019 Capacity per unit Latency $/TB Speed reliability RAM 16 GB 10 ns 7000 ✩ 10 GB s−1 volatile SSD 500 GB 10 ➭s 200 ✩ 1 GB s−1 poor HD 6 TB 3 ms 25 ✩ 150 MB s−1 average Tape 20 TB 100 s 20 ✩ 500 MB s−1 good