Preserving data 4 risks co ncy safety c/e Preserving data Sebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2018 1/43 S.Ponce-CERN
Preserving data 1 / 43 S. Ponce - CERN risks consistency safety c/c Preserving data S´ebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2018
Preserving data In the previous episodes... We've found out how to store data efficiently o And how to distribute it o And even how to distribute the computation Today Let's make sure we do not lose or corrupt our nice data 2/43 S.Ponce-CERN
Preserving data 2 / 43 S. Ponce - CERN risks consistency safety c/c In the previous episodes... We’ve found out how to store data efficiently And how to distribute it And even how to distribute the computation Today Let’s make sure we do not lose or corrupt our nice data !
Preserving data risls conaistency safety c/c Outline Risks of data loss and corruption ② Data consistency ●Checksums ●Block checksums ③ Data safety ●Redundancy ●Parity ●Erasure coding Conclusion 3/43 S.Ponce-CERN
Preserving data 3 / 43 S. Ponce - CERN risks consistency safety c/c Outline 1 Risks of data loss and corruption 2 Data consistency Checksums Block checksums 3 Data safety Redundancy Parity Erasure coding 4 Conclusion
Preserving data risks consistency safety c/e Risks of data loss and corruption ① Risks of data loss and corruption 2 Data consistency Data safety Conclusion 4/43 S.Ponce-CERN
Preserving data 4 / 43 S. Ponce - CERN risks consistency safety c/c Risks of data loss and corruption 1 Risks of data loss and corruption 2 Data consistency 3 Data safety 4 Conclusion
Preserving data risks consistency Risks for my data -Hardware some numbers for disks probability of losing a disk per year:few %up to 10% with 60K disks,it's around 10 per day 。and all files are lost o one unrecoverable bit error in 1014 bits read/written for 1GB files,that's one file corrupted per 10K files written some numbers for tapes probability of losing a tape per year:10-4 and you recover most of the data on it o net result is 10-7 file loss per year one unrecoverable bit error in 1019 bits read/written for 1GB files,that's one file corrupted per 1G files written 5/43 S.Ponce-CERN
Preserving data 5 / 43 S. Ponce - CERN risks consistency safety c/c Risks for my data - Hardware some numbers for disks probability of losing a disk per year : few %, up to 10% with 60K disks, it’s around 10 per day and all files are lost one unrecoverable bit error in 1014 bits read/written for 1GB files, that’s one file corrupted per 10K files written some numbers for tapes probability of losing a tape per year : 10−4 and you recover most of the data on it net result is 10−7 file loss per year one unrecoverable bit error in 1019 bits read/written for 1GB files, that’s one file corrupted per 1G files written