Preserving data risks consistency Risks for my data -Software BUGS o in your software oe.g.scheduling twice a transfer,not receiving data on the second run and overwriting the correct file with an empty one o in your dependencies .e.g.the transfer protocol used does not support checksum and data may be corrupted by TCP(checksum is only 16 bit,one corrupted packet in 65536 will go through) o in the OS or common libraries e.g.libc locks not being atomic o in the hardware-that is in the micro code running inside e.g.RAID controllers o in your admin tools o e.g.recycling a tape that was not empty 6/43 S.Ponce-CERN
Preserving data 6 / 43 S. Ponce - CERN risks consistency safety c/c Risks for my data - Software BUGS ! in your software e.g. scheduling twice a transfer, not receiving data on the second run and overwriting the correct file with an empty one in your dependencies e.g. the transfer protocol used does not support checksum and data may be corrupted by TCP (checksum is only 16 bit, one corrupted packet in 65536 will go through) in the OS or common libraries e.g. libc locks not being atomic in the hardware - that is in the micro code running inside e.g. RAID controllers in your admin tools e.g. recycling a tape that was not empty
Preserving data Risks for my data-Human factor Real life cases that went wrong reinstall (and wipe)old machine p23425a4752 Oh no,I actually meant p42532a8779...bad cut and paste rm-rf/top/data/alltimes /2015/04/crap one space too much and all data are gone.... activate garbage collection on pool XYZ,it's full wasn't it tape backed up no oups.... 7/43 S.Ponce-CERN
Preserving data 7 / 43 S. Ponce - CERN risks consistency safety c/c Risks for my data - Human factor Real life cases that went wrong reinstall (and wipe) old machine p23425a4752 Oh no, I actually meant p42532a8779... bad cut and paste rm -rf /top/data/alltimes /2015/04/crap one space too much and all data are gone.... activate garbage collection on pool XYZ, it’s full wasn’t it tape backed up ? no ? oups
Preserving data Risks for my data conclusion You will lose/corrupt data o better to be able to know when and what o even better if you can repair 8/43 S.Ponce-CERN
Preserving data 8 / 43 S. Ponce - CERN risks consistency safety c/c Risks for my data - conclusion You will lose/corrupt data ! better to be able to know when and what even better if you can repair
Preserving data risls consistency safety c/c 花5 Data consistency Risks of data loss and corruption ②Data consistency oChecksums ●Block checksums Data safety Conclusion chsum block 9/43 S.Ponce-CERN
Preserving data 9 / 43 S. Ponce - CERN risks consistency safety c/c cksum block Data consistency 1 Risks of data loss and corruption 2 Data consistency Checksums Block checksums 3 Data safety 4 Conclusion
Preserving data Checksum Definition "small-size datum from a block of digital data for the purpose of detecting errors" n blocks W 6 a1 a2 a3 a4 ..ai. an → CS cksum block 10/43 S.Ponce-CERN
Preserving data 10 / 43 S. Ponce - CERN risks consistency safety c/c cksum block Checksum Definition “small-size datum from a block of digital data for the purpose of detecting errors“ a1 a2 a3 a4 ... ai ... an CS n blocks b w