Preserving data Adler like checksums Computation a1 a2 a3 a4 ...ai... an → ∑auia b=8 bit w=32 bit CShigh= ∑ac5ow=∑ =1 Pros and Contra ●easy to compute o detects most corruptions and inversions o weak for small files o easy to fake in case of intentional corruption cksum block 13/43 S.Ponce-CERN
Preserving data 13 / 43 S. Ponce - CERN risks consistency safety c/c cksum block Adler like checksums Computation a1 a2 a3 a4 ... ai ... an Pai Piai b = 8 bit w = 32 bit CShigh = Xn i=1 ai CSlow = Xn i=1 iai Pros and Contra easy to compute detects most corruptions and inversions weak for small files easy to fake in case of intentional corruption
Preserving data rislis consistency safety c/c (Crypt)Analysis of adler Weaknesses 32 bits is short one per 4 billion corruption will go through o it's actually worse for small files all bits of the sum are not even used for less than 256 bytes o they can be easily bypassed one can easily change the last 16 bytes and reach any checksum o so intentional corruptions are not covered cksum block 14/43 S.Ponce-CERN
Preserving data 14 / 43 S. Ponce - CERN risks consistency safety c/c cksum block (Crypt)Analysis of adler Weaknesses 32 bits is short one per 4 billion corruption will go through it’s actually worse for small files all bits of the sum are not even used for less than 256 bytes they can be easily bypassed one can easily change the last 16 bytes and reach any checksum so intentional corruptions are not covered
Preserving data rislis consistency Cryptographic checksums What is it o checksums that cannot be faked (easily) o they are based on non reversible cryptographic functions Most used ones md5 1991,128 bits,by Rivest.Not considered secure anymore as complete collisions have been discovered. shal 1995,160 bits,by NSA.Collision in 261 operations sha256 2001,256 bits,by NSA.Collision in 2128 operations sha512 2001,512 bits,by NSA.Collision in 2256 operations Drawback o more costful to compute o although modern processors have dedicated instructions 15/43 S.Ponce-CERN
Preserving data 15 / 43 S. Ponce - CERN risks consistency safety c/c cksum block Cryptographic checksums What is it ? checksums that cannot be faked (easily) they are based on non reversible cryptographic functions Most used ones md5 1991, 128 bits, by Rivest. Not considered secure anymore as complete collisions have been discovered. sha1 1995, 160 bits, by NSA. Collision in 261 operations sha256 2001, 256 bits, by NSA. Collision in 2128 operations sha512 2001, 512 bits, by NSA. Collision in 2256 operations Drawback more costful to compute although modern processors have dedicated instructions
Preserving data Comparison of main checksums Name MB/s on intel core 2 Cycles Per Byte Adler32 920 1.9 MD5 255 6.8 SHA-1 153 11.4 SHA-256 111 15.8 SHA-512 99 17.7 cksum block 16/43 S.Ponce-CERN
Preserving data 16 / 43 S. Ponce - CERN risks consistency safety c/c cksum block Comparison of main checksums Name MB/s on intel core 2 Cycles Per Byte Adler32 920 1.9 MD5 255 6.8 SHA-1 153 11.4 SHA-256 111 15.8 SHA-512 99 17.7
Preserving data Practical usage of checksums Simple approach o compute checksum in memory when creating/writing file store checksum in a DB o check it in memory on full file reads Problems o corrupted data only found when read back,unnoticed otherwise o one needs to fully read the file to be able to check o file updates not supported.Need to read back the whole file o file append suffers the same limitation o multi stream,out of order writing not supported o losing the DB loses all checksums o renaming a file implies changing the entry in the DB,with double commit issue 17/ 3.ronce·ETrW
Preserving data 17 / 43 S. Ponce - CERN risks consistency safety c/c cksum block Practical usage of checksums Simple approach compute checksum in memory when creating/writing file store checksum in a DB check it in memory on full file reads Problems corrupted data only found when read back, unnoticed otherwise one needs to fully read the file to be able to check file updates not supported. Need to read back the whole file file append suffers the same limitation multi stream, out of order writing not supported losing the DB loses all checksums renaming a file implies changing the entry in the DB, with double commit issue