Data storage and preservation 花5 A variety of storage devices You cannot have everything cheap HD Tape SSD RAM reliability speed 2o0 5/62 S.Ponce-CERN
Data storage and preservation 5 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo A variety of storage devices You cannot have everything cheap reliability speed RAM SSD HD Tape
Data storage and preservation 4 devices/7 Reliability in real world (CERN) For disks probability of losing a disk per year:few %up to 10% with 60K disks,it's around 10 per day and all files are lost o one unrecoverable bit error in 1014 bits read/written for 10GB files,that's one file corrupted per 1000 files written 6/62 S.Ponce-CERN
Data storage and preservation 6 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo Reliability in real world (CERN) For disks probability of losing a disk per year : few %, up to 10% with 60K disks, it’s around 10 per day and all files are lost one unrecoverable bit error in 1014 bits read/written for 10GB files, that’s one file corrupted per 1000 files written For tapes probability of losing a tape per year : 10 −4 and you recover most of the data on it net result is 10 −7 file loss per year one unrecoverable bit error in 10 19 bits read/written for 10GB files, that’s one file corrupted per 100M files written
Data storage and preservation Reliability in real world (CERN) For disks ● probability of losing a disk per year:few %up to 10% with 60K disks,it's around 10 per day and all files are lost o one unrecoverable bit error in 1014 bits read/written for 10GB files,that's one file corrupted per 1000 files written For tapes probability of losing a tape per year:10-4 and you recover most of the data on it o net result is 10-7 file loss per year one unrecoverable bit error in 1019 bits read/written for 10GB files,that's one file corrupted per 100M files written 6/62 S.Ponce-CERN
Data storage and preservation 6 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo Reliability in real world (CERN) For disks probability of losing a disk per year : few %, up to 10% with 60K disks, it’s around 10 per day and all files are lost one unrecoverable bit error in 1014 bits read/written for 10GB files, that’s one file corrupted per 1000 files written For tapes probability of losing a tape per year : 10−4 and you recover most of the data on it net result is 10−7 file loss per year one unrecoverable bit error in 1019 bits read/written for 10GB files, that’s one file corrupted per 100M files written
Data storage and preservation 花5 Parallelizing files'storage Storage devices 2Parallelizing files'storage Striping o Introduction to Map/Reduce 3 Risks of data loss and corruption Data consistency Data safety Conclusion 世nping mapred 7/62 S.Ponce-CERN
Data storage and preservation 7 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce Parallelizing files’ storage 1 Storage devices 2 Parallelizing files’ storage Striping Introduction to Map/Reduce 3 Risks of data loss and corruption 4 Data consistency 5 Data safety 6 Conclusion
Data storage and preservation Why to parallelize storage to work around limitations o individual device speed(think disk) .a file is typically stored on a single device ·network cards'speed 1 Gbit network still present network congestion on a node reduces bandwidth per stream o core network throughput o switches/routers are expensive o machines may have less throughput than their card(s)allow(s) ●hot data congestions o and the black hole it can generate as slower tranfers allow to accumulate more transfers strping mapreduce 8/62 S.Ponce-CERN
Data storage and preservation 8 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce Why to parallelize storage ? to work around limitations individual device speed (think disk) a file is typically stored on a single device network cards’ speed 1 Gbit network still present network congestion on a node reduces bandwidth per stream core network throughput switches / routers are expensive machines may have less throughput than their card(s) allow(s) hot data congestions and the black hole it can generate as slower tranfers allow to accumulate more transfers