Many ways to store data devices distnb //cft 花5 A variety of storage devices You cannot have everything cheap HD Tape SSD RAM reliability speed too HSM 6/42 S.Ponce-CERN
Many ways to store data 6 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM A variety of storage devices You cannot have everything cheap reliability speed RAM SSD HD Tape
Many ways to store data devices distnb //c/ Reliability in real world (CERN) For disks ● probability of losing a disk per year:few %up to 10% with 60K disks,it's around 10 per day and all files are lost o one unrecoverable bit error in 1014 bits read/written for 10GB files,that's one file corrupted per 1000 files written For tapes probability of losing a tape per year:10-4 and you recover most of the data on it o net result is 10-7 file loss per year one unrecoverable bit error in 1019 bits read/written for 10GB files,that's one file corrupted per 100M files written too HSM 7/42 S.Ponce-CERN
Many ways to store data 7 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Reliability in real world (CERN) For disks probability of losing a disk per year : few %, up to 10% with 60K disks, it’s around 10 per day and all files are lost one unrecoverable bit error in 1014 bits read/written for 10GB files, that’s one file corrupted per 1000 files written For tapes probability of losing a tape per year : 10−4 and you recover most of the data on it net result is 10−7 file loss per year one unrecoverable bit error in 1019 bits read/written for 10GB files, that’s one file corrupted per 100M files written
Many ways to store data 4 devices distn/∥ch Practical Mass Storage-Real Big Data when you count in 100s of PetaBytes... The constraints disks or tapes are the only possible solutions odisks are unreliable at that scale,and need redundancy we'll see that extensively tapes are cheaper long term storage by factor 2-2.5 tape latency imposes data access on disk 0o HSM 8/42 S.Ponce-CERN
Many ways to store data 8 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Practical Mass Storage - Real Big Data when you count in 100s of PetaBytes... The constraints disks or tapes are the only possible solutions disks are unreliable at that scale, and need redundancy we’ll see that extensively tapes are cheaper long term storage by factor 2-2.5 tape latency imposes data access on disk
Many ways to store data Specificities of tape storage Key points 500MB/s in sequential read/write ●4 k the speed of a disk who said tape is slow o latency/seek time in the order of minutes due to mount time and robot arm moving 。due to positionning o storage is cheap,I/O is not 205/TB for storage capacity 。25 KS for each drive 0o HSM 9/42 S.Ponce-CERN
Many ways to store data 9 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Specificities of tape storage Key points 500MB/s in sequential read/write 4x the speed of a disk who said tape is slow ? latency/seek time in the order of minutes ! due to mount time and robot arm moving due to positionning storage is cheap, I/O is not 20✩/TB for storage capacity 25K✩ for each drive
Many ways to store data 4 devices distr/fct 花5 Tape efficiency Computation 1/0 time efficiency= mount time+l/O time mount size mount time drive speed 1 efficiency= 1+ mount size data size mount size≈50GB 20o HSM 10/42 S.Ponce-CERN
Many ways to store data 10 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Tape efficiency Computation efficiency = I /O time mount time + I /O time mount size = mount time ∗ drive speed efficiency = 1 1 + mount size data size mount size ' 50 GB