Many ways to store data Specificities of tape storage Key points 500MB/s in sequential read/write ●4 k the speed of a disk who said tape is slow o latency/seek time in the order of minutes due to mount time and robot arm moving 。due to positionning o storage is cheap,I/O is not 205/TB for storage capacity 。25 KS for each drive 0o HSM 9/42 S.Ponce-CERN
Many ways to store data 9 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Specificities of tape storage Key points 500MB/s in sequential read/write 4x the speed of a disk who said tape is slow ? latency/seek time in the order of minutes ! due to mount time and robot arm moving due to positionning storage is cheap, I/O is not 20✩/TB for storage capacity 25K✩ for each drive
Many ways to store data 4 devices distr/fct 花5 Tape efficiency Computation 1/0 time efficiency= mount time+l/O time mount size mount time drive speed 1 efficiency= 1+ mount size data size mount size≈50GB 20o HSM 10/42 S.Ponce-CERN
Many ways to store data 10 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Tape efficiency Computation efficiency = I /O time mount time + I /O time mount size = mount time ∗ drive speed efficiency = 1 1 + mount size data size mount size ' 50 GB
Many ways to store data devices Tape efficiency 80% 10% 1% 1%0 10MB 1GB 200GB 1TB size(GB) → No mount for less than 100 GB o HSM 11/42 S.Ponce-CERN
Many ways to store data 11 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Tape efficiency 10 MB 1 GB 200 GB 1 TB 1h 1% 10% 80% size(GB) efficiency No mount for less than 100 GB !
Many ways to store data 4 devices distn/fct Hierarchical storage Layers o tape as primary storage o disks used as a cache in front of the tape system SSD used as cache in front of disks (or inside them) CERN's case(2018) 。tape capacity:330PB tape usage:240 PB disk raw capacity:250 PB o disk usage:182PB for 91PB of data 20o HSM 12/42 S.Ponce-CERN
Many ways to store data 12 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Hierarchical storage Layers tape as primary storage disks used as a cache in front of the tape system SSD used as cache in front of disks (or inside them) CERN’s case (2018) tape capacity : 330 PB tape usage : 240 PB disk raw capacity : 250 PB disk usage : 182 PB for 91 PB of data
Many ways to store data 4 devices distn/fc 花5 Some consequences Disk cache management is needed o the disk cache needs garbage collection o different algorithm used depending on usage FIFO-First In First Out LRU-Least Recently Used User need to adapt their usage o data need to be prefetched from tape before access preferably in bulk 20o HSM 13/42 S.Ponce-CERN
Many ways to store data 13 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Some consequences Disk cache management is needed the disk cache needs garbage collection different algorithm used depending on usage FIFO - First In First Out LRU - Least Recently Used User need to adapt their usage data need to be prefetched from tape before access preferably in bulk