Data storage and preservation Practical striping -the stripe size Desired picture File A. File A.2 File A.3 File A.4 File B.1 File B.2 File B.3 File C.1 File C.2 File C.3 File C.4 File C.5 File C.6 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 striping mapreduce 12/62 S.Ponce-CERN
Data storage and preservation 12 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce Practical striping - the stripe size Desired picture Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 File C.4 File C.5 File C.6 File B.2 File B.3 File C.1 File C.2 File C.3 File A.1 File A.2 File A.3 File A.4 File B.1
Data storage and preservation Practical striping the stripe size Stripes too big File C.2 File B.1 File A.1 File C.1 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 RAID has practically not effect striping mapreduce 13/62 S.Ponce-CERN
Data storage and preservation 13 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce Practical striping - the stripe size Stripes too big Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 File C.2 File C.1 File B.1 File A.1 RAID has practically not effect
Data storage and preservation Practical striping-the stripe size Striped too small File A.1 File A.2 File A.3 File A.4 File A.5 Tne A.0 Te A.T 吧八L8 TTE A.9 File A.10 Fie A.II File A.I2 File A.14 Then.15 Tn B.Z TE B.0 TnE B.7 The B.6 Tne B.9 下TeB.10 下FeBT File B.12 Fhe c.I The C.2 rie C.0 THE C.T File C.10 File CI File C.12 Fue C.13 File C.14 e C.D E.19 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 RAID will only kill performance by forcing disk to seek far too often striping mapreduce 14/62 S.Ponce-CERN
Data storage and preservation 14 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce Practical striping - the stripe size Striped too small Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 File C.18 File C.20 File C.19 File C.13 File C.14 File C.15 File C.16 File C.17 File C.8 File C.9 File C.10 File C.11 File C.12 File C.3 File C.4 File C.5 File C.6 File C.7 File B.10 File B.11 File B.12 File C.1 File C.2 File B.5 File B.6 File B.7 File B.8 File B.9 File A.16 File B.1 File B.2 File B.3 File B.4 File A.11 File A.12 File A.13 File A.14 File A.15 File A.6 File A.7 File A.8 File A.9 File A.10 File A.1 File A.2 File A.3 File A.4 File A.5 RAID will only kill performance by forcing disk to seek far too often
Data storage and preservation 4// How to choose the stripe size size of the stripe o must be as small as possible to let small reads benefit from parallelization must not be too small to avoid having to deal with too much metadata to avoid too much disk seeking striping mapreduce 15/62 S.Ponce-CERN
Data storage and preservation 15 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce How to choose the stripe size size of the stripe must be as small as possible to let small reads benefit from parallelization must not be too small to avoid having to deal with too much metadata to avoid too much disk seeking
Data storage and preservation 4e// A generic solution for the stripe size Idea o disentangle "stripe size"from "object size" o"stripe size"is the size of one slice of data o"object size"is the size of one block of data on disk o several stripes are put together into one bigger object striping mapreduce 16/62 S.Ponce-CERN
Data storage and preservation 16 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce A generic solution for the stripe size Idea disentangle “stripe size” from “object size” “stripe size” is the size of one slice of data “object size” is the size of one block of data on disk several stripes are put together into one bigger object