Structuring data for efficient I/O format compress addr state c/c Column storage Time (mn) Captor 1 Captor 2 Captor c 0 ao bo 20 1 a1 b1 41 … 4 4 n an bn Zn File content a0a1…anb0b1.bn…202.…zn row/cal 9/42 S.Ponce-CERN
Structuring data for efficient I/O 9 / 42 S. Ponce - CERN format compress addr state c/c row/col Column storage Time (mn) Captor 1 Captor 2 ... Captor c 0 a0 b0 ... z0 1 a1 b1 ... z1 ... ... ... ... ... n an bn ... zn File content a0 a1 ... an b0 b1 ... bn ... z0 z1 ... zn Back to efficient read seek read
Structuring data for efficient I/O 4 format compress addr state c/c 花5 Column storage Time (mn) Captor 1 Captor 2 Captor c 0 ao bo Zo 1 a1 b1 41 4 。。。 n an bn Zn File content a0a1.an bo b1…bn…z021…Zn Back to efficient read seek read row/cal 9/42 S.Ponce-CERN
Structuring data for efficient I/O 9 / 42 S. Ponce - CERN format compress addr state c/c row/col Column storage Time (mn) Captor 1 Captor 2 ... Captor c 0 a0 b0 ... z0 1 a1 b1 ... z1 ... ... ... ... ... n an bn ... zn File content a0 a1 ... an b0 b1 ... bn ... z0 z1 ... zn Back to efficient read seek read
Structuring data for efficient l/O format compres addr. state c/c Row vs column storage Definition Row storage respects internal structure of the data and puts the different items one next in a sequence Column storage breaks the internal structure of the data to collate similar pieces row/cal 10/42 S.Ponce-CERN
Structuring data for efficient I/O 10 / 42 S. Ponce - CERN format compress addr state c/c row/col Row vs column storage Definition Row storage respects internal structure of the data and puts the different items one next in a sequence Column storage breaks the internal structure of the data to collate similar pieces Why to use column ? to optimize I/O in general and avoid scattered reads to optimize data compression to optimize parallelization of processing Drawback of column storage a column organized file cannot be updated easily column storage is usually created from row storage in a postprocessing phase
Structuring data for efficient I/O format compreas addr state cfc Row vs column storage Definition Row storage respects internal structure of the data and puts the different items one next in a sequence Column storage breaks the internal structure of the data to collate similar pieces Why to use column o to optimize I/O in general and avoid scattered reads o to optimize data compression o to optimize parallelization of processing row/cal 10/42 S.Ponce-CERN
Structuring data for efficient I/O 10 / 42 S. Ponce - CERN format compress addr state c/c row/col Row vs column storage Definition Row storage respects internal structure of the data and puts the different items one next in a sequence Column storage breaks the internal structure of the data to collate similar pieces Why to use column ? to optimize I/O in general and avoid scattered reads to optimize data compression to optimize parallelization of processing Drawback of column storage a column organized file cannot be updated easily column storage is usually created from row storage in a postprocessing phase
Structuring data for efficient I/O 4 format compre addr. Row vs column storage Definition Row storage respects internal structure of the data and puts the different items one next in a sequence Column storage breaks the internal structure of the data to collate similar pieces Why to use column o to optimize I/O in general and avoid scattered reads o to optimize data compression o to optimize parallelization of processing Drawback of column storage o a column organized file cannot be updated easily o column storage is usually created from row storage in a postprocessing phase. 10
Structuring data for efficient I/O 10 / 42 S. Ponce - CERN format compress addr state c/c row/col Row vs column storage Definition Row storage respects internal structure of the data and puts the different items one next in a sequence Column storage breaks the internal structure of the data to collate similar pieces Why to use column ? to optimize I/O in general and avoid scattered reads to optimize data compression to optimize parallelization of processing Drawback of column storage a column organized file cannot be updated easily column storage is usually created from row storage in a postprocessing phase