Structuring data for efficient I/O format compreas addr state c/o The parallelization view Row/Column AoS/SoA 。naming o naming 。Row storage Array of structs(AoS) 。Column storage Struct of arrays(SoA) 。goal ogoal 。Storage efficiency, Algorithmic efficiency. data on disk data in RAM A lot in common oSame advantages/disadvantages oSame compromises o Same ultimate solution ?column per block /AoSoA row/cal 11/42 S.Ponce-CERN
Structuring data for efficient I/O 11 / 42 S. Ponce - CERN format compress addr state c/c row/col The parallelization view Row/Column naming Row storage Column storage goal Storage efficiency, data on disk AoS/SoA naming Array of structs (AoS) Struct of arrays (SoA) goal Algorithmic efficiency, data in RAM A lot in common Same advantages/disadvantages Same compromises Same ultimate solution ? : column per block / AoSoA
Structuring data for efficient I/O 4 format compre= addr.state c/c Column per block storage Time (mn) Captor 1 Captor 2 Captor c 0 ao bo 20 4 p-1 ap-1 bp-1 Zp-1 p ap bp … Zp 0。 0 。。4 File content ao a1...ap-1 bo b1...bp-1...Zo z1...Zp-1 apap+1...a2p-1 bp bp+1...b2p- Advantages limits number of seeks per data to o allows updates,providing a small cache row/col 12/42 S.Ponce-CERN
Structuring data for efficient I/O 12 / 42 S. Ponce - CERN format compress addr state c/c row/col Column per block storage Time (mn) Captor 1 Captor 2 ... Captor c 0 a0 b0 ... z0 ... ... ... ... ... p-1 ap−1 bp−1 ... zp−1 p ap bp ... zp ... ... ... ... ... File content a0 a1 ... ap−1 b0 b1 ... bp−1 ... z0 z1 ... zp−1 ap ap+1 ... a2p−1 bp bp+1 ... b2p−1 ... Advantages limits number of seeks per data to 1 p allows updates, providing a small cache
Structuring data for efficient I/O format compreas addr state c/c 花5 Compressing data Data forma ② Compressing data Compression algorithms o Efficiency and use cases Data addressing 年 Stateful interfaces Conclusion algos efhciency 13/42 S.Ponce-CERN
Structuring data for efficient I/O 13 / 42 S. Ponce - CERN format compress addr state c/c algos efficiency Compressing data 1 Data format 2 Compressing data Compression algorithms Efficiency and use cases 3 Data addressing 4 Stateful interfaces 5 Conclusion
Structuring data for efficient I/O format compreas addr state c/c 花5 Data compression Main idea eliminate redundancy (e.g.LZ77) o optimize encoding(Huffman coding) o to squeeze more information into less bytes algos eflciency 14/42 S.Ponce-CERN
Structuring data for efficient I/O 14 / 42 S. Ponce - CERN format compress addr state c/c algos efficiency Data compression Main idea eliminate redundancy (e.g. LZ77) optimize encoding (Huffman coding) to squeeze more information into less bytes