Data Technologies-CERN School af Compuaing 2019 Data Technologes-CERN School of Computing 2019 Chunk size and physical blocks Types of arbitrary reliability (summary) The storage overhead of the checksum is typically of few hundred Plain(reliability of the service reliabllity of the hardware) bytes and can be easily neglected compared to the chunk size that is of few megabytes. To guarantee a high efficiency in transferring the chunks is essential that the sum of the chunk size with its checksum is an exact multiple or divisor of the physical block size of the storage .Avoid at all cost is to choose a chunk size equal to the physical disk block size leaving no space to save the checksum in the same physical block. Correct chunk size (fits with checksum in phys.block Physical block size incorrect chunk size (requires 2 phys.block) Data Technologles-CERN School af Compuang 2019 Data Technologies-CERN School of Computing 2019 Types of arbitrary rellability(summary) Types of arbitrary reliability(summary) Plain(reliability of the service reliability of the hardware) Double parity /Diagonal parity ◆Replication Example 4+2,can lose any 2,remaining 4 are enough to reconstruct,only Reliable,maximum performance,but heavy storage overhead 50%storage overhead Example:3 copies,200%overhead Slorage 150% Data 100% Any 4 of the 6 chunks can reconstruct the data Any of the 3 copies is enough to reconstruct the data
63 Data Technologies – CERN School of Computing 2019 Chunk size and physical blocks The storage overhead of the checksum is typically of few hundred bytes and can be easily neglected compared to the chunk size that is of few megabytes. To guarantee a high efficiency in transferring the chunks is essential that the sum of the chunk size with its checksum is an exact multiple or divisor of the physical block size of the storage Avoid at all cost is to choose a chunk size equal to the physical disk block size leaving no space to save the checksum in the same physical block. s Correct chunk size (fits with checksum in phys.block) Physical block size incorrect chunk size (requires 2 phys.block) 64 Data Technologies – CERN School of Computing 2019 Types of arbitrary reliability (summary) Plain (reliability of the service = reliability of the hardware) 65 Data Technologies – CERN School of Computing 2019 Types of arbitrary reliability (summary) Plain (reliability of the service = reliability of the hardware) Replication Reliable, maximum performance, but heavy storage overhead Example: 3 copies, 200% overhead checksum 100% 300% { Any of the 3 copies is enough to reconstruct the data 66 Data Technologies – CERN School of Computing 2019 Types of arbitrary reliability (summary) Double parity / Diagonal parity Example 4+2, can lose any 2, remaining 4 are enough to reconstruct, only 50 % storage overhead checksum Data 100% Storage 150% Any 4 of the 6 chunks can reconstruct the data
Data Technologies-CERN School af Compuaing 2019 Data Technologes-CERN School of Computing 2019 Types of arbitrary rellability(summary) Types of arbitrary reliability (summary) Plain(reliability of the service reliability of the hardware) Plain(reliability of the service reliabllity of the hardware) ◆Replication ◆Replication Reliable,maximum performance,but heavy storage overhead Reliable,maximum performance,but heavy storage overhead Example:3 copies,200%overhead Example:3 copies,200%overhead .Reed-Solomon,double,triple parity,NetRaid5,NetRaid6 Reed-Solomon,double,triple parity,NetRaid5,NetRaid6 Maximum reliability,minimum storage overhead Maximum reliability,minimum storage overhead Example 10+3,can lose any 3.remaining 10 are enough to reconstruct. Example 10+3,can lose any 3,remaining 10 are enough to reconstruct, only 30%storage overhead only 30 storage overhead Low Density Parity Check (LDPC)/Fountain Codes Raptor Codes t00% 130% Excellent performance,more storage overhead Example:8+6,can lose any 3.remaining 11 are enough to reconstruct,75 Any 10 of the 13 chunks storage overhead (See next slide) are enough to reconstruct the data Data Technologles-CERN School af Compuang 2019 Data Technologies-CERN School of Computing 2019 Example:8+6 LDPC Types of arbitrary rellability (summary) 0..7:original data Plain (reliability of the service reliability of the hardware) 8..13:data xor-ed following the arrows in the graph ◆Replication (1) Reliable,maximum performance,but heavy storage overhead Example:3 copies,200%overhead Reed-Solomon,double,triple parity.NetRaid5.NetRaid6 Maximum relability,minimum storage overhead 8 Example 4+2.can lose any 2,remaining 4 are enough to reconstruct.50% storage overhead Any 11 of the 14 chunks Example 10+3,can lose any 3,remaining 10 are enough to reconstruct,30% are enough to storage overhead 100% data se) reconstruct the data Low Density Parity Check (LDPC)/Fountain Codes 175% 138% using only XOR Excellent performance,more storage overhead (otal size (min size required on disk) operations(very fast) Example:8+6,can lose any 3.remaining 11 are enough to reconstruct,75% to reconstnuct) storage overhead You are allowed to ◆In addition to ose amy3中ur怒21%) File checksums (available today) Block-level checksums (available today)
67 Data Technologies – CERN School of Computing 2019 Types of arbitrary reliability (summary) Plain (reliability of the service = reliability of the hardware) Replication Reliable, maximum performance, but heavy storage overhead Example: 3 copies, 200% overhead Reed-Solomon, double, triple parity, NetRaid5, NetRaid6 Maximum reliability, minimum storage overhead Example 10+3, can lose any 3, remaining 10 are enough to reconstruct, only 30 % storage overhead checksum 100% 130% Any 10 of the 13 chunks are enough to reconstruct the data 68 Data Technologies – CERN School of Computing 2019 Types of arbitrary reliability (summary) Plain (reliability of the service = reliability of the hardware) Replication Reliable, maximum performance, but heavy storage overhead Example: 3 copies, 200% overhead Reed-Solomon, double, triple parity, NetRaid5, NetRaid6 Maximum reliability, minimum storage overhead Example 10+3, can lose any 3, remaining 10 are enough to reconstruct, only 30 % storage overhead Low Density Parity Check (LDPC) / Fountain Codes / Raptor Codes Excellent performance, more storage overhead Example: 8+6, can lose any 3, remaining 11 are enough to reconstruct, 75 % storage overhead (See next slide) 69 Data Technologies – CERN School of Computing 2019 Example: 8+6 LDPC checksum 100% (original data size) 175% (total size on disk) Any 11 of the 14 chunks are enough to reconstruct the data using only XOR operations (very fast) 0 .. 7: original data 8 .. 13: data xor-ed following the arrows in the graph 138% (min size required to reconstruct) You are allowed to lose any 3 chunks (21 %) 70 Data Technologies – CERN School of Computing 2019 Types of arbitrary reliability (summary) Plain (reliability of the service = reliability of the hardware) Replication Reliable, maximum performance, but heavy storage overhead Example: 3 copies, 200% overhead Reed-Solomon, double, triple parity, NetRaid5, NetRaid6 Maximum reliability, minimum storage overhead Example 4+2, can lose any 2, remaining 4 are enough to reconstruct, 50 % storage overhead Example 10+3, can lose any 3, remaining 10 are enough to reconstruct, 30 % storage overhead Low Density Parity Check (LDPC) / Fountain Codes Excellent performance, more storage overhead Example: 8+6, can lose any 3, remaining 11 are enough to reconstruct, 75 % storage overhead In addition to File checksums (available today) Block-level checksums (available today)