Block size Representative datasets and queries a Datasets Heavy text Light text A mix of heavy text and light text eries High Selectivity Medium Selectivity Low selectiv
21 Block Size Representative datasets and queries: ▪ Datasets: • Heavy text • Light text • A mix of heavy text and light text ▪ Queries: • High Selectivity • Medium Selectivity • Low Selectivity
Block size 12.9 13.6 10 Swiss Prot-L- Swiss Prot-M- SwissProt-H XM ark-M-合 XMark-H -OM IM-L +OM IM-M OM IM-H omgc 9876543 小上AA44 100 6001000 10000 Block Size data records)
22 Block Size 0 1 2 3 4 5 6 7 8 9 1 0 1 0 100 1000 10000 Block Size (# data records) Querying Time (sec) SwissProt-L SwissProt-M SwissProt-H XM ark-L XM ark-M XM ark-H OM IM -L OM IM -M OM IM -H 12.9 13.6 600
Structure of Compressed-Data Block size? Determined by an empirical study Querying Time near-optimal range: 600-1000 data items/block (average optimal: 950) Compression ratio Not improved much after 150 KB/block (usually contain more than 1000 items) >1000 data items/block
23 Structure of Compressed-Data ▪ Block size? • Determined by an empirical study • Querying Time ▪ near-optimal range : 600-1000 data items/block (average optimal: 950) • Compression Ratio ▪ Not improved much after 150 KB/block (usually contain more than 1000 items) • ≥ 1000 data items/block
O utline ■ Introduction XQzip eDBT2004 Indexing Data Compression Query evaluation Performance evaluation Conclusion 24
24 Outline ▪ Introduction ▪ XQzip [EDBT 2004] • Indexing • Data Compression • Query Evaluation • Performance Evaluation ▪ Conclusion
XQzip Query Coverage All XPath axes except the sideways axes( e.g preceding following)-siblings Multiple and nested predicates and/or not expressions Aggregations: sum, count, average, max, min Group queries: e.g(L1L2+L3+ L4) LI: /ab=Crete](prefis L2. c L3: d[/count(>100]L4: e/lg
25 XQzip Query Coverage ▪ All XPath axes except the sideways axes (e.g. preceding, following)-siblings ▪ Multiple and nested predicates • and / or / not expressions ▪ Aggregations: sum, count, average, max, min ▪ Group queries: e.g. (L1 (L2 + L3 + L4)) • L1 : //a[b = “Crete”] (prefix) L2 : c • L3 : d[f/count() >100] L4 : e[//g]