Managing XML and Semistructured data Part 4: Compressing XMl data
1 Part 4: Compressing XML Data Managing XML and Semistructured Data
In this section XML Compression Motivation The State-of-the-Art Queriable compressors a Non-queriable compressors Resources XMILL: An Efficient Compressor for XML Data by liefke and Suciu in Sigmod20ol Others: XGrind, XPress, XQuec, XMLzip ■ⅩCQ: From my publications XOZip: From my publications MOX: From my publications
2 In this section ▪ XML Compression • Motivation • The State-of-the-Art ▪ Queriable compressors ▪ Non-queriable compressors Resources ▪ XMILL: An Efficient Compressor for XML Data by Liefke and Suciu, in SIGMOD'2001 ▪ Others: XGrind, XPress, XQuec, XMLzip, … ▪ XCQ: From my publications ▪ XQZip: From my publications ▪ MQX : From my publications
Introduction a More and more xml data is created Duplicate structures(tags, paths.) Data inflation: data in XML is much larger than raw data Compression: storage and data transfer General-purpose compressor( e.g. gzip) Characteristics of Xml data not utilized Ungueriable
3 Introduction ▪ More and more XML data is created • Duplicate structures (tags, paths …) • Data inflation: data in XML is much larger than raw data • Compression: storage and data transfer ▪ General-purpose compressor (e.g. gzip) • Characteristics of XML data not utilized • Unqueriable
Compression: The Problem XML for exchange(space or time But XML is verbose and inflated due to Duplicated tags and paths Users prefer application specific formats Eg Web Server Logs Is Xml doomed to fail Solution XML-specific compressor Non-queriable: XMill Queriable XQzip
4 Compression: The Problem ▪ XML for exchange (space or time) ▪ But XML is verbose and inflated due to • Duplicated tags and paths ▪ Users prefer application specific formats: • Eg. Web Server Logs ▪ Is XML doomed to fail ? ▪ Solution: XML-specific compressor • Non-queriable: XMill • Queriable: XQzip
XML-Specific Compressors Unqueriable Compression( e.g. XMill) Full-chunked data commonalities eliminated Very good compression ratio Queriable Compression(e.g XGrind, XPRESS Fine-grained: data commonalities ignored Inadequate compression ratio and time Support simple path queries with atomic predicate
5 XML-Specific Compressors ▪ Unqueriable Compression (e.g. XMill): • Full-chunked: data commonalities eliminated • Very good compression ratio ▪ Queriable Compression (e.g. XGrind, XPRESS): • Fine-grained: data commonalities ignored • Inadequate compression ratio and time • Support simple path queries with atomic predicate