Structure Index Tree(SIT) Effective elimination of duplicate structures in theⅩ ML data mergE ging of nodes that have the same incoming path the same ordered set of paths of their descendants SIT Construction a linear scan of the xml document Merging of the subtree that we are constructing into its equivalent subtree in the base tree 16
16 Structure Index Tree (SIT) ▪ Effective elimination of duplicate structures in the XML data ▪ Merging of nodes that have • the same incoming path • the same ordered set of paths of their descendants ▪ SIT Construction • A linear scan of the XML document • Merging of the subtree that we are constructing into its equivalent subtree in the base tree
SIT Construction C 2 2,75,6 910 3,8,104,9 8,109
17 / d b d a b d e c c e / d a b d e c e c d c b d SIT Construction 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 ,6 67 ,8,10 ,9 8 9 10 ,7 ,10
XQzip architecture XQzip Repository Query Compressed blocks Compressor Pars (gzIp) Query Executor Hashtable ac…b Result XML Parser Document Index Buffer Constructor Ma 6o6o Buffer Pool SIT Query Processo Index constructor construct the sit a Compressor Group semantically related items in blocks Compress each block by gzip Query Processor: evaluate query Parser Executor: apply the sIt to evaluate query Buffer Manager(By Lru)
18 XQzip Architecture Input XML Document SAX Parser Compressor (gzip) Index Constructor b1 a1 c1 a2 ... bi ck aj a c ... b a5 c7 ... b9 Parser Executor Buffer Manager SIT Hashtable Compressed blocks Query Processor Query Query Result Buffer Pool XQzip Repository ▪ Index Constructor: construct the SIT ▪ Compressor • Group semantically related items in blocks • Compress each block by gzip ▪ Query Processor: evaluate query • Parser • Executor: apply the SIT to evaluate query • Buffer Manager (By LRU)
SIT Construction Complexity N: Total number of elements in the input XML document Time Complexity Worst-case: O(N SIT D) Average-case: O(N) Space complexity Base tree and the subtree being merged: <2 SIT I Space for storing ids of eliminated nodes: O( 19
19 SIT Construction Complexity N: Total number of elements in the input XML document ▪ Time Complexity: • Worst-case: O(N │SIT │) • Average-case: O(N) ▪ Space Complexity: • Base tree and the subtree being merged: ≤ 2│SIT │ • Space for storing ids of eliminated nodes: O(N)
Data Compression a balance between full-chunked and fine grained compression a distinct data container for each distinct element Each container compressed (using gzip) into many smaller blocks Block size? Too small: query time compression ratio Too large: query time Compression ratiot Only can be determined by an empirical study
20 Data Compression ▪ A balance between full-chunked and finegrained compression • A distinct data container for each distinct element • Each container compressed (using gzip) into many smaller blocks ▪ Block size? • Too small: query time ↑compression ratio↓ • Too large: query time ↓compression ratio↑ • Only can be determined by an empirical study