Optimizations of LSM ■Rolling merge LSM/Stepped Merge often implemented on a partitioned relation Each partition size set to some max,split if over-sized Spread partitions over multiple machines Database System Concepts-7th Edition 24.7 @Silberschatz,Korth and Sudarshan
Database System Concepts - 7 24.7 ©Silberschatz, Korth and Sudarshan th Edition ▪ Rolling merge ▪ LSM/Stepped Merge often implemented on a partitioned relation • Each partition size set to some max, split if over-sized • Spread partitions over multiple machines Optimizations of LSM
Stepped Merge Index Stepped-merge index:variant of LSM tree with k trees at each level on disk Memory When all k indices exist at a level,merge them into one Disk index of next level. ·Reduces write cost compared to LSM tree But queries are even more expensive since many trees need to be queries Optimization for point lookups Compute Bloom filter for each tree and store in- memory Query a tree only if Bloom filter returns a positive result Database System Concepts-7th Edition 24.8 @Silberschatz,Korth and Sudarshan
Database System Concepts - 7 24.8 ©Silberschatz, Korth and Sudarshan th Edition Stepped Merge Index ▪ Stepped-merge index: variant of LSM tree with k trees at each level on disk • When all k indices exist at a level, merge them into one index of next level. • Reduces write cost compared to LSM tree ▪ But queries are even more expensive since many trees need to be queries ▪ Optimization for point lookups • Compute Bloom filter for each tree and store inmemory • Query a tree only if Bloom filter returns a positive result
LSM Trees (Cont.) Deletion handled by adding special "delete"entries Lookups will find both original entry and the delete entry,and must return only those entries that do not have matching delete entry When trees are merged,if we find a delete entry matching an original entry,both are dropped. Update handled using insert delete LSM trees were introduced for disk-based indices But useful to minimize erases with flash-based indices The stepped-merge variant of LSM trees is used in many BigData storage systems Google BigTable,Apache Cassandra,MongoDB And more recently in SQLite4,LevelDB,and MyRocks storage engine of MySQL Database System Concepts-7th Edition 24.9 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 24.9 ©Silberschatz, Korth and Sudarshan th Edition LSM Trees (Cont.) ▪ Deletion handled by adding special “delete” entries • Lookups will find both original entry and the delete entry, and must return only those entries that do not have matching delete entry • When trees are merged, if we find a delete entry matching an original entry, both are dropped. ▪ Update handled using insert + delete ▪ LSM trees were introduced for disk-based indices • But useful to minimize erases with flash-based indices • The stepped-merge variant of LSM trees is used in many BigData storage systems ▪ Google BigTable, Apache Cassandra, MongoDB ▪ And more recently in SQLite4, LevelDB, and MyRocks storage engine of MySQL
Buffer Tree Alternative to LSM tree Key idea:each internal node of B+-tree has a buffer to store inserts Inserts are moved to lower levels when buffer is full With a large buffer,many records are moved to lower level each time Per record I/O decreases correspondingly ■Benefits Less overhead on queries Can be used with any tree index structure Used in PostgreSQL Generalized Search Tree(GiST)indices Drawback:more random l/O than LSM tree Internal node Buffer Database System Concepts-7th Edition 24.10 @Silberschatz,Korth and Sudarshan
Database System Concepts - 7 24.10 ©Silberschatz, Korth and Sudarshan th Edition Buffer Tree ▪ Alternative to LSM tree ▪ Key idea: each internal node of B+ -tree has a buffer to store inserts • Inserts are moved to lower levels when buffer is full • With a large buffer, many records are moved to lower level each time • Per record I/O decreases correspondingly ▪ Benefits • Less overhead on queries • Can be used with any tree index structure • Used in PostgreSQL Generalized Search Tree (GiST) indices ▪ Drawback: more random I/O than LSM tree
Bitmap Indices Bitmap indices are a special type of index designed for efficient querying on multiple keys Records in a relation are assumed to be numbered sequentially from,say,0 Given a number n it must be easy to retrieve record n Particularly easy if records are of fixed size Applicable on attributes that take on a relatively small number of distinct values E.g.,gender,country,state,... 。 E.g.,income-level (income broken up into a small number of levels such as0-9999,10000-19999,20000-50000,50000-infinity) A bitmap is simply an array of bits Database System Concepts-7th Edition 24.11 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 24.11 ©Silberschatz, Korth and Sudarshan th Edition Bitmap Indices ▪ Bitmap indices are a special type of index designed for efficient querying on multiple keys ▪ Records in a relation are assumed to be numbered sequentially from, say, 0 • Given a number n it must be easy to retrieve record n ▪ Particularly easy if records are of fixed size ▪ Applicable on attributes that take on a relatively small number of distinct values • E.g., gender, country, state, … • E.g., income-level (income broken up into a small number of levels such as 0-9999, 10000-19999, 20000-50000, 50000- infinity) ▪ A bitmap is simply an array of bits