Multidimensional Data and Warehouse Schemas Data in warehouses can usually be divided into Fact tables,which are large E.g,sales(item_id,store_id,customer_id,date,number,price) Dimension tables,which are relatively small Store extra information about stores,items,etc. Attributes of fact tables can be usually viewed as ·Measure attributes measure some value,and can be aggregated upon e.g.,the attributes numberor price of the sales relation Dimension attributes dimensions on which measure attributes are viewed e.g.,attributes item_id,color,and size of the sales relation Usually small ids that are foreign keys to dimension tables Database System Concepts-7th Edition 11.12 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 11.12 ©Silberschatz, Korth and Sudarshan th Edition Multidimensional Data and Warehouse Schemas ▪ Data in warehouses can usually be divided into • Fact tables, which are large ▪ E.g, sales(item_id, store_id, customer_id, date, number, price) • Dimension tables, which are relatively small ▪ Store extra information about stores, items, etc. ▪ Attributes of fact tables can be usually viewed as • Measure attributes ▪ measure some value, and can be aggregated upon ▪ e.g., the attributes number or price of the sales relation • Dimension attributes ▪ dimensions on which measure attributes are viewed ▪ e.g., attributes item_id, color, and size of the sales relation ▪ Usually small ids that are foreign keys to dimension tables
Data Warehouse Schema item info store item id store id itemname city color sales state size country category item _id store id customer_id date customer number customer id date info price name date street month city quarter state year zipcode country Database System Concepts-7th Edition 11.13 @Silberschatz,Korth and Sudarshan
Database System Concepts - 7 11.13 ©Silberschatz, Korth and Sudarshan th Edition Data Warehouse Schema
Multidimensional Data and Warehouse Schemas Resultant schema is called a star schema More complicated schema structures Snowflake schema:multiple levels of dimension tables May have multiple fact tables Typically fact table joined with dimension tables and then group-by on dimension table attributes,and then aggregation on measure attributes of fact table Some applications do not find it worthwhile to bring data to a common schema Data lakes are repositories which allow data to be stored in multiple formats,without schema integration Less upfront effort,but more effort during querying Database System Concepts-7th Edition 11.14 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 11.14 ©Silberschatz, Korth and Sudarshan th Edition Multidimensional Data and Warehouse Schemas ▪ Resultant schema is called a star schema • More complicated schema structures ▪ Snowflake schema: multiple levels of dimension tables ▪ May have multiple fact tables ▪ Typically • fact table joined with dimension tables and then • group-by on dimension table attributes, and then • aggregation on measure attributes of fact table ▪ Some applications do not find it worthwhile to bring data to a common schema • Data lakes are repositories which allow data to be stored in multiple formats, without schema integration • Less upfront effort, but more effort during querying
Database Support for Data Warehouses Data in warehouses usually append only,not updated Can avoid concurrency control overheads Data warehouses often use column-oriented storage E.g.,a sequence of sales tuples is stored as follows Values of item id attribute are stored as an array Values of store id attribute are stored as an array, And so on Arrays are compressed,reducing storage,1O and memory costs significantly 。 Queries can fetch only attributes that they care about,reducing lO and memory cost More details in Section 13.6 国 Data warehouses often use parallel storage and query processing infrastructure Distributed file systems,Map-Reduce,Hive,... Database System Concepts-7th Edition 11.15 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 11.15 ©Silberschatz, Korth and Sudarshan th Edition Database Support for Data Warehouses ▪ Data in warehouses usually append only, not updated • Can avoid concurrency control overheads ▪ Data warehouses often use column-oriented storage • E.g., a sequence of sales tuples is stored as follows ▪ Values of item_id attribute are stored as an array ▪ Values of store_id attribute are stored as an array, ▪ And so on • Arrays are compressed, reducing storage, IO and memory costs significantly • Queries can fetch only attributes that they care about, reducing IO and memory cost • More details in Section 13.6 ▪ Data warehouses often use parallel storage and query processing infrastructure • Distributed file systems, Map-Reduce, Hive, …
OLAP
OLAP