OLTP VS OLAP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support dB design application-oriented subject-oriented current, up-to-date historical detailed. flat relational summarized multidimensional isolated integrated, consolidated usage repetitive ad-hoc access read/write lots of scans index/hash on prim. key unit of work short, simple transaction complex query ## records accessed tens millions users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response 11
11 OLTP vs. OLAP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date detailed, flat relational isolated historical, summarized, multidimensional integrated, consolidated usage repetitive ad-hoc access read/write index/hash on prim. key lots of scans unit of work short, simple transaction complex query # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response
Why a Separate Data Warehouse? High performance for both systems DBMS- tuned for OLTP: access methods, indexing concurrency control, recovery Warehouse-tuned for OLAP: complex oLAP queries multidimensional view consolidation Different functions and different data missing data: Decision support requires historical data which operational dBs do not typically maintain data consolidation: DS requires consolidation(aggregation, summarization) of data from heterogeneous sources data quality: different sources typically use inconsistent data representations codes and formats which have to be reconciled Note: There are more and more systems which perform OLAP analysis directly on relational databases
12 Why a Separate Data Warehouse? ◼ High performance for both systems ◼ DBMS— tuned for OLTP: access methods, indexing, concurrency control, recovery ◼ Warehouse—tuned for OLAP: complex OLAP queries, multidimensional view, consolidation ◼ Different functions and different data: ◼ missing data: Decision support requires historical data which operational DBs do not typically maintain ◼ data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources ◼ data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled ◼ Note: There are more and more systems which perform OLAP analysis directly on relational databases
Data Warehouse:A Multi-Tiered Architecture Monitor & OLAP Server Other Metadata sources Integrator Analysis Operational Extract Query DBS Transform Data Serve Reports Warehouse Refresh Data mining Data marts Data Sources Data Storage OlaP Engine Front-End Tools
13 Data Warehouse: A Multi-Tiered Architecture Data Warehouse Extract Transform Load Refresh OLAP Engine Analysis Query Reports Data mining Monitor & Integrator Metadata Data Sources Front-End Tools Serve Data Marts Operational DBs Other sources Data Storage OLAP Server
Three data warehouse models Enterprise warehouse collects all of the information about subjects spanning the entire organization Data Mart a subset of corporate-wide data that is of value to a specific groups of users. Its scope is confined to specific selected groups, such as marketing data mart Independent vs dependent directly from warehouse )data mart Virtual warehouse a set of views over operational databases Only some of the possible summary views may be materialized 14
14 Three Data Warehouse Models ◼ Enterprise warehouse ◼ collects all of the information about subjects spanning the entire organization ◼ Data Mart ◼ a subset of corporate-wide data that is of value to a specific groups of users. Its scope is confined to specific, selected groups, such as marketing data mart ◼ Independent vs. dependent (directly from warehouse) data mart ◼ Virtual warehouse ◼ A set of views over operational databases ◼ Only some of the possible summary views may be materialized
Data Warehouse Development: A Recommended Approach Multi-Tier data Warehouse Distributed Data marts Data Data Enterprise Mart Dat Mart Warehouse Model refinement Model refinement Define a high-level corporate data model 15
15 Data Warehouse Development: A Recommended Approach Define a high-level corporate data model Data Mart Data Mart Distributed Data Marts Multi-Tier Data Warehouse Enterprise Data Warehouse Model refinement Model refinement