Extraction, Transformation, and Loading (ETL) Data extraction get data from multiple heterogeneous and external Sources Data cleaning detect errors in the data and rectify them when possible Data transformation convert data from legacy or host format to warehouse format Load sort, summarize consolidate compute views, check integrity, and build indicies and partitions Refresh propagate the updates from the data sources to the warehouse 16
16 Extraction, Transformation, and Loading (ETL) ◼ Data extraction ◼ get data from multiple, heterogeneous, and external sources ◼ Data cleaning ◼ detect errors in the data and rectify them when possible ◼ Data transformation ◼ convert data from legacy or host format to warehouse format ◼ Load ◼ sort, summarize, consolidate, compute views, check integrity, and build indicies and partitions ◼ Refresh ◼ propagate the updates from the data sources to the warehouse
Metadata Repository Meta data is the data defining warehouse objects. It stores Description of the structure of the data warehouse schema, view dimensions, hierarchies derived data definition data mart locations and contents Operational meta-data data lineage(history of migrated data and transformation path), currency of data(active, archived or purged), monitoring information (warehouse usage statistics error reports, audit trails The algorithms used for summarization The mapping from operational environment to the data warehouse Data related to system performance warehouse schema, view and derived data definitions Business data business terms and definitions ownership of data, charging policies 17
17 Metadata Repository ◼ Meta data is the data defining warehouse objects. It stores: ◼ Description of the structure of the data warehouse ◼ schema, view, dimensions, hierarchies, derived data definition, data mart locations and contents ◼ Operational meta-data ◼ data lineage (history of migrated data and transformation path), currency of data (active, archived, or purged), monitoring information (warehouse usage statistics, error reports, audit trails) ◼ The algorithms used for summarization ◼ The mapping from operational environment to the data warehouse ◼ Data related to system performance ◼ warehouse schema, view and derived data definitions ◼ Business data ◼ business terms and definitions, ownership of data, charging policies
From Tables and spreadsheets to Data cubes a data warehouse is based on a multidimensional data model Which views data in the form of a data cube a data cube, such as sales allows data to be modeled and viewed in multiple dimensions Dimension tables, such as item(itemname, brand type),or time(day week, month, quarter, year) Fact table contains measures(such as dollars_ sold) and keys to each of the related dimension tables In data warehousing literature, an n-d base cube is called a base cuboid. The top most 0-d cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube
18 From Tables and Spreadsheets to Data Cubes ◼ A data warehouse is based on a multidimensional data model which views data in the form of a data cube ◼ A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions ◼ Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year) ◼ Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables ◼ In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube
Cube:a lattice of cuboids o-D(apex)cuboid time Iter location supplier 1-D cuboids timle, locatioA Citemlocation location, supplier 2-D cuboids timelsupplier item supplier fiRe, location, supplier 3-D cuboids time, item location time, item, supplier item, location, supplier 4-D(base)cuboid time, item, location, supplier
19 Cube: A Lattice of Cuboids time,item time,item,location time, item, location, supplier all time item location supplier time,location time,supplier item,location item,supplier location,supplier time,item,supplier time,location,supplier item,location,supplier 0-D (apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D (base) cuboid