Chapter 2: Getting to Know Your Data Data Objects and Attribute Types Basic statistical Descriptions of Data Data visualization Measuring Data Similarity and dissimilarity Summary
1 Chapter 2: Getting to Know Your Data ◼ Data Objects and Attribute Types ◼ Basic Statistical Descriptions of Data ◼ Data Visualization ◼ Measuring Data Similarity and Dissimilarity ◼ Summary
Types of Data Sets Record Relational records Data matrix, e.g. numerical matrix, crosstabs Document data: text documents: term frequency vector Document 1 Transaction data graph and network Document 2 0 00 Vorld wide Web Document 3 00 2 3 Social or information networks Molecular structures Ordered TD tems Video data: sequence of images Bread. Coke. Milk Temporal data: time-series Beer. bread Sequential Data: transaction sequences Genetic sequence data 1234 Beer, Coke, Diaper, Milk patial, image and multimedia Beer, Bread, Diaper, Milk Spatial data: maps Coke, Diaper, Milk Image da Video data 2
2 Types of Data Sets ◼ Record ◼ Relational records ◼ Data matrix, e.g., numerical matrix, crosstabs ◼ Document data: text documents: termfrequency vector ◼ Transaction data ◼ Graph and network ◼ World Wide Web ◼ Social or information networks ◼ Molecular Structures ◼ Ordered ◼ Video data: sequence of images ◼ Temporal data: time-series ◼ Sequential Data: transaction sequences ◼ Genetic sequence data ◼ Spatial, image and multimedia: ◼ Spatial data: maps ◼ Image data: ◼ Video data: Document 1 season timeout lost wi n game score ball pla y coach team Document 2 Document 3 3 0 5 0 2 6 0 2 0 2 0 0 7 0 2 1 0 0 3 0 0 1 0 0 1 2 2 0 3 0 TID Items 1 Bread, Coke, Milk 2 Beer, Bread 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk
Important Characteristics of Structured Data Dimensionality Curse of dimensionality Sparsity Only presence counts Resolution Patterns depend on the scale a Distribution Centrality and dispersion
3 Important Characteristics of Structured Data ◼ Dimensionality ◼ Curse of dimensionality ◼ Sparsity ◼ Only presence counts ◼ Resolution ◼ Patterns depend on the scale ◼ Distribution ◼ Centrality and dispersion
Data Objects Data sets are made up of data objects a data object represents an entity Examples: sales database: customers store items, sales medical database: patients, treatments university database: students, professors, courses Also called samples, examples, instances, data points, objects, tuples. Data objects are described by attributes Database rows->data objects columns->attributes
4 Data Objects ◼ Data sets are made up of data objects. ◼ A data object represents an entity. ◼ Examples: ◼ sales database: customers, store items, sales ◼ medical database: patients, treatments ◼ university database: students, professors, courses ◼ Also called samples , examples, instances, data points, objects, tuples. ◼ Data objects are described by attributes. ◼ Database rows -> data objects; columns ->attributes
Attributes Attribute(or dimensions features, variables) a data field representing a characteristic or feature of a data object -E.g, customer_1D, name address ■ Types Nominal Binary Numeric: quantitative Interval-scaled Ratio-scaled
5 Attributes ◼ Attribute (or dimensions, features, variables): a data field, representing a characteristic or feature of a data object. ◼ E.g., customer _ID, name, address ◼ Types: ◼ Nominal ◼ Binary ◼ Numeric: quantitative ◼ Interval-scaled ◼ Ratio-scaled