Attributes Attribute(or dimensions, features, variables a data field, representing a characteristic or feature of a data object E.g., customer D, name, address Types ◆ Nominal ◆ Binary ◆ Numeric: quantitative a Interval-scaled □ Ratio-sca|ed 6 同济大学软件学院 ool of Software Engineering. Tongpi Unversity
6 Attributes ◼ Attribute (or dimensions, features, variables): a data field, representing a characteristic or feature of a data object. ◆ E.g., customer _ID, name, address ◼ Types: ◆ Nominal ◆ Binary ◆ Numeric: quantitative Interval-scaled Ratio-scaled
Attribute Types a Nominal: categories, states, or"names of things Hair color=auburn, black, blond, brown, grey, red, white) .marital status, occupation, ID numbers, zip codes Binary Nominal attribute with only 2 states(0 and 1) Symmetric binary: both outcomes equally important 口e.g., gender Asymmetric binary outcomes not equally important o e.g., medical test(positive Vs. negative o Convention: assign 1 to most important outcome( e.g., HIV positive) Ordinal Values have a meaningful order(ranking) but magnitude between successive values is not known Size =Small, medium, large), grades, army rankings 同济大学软件学院 ool of Software Engineering. Tongpi Unversity
7 Attribute Types ◼ Nominal: categories, states, or “names of things” ◆ Hair_color = {auburn, black, blond, brown, grey, red, white} ◆ marital status, occupation, ID numbers, zip codes ◼ Binary ◆ Nominal attribute with only 2 states (0 and 1) ◆ Symmetric binary: both outcomes equally important e.g., gender ◆ Asymmetric binary: outcomes not equally important. e.g., medical test (positive vs. negative) Convention: assign 1 to most important outcome (e.g., HIV positive) ◼ Ordinal ◆ Values have a meaningful order (ranking) but magnitude between successive values is not known. ◆ Size = {small, medium, large}, grades, army rankings
Numeric Attribute Types a Quantity(integer or real-valued ■ nterval a Measured on a scale of equal-sized units a Values have order > E.g., temperature in C or F, calendar dates 口 No true zero-point Ratio 口 Inherent zero- point a We can speak of values as being an order of magnitude larger than the unit of measurement (10 K is twice as high as 5K) >e.g., temperature in Kelvin, length, counts, monetary guantities 8 同济大学软件学院 ool of Software Engineering. Tongpi Unversity
8 Numeric Attribute Types ◼ Quantity (integer or real-valued) ◼ Interval Measured on a scale of equal-sized units Values have order ➢ E.g., temperature in C˚or F˚, calendar dates No true zero-point ◼ Ratio Inherent zero-point We can speak of values as being an order of magnitude larger than the unit of measurement (10 K˚ is twice as high as 5 K˚). ➢ e.g., temperature in Kelvin, length, counts, monetary quantities
Discrete vs. Continuous Attributes Discrete Attribute o Has only a finite or countably infinite set of values o E.g., zip codes, profession, or the set of words in a collection of documents Sometimes, represented as integer variables Note: Binary attributes are a special case of discrete attributes Continuous Attribute e Has real numbers as attribute values D E.g., temperature, height, or weight Practically, real values can only be measured and represented using a finite number of digits o Continuous attributes are typically represented as floating-point variables 同济大学软件学院 ool of Software Engineering. Tongpi Unversity
9 Discrete vs. Continuous Attributes ◼ Discrete Attribute ◆ Has only a finite or countably infinite set of values E.g., zip codes, profession, or the set of words in a collection of documents ◆ Sometimes, represented as integer variables ◆ Note: Binary attributes are a special case of discrete attributes ◼ Continuous Attribute ◆ Has real numbers as attribute values E.g., temperature, height, or weight ◆ Practically, real values can only be measured and represented using a finite number of digits ◆ Continuous attributes are typically represented as floating-point variables
Getting to Know your Data a Data objects and Attribute types a Basic Statistical Descriptions of data Data visualization Measuring Data Similarity and Dissimilarity a Summary 同济大学软件学院 10 ool of Software Engineering. Tongpi Unversity
10 Getting to Know Your Data ◼ Data Objects and Attribute Types ◼ Basic Statistical Descriptions of Data ◼ Data Visualization ◼ Measuring Data Similarity and Dissimilarity ◼ Summary