Properties of normal Distribution Curve The normal(distribution) curve From H-o to H+o: contains about 68% of the measurements (u: mean, o: standard deviation) From u-20 to H+20: contains about 95% of it From u-30 to u+30: contains about 99. 7% of it 68 95% 99.7% 32-10+1+2+332-10+1+2+332 17
17 Properties of Normal Distribution Curve ◼ The normal (distribution) curve ◼ From μ–σ to μ+σ: contains about 68% of the measurements (μ: mean, σ: standard deviation) ◼ From μ–2σ to μ+2σ: contains about 95% of it ◼ From μ–3σ to μ+3σ: contains about 99.7% of it
Graphic Displays of Basic Statistical Descriptions a Boxplot: graphic display of five-number summary a Histogram: x-axis are values y-axis repres. frequencies Quantile plot: each value X; is paired with f indicating that approximately 100 f%of data are s X Quantile-quantile q g plot: graphs the quantiles of one univariant distribution against the corresponding quantiles of another a Scatter plot: each pair of values is a pair of coordinates and plotted as points in the plane
18 Graphic Displays of Basic Statistical Descriptions ◼ Boxplot: graphic display of five-number summary ◼ Histogram: x-axis are values, y-axis repres. frequencies ◼ Quantile plot: each value xi is paired with fi indicating that approximately 100 fi % of data are xi ◼ Quantile-quantile (q-q) plot: graphs the quantiles of one univariant distribution against the corresponding quantiles of another ◼ Scatter plot: each pair of values is a pair of coordinates and plotted as points in the plane
Histogram Analysis Histogram: Graph display of tabulated frequencies shown as 40 bars It shows what proportion of cases 30 fall into each of several categories 25 Differs from a bar chart in that it is the area of the bar that denotes the 20 value, not the height as in bar 15 charts, a crucial distinction when the10 categories are not of uniform width The categories are usually specified as non-overlapping intervals of 1000 3000 5000 9000 some variable. The categories (bars) must be adjacent 19
19 Histogram Analysis ◼ Histogram: Graph display of tabulated frequencies, shown as bars ◼ It shows what proportion of cases fall into each of several categories ◼ Differs from a bar chart in that it is the area of the bar that denotes the value, not the height as in bar charts, a crucial distinction when the categories are not of uniform width ◼ The categories are usually specified as non-overlapping intervals of some variable. The categories (bars) must be adjacent 0 5 10 15 20 25 30 35 40 10000 30000 50000 70000 90000
Histograms Often Tell More than Boxplots ■ The two histograms shown in the left may have the same boxplot representation The same values for: min Q1, median Q3, max But they have rather different data distributions
20 Histograms Often Tell More than Boxplots ◼ The two histograms shown in the left may have the same boxplot representation ◼ The same values for: min, Q1, median, Q3, max ◼ But they have rather different data distributions
Quantile Plot Displays all of the data(allowing the user to assess both the overall behavior and unusual occurrences Plots quantile information For a data X data sorted in increasing order, fi indicates that approximately 100 f% of the data are below or equal to the value x; 140 540◆°◆◆ 120 100 880 60 20 0.000 0.250 0.500 0.750 1.000 f-valu 21
Data Mining: Concepts and Techniques 21 Quantile Plot ◼ Displays all of the data (allowing the user to assess both the overall behavior and unusual occurrences) ◼ Plots quantile information ◼ For a data xi data sorted in increasing order, fi indicates that approximately 100 fi% of the data are below or equal to the value xi