Symmetric Vs Skewed Da Mean Median Median, mean and mode of ymmetric symmetric, positively and negatively skewed data Mean Mode Mode Mean positively skewed negatively skewed Median Median
December 17, 2021 Data Mining: Concepts and Techniques 11 Symmetric vs. Skewed Data ◼ Median, mean and mode of symmetric, positively and negatively skewed data positively skewed negatively skewed symmetric
Measuring the Dispersion of Data Quartiles outliers and boxplots Quartiles: Q1(25th percentile), Q3 (75th percentile) a Inter-quartile range: IQR=Q3-Q1 Five number summary: min, Q1 median, Q3,max Boxplot: ends of the box are the quartiles median is marked add whiskers, and plot outliers individually Outlier: usually, a value higher/ lower than 1.5 X IQR Variance and standard deviation(sample. s, population: 0 a Variance: (algebraic, scalable computation x2-1(∑x ∑( ∑x2 Standard deviation s(or o)is the square root of variance s2(oro
12 Measuring the Dispersion of Data ◼ Quartiles, outliers and boxplots ◼ Quartiles: Q1 (25th percentile), Q3 (75th percentile) ◼ Inter-quartile range: IQR = Q3 – Q1 ◼ Five number summary: min, Q1 , median, Q3 , max ◼ Boxplot: ends of the box are the quartiles; median is marked; add whiskers, and plot outliers individually ◼ Outlier: usually, a value higher/lower than 1.5 x IQR ◼ Variance and standard deviation (sample: s, population: σ) ◼ Variance: (algebraic, scalable computation) ◼ Standard deviation s (or σ) is the square root of variance s 2 (or σ 2) = = = − − − = − = n i n i i i n i i x n x n x x n s 1 1 2 2 1 2 2 ( ) ] 1 [ 1 1 ( ) 1 1 = = = − = − n i i n i i x N x N 1 2 2 1 2 2 1 ( ) 1
Lower Quartile Upper Extreme Extreme Boxplot Analysis a Five-number summary of a distribution 0 20 30 40 50+++++++++ 60708090100 Minimum, Q1, Median, Q3, Maximum Boxplot Data is represented with a box The ends of the box are at the first and third quartiles i e. the height of the box is IQr -4 The median is marked by a line within the box Whiskers: two lines outside the box extended to minimum and maximum Outliers: points beyond a specified outlier threshold, plotted individually
13 Boxplot Analysis ◼ Five-number summary of a distribution ◼ Minimum, Q1, Median, Q3, Maximum ◼ Boxplot ◼ Data is represented with a box ◼ The ends of the box are at the first and third quartiles, i.e., the height of the box is IQR ◼ The median is marked by a line within the box ◼ Whiskers: two lines outside the box extended to Minimum and Maximum ◼ Outliers: points beyond a specified outlier threshold, plotted individually
Graphic Displays of Basic Statistical Descriptions Boxplot: graphic display of five-number summary a Histogram: x-axis are values, y-axis repres. frequencies a Quantile plot: each value x; is paired with f indicating that approximately 100 f% of data are<x a Quantile-quantile (q-q plot: graphs the quantiles of one univariant distribution against the corresponding quantiles of another a Scatter plot: each pair of values is a pair of coordinates and plotted as points in the plane 14
14 Graphic Displays of Basic Statistical Descriptions ◼ Boxplot: graphic display of five-number summary ◼ Histogram: x-axis are values, y-axis repres. frequencies ◼ Quantile plot: each value xi is paired with fi indicating that approximately 100 fi % of data are xi ◼ Quantile-quantile (q-q) plot: graphs the quantiles of one univariant distribution against the corresponding quantiles of another ◼ Scatter plot: each pair of values is a pair of coordinates and plotted as points in the plane