Data Displays and Pictorial Representations19Figure I.8 repeats the scatter plot for the original variables but with malesmarked by solid circles andfemalesbyopen circles.Clearly,males aretypicallylargerthanfemales.15 108:1551355011560HLS70958090SVLFigure1.83D scatterplotofmaleandfemalelizardsp Points in n Dimensions.The n observations of the p variables can also be re-garded as Ppoints in n-dimensional space.Each column ofX determines one of thepoints.The ith column,X1iX2i目Xniconsisting of all n measurements on the ith variable,determines the ith point.In Chapter 3, we show how the closeness of points in n dimensions can berelat-ed tomeasuresof associationbetween the corresponding variables.I.4DataDisplaysandPictorialRepresentationsTherapiddevelopmentofpowerfulpersonal computers andworkstations has ledtoa proliferation of sophisticated statistical software for data analysis and graphics. Itis often possible,forexample,to sit atone'sdeskand examine thenature ofmultidi-mensional data with clever computer-generated pictures.These pictures are valu-able aids in understanding data and often prevent manyfalse starts and subsequentinferentialproblemsAs we shall see in Chapters 8 and 12, there are several techniques that seek torepresent p-dimensional observations in few dimensions such that the original dis-tances (or similarities) between pairs of observations are (nearly)preserved.In gen-eral, if multidimensional observations can be represented in two dimensions, thenoutliers, relationships, and distinguishable groupings can often be discerned by eye.We shall discuss and illustrate several methods for displaying multivariate data intwo dimensions One good sourceformore discussion of graphical methods is [11]
Data Displays and Pictorial Representations 19 Figure 1.8 repeats the scatter plot for the original variables but with males marked by solid circles and females by open circles. Clearly, males are typically larger than females. 15 ~ 10 5 Figure 1.8 3D scatter plot of male and female lizards. • p Points in n Dimensions. The n observations of the p variables can also be regarded as p points in n-dimensional space. Each column of X determines one of the points. The ith column, consisting of all n measurements on the ith variable, determines the ith point. In Chapter 3, we show how the closeness of points inn dimensions can be related to measures of association between the corresponding variables. 1.4 Data Displays and Pictorial Representations The rapid development of powerful personal computers and workstations has led to a proliferation of sophisticated statistical software for data analysis and graphics. It is often possible, for example, to sit at one's desk and examine the nature of multidimensional data with clever computer-generated pictures. These pictures are valuable aids in understanding data and often prevent many false starts and subsequent inferential problems. As we shall see in Chapters 8 and 12, there are several techniques that seek to represent p-dimensional observations in few dimensions such that the original distances (or similarities) between pairs of observations are (nearly) preserved. In general, if multidimensional observations can be represented in two dimensions, then outliers, relationships, and distinguishable groupings can often be discerned by eye. We shall discuss and illustrate several methods for displaying multivariate data in two dimensions. One good source for more discussion of graphical methods is [11]
20 Chapter1 Aspects of Multivariate AnalysisLinkingMultipleTwo-Dimensional ScatterPlotsOne of the more exciting new graphical procedures involves electronically connect-ingmanytwo-dimensional scatterplotsExample I.8 (Linked scatter plots and brushing)To illustrate linked two-dimensionalscatter plots,we refer to the paper-quality data in Table 1.2.These data representmeasurements on the variables xi density,x2 strength in the machine direction,and xg strength in the cross direction.Figure 1.9 shows two-dimensional scatterplots forpairsofthesevariablesorganizedasa3x3aray.Forexample,the picturein the upper left-hand cormerof thefigure is a scatter plot ofthe pairs ofobservations(xi,xa).That is, thexivalues are plotted along thehorizontal axis,and the x valuesare plotted along the vertical axis The lower right-hand corner of thefigure contains ascatter plot of the observations (xs,xi).That is,the axes are reversed.Correspondinginterpretations hold for the other scatter plots in the figure. Notice that the variablesand their three-digit ranges are indicated in the boxes along the SW-NE diagonal.Theoperation of marking (selecting),the obvious outier in the (xi,xs) scatter plotofFigure1.9createsFigure1.10(a),wheretheoutlierislabeled as specimen25 and thesame data pointishighlightedinallthe scatterplots Specimen25alsoappearstobeanoutlierin the (xx2)scatterplotbutnotinthe (x2,x3)scatterplot.Theoperationof deleting this specimen leads to the modified scatter plots of Figure 1.10(b).From Figure1.10,we notice that somepoints in,for example,the (x2,xs)scatterplot seem to be disconnected from the others. Selecting these points, using the(dashed) rectangle (see page 22), highlights the selected points in all of the otherScatter plots and leads to the display in Figure 1.11(a).Further checking revealedthat specimens 16-21, specimen 34, and specimens 38-41 were actually specimens80.3文Cross(μy)48.9135Machine(x2)104.971Density(μ)Figure 1.9 Scatterplots for the paper-quality dataof.758Table 1.2
20 Chapter 1 Aspects of Multivariate Analysis Linking Multiple Two-Dimensional Scatter Plots One of the more exciting new graphical procedures involves electronically connecting many two-dimensional scatter plots. Example 1.8 (Linkecl scatter plots and brushing) To illustrate linked two-dimensional scatter plots, we refer to the paper-quality data in Thble 1.2. These data represent measurements on the variables x1 = density, x2 = strength in the machine direction, and x3 = strength in the cross direction. Figure 1.9 shows two-dimensional scatter plots for pairs of these variables organized as a 3 X 3 array. For example, the picture in the upper left-hand comer of the figure is a scatter plot of the pairs of observations ( x1, x3 ). That is, the x1 values are plotted along the horizontal axis, and the x 3 values are plotted along the vertical axis. The lower right-hand comer of the figure contains a scatter plot of the observations ( x3, xi). That is, the axes are reversed. Corresponding interpretations hold for the other scatter plots in the figure. Notice that the variables and their three-digit ranges are indicated in the boxes along the SW-NE diagonal. The operation of marking (selecting), the obvious outlier in the (x1 , x 3) scatter plot of Figure 1.9 creates Figure l.lO(a), where the outlier is labeled as specimen 25 and the same data point is highlighted in all the scatter plots. Specimen 25 also appears to be an outlier in the ( x1 , x2 ) scatter plot but not in the ( x2 , x3 ) scatter plot. The operation of deleting this specimen leads to the modified scatter plots of Figure l.lO(b ). From Figure 1.10, we notice that some points in, for example, the ( x2 , x 3) scatter plot seem to be disconnected from the others. Selecting these points, using the (dashed) rectangle (see page 22), highlights the selected points in all of the other scatter plots and leads to the display in Figure l.ll(a). Further checking revealed that specimens 16-21, specimen 34, and specimens 38-41 were actually specimens .::1 . -~-~· . '"' . :· ·: r . · . 758 104 .971 . · . I ~ ' I ~ '- ·:· . , 135 48.9 .I . :· . 80.3 # ~· . ~ . ·: ' . . . . . ' . · . ~~ . •"\ . ·.• ~ Figure 1.9 Scatter plots for the paperquality data of Table 1.2
Data Displays and Pictorial Representations21280.3.25Cross(g):48.91353o.25Machine1(x2)r."..104.2525.971Density(x,)ii....4:.758(a)&80.3Cross(xg)48.91353o:Machine.(2)104.971DensityFigure1.1oModified(s)scatterplotsforthepaper-qualitydata.758.withoutlier(25)(a) selectedand(b)(b) deleted
::.' .~:~: . . : :r-. . . ~ . ~·. . ~. . . r ' . .758 Density (x,) .:·., .~:~: . . :· . · . . ' ~·. ··'· . . ' . : r ··. .758 Density (x,) 25 25 .971 .971 Data Displays and Pictorial Representations 21 : . I ~ ' , '- ·:·25 , ., . . 48.9 135 Machine ( x2) . .I 104 25 , ···::.:·,: . . ~. . .: . 80.3 Cross (xJ) , ~· ···:2s•. I I •• . . ' 25 . · . . ~~ . . ·.• . . {" I . '-.J (a) . ~ ' ' , . -·:· , .,. . . 104 Machine (~) 135 "' ··.:,.:,' . -'• . 48.9 ·. .I ·~·.·.1 :·7 (b) 80.3 # ~- .· .: . . . ' . . . ' ,.i. · . -:. . . , Figure 1.10 Modified scatter plots for the paper-quality data with outlier (25) (a) selected and (b) deleted
22Chapter1 Aspectsof MultivariateAnalysis80.3Cross(g)48.91351oMachine(x2)104.971Density(x,)..758(a)80.3Cross(x)68.1135Machine(x2)..114.845Figure1.l1Modifiedscatter plots with(a)groupofpointsDensityselected and(x,)(b)points,includingspecimen25,deleted.788and the scatter plotsrescaled.(b)
22 Chapter 1 Aspects of Multivariate Analysis :·.' ,···· . ···~· :~· · . ~ . . . ' . . ~. . ' .·. r . . , ~ ' , . -; I r-:-1 . ., . : . , Machine (x2) 135 104 ~===~ Density (x,) .971 48.9 . .I " ·:.·, . _, !' . . . ~. . .:· .758 .788 . Density (x,) . . ·. 114 . 845 (a) ·. ·. Machine (x2) .· . . · . : . . . . (b) 68.1 135 . .· 80.3 # ~- ···: \. ' ' .· • ' I . · . ~~ . ~ ·. ',4 . 80.3 Figure 1.1 I Modified scatter plots with (a) group of points selected and (b) points, including specimen 25, deleted and the scatter plots rescaled
Data Displays and Pictorial Representations23from an older roll of paper that was included in order to have enough plies in thecardboardbeingmanufactured.Deletingtheoutlierandthecasescorrespondingtothe olderpaper and adjusting the ranges of the remaining observations leads to thescatterplotsinFigure1.11(b).The operation of highlighting points corresponding toa selected range of one ofthe variables is called brushing.Brushing could begin with a rectangle,as in Figure1.11(a),but then the brush could be moved to provide a sequence of highlightedpoints.The process canbe stopped at anytime to provide a snapshot of thecurrentsituation.Scatter plots like those in Example 1.8 are extremely useful aids in data analy-sis.Another important new graphical technique uses software that allows the dataanalyst to view high-dimensional data as slices of various three-dimensional per-spectives. This can be done dynamically and continuously until informative viewsare obtained. A comprehensive discussion of dynamic graphical methods is avail-able in [1].A strategy for on-line multivariate exploratory graphical analysis, moti-vated by the need for a routine procedurefor searchingfor structure in multivariatedata, is given in [32].Example 1.9(Rotatedplots inthree dimensions)Fourdifferent measurements oflumberstiffnessaregiven in Table 4.3,page186.InExample 4.14,specimen(board)16and possibly specimen (board)9areidentified as unusual observations.Figures 1.12(a), (b),and (c) contain perspectives of the stiffness data in the x,X2,xgspace. These views were obtained by continually rotating and turning the three-dimensional coordinate axes.Spinning the coordinate axes allows one to geta better.16L9.16COutliers clear.Outliersmasked.(a)(b)169.(d) Good view of(c)Specimen9large.X2,yXaspace.Figure I.12Three-dimensional perspectivesforthelumber stiffness data
Data Displays and Pictorial Representations 23 from an older roll of paper that was included in order to have enough plies in the cardboard being manufactured. Deleting the outlier and the cases corresponding to the older paper and adjusting the ranges of the remaining observations leads to the scatter plots in Figure l.ll{b). The operation of highlighting points corresponding to a selected range of one of the variables is called brushing. Brushing could begin with a rectangle, as in Figure 1.11(a), but then the brush could be moved to provide a sequence of highlighted points. The process can be stopped at any time to provide a snapshot of the current situation. • Scatter plots like those in Example 1.8 are extremely useful aids in data analysis. Another important new graphical technique uses software that allows the data analyst to view high-dimensional data as slices of various three-dimensional perspectives. This can be done dynamically and continuously until informative views are obtained. A comprehensive discussion of dynamic graphical methods is available in [U A strategy for on-line multivariate exploratory graphical analysis, motivated by the need for a routine procedure for searching for structure in multivariate data, is given in [32]. Example 1.9 (Rotated plots in three dimensions) Four different measurements of lumber stiffness are given in Table 4.3, page 186. In Example 4.14, specimen (board) 16 and possibly specimen (board) 9 are identified as unusual observations. Figures 1.12(a), {b), and (c) contain perspectives of the stiffness data in the x1 , x2 , x 3 space. These views were obtained by continually rotating and turning the threedimensional coordinate axes. Spinning the coordinate axes allows one to get a better .16 xz ) . . . ' . ·: . ·. (a) Outliers clear. ~~.~= . . . . . ~ f:x2 :· • • • x3 x, 9. (c) Specimen 9large. . · . ~: . . "' I (b) Outliers masked. ·.= ••• . =.:· ~~· x • • 9 1.6 xz (d) Good view of x2, .x3, x4 space. Figure 1.12 Three-dimensional perspectives for the lumber stiffness data