6ErploratoryMultivariateAnalysis byErample UsingRtwoindividuals iand lisexpressed asd(i,l)2u)2.(2If two individuals have similar values within the table of all K variables, theyare also close in the space RK. Thus, the study of the data table can beconducted geometrically bystudyingthe distances between individuals.Weare therefore interested in all of the individuals in RK, that is, the cloudof individuals (denoted N).Analysing the distances between individuals isthereforetantamount to studyving the shape of the cloud of points.Figurel.3illustrates a cloud of points within a space RK for K =3.FIGUR.E.1.3Flight of a fock of starlings illustrating a scatterplot in IRK.The shape of the cloud Ni remains the same even when translated. Thedataarealso centred,which corresponds to considering ik-ikratherthanFik.Geometrically,this is tantamount to coinciding the centre of mass of thecloud G (with coordinates k for k =1, .,K) with the origin of reference (seeFigure 1.4). Centring presents technical advantages and is always conductedin PCA.The operation of reduction (also referred to as standardising),which con-sists of considering (ik-)/sk rather thanTik,modifies theshapeof thecloud byharmonisingits variabilityinall thedirections ofthe original vectors(i.e., the K variables). Geometrically, it means choosing standard deviation
6 Exploratory Multivariate Analysis by Example Using R two individuals i and l is expressed as d(i, l) = vuutX K k=1 (xik − xlk) 2. If two individuals have similar values within the table of all K variables, they are also close in the space R K. Thus, the study of the data table can be conducted geometrically by studying the distances between individuals. We are therefore interested in all of the individuals in R K, that is, the cloud of individuals (denoted NI ). Analysing the distances between individuals is therefore tantamount to studying the shape of the cloud of points. Figure 1.3 illustrates a cloud of points within a space R K for K = 3. FIGURE 1.3 Flight of a flock of starlings illustrating a scatterplot in R K. The shape of the cloud NI remains the same even when translated. The data are also centred, which corresponds to considering xik − x¯k rather than xik. Geometrically, this is tantamount to coinciding the centre of mass of the cloud GI (with coordinates ¯xk for k = 1, ., K) with the origin of reference (see Figure 1.4). Centring presents technical advantages and is always conducted in PCA. The operation of reduction (also referred to as standardising), which consists of considering (xik − x¯k)/sk rather than xik, modifies the shape of the cloud by harmonising its variability in all the directions of the original vectors (i.e., the K variables). Geometrically, it means choosing standard deviation
7PrincipalComponentAnalysisRKNiGFIGURE1.4Scatterplot of the individuals in RK.S as a unit of measurement in direction k. This operation is essential if thevariables are not expressed in the same units.Even when the units of measurement do not differ, this operation is generally preferable as it attachesthe same importance to each variable. Therefore, we will assume this to bethe casefromhere on in.StandardisedPCA occurswhenthevariablesarecentred and reduced,and unstandardised PCA when the variables are onlycentred.When not otherwise specified, it may be assumed that we are usingstandardisedPCAComment:WeightingIndividualsSo far we have assumed that all individuals have the same weight.This appliesto almost all applications and is always assumed to be the case. Nevertheless,generalisation with unspecified weightsposes no conceptual or practical prob-lems (double weight is equivalent to two identical individuals) and most softwarepackages,including FactoMineR,envisagethis possibility(FactoMineRis a package dedicated to Factor Analysis and Data Mining with R; see Section A.2.3 in the appendix). For example, it may be useful to assign a differentweight to each individual after having rectified a sample.In all cases,it isconvenientto consider that the sum of theweightsis equal to 1.If supposedto be of the same weight,each individual will be assigned a weight of 1/I1.3.2FittingtheCloud ofIndividuals1.3.2.1BestPlaneRepresentation of N1The aim of PCA is to represent the cloud of points in a space with reduceddimensions in an "optimal" manner, that is to say, by distorting the distances
Principal Component Analysis 7 O FIGURE 1.4 Scatterplot of the individuals in R K. sk as a unit of measurement in direction k. This operation is essential if the variables are not expressed in the same units. Even when the units of measurement do not differ, this operation is generally preferable as it attaches the same importance to each variable. Therefore, we will assume this to be the case from here on in. Standardised PCA occurs when the variables are centred and reduced, and unstandardised PCA when the variables are only centred. When not otherwise specified, it may be assumed that we are using standardised PCA. Comment: Weighting Individuals So far we have assumed that all individuals have the same weight. This applies to almost all applications and is always assumed to be the case. Nevertheless, generalisation with unspecified weights poses no conceptual or practical problems (double weight is equivalent to two identical individuals) and most software packages, including FactoMineR, envisage this possibility (FactoMineR is a package dedicated to Factor Analysis and Data Mining with R; see Section A.2.3 in the appendix). For example, it may be useful to assign a different weight to each individual after having rectified a sample. In all cases, it is convenient to consider that the sum of the weights is equal to 1. If supposed to be of the same weight, each individual will be assigned a weight of 1/I. 1.3.2 Fitting the Cloud of Individuals 1.3.2.1 Best Plane Representation of NI The aim of PCA is to represent the cloud of points in a space with reduced dimensions in an “optimal” manner, that is to say, by distorting the distances
8ErploratoryMultivariateAnalysisbyErample UsingRbetween individuals as little as possible.Figure 1.5 gives two representationsof threedifferentfruits.Theviewpointschosenforthe imagesof thefruits onthe top line make them difficult to identify. On the second row, the fruits canbe more easily recognised.What is it which differentiates the views of eachfruit between the first and the second lines? In the pictures on the second line,the distances are less distorted and the representations take up more spaceon the image.The image is aprojection of a three-dimensional object in atwo-dimensional space.FIGURE1.5Two-dimensional representations of fruits: from left to right, an avocado, amelon, and a banana; each row corresponds to a different representation.For a representation to be successful, it must select an appropriate view-point.Moregenerally,PCA means searching for the best representationalspace (of reduced dimension) thus enabling optimal visualisation of the shapeof a cloud with K dimensions.We often useaplane representation alone,which can prove inadequate when dealing with particularly complex data.To obtain this representation, the cloud N, is projected on a plane of RKdenoted P, chosen in such a manner as to minimise distortion of the cloudof points.Plane P is selected so that the distances between the projectedpoints might be as close as possible to the distances between the initial points.Since,inprojection,distancescan only decrease,wetrytomaketheprojecteddistances as high as possible.By denoting H, the projection of the individualionplaneP,theproblem consistsoffinding P,with1ZoH? maximum.=1The convention for notation uses mechanical terms: O is the centre of gravityOH, is a vector, and the criterion is the inertia of the projection of Nr. Thecriterion which consists of increasing the variance of the projected points to amaximumisperfectlyappropriate
8 Exploratory Multivariate Analysis by Example Using R between individuals as little as possible. Figure 1.5 gives two representations of three different fruits. The viewpoints chosen for the images of the fruits on the top line make them difficult to identify. On the second row, the fruits can be more easily recognised. What is it which differentiates the views of each fruit between the first and the second lines? In the pictures on the second line, the distances are less distorted and the representations take up more space on the image. The image is a projection of a three-dimensional object in a two-dimensional space. FIGURE 1.5 Two-dimensional representations of fruits: from left to right, an avocado, a melon, and a banana; each row corresponds to a different representation. For a representation to be successful, it must select an appropriate viewpoint. More generally, PCA means searching for the best representational space (of reduced dimension) thus enabling optimal visualisation of the shape of a cloud with K dimensions. We often use a plane representation alone, which can prove inadequate when dealing with particularly complex data. To obtain this representation, the cloud NI is projected on a plane of R K denoted P, chosen in such a manner as to minimise distortion of the cloud of points. Plane P is selected so that the distances between the projected points might be as close as possible to the distances between the initial points. Since, in projection, distances can only decrease, we try to make the projected distances as high as possible. By denoting Hi the projection of the individual i on plane P, the problem consists of finding P, with X I i=1 OH2 i maximum. The convention for notation uses mechanical terms: O is the centre of gravity, OHi is a vector, and the criterion is the inertia of the projection of NI . The criterion which consists of increasing the variance of the projected points to a maximum is perfectly appropriate
9PrincipalComponentAnalysisRemarkIf the individuals are weighted with different weights pi, the maximised crite-rion is I-I PiOH?.In some rare cases, it might be interesting to search for the best axialrepresentation of cloud Nr alone.This best axis is obtained in the same way:find thecomponentuiwheni=,OH?aremaximum (whereH, is thepro-jection of ion u).It can be shown thatplaneP contains component ui (the"best"plane contains the“"best"component):in this case, these representa-tions are said to be nested. An illustration of this property is presented inFigure l.6. Planets, which are in a three-dimensional space, are traditionallyrepresented on a component.This component determines their positions aswell as possibleinterms of their distancesfrom one other (in terms of inertiaof the projected cloud). We can also represent planets on a plane accordingto the same principle: to maximise the inertia of the projected scatterplot(on the plane).This best plane representation also contains the best axialrepresentation.UranusMarsSaturnEarthooSunOMercuryOONeptuneVenusJupiteroPlutoFIGURE 1.6The best axial representation is nested in the best plane representation of thesolar system (18February 2008)We define plane P by two nonlinear vectors chosen as follows:vector ui,which defines the best axis (and which is included in P), and vector u2 ofthe plane P orthogonal to ui.Vector u2 corresponds to the vector whichexpresses thegreatest variability of Nr oncethatwhich is expressed byui is
Principal Component Analysis 9 Remark If the individuals are weighted with different weights pi , the maximised criterion is PI i=1 piOH2 i . In some rare cases, it might be interesting to search for the best axial representation of cloud NI alone. This best axis is obtained in the same way: find the component u1 when PI i=1 OH2 i are maximum (where Hi is the projection of i on u1). It can be shown that plane P contains component u1 (the “best” plane contains the “best” component): in this case, these representations are said to be nested. An illustration of this property is presented in Figure 1.6. Planets, which are in a three-dimensional space, are traditionally represented on a component. This component determines their positions as well as possible in terms of their distances from one other (in terms of inertia of the projected cloud). We can also represent planets on a plane according to the same principle: to maximise the inertia of the projected scatterplot (on the plane). This best plane representation also contains the best axial representation. Neptune Uranus Mercury Sun Mars Saturn Earth Venus Jupiter Pluto Neptune Uranus Mercury Sun Mars Saturn Earth Venus Pluto Jupiter FIGURE 1.6 The best axial representation is nested in the best plane representation of the solar system (18 February 2008). We define plane P by two nonlinear vectors chosen as follows: vector u1, which defines the best axis (and which is included in P), and vector u2 of the plane P orthogonal to u1. Vector u2 corresponds to the vector which expresses the greatest variability of NI once that which is expressed by u1 is
10ErploratoryMultivariateAnalysisbyErample UsingRremoved. In other words, the variability expressed by u2 is the best couplingand is independent of that expressed by ui.1.3.2.2Sequence of Axes for Representing NiMore generally, let us look for nested subspaces of dimensions s = 1 to Sso that each subspace is of maximum inertia for the given dimension s. The(OH.)?(whereHsubspace of dimension s is obtained bymaximisingis theprojection of i in the subspace of dimension s).:As thesubspacesare nested, it is possible to choose vector us as the vector of the orthogonalsubspaceforall of the vectorsut(with1≤t<s)whichdefinethe smallersubspaces.The first plane (defined by ui, u2), i.e., the plane of best representation, isoften sufficient for visualising cloud Nr.When S is greater than or equal to 3,we may need to visualise cloud N in the subspace of dimension S by using anumber of plane representations: the representation on (ui,u2) but also thaton (u3,u4)which is the most complementary to that on (ui,u2).However, incertain situations, we might choose to associate (u2,u3),for example, in ordertohighlightaparticularphenomenonwhichappears onthesetwocomponents.1.3.2.3HowAre the Components Obtained?Components in PCA are obtained through diagonalisation of the correlationmatrix which extracts the associated eigenvectors and eigenvalues. The eigen-vectors correspond to vectors u,which are each associated with the eigenvaluesof rank s (denoted X),as the eigenvalues are ranked in descending order.Theeigenvalue , is interpreted as the inertia of cloud Ni projected on the compo-nent of rank s or,in other words,the“explained variance"for the componentof rank s.If all of the eigenvectors are calculated (S =K), the PCA recreates a basisfor the space RK. In this sense, PCA can be seen as a change of basis in whichthe first vectors of the new basis play an important role.RemarkWhen variables are centred but not standardised, the matrix to be diagonalised is the variance-covariance matrix.1.3.2.4ExampleThe distance between two orange juices is calculated using their seven sensorydescriptors.Wedecided to standardise the datato attribute eachdescriptonequalinfluence.Figure1.7isobtainedfromthefirsttwocomponentsofthePCA and corresponds to the best plane for representing the cloud of individu-als in terms of projected inertia.The inertiaprojected on the plane is the sumof two eigenvalues, that is,86.82% (=67.77%+19.05%)of the total inertiaof the cloud ofpoints.The first principal component, that is, the principal axis of variability
10 Exploratory Multivariate Analysis by Example Using R removed. In other words, the variability expressed by u2 is the best coupling and is independent of that expressed by u1. 1.3.2.2 Sequence of Axes for Representing NI More generally, let us look for nested subspaces of dimensions s = 1 to S so that each subspace is of maximum inertia for the given dimension s. The subspace of dimension s is obtained by maximising PI i=1 (OHi) 2 (where Hi is the projection of i in the subspace of dimension s). As the subspaces are nested, it is possible to choose vector us as the vector of the orthogonal subspace for all of the vectors ut (with 1 ≤ t < s) which define the smaller subspaces. The first plane (defined by u1, u2), i.e., the plane of best representation, is often sufficient for visualising cloud NI . When S is greater than or equal to 3, we may need to visualise cloud NI in the subspace of dimension S by using a number of plane representations: the representation on (u1, u2) but also that on (u3, u4) which is the most complementary to that on (u1, u2). However, in certain situations, we might choose to associate (u2, u3), for example, in order to highlight a particular phenomenon which appears on these two components. 1.3.2.3 How Are the Components Obtained? Components in PCA are obtained through diagonalisation of the correlation matrix which extracts the associated eigenvectors and eigenvalues. The eigenvectors correspond to vectors us which are each associated with the eigenvalues of rank s (denoted λs), as the eigenvalues are ranked in descending order. The eigenvalue λs is interpreted as the inertia of cloud NI projected on the component of rank s or, in other words, the “explained variance” for the component of rank s. If all of the eigenvectors are calculated (S = K), the PCA recreates a basis for the space R K. In this sense, PCA can be seen as a change of basis in which the first vectors of the new basis play an important role. Remark When variables are centred but not standardised, the matrix to be diagonalised is the variance–covariance matrix. 1.3.2.4 Example The distance between two orange juices is calculated using their seven sensory descriptors. We decided to standardise the data to attribute each descriptor equal influence. Figure 1.7 is obtained from the first two components of the PCA and corresponds to the best plane for representing the cloud of individuals in terms of projected inertia. The inertia projected on the plane is the sum of two eigenvalues, that is, 86.82% (= 67.77% + 19.05%) of the total inertia of the cloud of points. The first principal component, that is, the principal axis of variability