当前位置：和泉文库 > 统计 > 浏览文档

《实用非参数统计》课程教学资源（阅读材料）回归与方差分析 Practical Regression and Anova using R（共16章）Faraway-PRA

Chapter 1 Introduction Chapter 2 Estimation Chapter 3 Inference Chapter 4 Errors in Predictors Chapter 5 Generalized Least Squares Chapter 6 Testing for Lack of Fit Chapter 7 Diagnostics Chapter 8 Transformation Chapter 9 Scale Changes, Principal Components and Collinearity Chapter 10 Variable Selection Chapter 11 Statistical Strategy and Model Uncertainty Chapter 12 Chicago Insurance Redlining - a complete example Chapter 13 Robust and Resistant Regression Chapter 14 Missing Data Chapter 15 Analysis of Covariance Chapter 16 ANOVA

文件格式：PDF，文件大小：0.99MB，售价：39.42元

共212页，可试读40页，点击往前阅读 ↑↑

文档详细内容（约212页）

2.3. MATRIX REPRESENTATION 17 is linear but Y ☎ β0 β1X β2 1 ε is not. Some relationships can be transformed to linearity — for example y ☎ β0x β 1 ε can be linearized by taking logs. Linear models seem rather restrictive but because the predictors can transformed and combined in any way, they are actually very flexible. Truly non-linear models are rarely absolutely necessary and most often arise from a theory about the relationships between the variables rather than an empirical investigation. 2.3 Matrix Representation Given the actual data, we may write yi ☎ β0 β1x1i β2x2i β3x3i εi i ☎ 1 ✂✁✂✁✂✁✄ n but the use of subscripts becomes inconvenient and conceptually obscure. We will find it simpler both notationally and theoretically to use a matrix/vector representation. The regression equation is written as y ☎ Xβ ε where y ☎ ✁ y1 ✁✂✁✂✁ yn ✂ T , ε ☎ ✁ ε1 ✁✂✁✂✁ εn ✂ T , β ☎ ✁ β0 ✁✂✁✂✁ β3 ✂ T and X ☎ ✁ ✁✂ 1 x11 x12 x13 1 x21 x22 x23 ✁✂✁✂✁ ✁✂✁✂✁ 1 xn1 xn2 xn3 ✄✆☎☎ ✝ The column of ones incorporates the intercept term. A couple of examples of using this notation are the simple no predictor, mean only model y ☎ µ ε ✂ y1 ✁✂✁✂✁ yn ✄ ✝ ☎ ✂ 1 ✁✂✁✂✁ 1 ✄ ✝ µ ✂ ε1 ✁✂✁✂✁ εn ✄ ✝ We can assume that Eε ☎ 0 since if this were not so, we could simply absorb the non-zero expectation for the error into the mean µ to get a zero expectation. For the two sample problem with a treatment group having the response y1 ✂✁✂✁✂✁✄ ym with mean µy and control group having response z1 ✂✁✂✁✂✁ zn with mean µz we have ✁ ✁ ✁ ✁ ✁ ✁✂ y1 ✁✂✁✂✁ ym z1 ✁✂✁✂✁ zn ✄✆☎☎ ☎ ☎ ☎ ☎ ✝ ☎ ✁ ✁ ✁ ✁ ✁ ✁✂ 1 0 ✁✂✁✂✁ 1 0 0 1 ✁ ✁ 0 1 ✄✆☎☎ ☎ ☎ ☎ ☎ ✝ ✞ µy µz ✟ ✁ ✁ ✁ ✁✂ ε1 ✁✂✁✂✁ ✁✂✁✂✁ ✁✂✁✂✁ εm✠n ✄✆☎☎ ☎ ☎ ✝ 2.4 Estimating β We have the regression equation y ☎ Xβ ε - what estimate of β would best separate the systematic component Xβ from the random component ε. Geometrically speaking, y ✡ IR n while β ✡ IR p where p is the number of parameters (if we include the intercept then p is the number of predictors plus one)

2.5. LEAST SQUARES ESTIMATION 18 Space spanned by X Fitted in p dimensions y in n dimensions Residual in n−p dimensions Figure 2.1: Geometric representation of the estimation β. The data vector Y is projected orthogonally onto the model space spanned by X. The fit is represented by projection yˆ ☎ X ˆβ with the difference between the fit and the data represented by the residual vector εˆ. The problem is to find β such that Xβ is close to Y. The best choice of ˆβ is apparent in the geometrical representation shown in Figure 2.4. ˆβ is in some sense the best estimate of β within the model space. The response predicted by the model is yˆ ☎ X ˆβ or Hy where H is an orthogonal projection matrix. The difference between the actual response and the predicted response is denoted by εˆ — the residuals. The conceptual purpose of the model is to represent, as accurately as possible, something complex — y which is n-dimensional — in terms of something much simpler — the model which is p-dimensional. Thus if our model is successful, the structure in the data should be captured in those p dimensions, leaving just random variation in the residuals which lie in an n p dimensional space. We have Data ☎ Systematic Structure Random Variation n dimensions ☎ p dimensions ✁ n p ✂ dimensions 2.5 Least squares estimation The estimation of β can be considered from a non-geometric point of view. We might define the best estimate of β as that which minimizes the sum of the squared errors, ε T ε. That is to say that the least squares estimate of β, called ˆβ minimizes ∑ε 2 i ☎ ε T ε ☎ ✁ y Xβ ✂ T ✁ y Xβ ✂ Expanding this out, we get y T y 2βX T y β TX TXβ Differentiating with respect to β and setting to zero, we find that ˆβ satisfies X TX ˆβ ☎ X T y These are called the normal equations. We can derive the same result using the geometric approach. Now provided X TX is invertible ˆβ ☎ ✁ X TX ✂✁1X T y X ˆβ ☎ X ✁ X TX ✂ 1X T y ☎ Hy

2.6. EXAMPLES OF CALCULATING ˆβ 19 H ☎ X ✁ X TX ✂ 1X T is called the “hat-matrix” and is the orthogonal projection of y onto the space spanned by X. H is useful for theoretical manipulations but you usually don’t want to compute it explicitly as it is an n n matrix. Predicted values: yˆ ☎ Hy ☎ X ˆβ. Residuals: εˆ ☎ y X ˆβ ☎ y yˆ ☎ ✁ I H ✂ y Residual sum of squares: εˆ T εˆ ☎ y T ✁ I H ✂ ✁ I H ✂ y ☎ y T ✁ I H ✂ y Later we will show that the least squares estimate is the best possible estimate of β when the errors ε are uncorrelated and have equal variance - i.e. var ε ☎ σ 2 I. 2.6 Examples of calculating βˆ 1. When y ☎ µ ε, X ☎ 1 and β ☎ µ so X TX ☎ 1 T 1 ☎ n so ˆβ ☎ ✁ X TX ✂ 1X T y ☎ 1 n 1 T y ☎ ¯y 2. Simple linear regression (one predictor) yi ☎ α βxi εi ✂ y1 ✁✂✁✂✁ yn ✄ ✝ ☎ ✂ 1 x1 ✁✂✁✂✁ 1 xn ✄ ✝ ✞ α β ✟ ✂ ε1 ✁✂✁✂✁ εn ✄ ✝ We can now apply the formula but a simpler approach is to rewrite the equation as yi ☎ α✁ ✂ ✄✆☎ ✝ α β¯x β ✁ xi ¯x✂ εi so now X ☎ ✂ 1 x1 ¯x ✁✂✁✂✁ 1 xn ¯x ✄ ✝ X TX ☎ ✞ n 0 0 ∑ n i✞1 ✁ xi ¯x ✂ 2 ✟ Now work through the rest of the calculation to reconstruct the familiar estimates, i.e. ˆβ ☎ ∑ ✁ xi ¯x ✂ yi ∑ ✁ xi ¯x ✂ 2 In higher dimensions, it is usually not possible to find such explicit formulae for the parameter estimates unless X TX happens to be a simple form. 2.7 Why is βˆa good estimate? 1. It results from an orthogonal projection onto the model space. It makes sense geometrically. 2. If the errors are independent and identically normally distributed, it is the maximum likelihood estimator. Loosely put, the maximum likelihood estimate is the value of β that maximizes the probability of the data that was observed. 3. The Gauss-Markov theorem states that it is best linear unbiased estimate. (BLUE)

2.8. GAUSS-MARKOV THEOREM 20 2.8 Gauss-Markov Theorem First we need to understand the concept of an estimable function. A linear combination of the parameters ψ ☎ c T β is estimable if and only if there exists a linear combination a T y such that Ea T y ☎ c T β β Estimable functions include predictions of future observations which explains why they are worth considering. If X is of full rank (which it usually is for observational data), then all linear combinations are estimable. Gauss-Markov theorem Suppose Eε ☎ 0 and var ε ☎ σ 2 I. Suppose also that the structural part of the model, EY ☎ Xβ is correct. Let ψ ☎ c T β be an estimable function, then in the class of all unbiased linear estimates of ψ, ψˆ ☎ c T ˆβ has the minimum variance and is unique. Proof: We start with a preliminary calculation: Suppose a T y is some unbiased estimate of c T β so that Ea T y ☎ c T β β a TXβ ☎ c T β β which means that a TX ☎ c T . This implies that c must be in the range space of X T which in turn implies that c is also in the range space of X TX which means there exists a λ such that c ☎ X TXλ c T ˆβ ☎ λ TX TX ˆβ ☎ λ TX T y Now we can show that the least squares estimator has the minimum variance — pick an arbitrary estimable function a T y and compute its variance: var ✁ a T y ✂ ☎ var ✁ a T y c T ˆβ c T ˆβ ✂ ☎ var ✁ a T y λ TX T y c T ˆβ ✂ ☎ var ✁ a T y λ TX T y ✂ var ✁ c T ˆβ✂ 2cov ✁ a T y λ TX T y λ TX T y ✂ but cov ✁ a T y λ TX T y λ TX T y ✂ ☎ ✁ a T λ TX T ✂ σ 2 IXλ ☎ ✁ a TX λX TX ✂ σ 2 Iλ ☎ ✁ c T c T ✂ σ 2 Iλ ☎ 0 so var ✁ a T y ✂ ☎ var ✁ a T y λ TX T y ✂ var ✁ c T ˆβ ✂ Now since variances cannot be negative, we see that var ✁ a T y ✂✂✁ var ✁ c T ˆβ ✂ In other words c T ˆβ has minimum variance. It now remains to show that it is unique. There will be equality in above relation if var ✁ a T y λ TX T y ✂ ☎ 0 which would require that a T λ TX T ☎ 0 which means that a T y ☎ λ TX T y ☎ c T ˆβ so equality occurs only if a T y ☎ c T ˆβ so the estimator is unique

点击进入文档下载页（PDF格式）

共212页，可试读40页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录