Ch. 3 Estimation 1 The Nature of statistical Inference It is argued that it is important to develop a mathematical model purporting to provide a generalized description of the data generating process. A prob bility model in the form of the parametric family of the density functions p=f(:0),0E e and its various ramifications formulated in last chapter provides such a mathematical model. By postulating p as a probability model for the distribution of the observation of interested, we could go on to consider questions about the unknown parameters 0(via estimation and hypothesis tests)as well as further observations from the probability model(prediction) In the next section the important concept of a sampleing model is introduced as a way to link the probability model postulated, say p= f(r: 0),0E 0] to the observed data a =(a1,. In)'available. The sampling model provided the second important ingredient needed to define a statistical model; the starting point of any parametric"statistical inference In short, a statistical model is defined as comprising (a). a probability model p=f(; 0),0E0f;and (b). a sampling model x≡(X1,…,Xxn)y The concept of a statistical model provide the starting point of all forms of sta- tistical inference to be considered in the sequel. To be more precise the concept of a statistical model forms the basis of what is known as parametric in ference There is also a branch of statistical inference known as non-parametric in ference where no gp is assumed a prior 1.1 The sampling model A sampleing model is introduced as a way to link the probability model postu ated,sayp={f(x;),θ∈θ} and the observed data a≡(x1,…xn) available It is designed to model the relationship between them and refers to the way the
Ch. 3 Estimation 1 The Nature of Statistical Inference It is argued that it is important to develop a mathematical model purporting to provide a generalized description of the data generating process. A probability model in the form of the parametric family of the density functions Φ = {f(x; θ), θ ∈ Θ} and its various ramifications formulated in last chapter provides such a mathematical model. By postulating Φ as a probability model for the distribution of the observation of interested, we could go on to consider questions about the unknown parameters θ (via estimation and hypothesis tests) as well as further observations from the probability model (prediction). In the next section the important concept of a sampleing model is introduced as a way to link the probability model postulated, say Φ = {f(x; θ), θ ∈ Θ}, to the observed data x ≡ (x1, ..., xn) ′ available. The sampling model provided the second important ingredient needed to define a statistical model; the starting point of any ”parametric” statistical inference. In short, a statistical model is defined as comprising (a). a probability model Φ = {f(x; θ), θ ∈ Θ}; and (b). a sampling model x ≡ (X1, ..., Xn) ′ . The concept of a statistical model provide the starting point of all forms of statistical inference to be considered in the sequel. To be more precise, the concept of a statistical model forms the basis of what is known as parametric inference. There is also a branch of statistical inference known as non−parametric inference where no Φ is assumed a priori. 1.1 The sampling model A sampleing model is introduced as a way to link the probability model postulated, say Φ = {f(x; θ), θ ∈ Θ} and the observed data x ≡ (x1, ..., xn) ′ available. It is designed to model the relationship between them and refers to the way the 1
observed data can be viewed in relation to dp Definition 1 A sample is defined to be a set of random variables(X1, X2,. Xn) whose den- sity functions coincides with the"true"density function f(a; 00) as postulated by the probability model Data are generally drawn in one of two settings. A cross section sample is a sample of a number of observational units all drawn at the same point in time a time series sample is a set of observations drawn on the same observational unit at a number (usually evenly spaced) points in time. Many recently have been based on time-series cross sections, which generally consistent of the same cross section observed at several points in time. The term panel data set is usually fitting for this Given that a sample is a set of r v s related to it must have a distribution which we call the distribution of the sample The distribution of the sample x=(X1, X2, Xn), is defined to be the joint distribution of the r.v.'s X1, X2, .. Xn denoted by ∫x(x1,…,xn;)≡∫(x;) The distribution of the sample incorporates both forms of relevant informa tion, the probability as well as sample information. It must comes as no surprise to learn that f(a: 0) plays a very important role in statistical inference. The form of f(a: 8) depends crucially on the nature of the sampling model and as well as on the idea of a random experiment 2 and is called a random sanpete one based dp. The simplest but most widely used form of a sampling model is the Definition 3 A set of random variables(X1, X2,., Xn)is called a random sample from f(a: 0) if the r.v. 's X1, X2, Xn are independently and identically distributed (ii d ) In
observed data can be viewed in relation to Φ. Definition 1: A sample is defined to be a set of random variables (X1, X2, ..., Xn) whose density functions coincides with the ”true” density function f(x; θ0) as postulated by the probability model. Data are generally drawn in one of two settings. A cross section sample is a sample of a number of observational units all drawn at the same point in time. A time series sample is a set of observations drawn on the same observational unit at a number (usually evenly spaced) points in time. Many recently have been based on time-series cross sections, which generally consistent of the same cross section observed at several points in time. The term panel data set is usually fitting for this sort of study. Given that a sample is a set of r.v.’s related to Φ it must have a distribution which we call the distribution of the sample. Definition 2: The distribution of the sample x ≡ (X1, X2, ..., Xn) ′ , is defined to be the joint distribution of the r.v.’s X1, X2, ..., Xn denoted by fx(x1, ..., xn; θ) ≡ f(x; θ). The distribution of the sample incorporates both forms of relevant information, the probability as well as sample information. It must comes as no surprise to learn that f(x; θ) plays a very important role in statistical inference. The form of f(x; θ) depends crucially on the nature of the sampling model and as well as Φ. The simplest but most widely used form of a sampling model is the one based on the idea of a random experiment E and is called a random sample. Definition 3: A set of random variables (X1, X2, ..., Xn) is called a random sample from f(x; θ) if the r.v.’s X1, X2, ..., Xn are independently and identically distributed (i.i.d). In 2
this case the distribution of the sample takes the form 0)=If(x;.)=(f(x;0) i=1 the first equality due to independence and the second due to the fact that the r.v. are identically distributed A less restrictive form of a sample model in which we call an independent sample, where the identically distributed condition in the random sample is re Definition 4 A set of random variables(X1, X2,..., Xn)is said to be an independent sample Grom f(ri; 01), i= 1, 2, .., respectively, if the r v's X1, X2, .. Xn are indepen- dent. In this case the distribution of the sample takes the form f(r 0)=Tf(x;0) Usually the density function f(:: 81), i= 1, 2,..., n belong to the same family but their numerical characteristics(moments, etc )may differs If we relax the independence assumption as well we have what we can call a non-random sample Definition 5 A set of random variables(X1, X2,.,Xn)I is said to be a non-random sample from f(1, 2,., n; 0) if the r v 's X1, X2, .Xn are non-i.id. In this case the only decomposition of the distribution of the sample possible is f( f(cil ) given To, where f(eiII,,Ti-1; 01), i= 1, 2,.n represent the conditional distri- bution of Xi given X1, X2, ., Xi-1 IHere, we must regard this set of random variables as a sample of size ' one'from a multi- variate point of view
this case the distribution of the sample takes the form f(x1, ..., xn; θ) = Yn i=1 f(xi ; θ) = (f(x; θ))n the first equality due to independence and the second due to the fact that the r.v. are identically distributed. A less restrictive form of a sample model in which we call an independent sample, where the identically distributed condition in the random sample is relaxed. Definition 4: A set of random variables (X1, X2, ..., Xn) is said to be an independent sample from f(xi ; θi), i = 1, 2, ...n, respectively, if the r.v.’s X1, X2, ..., Xn are independent. In this case the distribution of the sample takes the form f(x1, ..., xn; θ) = Yn i=1 f(xi ; θi). Usually the density function f(xi ; θi), i = 1, 2, ..., n belong to the same family but their numerical characteristics (moments, etc.) may differs. If we relax the independence assumption as well we have what we can call a non-random sample. Definition 5: A set of random variables (X1, X2, ..., Xn) 1 is said to be a non-random sample from f(x1, x2, ..., xn; θ) if the r.v.’s X1, X2, ...Xn are non-i.i.d.. In this case the only decomposition of the distribution of the sample possible is f(x1, ..., xn; θ) = Yn i=1 f(xi |x1, ..., xi−1; θi) given x0, where f(xi |x1, ..., xi−1; θi), i = 1, 2, ...n represent the conditional distribution of Xi given X1, X2, ..., Xi−1. 1Here, we must regard this set of random variables as a sample of size ’one’ from a multivariate point of view. 3
n the context of statistical inferences need to postulate both probability as well as a sampling model and thus we define a statistical model as comprising A statistical model is defined as comprising (a). a probability model重={f(x;6),∈e};and (b). a sampling model x≡(X1,X2,…,Xn) It must be emphasized that the two important components of a statistical model, the probability and sampling models, are clearly interrelated. For ex ample we cannot postulate the probability modelΦ={f(x;),θ∈e} if the sample x is non-random. This is because if the r.v. 's X1, X2,. Xn are not independent the probability model must be defined in terms of their joint distri- bution,ie.重={f(x1,x2,…,xn;日),θ∈}( for example, stock price).More over, in the case of an independent but not identically distributed sample we need to specify the individual density functions for each r.v. in the sample, i.e. 重={(xk;日),6∈,k=1,2,…,n}. The most important implication of this relationship is that when the sampling model postulated is found to be inappro- priate it means that the probability model has to be re-specified as well. 1. 2 An overview of statistical inference The statistical model in conjunction with the observed data enable us to consider the following question (A). Are the observed data consistent with the postulated statistical model (model misspeci fication) (B). Assuming that the postulated statistical model is consistent with the ob served data, what can we infer about the unknown parameter bEe? (a). Can we decrease the uncertainty about 8 by reducing the parameters space from e to Oo where Oo is a subset of e.(confidence estimation (b). Can we decrease the uncertainty about 8 by choosing a particular value in 8, say 8, as providing the most representative value of 0?(point estimation
In the context of statistical inferences need to postulate both probability as well as a sampling model and thus we define a statistical model as comprising both. Definition 6: A statistical model is defined as comprising (a). a probability model Φ = {f(x; θ), θ ∈ Θ}; and (b). a sampling model x ≡ (X1, X2, ..., Xn) ′ . It must be emphasized that the two important components of a statistical model, the probability and sampling models, are clearly interrelated. For example we cannot postulate the probability model Φ = {f(x; θ), θ ∈ Θ} if the sample x is non-random. This is because if the r.v.’s X1, X2, ..., Xn are not independent the probability model must be defined in terms of their joint distribution,.i.e. Φ = {f(x1, x2, ..., xn; θ), θ ∈ Θ} (for example, stock price). Moreover, in the case of an independent but not identically distributed sample we need to specify the individual density functions for each r.v. in the sample, i.e. Φ = {fk(xk; θ), θ ∈ Θ, k = 1, 2, ..., n}. The most important implication of this relationship is that when the sampling model postulated is found to be inappropriate it means that the probability model has to be re-specified as well. 1.2 An overview of statistical inference The statistical model in conjunction with the observed data enable us to consider the following question: (A). Are the observed data consistent with the postulated statistical model ? (model misspecif ication) (B). Assuming that the postulated statistical model is consistent with the observed data, what can we infer about the unknown parameter θ ∈ Θ ? (a). Can we decrease the uncertainty about θ by reducing the parameters space from Θ to Θ0 where Θ0 is a subset of Θ. (conf idence estimation) (b). Can we decrease the uncertainty about θ by choosing a particular value in θ, say θˆ, as providing the most representative value of θ ? (point estimation) 4
(c). Can we consider the question that 0 belongs to some subset 0o of 0? hypothesis testing (C). Assuming that a particular representative value 8 of 8 has been chosen what can we infer about further observations from the data generating process(DGP as described by the postulated statistical model?(prediction
(c). Can we consider the question that θ belongs to some subset Θ0 of Θ ?( hypothesis testing) (C). Assuming that a particular representative value θˆ of θ has been chosen what can we infer about further observations from the data generating process (DGP) as described by the postulated statistical model ? (prediction) 5