bi 1as On is unbiased if be 0n)=0 o a desirable property On is asymptotically unbiased if lim Ee[on n→ 0, for every possible value of 0 oo becomes unbiased as the number n of observations increases o this is desirable when n is large
bias ◼ Θ 𝑛 is unbiased if 𝑏𝜃 Θ 𝑛 = 0. ❑ a desirable property. ◼ Θ 𝑛 is asymptotically unbiased if lim 𝑛→∞ 𝐸𝜃 Θ 𝑛 = 𝜃, for every possible value of 𝜃. ❑ Θ𝑛 becomes unbiased as the number 𝑛 of observations increases, ❑ this is desirable when 𝑛 is large
Consistent On is consistent if the sequence 0n converges to the true value 0, in probability, for every possible value of 8 Recal a Xn converges to a in probability if v>0,P(Xn-al≥∈)→0,asn→0 a Xn converges to a with probability 1(or almost surely) if Plan-a)=1 1→0
Consistent ◼ Θ 𝑛 is consistent if the sequence Θ 𝑛 converges to the true value 𝜃, in probability, for every possible value of 𝜃. ◼ Recall: ❑ 𝑋𝑛 converges to 𝑎 in probability if ∀𝜖 > 0, P 𝑋𝑛 − 𝑎 ≥ 𝜖 → 0, as 𝑛 → ∞. ❑ 𝑋𝑛 converges to 𝑎 with probability 1 (or almost surely) if P lim 𝑛→∞ 𝑋𝑛 = 𝑎 = 1
Mean squared error: E8 0nl This is related to the bias and the variance of ;E[⑥引=b(6n)+arn a Reason: E[X]=(EIX)2+var(X),X=0n=0n-0 a In many statistical problems, there is a tradeoff between the two terms on the right-hand-side Often a reduction in the variance is accompanied by an increase in the bias Of course, a good estimator is one that manages to keep both terms small
◼ Mean squared error: 𝐸𝜃 Θ෩𝑛 2 . ◼ This is related to the bias and the variance of Θ 𝑛: 𝐸𝜃 Θ෩𝑛 2 = 𝑏𝜃 2 Θ 𝑛 + 𝑣𝑎𝑟𝜃 Θ 𝑛 . ❑ Reason: 𝐸 𝑋 2 = 𝐸 𝑋 2 + 𝑣𝑎𝑟(𝑋), 𝑋 = Θ෩𝑛 = Θ𝑛 − 𝜃. ◼ In many statistical problems, there is a tradeoff between the two terms on the right-hand-side. ◼ Often a reduction in the variance is accompanied by an increase in the bias. ◼ Of course, a good estimator is one that manages to keep both terms small
Maximum Likelihood Estimation Mley Let the vector of observations X=(X1,m,Xn) be described by a joint PMF px(x; 0) a Note that px(x; 0)is PMF for X only, not joint distribution forⅩand Recall o is just a fixed parameter, not a random variable px(x; 0) depends on 0 Suppose we observe a particular valuex ,En) of Ⅹ
Maximum Likelihood Estimation (MLE) ◼ Let the vector of observations 𝑋 = 𝑋1,… , 𝑋𝑛 be described by a joint PMF 𝑝𝑋 𝑥; 𝜃 ❑ Note that 𝑝𝑋 𝑥; 𝜃 is PMF for 𝑋 only, not joint distribution for 𝑋 and 𝜃. ◼ Recall 𝜃 is just a fixed parameter, not a random variable. ◼ 𝑝𝑋 𝑥; 𝜃 depends on 𝜃. ◼ Suppose we observe a particular value 𝑥 = 𝑥1,… , 𝑥𝑛 of 𝑋
A maximum likelihood estimate( MLE is a value of the parameter that maximizes the numerical function Px(xi,,xn 0)over all 8 argmaxpx(1.,xn: 0) The above is for the case of discrete x, If X is continuous, then mle is n= argmax x(i.,xn: 0) 6 px(x;61) oBservation Max ML Estimate Pr rocess over (x;m)
◼ A maximum likelihood estimate (MLE) is a value of the parameter that maximizes the numerical function 𝑝𝑋 𝑥1,… , 𝑥𝑛; 𝜃 over all 𝜃. 𝜃መ 𝑛 = argmax 𝜃 𝑝𝑋 𝑥1, …, 𝑥𝑛; 𝜃 ◼ The above is for the case of discrete 𝑋. If 𝑋 is continuous, then MLE is 𝜃መ 𝑛 = argmax 𝜃 𝑓𝑋 𝑥1,… , 𝑥𝑛;𝜃