In many applications,the observations X;are assumed to be independent. ■Then px(x1,…,xn;8)=Π=1px;(xii8). It is often analytically or computationally convenient to maximize its logarithm,called the log-likelihood function (over 0) n logpx(x0)=logpx:(x:0) i=1
◼ In many applications, the observations 𝑋𝑖 are assumed to be independent. ◼ Then 𝑝𝑋 𝑥1,… , 𝑥𝑛; 𝜃 = ς𝑖=1 𝑛 𝑝𝑋𝑖 𝑥𝑖 ;𝜃 . ◼ It is often analytically or computationally convenient to maximize its logarithm, called the log-likelihood function (over 𝜃) log 𝑝𝑋 𝑥1,… , 𝑥𝑛; 𝜃 = 𝑖=1 𝑛 log 𝑝𝑋𝑖 𝑥𝑖 ;𝜃
The term "likelihood"needs to be interpreted properly. Having observed the value x of X,px(x,0)is not the probability that the unknown parameter is equal to 0. It is the probability that the observed value x can arise when the parameter is equal to 0
◼ The term "likelihood" needs to be interpreted properly. ◼ Having observed the value 𝑥 of 𝑋, 𝑝𝑋 𝑥, 𝜃 is not the probability that the unknown parameter is equal to 𝜃. ◼ It is the probability that the observed value 𝑥 can arise when the parameter is equal to 𝜃
Thus,in maximizing the likelihood,we are asking the following question: "What is the value of 0 under which the observations we have seen are most likely to arise ?
◼ Thus, in maximizing the likelihood, we are asking the following question: ◼ "What is the value of 𝜃 under which the observations we have seen are most likely to arise?
Comparison with Bayesian MAP Recall MAP:maxp()pxI(x). ■ Thus we can interpret MLE as MAP estimation with a flat prior. ▣i.e.,a prior which is the same for allθ, o i indicating the absence of any useful prior knowledge. ■In the case of continuousθwith a bounded range,MLE is MAP with a uniform prior: fo()=c for all 0 and some constant c
Comparison with Bayesian MAP ◼ Recall MAP: max 𝜃 𝑝Θ 𝜃 𝑝𝑋|Θ 𝑥|𝜃 . ◼ Thus we can interpret MLE as MAP estimation with a flat prior. ❑ i.e., a prior which is the same for all 𝜃, ❑ indicating the absence of any useful prior knowledge. ◼ In the case of continuous 𝜃 with a bounded range, MLE is MAP with a uniform prior: 𝑓Θ 𝜃 = 𝑐 for all 𝜃 and some constant 𝑐
Estimating parameter of exponential ■ Customers arrive to a facility,with the ith customer arriving at time Yi. We assume that the ith interarrival time, Xi=Yi-Yi-1 is exponentially distributed with parameter 0, 口 with the convention Yo =0 Assume that X1,...Xn are independent. We wish to estimate the value of 0 (interpreted as the arrival rate),on the basis of the observations X1,…,Xn
Estimating parameter of exponential ◼ Customers arrive to a facility, with the 𝑖th customer arriving at time 𝑌𝑖 . ◼ We assume that the 𝑖th interarrival time, 𝑋𝑖 = 𝑌𝑖 − 𝑌𝑖−1 is exponentially distributed with parameter 𝜃, ❑ with the convention 𝑌0 = 0 ◼ Assume that 𝑋1 , … , 𝑋𝑛 are independent. ◼ We wish to estimate the value of 𝜃 (interpreted as the arrival rate), on the basis of the observations 𝑋1 ,… , 𝑋𝑛