In many applications, the observations Xi are assumed to be independent Then px(1,,n;0)=i=1px xi; 0) It is often analytically or computationally convenient to maximize its logarithm called the log-likelihood function(over 8) logpx(x,,, xn: 0)=>logpx, (xi 0)
◼ In many applications, the observations 𝑋𝑖 are assumed to be independent. ◼ Then 𝑝𝑋 𝑥1,… , 𝑥𝑛; 𝜃 = ς𝑖=1 𝑛 𝑝𝑋𝑖 𝑥𝑖 ;𝜃 . ◼ It is often analytically or computationally convenient to maximize its logarithm, called the log-likelihood function (over 𝜃) log 𝑝𝑋 𝑥1,… , 𝑥𝑛; 𝜃 = 𝑖=1 𝑛 log 𝑝𝑋𝑖 𝑥𝑖 ;𝜃
The term "likelihood"needs to be interpreted properly Having observed the value x of X, px(x, 0)is not the probability that the unknown parameter is equal to 8 It is the probability that the observed value x can arise when the parameter is equal to 8
◼ The term "likelihood" needs to be interpreted properly. ◼ Having observed the value 𝑥 of 𝑋, 𝑝𝑋 𝑥, 𝜃 is not the probability that the unknown parameter is equal to 𝜃. ◼ It is the probability that the observed value 𝑥 can arise when the parameter is equal to 𝜃
Thus, in maximizing the likelihood, We are asking the following question aWhat is the value of 0 under which the observations we have seen are most likely to arise
◼ Thus, in maximizing the likelihood, we are asking the following question: ◼ "What is the value of 𝜃 under which the observations we have seen are most likely to arise?
Comparison with Bayesian MAP Recall MAP: max po(O)pxI(x0) Thus we can interpret mle as maP estimation with a flat prior aie, a prior which is the same for叫lθ, a indicating the absence of any useful prior knowledge In the case of continuous e with a bounded range, MLE is MAP with a uniform prior fo(0)=c for all 0 and some constant c
Comparison with Bayesian MAP ◼ Recall MAP: max 𝜃 𝑝Θ 𝜃 𝑝𝑋|Θ 𝑥|𝜃 . ◼ Thus we can interpret MLE as MAP estimation with a flat prior. ❑ i.e., a prior which is the same for all 𝜃, ❑ indicating the absence of any useful prior knowledge. ◼ In the case of continuous 𝜃 with a bounded range, MLE is MAP with a uniform prior: 𝑓Θ 𝜃 = 𝑐 for all 𝜃 and some constant 𝑐
Estimating parameter of exponential a Customers arrive to a facility, with the ith customer arriving at time Y We assume that the ith interarrival time X;=Y;-Y; is exponentially distributed with parameter 0 o with the convention 0 0 Assume that X1., Xn are independent We wish to estimate the value of 0(interpreted as the arrival rate), on the basis of the observations X 1…,4n
Estimating parameter of exponential ◼ Customers arrive to a facility, with the 𝑖th customer arriving at time 𝑌𝑖 . ◼ We assume that the 𝑖th interarrival time, 𝑋𝑖 = 𝑌𝑖 − 𝑌𝑖−1 is exponentially distributed with parameter 𝜃, ❑ with the convention 𝑌0 = 0 ◼ Assume that 𝑋1 , … , 𝑋𝑛 are independent. ◼ We wish to estimate the value of 𝜃 (interpreted as the arrival rate), on the basis of the observations 𝑋1 ,… , 𝑋𝑛