Outline (Level 1-2) Definition of Statistical Learning o Definition ofStatistics Definition of"Learning" Definition of Statistical Learning o Definition of Pattern 11/68
Outline (Level 1-2) 1 Definition of Statistical Learning Definition of “Statistics” Definition of “Learning” Definition of Statistical Learning Definition of Pattern 11 / 68
l.2.Definition of“Learning” (1)"The acquisition of knowledge or skills through study,experience,or being taught", Oxford Dictionary (2)T.M.Mitchell:A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P,if its performance at tasks in T,as measured by P,improves with experience E. 12/68
1.2. Definition of “Learning” (1) “The acquisition of knowledge or skills through study, experience, or being taught”, Oxford Dictionary (2) T. M. Mitchell: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. 12 / 68
Experience,E 1Supervised learning: {x, 2 Unsupervised learning: {x2 3 Reinforcement learning:Modeled as a Markov decision process: S,a set of states of the environment and agent oA,a set of actions of the agent Pa(s,s)=P(st+1=ssr =s,ar =a),probability of transition from state s to state s'under action a. Ra(s,s),(expected)immediate reward after transition from s to s' o f find a policyπ:(S×A)→[O,1]with maximum expected reward E(R), π(as)=P(a,=ast=s 13/68
Experience, E 1 Supervised learning: {x i , y i } N i=1 2 Unsupervised learning: {x i } N i=1 3 Reinforcement learning: Modeled as a Markov decision process: S, a set of states of the environment and agent A, a set of actions of the agent Pa(s,s ′ ) = P(st+1 = s ′ |st = s, at = a), probability of transition from state s to state s ′ under action a. Ra(s,s ′ ), (expected) immediate reward after transition from s to s ′ find a policy π : (S × A) 7→ [0, 1] with maximum expected reward E(R), π(a|s) = P(at = a|st = s) 13 / 68
Tasks,T Classification:the computer program is asked to specify which ofk categories some input belongs to.To solve this task,the learning algorithm is usually asked to produce a function: f:R”→{1,,k Output is discrete. 2 Regression:the computer program is asked to predict a numerical value given some input.To solve this task,the learning algorithm is asked to output a function: f:R”→R Output is continuous. 14/68
Tasks, T 1 Classification: the computer program is asked to specify which of k categories some input belongs to. To solve this task, the learning algorithm is usually asked to produce a function: f : R n → {1, ..., k} Output is discrete. 2 Regression: the computer program is asked to predict a numerical value given some input. To solve this task, the learning algorithm is asked to output a function: f : R n → R Output is continuous. 14 / 68
Performance Measures,P 0-1 Loss Function (Classification) )-{& 2 Square Loss Function: L(v,f(x))=(v-f(x)) 3 Absolute Loss Function: Ly,f(x)=y-f(x川 4 Log (Likelihood)Loss Function L(v,P(vx))=-log(P(vx)) 15/68
Performance Measures, P 1 0-1 Loss Function (Classification): L(y, f (x)) = { 1, y 6= f (x) 0, y = f (x) 2 Square Loss Function: L(y, f (x)) = (y − f (x))2 3 Absolute Loss Function: L(y, f (x)) = |y − f (x)| 4 Log (Likelihood) Loss Function : L(y, P(y|x)) = − log(P(y|x)) 15 / 68