Word2vecWord2vecTomasMikolov,etal.2013AwordvectorlearningframeworkBasicideaA large text corpus (corpus)Represent eachword in a fixed-size vocabulary as a vector.At each position t in the text, there is a head word c and a context word oUsing the word vector similarity of c and o, calculate the probability of o given麦通大c, and vice versa.Maximize this probability by continuously adjusting the word vector.1.MikolovT,ChenK,CorradoG,et al.Efficient estimationof word representationsinvectorspace[J].arXivpreprintarXiv:1301.3781,2013
Word2vec • Tomáš Mikolov, et al. 2013 • A word vector learning framework Word2vec • A large text corpus (corpus). • Represent each word in a fixed-size vocabulary as a vector. • At each position t in the text, there is a head word c and a context word o. • Using the word vector similarity of c and o, calculate the probability of o given c, and vice versa. • Maximize this probability by continuously adjusting the word vector. Basic idea 1. Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv:1301.3781, 2013
Word2vecExampleCompute the window and sample process of P(wt+ilwt)P(wt+2 / wt)P(wt-2 / wt)P(wt+1 / wt)P(wt-1 / wt)bankingproblemsturningintocrisesascenterword outsidecontextwordsoutsidecontextwordsinwindowofsize2atpositiontinwindowofsize2
Word2vec • Compute the window and sample process of �(��+� |��) Example
Word2vecExampleCompute the window and sample process of P(wt+ilwt)P(wt+2 / wt)P(wt-2 / wt)P(wt+1 / wWt)P(wt-1 / wt)problemsturningintobankingcrisesas..centerwordoutsidecontextwordsoutsidecontextwordsinwindowofsize2at positiontin window of size 2
Word2vec Example • Compute the window and sample process of �(��+� |��)
Word2vecObjectivefunctionFor each position t = 1, .., T, given the central word wi predict the contextword inawindow (fixed sizem)交通大学
Word2vec • For each position t = 1, ., T,given the central word ��,predict the context word in a window (fixed size m). Objective function
Word2vecObjectivefunctionFor each position t = 1, .., T, given the central word wir predict the contextword in a window (fixed size m)L(0)= Likelihood =LP(wt+jlwt; 0)t=1-m≤j≤mj+0gisallvariablestobeoptimized交通大学
Word2vec Likelihood = � is all variables to be optimized Objective function • For each position t = 1, ., T,given the central word ��,predict the context word in a window (fixed size m). �(�) = �=1 � −�≤�≤� �≠0 �(��+� |�� ; �)