Questions What are the fundamentals underlying the different models? or Can we unify the different network embedding approaches
16 Questions • What are the fundamentals underlying the different models? or • Can we unify the different network embedding approaches?
Unifying DeepWalk, LINE, PTE, and node 2vec into matrix forms Algorithm Closed Matrix Form DeepWalk log(vol(G)(ET(D-1A)D-2)-log b LINE log(vol(G)D-AD)-log b a vol(Gww)(Drow) vw)-1 PTE g B vol(Gdw)(Drow)-1Adw( dw)-1 log b y vol( Glw)(Drow) Alw(Dcop)- ∑r=1(2 u Xw,uPc,w+∑xnP v.c.t g ΣaXv,a)(∑axe,u) log b A:Ae V is G's adjacency matrix with Ai, j as the edge weight between vertices i and Drow. Drow = diag(Ae) is the diagonal matrix with row sum of f 3. Dcol: Dcol= diag(ae)is the diagonal matrix with column sum D: For undirected graphs(A= A), Dcol=Drow. For brevity, D represents both Dcol& Drow D=diag(d1,., div), where di represents generalized degree of vertex i vo(G:volG)=∑t∑jAi,j=Σ i di is the volume of an weighted graph G T& b: The context window size and the number of negative sampling in skip-gram, respectively 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node 2vec WSDM,18. The most cited paperin WSDM'18 as of May 2019 17
17 Unifying DeepWalk, LINE, PTE, and node2vec into Matrix Forms 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. WSDM’18. The most cited paper in WSDM’18 as of May 2019
Starting with DeepWalk ●●0●●● Output: Input Random Node G=(VE) Walk ram Embedding
18 Starting with DeepWalk
DeepWalk Algorithm Algorithm 1: DeepWalk 1forn=1,2, ··, n do Generate a vertex sequence(a? 'ly distribution P(u Pick wl '1 according to a probabili ,…,Z) of length L by a random walk on network g: forj=1,2,…,L-Tdo 4567 for r= l,..., T do Add vertex-context pair (wn, wn r) to multiset D Add vertex-context pair(wi+r,w; )to multiset D 8 Run SGNS on D with b negative samples
19 DeepWalk Algorithm
Skip-gram with Negative Sampling SGNS maintains a multiset d that counts the occurrence of each word-context pair (w,c) Objective ∑ b#(W)#(c) (#(w,c)log g(xwx+ log g (xoxo W C Where x and x are d-dimenational vector For sufficiently large dimension d, the objective above is equivalent to factorizing the Pmi matri×们l #(w,CDI 1ogb#()#(c) 1. Levy and Goldberg. Neural word embeddings as implicit matrix factorization In NIPS 2014
20 Skip-gram with Negative Sampling 1. Levy and Goldberg. Neural word embeddings as implicit matrix factorization. In NIPS 2014 • SGNS maintains a multiset 𝓓 that counts the occurrence of each word-context pair (𝑤, 𝑐) • Objective ℒ = 𝑤 𝑐 (# 𝑤, 𝑐 log 𝑔 𝑥𝑤 𝑇 𝑥𝑐 + 𝑏# 𝑤 #(𝑐) |𝒟| log 𝑔(−𝑥𝑤 𝑇 𝑥𝑐 )) where xw and xc are d-dimenational vector • For sufficiently large dimension d, the objective above is equivalent to factorizing the PMI matrix[1] log #(𝑤, 𝑐)|𝒟| 𝑏#(𝑤)#(𝑐)