IEEE TRANSACTIONS ON INFORMATION THEORY 6 for any vector x E R",the Laplacian matrix L is a positive where semidefinite symmetric matrix whose diagonal elements form the degree sequence d and eigenvalues form the spectrum A. h回=f四)-fEX-f'EK Therefore,the majorization Ad implies that there exists (x-EX])2 x-E[Xi] some doubly stochastic matrix p=(p)∈[0,l]nxn such Since f'(x)is concave,hi(z)is monotonically decreasing in that Pλ=d. Therefore,supo.(h()=h().Since Using the relation PA d and the convexity of f(x)in Lemma 2,we can now proceed to prove Theorem 1. h:(0)= ()-f(di)f(d)logaesloge Proof of Theorem 1:For each i E V,we define a discrete d di 6 random variable X;whose probability mass function is given the inequality in (5)can be simplified as by().where (is the Kronecker delta func- tion.Then the expectationE=d and the variance var(X)=∑=1p%(-d=∑1p号-d. ()-rd)s (6 First,we express the entropy gap in terms of the Lapalcian spectrum and the degree sequence.Since By summing both sides of the inequality (6)over i,we get di di an upper bound UB on∑=1f(a)-∑-1f(d)as i=1 vol(G) f(d)-∑dlog2(vol(G) ue-学((区w-) i=1 log2 (vol(G))- ∑,fa log2e vol(G) 区-) and similarly Hv(G)=loga(vol(G)-( log2e.(tr(L2)-tr(D2)) vol(G) (3) g2e.(tr(A2)-tr(AD)-tr(DA)) =1 we have 6 △H(G)=,(G-Hn(G=∑f0)-∑1fd =1og2e.tr(42). vol(G) (4) As a result,△H(G)=ZRf-fa≤loe48 Second,we use Jensen's inequality to prove AH(G)>0. vol(G) 6 vol(G) ■ Since f(r)is convex,f(di)=f(E[Xi])<E[f(Xi)]for any To illustrate the tightness of the bounds in Theorem 1, i{1,...:n}.By summing over i,we have we further derive bounds on the entropy gap for unweighted ∑fd)≤∑Fx=∑∑f0)-∑fA graphs,especially the regular graphs.Via multiplicative error analysis,we show that the structural information converges to 51 i=1 i=1j=1 j=1 the von Neumann graph entropy as the graph size grows. Therefore,AH(G)>0 for any undirected graphs.Actu- Corollary I(Constant bounds on the entropy gap):For any ally,AH(G)cannot be 0.To see this,suppose that the unweighted,undirected graph G.0<AH(G)<log2 e holds. Laplacian matrix can be decomposed as L=UAUT where Proof:In unweighted graph G. A=diag(,…,λn)andU=(u.,i…u,n)is a unitary nn matrix.Note that A1 =0 and u..1=1n/vn.The i-th diagonal element of the matrix UAUTu is given by t(43)=∑∑A4r =1j=1 Since the i-th diagonal element of L is di and nn b=UA,we have d-∑=for each i∈ml. -∑∑A, Recall that PA d where P is a doubly-stochastic matrix, =1j=1 therefore pij=j2.Now suppose for contradiction that AH(G)=0,which means that f(E[Xil)=E[f(X;)]for each -∑d i[n].Due to the strict convexity of f(),the discrete random i=1 variable Xi has to be deterministic.Since E[Xil =di>0, =vol(G), we have P(Xi=di)=1.However,it contradicts the fact that P(Xi =0)>P(Xi =A1)=pi,1 =Therefore the amdi≥1,therefore0<At(G)≤log8-loee≤ vol(G) contradiction implies that AH(G)>0 for any undirected log2 e. ■ graphs. Corollary 2 (Entropy gap of regular graphs):For any Finally,we use the Jensen's gap to prove the upper bound unweighted,undirected,regular graph Ga of degree d,the on AH(G)in(1).Apply the Jensen's gap to Xi and f(z), inequality0<△H(Ga)≤holds.. Ef(X】-f(EX)≤sup{h:(}var(X,(S)d Proof sketch:In any unweighted,regular graph Gd.6= x∈0,vol(G)】 ■
IEEE TRANSACTIONS ON INFORMATION THEORY 6 for any vector x ∈ R n, the Laplacian matrix L is a positive semidefinite symmetric matrix whose diagonal elements form the degree sequence d and eigenvalues form the spectrum λ. Therefore, the majorization λ d implies that there exists some doubly stochastic matrix P = (pij ) ∈ [0, 1]n×n such that Pλ = d. Using the relation Pλ = d and the convexity of f(x) in Lemma 2, we can now proceed to prove Theorem 1. Proof of Theorem 1: For each i ∈ V , we define a discrete random variable Xi whose probability mass function is given by Pn j=1 pij δλj (x), where δa(x) is the Kronecker delta function. Then the expectation E[Xi ] = Pn j=1 pijλj = di and the variance var(Xi) = Pn j=1 pij (λj − di) 2 = Pn j=1 pijλ 2 j − d 2 i . First, we express the entropy gap in terms of the Lapalcian spectrum and the degree sequence. Since H1(G) = − Xn i=1 di vol(G) log2 di vol(G) = − 1 vol(G) Xn i=1 f(di) − Xn i=1 di log2 (vol(G))! = log2 (vol(G)) − Pn i=1 f(di) vol(G) , (2) and similarly Hvn(G) = log2 (vol(G)) − Pn i=1 f(λi) vol(G) , (3) we have ∆H(G) = H1(G) − Hvn(G) = Pn i=1 f(λi) − Pn i=1 f(di) vol(G) . (4) Second, we use Jensen’s inequality to prove ∆H(G) > 0. Since f(x) is convex, f(di) = f(E[Xi ]) ≤ E[f(Xi)] for any i ∈ {1, . . . , n}. By summing over i, we have Xn i=1 f(di) ≤ Xn i=1 E[f(Xi)] = Xn i=1 Xn j=1 pijf(λj ) = Xn j=1 f(λj ). Therefore, ∆H(G) ≥ 0 for any undirected graphs. Actually, ∆H(G) cannot be 0. To see this, suppose that the Laplacian matrix can be decomposed as L = UΛU | where Λ = diag(λ1, . . . , λn) and U = (u·,1| · · · |u·,n) is a unitary matrix. Note that λ1 = 0 and u·,1 = 1n/ √ n. The i-th diagonal element of the matrix UΛU | = Pn j=1 λju·,ju | ·,j P is given by n j=1 λj |uij | 2 . Since the i-th diagonal element of L is di and L = UΛU | , we have di = Pn j=1 λj |uij | 2 for each i ∈ [n]. Recall that Pλ = d where P is a doubly-stochastic matrix, therefore pij = |uij | 2 . Now suppose for contradiction that ∆H(G) = 0, which means that f(E[Xi ]) = E[f(Xi)] for each i ∈ [n]. Due to the strict convexity of f(·), the discrete random variable Xi has to be deterministic. Since E[Xi ] = di > 0, we have P(Xi = di) = 1. However, it contradicts the fact that P(Xi = 0) ≥ P(Xi = λ1) = pi,1 = 1 n . Therefore the contradiction implies that ∆H(G) > 0 for any undirected graphs. Finally, we use the Jensen’s gap to prove the upper bound on ∆H(G) in (1). Apply the Jensen’s gap to Xi and f(x), E[f(Xi)] − f(E[Xi ]) ≤ sup x∈[0,vol(G)] {hi(x)} · var(Xi), (5) where hi(x) = f(x) − f(E[Xi ]) (x − E[Xi ])2 − f 0 (E[Xi ]) x − E[Xi ] . Since f 0 (x) is concave, hi(x) is monotonically decreasing in x. Therefore, supx∈[0,vol(G)]{hi(x)} = hi(0). Since hi(0) = f(0) − f(di) d 2 i + f 0 (di) di = log2 e di ≤ log2 e δ , the inequality in (5) can be simplified as Xn j=1 pijf(λj ) − f(di) ≤ log2 e δ · Xn j=1 pijλ 2 j − d 2 i . (6) By summing both sides of the inequality (6) over i, we get an upper bound UB on Pn j=1 f(λj ) − Pn i=1 f(di) as UB = log2 e δ · Xn i=1 Xn j=1 pijλ 2 j − d 2 i = log2 e δ · Xn j=1 λ 2 j − Xn i=1 d 2 i = log2 e δ · tr(L 2 ) − tr(D2 ) = log2 e δ · tr(A 2 ) − tr(AD) − tr(DA) = log2 e δ · tr(A 2 ). As a result, ∆H(G) = Pn i=1 f(λi)− Pn i=1 f(di) vol(G) ≤ log2 e δ tr(A 2 ) vol(G) . To illustrate the tightness of the bounds in Theorem 1, we further derive bounds on the entropy gap for unweighted graphs, especially the regular graphs. Via multiplicative error analysis, we show that the structural information converges to the von Neumann graph entropy as the graph size grows. Corollary 1 (Constant bounds on the entropy gap): For any unweighted, undirected graph G, 0 < ∆H(G) ≤ log2 e holds. Proof: In unweighted graph G, tr(A 2 ) = Xn i=1 Xn j=1 AijAji = Xn i=1 Xn j=1 Aij = Xn i=1 di = vol(G), and δ ≥ 1, therefore 0 < ∆H(G) ≤ log2 e δ tr(A 2 ) vol(G) = log2 e δ ≤ log2 e. Corollary 2 (Entropy gap of regular graphs): For any unweighted, undirected, regular graph Gd of degree d, the inequality 0 < ∆H(Gd) ≤ log2 e d holds. Proof sketch: In any unweighted, regular graph Gd, δ = d
IEEE TRANSACTIONS ON INFORMATION THEORY 7 Theorem 2 (Convergence of the multiplicative approxima- V.APPLICATIONS AND ALGORITHMS tion error):For almost all unweighted graphs G of order n, 10and decays to at the rate of o(1/loga(n). As a measure of the structural complexity of a graph the von Neumann entropy has been applied in a variety of Proof:Dairyko et al.[49]proved that for almost all applications.For example,the von Neumann graph entropy is unweighted graphs G of order n,Hvn(G)>Hvn(K1.n-1) exploited to measure the edge centrality [15]and the vertex where Ki.n-1 stands for the star graph.Since Hvn(K1.n-1)= centrality [18]in complex networks.As another example,the 1os影2m-2)-zlog2n=1+号1og2n+o1).高-1= von Neumann graph entropy can also be used to measure the 会8≤m爱=o(g ◆ distance between graphs for graph classification and anomaly detection [9].[14].In addition,the von Neumann graph entropy is used in the context of graph representation learning B. Sharpened Bounds on the Entropy Gap [27]to learn low-dimensional feature representations of nodes. We observe that,in these applications,the von Neumann graph Though the constant bounds on the entropy gap are tight entropy is used to address the following primitive tasks: enough for applications,we can still sharpen the bounds on Entropy-based network design:Design a new graph by the entropy gap in unweighted graphs using more advanced minimally perturbing the existing graph to meet the entropic majorizations. requirement.For example,Minello et al.[19]use the von Theorem 3 (Sharpened lower bound on entropy gap): Neumann entropy to explore the potential network growth For any unweighted,undirected graph G,AH(G)is lower model via experiments. bounded by (f(dmax+1)-f(dmax)+f(6-1)-f(6))/vol(G) Graph similarity measure:Compute the similarity score where diax is the maximum degree and 6 is the minimum between two graphs,which is represented by a real positive positive degree. number.For example,Domenico et al.[13]use the von Proof:The proof is based on the advanced majorization Neumann graph entropy to compute the Jensen-Shannon [50]λ>(d1+1,d2,.,dn-1)where d1≥d2≥·≥dnis distance between graphs for the purpose of compressing the sorted degree sequence of the unweighted undirected graph multilayer networks. G.Similar to the proof of Theorem 1,we havef()> Both of the primitive tasks require exact computation of f(d+1)+f(dn-1)+2 f(di).Then the sharpened the von Neumann graph entropy.To reduce the computational upper bound follows from the equation (4)since di=dmax complexity and preserve the interpretability,we can use the and dn=0. accurate proxy,structural information,to approximately solve Theorem 4 (Sharpened upper bound on entropy gap):For these tasks. any unweighted,undirected graph G=(V,E),AH(G)is up- per bounded by minflog2 e,61,b2}where b =(i) vol(G) A.Entropy-based network design 2 and-1og21+∑”1golG)-a vol(G) Network design aims to minimally perturb the network to Here (di,...,d)is the conjugate degree sequence of G fulfill some goals.Consider such a goal to maximize the where d=l{ild:≥k von Neumann entropy of a graph,it helps to understand Proof:We first prove△H(G)≤b1 using the Grone- how different structural patterns influence the entropy value. Merris majorization [22]:(df,...,d)A.Similar to the The entropy-based network design problem is formulated as proof of Theorem 1.we have∑glf(d)≥∑1fA), follows. thus b≥ o∑f)-fa=△H(G).Ve then prove Problem I(MaxEntropy):Given an unweighted,undirected vol(G) △H(G)≤b2.Since graph G=(V,E)of order n and an integer budget k,find a set F of non-existing edges of G whose addition to G creates ∑1f(A) the largest increase of the von Neumann graph entropy and vol(G) 1 0g2≤log2 IF≤k. =1 Due to the spectral nature of the von Neumann graph entropy,it is not easy to find an effective strategy to perturb the and graph,especially in the scenario where there are exponential 2=1+∑ number of combinations for the subset F.If we use the ∑=1xvol(G) vol(G) structural information as a proxy of the von Neumann entropy, Problem 1 can be reduced to maximizing H1(G')where we have△H(G)=Zif-fd≤b2 G'=(V,EU F)such that F<k.To further alleviate vol(G) the computational pressure rooted in the exponential size of the search space for F,we adopt the greedy method in which C.Entropy Gap of Various Types of Graphs the new edges are added one by one until either the structural information attains its maximum value log2 n or k new edges As summarized in Table III,we analyze the entropy gap have already been added.We denote the graph with l new of various types of graphs including the complete graph,the edges as Gi=(V,E),then Go =G.Now suppose that complete bipartite graph,the path graph,and the ring graph, we have G whose structural information is less than log2n, the proofs of which can be found in Appendix B. then we want to find a new edge e+1=(u,v)such that
IEEE TRANSACTIONS ON INFORMATION THEORY 7 Theorem 2 (Convergence of the multiplicative approximation error): For almost all unweighted graphs G of order n, H1(G) Hvn(G) − 1 > 0 and decays to 0 at the rate of o(1/ log2 (n)). Proof: Dairyko et al. [49] proved that for almost all unweighted graphs G of order n, Hvn(G) ≥ Hvn(K1,n−1) where K1,n−1 stands for the star graph. Since Hvn(K1,n−1) = log2 (2n−2)− n 2n−2 log2 n = 1+ 1 2 log2 n+o(1), H1(G) Hvn(G) −1 = ∆H(G) Hvn(G) ≤ log2 e Hvn(K1,n−1) = o( 1 log2 n ). B. Sharpened Bounds on the Entropy Gap Though the constant bounds on the entropy gap are tight enough for applications, we can still sharpen the bounds on the entropy gap in unweighted graphs using more advanced majorizations. Theorem 3 (Sharpened lower bound on entropy gap): For any unweighted, undirected graph G, ∆H(G) is lower bounded by (f(dmax+1)−f(dmax)+f(δ−1)−f(δ))/vol(G) where dmax is the maximum degree and δ is the minimum positive degree. Proof: The proof is based on the advanced majorization [50] λ (d1+1, d2, . . . , dn−1) where d1 ≥ d2 ≥ · · · ≥ dn is the sorted degree sequence of the unweighted undirected graph G. Similar to the proof of Theorem 1, we have Pn i=1 f(λi) ≥ f(d1 + 1) + f(dn − 1) + Pn−1 i=2 f(di). Then the sharpened upper bound follows from the equation (4) since d1 = dmax and dn = δ. Theorem 4 (Sharpened upper bound on entropy gap): For any unweighted, undirected graph G = (V, E), ∆H(G) is upper bounded by min{log2 e, b1, b2} where b1 = Pn i=1 f(d ∗ i ) vol(G) − Pn i=1 f(di) vol(G) and b2 = log2 (1 + Pn i=1 d 2 i /vol(G)) − Pn i=1 f(di) vol(G) . Here (d ∗ 1 , . . . , d∗ n ) is the conjugate degree sequence of G where d ∗ k = |{i|di ≥ k}|. Proof: We first prove ∆H(G) ≤ b1 using the GroneMerris majorization [22]: (d ∗ 1 , . . . , d∗ n ) λ. Similar to the proof of Theorem 1, we have Pn i=1 f(d ∗ i ) ≥ Pn i=1 f(λi), thus b1 ≥ Pn i=1 f(λi)− Pn i=1 f(di) vol(G) = ∆H(G). We then prove ∆H(G) ≤ b2. Since Pn i=1 f(λi) vol(G) = Xn i=1 P λi n j=1 λj ! log2 λi ≤ log2 Pn i=1 λ 2 P i n j=1 λj ! and Pn i=1 λ 2 P i n i=1 λi = tr(L 2 ) vol(G) = 1 + Pn i=1 d 2 i vol(G) , we have ∆H(G) = Pn i=1 f(λi)−f(di) vol(G) ≤ b2. C. Entropy Gap of Various Types of Graphs As summarized in Table III, we analyze the entropy gap of various types of graphs including the complete graph, the complete bipartite graph, the path graph, and the ring graph, the proofs of which can be found in Appendix B. V. APPLICATIONS AND ALGORITHMS As a measure of the structural complexity of a graph, the von Neumann entropy has been applied in a variety of applications. For example, the von Neumann graph entropy is exploited to measure the edge centrality [15] and the vertex centrality [18] in complex networks. As another example, the von Neumann graph entropy can also be used to measure the distance between graphs for graph classification and anomaly detection [9], [14]. In addition, the von Neumann graph entropy is used in the context of graph representation learning [27] to learn low-dimensional feature representations of nodes. We observe that, in these applications, the von Neumann graph entropy is used to address the following primitive tasks: • Entropy-based network design: Design a new graph by minimally perturbing the existing graph to meet the entropic requirement. For example, Minello et al. [19] use the von Neumann entropy to explore the potential network growth model via experiments. • Graph similarity measure: Compute the similarity score between two graphs, which is represented by a real positive number. For example, Domenico et al. [13] use the von Neumann graph entropy to compute the Jensen-Shannon distance between graphs for the purpose of compressing multilayer networks. Both of the primitive tasks require exact computation of the von Neumann graph entropy. To reduce the computational complexity and preserve the interpretability, we can use the accurate proxy, structural information, to approximately solve these tasks. A. Entropy-based network design Network design aims to minimally perturb the network to fulfill some goals. Consider such a goal to maximize the von Neumann entropy of a graph, it helps to understand how different structural patterns influence the entropy value. The entropy-based network design problem is formulated as follows, Problem 1 (MaxEntropy): Given an unweighted, undirected graph G = (V, E) of order n and an integer budget k, find a set F of non-existing edges of G whose addition to G creates the largest increase of the von Neumann graph entropy and |F| ≤ k. Due to the spectral nature of the von Neumann graph entropy, it is not easy to find an effective strategy to perturb the graph, especially in the scenario where there are exponential number of combinations for the subset F. If we use the structural information as a proxy of the von Neumann entropy, Problem 1 can be reduced to maximizing H1(G0 ) where G0 = (V, E ∪ F) such that |F| ≤ k. To further alleviate the computational pressure rooted in the exponential size of the search space for F, we adopt the greedy method in which the new edges are added one by one until either the structural information attains its maximum value log2 n or k new edges have already been added. We denote the graph with l new edges as Gl = (V, El), then G0 = G. Now suppose that we have Gl whose structural information is less than log2 n, then we want to find a new edge el+1 = (u, v) such that