2.4 Derivatives of Traces 2 DERIVATIVES 2.4 Derivatives of Traces 2.4.1 First Order 及X=-I XT(XB)=BT (16) (BXC)-BTCT Tr(BXTC)-CB aX 成n(XTc)=C XTr(BXT)=B 2.4.2 Second Order 及X9=x T(X2B)=(XB+Bx) x(XBX)-BX+B7X 脉(KBX)=XB?+XB X(XTX)=2x XT(BXXT)=(B+B)x 及FBXCXB)-CXBBT+CXBB XT [X"BXC]BXC+BxCT T(AXBXTC)-ATCTXBT+CAXB OX (AXB+AX2A(AXb+)b See [7]. PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 11
2.4 Derivatives of Traces 2 DERIVATIVES 2.4 Derivatives of Traces 2.4.1 First Order ∂ ∂X Tr(X) = I ∂ ∂X Tr(XB) = B T (16) ∂ ∂X Tr(BXC) = B T CT ∂ ∂X Tr(BXT C) = CB ∂ ∂X Tr(XT C) = C ∂ ∂X Tr(BXT ) = B 2.4.2 Second Order ∂ ∂X Tr(X2 ) = 2X ∂ ∂X Tr(X2B) = (XB + BX) T ∂ ∂X Tr(XT BX) = BX + B T X ∂ ∂X Tr(XBXT ) = XBT + XB ∂ ∂X Tr(XT X) = 2X ∂ ∂X Tr(BXXT ) = (B + B T )X ∂ ∂X Tr(B T XT CXB) = CT XBBT + CXBBT ∂ ∂X Tr £ XT BXC¤ = BXC + B T XCT ∂ ∂X Tr(AXBXT C) = AT CT XBT + CAXB ∂ ∂X Trh (AXb + c)(AXb + c) T i = 2AT (AXb + c)b T See [7]. Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 11
2.5 Derivatives of Structured Matrices 2 DERIVATIVES 2.4.3 Higher Order 及X=kX-P RAx购- -1 ∑(XAX-r-1T T BTXTCXXTCXB CXXTCXBBT+CTXBBTXTCTX CXBBTXTCX+CTXXTCTXBBT 2.4.40ther xT(AX-B)=-(X-BAX-I)T=-X-TATBTX-T Assume B and C to be symmetric,then CX)A]--(CX(XCX(A+A)XCX) (Cx)-XX-2cx(xX)-xBX(XCX)- +2BX(XTCX)-1 See [7]. 2.5 Derivatives of Structured Matrices Assume that the matrix A has some structure,i.e.is symmetric,toeplitz,etc. In that case the derivatives of the previous section does not apply in general. In stead,consider the following general rule for differentiating a scalar function f(A) of aA dAij 8Ak 8Ai kl The matrix differentiated with respect to itself is in this document referred to as the structure matriz of A and is defined simply by OA OAij =i If A has no special structure we have simply Sij=Jij,that is,the structure matrix is simply the singleentry matrix.Many structures have a representation in singleentry matrices,see Sec.8.2.7 for more examples of structure matrices. PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 12
2.5 Derivatives of Structured Matrices 2 DERIVATIVES 2.4.3 Higher Order ∂ ∂X Tr(Xk ) = k(Xk−1 ) T ∂ ∂X Tr(AXk ) = k X−1 r=0 (XrAXk−r−1 ) T ∂ ∂X Tr £ B T XT CXXT CXB¤ = CXXT CXBBT + CT XBBT XT CT X + CXBBT XT CX + CT XXT CT XBBT 2.4.4 Other ∂ ∂X Tr(AX−1B) = −(X−1BAX−1 ) T = −X−T AT B T X−T Assume B and C to be symmetric, then ∂ ∂X Trh (XT CX) −1A i = −(CX(XT CX) −1 )(A + AT )(XT CX) −1 ∂ ∂X Trh (XT CX) −1 (XT BX) i = −2CX(XT CX) −1XT BX(XT CX) −1 +2BX(XT CX) −1 See [7]. 2.5 Derivatives of Structured Matrices Assume that the matrix A has some structure, i.e. is symmetric, toeplitz, etc. In that case the derivatives of the previous section does not apply in general. In stead, consider the following general rule for differentiating a scalar function f(A) df dAij = X kl ∂f ∂Akl ∂Akl ∂Aij = Tr "· ∂f ∂A ¸T ∂A ∂Aij # The matrix differentiated with respect to itself is in this document referred to as the structure matrix of A and is defined simply by ∂A ∂Aij = S ij If A has no special structure we have simply S ij = J ij , that is, the structure matrix is simply the singleentry matrix. Many structures have a representation in singleentry matrices, see Sec. 8.2.7 for more examples of structure matrices. Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 12
2.5 Derivatives of Structured Matrices 2 DERIVATIVES 2.5.1 Symmetric If A is symmetric,then Sj=JiJij and therefore -]+[”- That is,e.g.,([5],[161): OTr(AX) 0X A+AT-(AoI);see(20) (17) 0det(X) =2X-(X。I) (18) 0ln det(X) =2X-1-(X-1o (19) ax 2.5.2 Diagonal If X is diagonal,then ([10]): aTr(AX)=AoI (20) 8X PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 13
2.5 Derivatives of Structured Matrices 2 DERIVATIVES 2.5.1 Symmetric If A is symmetric, then S ij = J ij + J ji − J ijJ ij and therefore df dA = · ∂f ∂A ¸ + · ∂f ∂A ¸T − diag · ∂f ∂A ¸ That is, e.g., ([5], [16]): ∂Tr(AX) ∂X = A + A T − (A ◦ I), see (20) (17) ∂ det(X) ∂X = 2X − (X ◦ I) (18) ∂ ln det(X) ∂X = 2X −1 − (X −1 ◦ I) (19) 2.5.2 Diagonal If X is diagonal, then ([10]): ∂Tr(AX) ∂X = A ◦ I (20) Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 13
3 INVERSES 3 Inverses 3.1 Exact Relations 3.1.1 The Woodbury identity (A+CBCT)-1=A-1-A-C(B-1+CTA-C)-CTA-1 If P,R are positive definite,then (see [17]) (P-1+BTR-B)-BTR-1=PBT(BPBT+R)-1 3.1.2 The Kailath Variant (A+BC)-1=A-1-A-1BI+CA-1B)-1CA-1 See [4]page 153. 3.1.3 The Searle Set of Identities The following set of identities,can be found in [13],page 151, (I+A-1)-1=A(A+I)-1 (A+BBT)-B=A-B(I+BTA-B)-1 (A-1+B-1)-1=A(A+B)-1B=B(A+B)-1A A-A(A+B)-1A=B-B(A+B)-B A-1+B-1=A-1(A+B)B-1 (I+AB)-1=I-A(I+BA)-B (I+AB)-1A=A(I+AB)-1 3.2 Implication on Inverses (A+B)-1=A-1+B-1÷ AB-A=BA-B See [13]. 3.2.1 A PosDef identity Assume P,R to be positive definite and invertible,then (P-1+BTR-B)-BTR-1=PBT(BPBT+R)-1 See [?] PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 14
3 INVERSES 3 Inverses 3.1 Exact Relations 3.1.1 The Woodbury identity (A + CBCT ) −1 = A−1 − A−1C(B −1 + CT A−1C) −1CT A−1 If P, R are positive definite, then (see [17]) (P −1 + B T R−1B) −1B T R−1 = PBT (BPBT + R) −1 3.1.2 The Kailath Variant (A + BC) −1 = A−1 − A−1B(I + CA−1B) −1CA−1 See [4] page 153. 3.1.3 The Searle Set of Identities The following set of identities, can be found in [13], page 151, (I + A−1 ) −1 = A(A + I) −1 (A + BBT ) −1B = A−1B(I + B T A−1B) −1 (A−1 + B −1 ) −1 = A(A + B) −1B = B(A + B) −1A A − A(A + B) −1A = B − B(A + B) −1B A−1 + B −1 = A−1 (A + B)B −1 (I + AB) −1 = I − A(I + BA) −1B (I + AB) −1A = A(I + AB) −1 3.2 Implication on Inverses (A + B) −1 = A−1 + B −1 ⇒ AB−1A = BA−1B See [13]. 3.2.1 A PosDef identity Assume P, R to be positive definite and invertible, then (P −1 + B T R−1B) −1B T R−1 = PBT (BPBT + R) −1 See [?]. Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 14
3.3 Approximations 3 INVERSES 3.3 Approximations (I+A)-1=I-A+A2-A3+ A-A(I+A)-1AI-A-1 if A large and symmetric If o2 is small then (Q+o2M)-1Q-1-02Q-MQ-1 3.4 Generalized Inverse 3.4.1 Definition A generalized inverse matrix of the matrix A is any matrix A-such that AA-A-A The matrix A-is not unique. 3.5 Pseudo Inverse 3.5.1 Definition The pseudo inverse (or Moore-Penrose inverse)of a matrix A is the matrix A+ that fulfils I AA+A=A ⅡA+AA+=A+ III AA+symmetric V A+A symmetric The matrix A+is unique and does always exist. 3.5.2 Properties Assume A+to be the pseudo-inverse of A,then (See [3]) (A+)+=A (AT)+=(A+)T (cA)+=(1/c)A+ (ATA)+=A+(AT)+ (AAT)+=(AT)+A+ Assume A to have full rank,then (AA+)(AA+)=AA+ (A+A)(A+A)=A+A Tr(AA+)=rank(AA+) (See[14) Tr(A+A)=rank(A+A) (See[14) PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 15
3.3 Approximations 3 INVERSES 3.3 Approximations (I + A) −1 = I − A + A2 − A3 + ... A − A(I + A) −1A ∼= I − A−1 if A large and symmetric If σ 2 is small then (Q + σ 2M) −1 ∼= Q−1 − σ 2Q−1MQ−1 3.4 Generalized Inverse 3.4.1 Definition A generalized inverse matrix of the matrix A is any matrix A− such that AA−A = A The matrix A− is not unique. 3.5 Pseudo Inverse 3.5.1 Definition The pseudo inverse (or Moore-Penrose inverse) of a matrix A is the matrix A+ that fulfils I AA+A = A II A+AA+ = A+ III AA+ symmetric IV A+A symmetric The matrix A+ is unique and does always exist. 3.5.2 Properties Assume A+ to be the pseudo-inverse of A, then (See [3]) (A+) + = A (AT ) + = (A+) T (cA) + = (1/c)A+ (AT A) + = A+(AT ) + (AAT ) + = (AT ) +A+ Assume A to have full rank, then (AA+)(AA+) = AA+ (A+A)(A+A) = A+A Tr(AA+) = rank(AA+) (See [14]) Tr(A+A) = rank(A+A) (See [14]) Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 15