Gradient and Derivatives(First Order) The gradient and derivative of a scalar function(f:RR)is the same. The derivative of vector functions (f C RdR)is the transpose of its gradient. we focus on the "gradient"language (i.e.,column vector) Definition 2(Gradient).Let f:C Rd-R be a differentiable function.Let x=[1,...,d].Then,the gradient of f at x is a vector in Rd denoted by Vf(x)and defined by Vf(x)= ... X Advanced Optimization (Fall 2023) Lecture 1.Mathematical Background 6
Advanced Optimization (Fall 2023) Lecture 1. Mathematical Background 6 Gradient and Derivatives (First Order) we focus on the “gradient” language (i.e., column vector)
Example Example1.The gradient off(x)=x≌∑1x号is [2x1 Vf(x)= =2x. 2xd Example 2.The gradient of f(x)=->Ini is 「-(lnx1+1) Vf(x)- -(Inza+1) Advanced Optimization (Fall 2023) Lecture 1.Mathematical Background 7
Advanced Optimization (Fall 2023) Lecture 1. Mathematical Background 7 Example
Hessian(Second Order) Definition 3(Hessian).Let f:CR->R be a twice differentiable function. Let x =[1,...,d]Then,the Hessian of f at x is the matrix in Rdxd denoted by V2f(x)and defined by Example 3.The Hessian of f(x)=-;n;is V2f(x)=diag(.). Example 4.The Hessian of f(x)=x-3x1+1is V2f(x)= 6a1x号 6ax片x2-6x2 6x2x2-9x号 2x3-18:x1x2 Advanced Optimization (Fall 2023) Lecture 1.Mathematical Background 8
Advanced Optimization (Fall 2023) Lecture 1. Mathematical Background 8 Hessian (Second Order)
Chain rule Consider scalar functions for simplicity. Chain Rule.For h(x)=f(g(x)), the gradient of h(x)is h'(x)=f'(g(x))g'(x). - the Hessian of h(x)is h"(x)=f"(g(x))(g'(x))2+f'(g(x))g"(x). Advanced Optimization (Fall 2023) Lecture 1.Mathematical Background 9
Advanced Optimization (Fall 2023) Lecture 1. Mathematical Background 9 • Consider scalar functions for simplicity. Chain Rule
Reference:The Matrix Cookbook 2 Derivatives The derivatives of vectors,matrices,norms, This section is covering differentiation of a number of expressions with respect to a matrix X.Note that it is always assumed that X has no special structure,i.e. determinants,etc can be found therein. that the elements of X are independent (e.g.not symmetric,Toeplitz,positive definite).See section 2.8 for differentiation of structured matrices.The basic assumptions can be written in a formula as 2.4.1 First Order 8X=65时 8Xi (32) (69) that is for e.g vectorforms Ox x 0aTXb 0x abT (70) 筒- [周-张 离。斋 8aTXTb X baT (71) The following rue are general and very useful when deriving the differential of an expression (191): 0aTXa 0aTXTa aaT (72 0A=0 (A is a constant) (33) ax a(ax) 8X 34 =Jij (73) 0(X+Y)=8X+Y (3 aXy a(T(X)】 =Tr(8X) 36 (XA) 8(XY=(8X)Y+X(8Y的 (37) 8Xmn =6im(A)ng =(JmnA) (74) X。Y)■ (8X)Y+Xo(8Y) 3 8X@Y)=(8X)⑧Y+X⑧(8Y) (39 8(XTA) =Bin(A)mj (J"mA) (75) 0(x-j = -X-(8x)x- 0) 0Xmn det(X)】 = Tr(adj(X)ax) 4 a(det(X))det(XyTr(X-aX) 8(In(det(X)))=Tr(X-8X) 8X =(8X)T (44 https://www.math.uwaterloo.ca/-hwolkowi/matrixcookbook.pdf 8xH=(8x" (45) Advanced Optimization (Fall 2023) Lecture 1.Mathematical Background 10
Advanced Optimization (Fall 2023) Lecture 1. Mathematical Background 10 Reference: The Matrix Cookbook https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf The derivatives of vectors, matrices, norms, determinants, etc can be found therein