第11卷第1期 智能系统学报 Vol.11 No.1 2016年2月 CAAI Transactions on Intelligent Systems Feh.2016 D0I:10.11992/is.201507073 网络出版地址:htp:/www.cmki.net/kcms/detail/23.1538.tp.201509030.1456.002.html Efficient tracker based on sparse coding with Euclidean local structure-based constraint WANG Hongyuan',ZHANG Ji',CHEN Fuhua2 (1.School of Information Science and Engineering,Changzhou University,Changzhou,Jiangsu,China 213164;2.Department of Nat- ural Science and Mathematics,West Liberty University,West Virginia,United States 26074) Abstract:Sparse coding (SC)based visual tracking(I-tracker)is gaining increasing attention,and many related algorithms are developed.In these algorithms,each candidate region is sparsely represented as a set of target tem- plates.However,the structure connecting these candidate regions is usually ignored.Lu proposed an NLSSC-tracker with non-local self-similarity sparse coding to address this issue,which has a high computational cost.In this study, we propose an Euclidean local-structure constraint based sparse coding tracker with a smoothed Euclidean local structure.With this tracker,the optimization procedure is transformed to a small-scale I,-optimization problem,sig- nificantly reducing the computational cost.Extensive experimental results on visual tracking demonstrate the effectiveness and efficiency of the proposed algorithm. Keywords:euclidean local-structure constraint;I,-tracker;sparse coding;target tracking CLC Number:TP18;TP301.6 Document Code:A Article ID:1673-4785(2016)01-0136-12 Citation:WANG Hongyuan,ZHANG Ji,CHEN Fuhua.Efficient tracker based on sparse coding with Euclidean local struc- ture-based constraint[J].CAAI Transactions on Intelligent Systems,2016,11(1):136-147. Recently,visual target tracking was widely used Based on sparse coding (SC;also referred to as in security surveillance,navigation,human-computer sparse sensing or compressive sensing)(7,Mei pro- interaction,and other applications(2.In a video se- posed an I-tracker for generative trackings,ad- quence,targets for tracking often change dynamically dressing occlusion,corruption,and some other chal- and uncertainly because of disturbance phenomena lenging issues.However,this tracker incurs a very such as occlusion,noisy and varying illumination,and high computational cost to achieve efficient tracking object appearance.Many tracking algorithms were pro- (see section 2.1 and Fig.1 for details),and the local posed in the last twenty years that can be divided into structures of similar regions are ignored,which may two categories:generative tracking and discriminant cause the instability and even failure of the I-tracker. tracking algorithms Generative algorithms (e.g., Indeed,the sparse coefficients,for representing six eigen tracker,mean-shift tracker,incremental tracker, similar regions (CR-CR)under ten template regions covariance tracker[2])adopt appearance models to ex- (T-To)with original l,-tracker,are diversified (Fig. press the target observations,whereas discriminant al- 3).Considering CR,and CR,for example,we can gorithms (e.g.,TLD3],ensemble tracking,and see that although the latter is almost the partial occlu- MILTrack[s])view tracking as a classification prob- sion version of the former,their sparse representations lem,thus attempting to distinguish the target from the are very different.Tracking CR(the woman's face) backgrounds.Here,we present a new generative algo- may fail,because the tracker is likely to incorrectly rithm. consider the region Ts(the book)as its target. Received Date:2015-07-31.Online Pulication:2015-09-30. Contrary to expectations,Xu proved that a sparse Foundation Item:National Natural Foundation of China under Grant (61572085,61502058). algorithm cannot be stable and that similar signals may Corresponding Author:Hongyuan Wang.E-mail:hywang@cczu.edu.cn. not exhibit similar sparse coefficients Thus,a
第 11 卷第 1 期 智 能 系 统 学 报 Vol.11 №.1 2016 年 2 月 CAAI Transactions on Intelligent Systems Feb. 2016 DOI:10.11992 / tis.201507073 网络出版地址:http: / / www.cnki.net / kcms/ detail / 23.1538.tp.201509030.1456.002.html Efficient tracker based on sparse coding with Euclidean local structure⁃based constraint WANG Hongyuan 1 , ZHANG Ji 1 , CHEN Fuhua 2 (1. School of Information Science and Engineering, Changzhou University, Changzhou, Jiangsu, China 213164; 2. Department of Nat⁃ ural Science and Mathematics, West Liberty University, West Virginia, United States 26074) Abstract:Sparse coding (SC) based visual tracking (l 1 ⁃tracker) is gaining increasing attention, and many related algorithms are developed. In these algorithms, each candidate region is sparsely represented as a set of target tem⁃ plates. However, the structure connecting these candidate regions is usually ignored. Lu proposed an NLSSC⁃tracker with non⁃local self⁃similarity sparse coding to address this issue, which has a high computational cost. In this study, we propose an Euclidean local⁃structure constraint based sparse coding tracker with a smoothed Euclidean local structure. With this tracker, the optimization procedure is transformed to a small⁃scale l 1 ⁃optimization problem, sig⁃ nificantly reducing the computational cost. Extensive experimental results on visual tracking demonstrate the effectiveness and efficiency of the proposed algorithm. Keywords:euclidean local⁃structure constraint; l 1 ⁃tracker; sparse coding; target tracking CLC Number:TP18; TP301.6 Document Code:A Article ID:1673⁃4785(2016)01⁃0136⁃12 Citation:WANG Hongyuan, ZHANG Ji, CHEN Fuhua. Efficient tracker based on sparse coding with Euclidean local struc⁃ ture⁃based constraint[J]. CAAI Transactions on Intelligent Systems, 2016, 11(1): 136⁃147. Received Date:2015⁃07⁃31. Online Pulication:2015⁃09⁃30. Foundation Item: National Natural Foundation of China under Grant (61572085,61502058). Corresponding Author:Hongyuan Wang. E⁃mail: hywang@ cczu.edu.cn. Recently, visual target tracking was widely used in security surveillance, navigation, human⁃computer interaction, and other applications [1⁃2] . In a video se⁃ quence, targets for tracking often change dynamically and uncertainly because of disturbance phenomena such as occlusion, noisy and varying illumination, and object appearance. Many tracking algorithms were pro⁃ posed in the last twenty years that can be divided into two categories: generative tracking and discriminant tracking algorithms [1⁃2] . Generative algorithms ( e. g., eigen tracker, mean⁃shift tracker, incremental tracker, covariance tracker [2] ) adopt appearance models to ex⁃ press the target observations, whereas discriminant al⁃ gorithms ( e. g., TLD [3] , ensemble tracking [4] , and MILTrack [5] ) view tracking as a classification prob⁃ lem, thus attempting to distinguish the target from the backgrounds. Here, we present a new generative algo⁃ rithm. Based on sparse coding ( SC; also referred to as sparse sensing or compressive sensing) [6⁃7] , Mei pro⁃ posed an l 1 ⁃tracker for generative tracking [8⁃9] , ad⁃ dressing occlusion, corruption, and some other chal⁃ lenging issues. However, this tracker incurs a very high computational cost to achieve efficient tracking (see section 2.1 and Fig.1 for details), and the local structures of similar regions are ignored, which may cause the instability and even failure of the l 1 ⁃tracker. Indeed, the sparse coefficients, for representing six similar regions (CR1 -CR6 ) under ten template regions ( T1 -T10 ) with original l 1 ⁃tracker, are diversified (Fig. 3). Considering CR1 and CR4 , for example, we can see that although the latter is almost the partial occlu⁃ sion version of the former, their sparse representations are very different. Tracking CR4 ( the womans face) may fail, because the tracker is likely to incorrectly consider the region T8(the book) as its target. Contrary to expectations, Xu proved that a sparse algorithm cannot be stable and that similar signals may not exhibit similar sparse coefficients [10] . Thus, a
1 WANG Hongyuan,et al:Efficient tracker based on sparse coding with Euclidean local structure-based constraint ·137. trade-off occurs between sparsity and stability when de- 1 Related works signing a learning algorithm.In addition,instability in the I,-optimization problem affects the performance of 1.1 Sparse coding and the I,-tracker the 1-tracker. Sparse coding is an attractive signal reconstruction Lu developed a NLSSS-tracker NLSSST)based method proposed by Candes that reconstructs a sig- on SC applying a non-local self-similarity constraint by nal y R"x with an over-complete dictionary D introducing the geometrical information of the set of R)with a sparse coefficient vectoreR.The candidates as a smoothing term to alleviate the instabil- SC formulation can be written as the lo-norm-constrain- ity of the -tracker However,its low efficiency (e- ed optimization problem as follows: ven slower than the original l-tracker,Table 4)re- min,lly-Dcli+all cllo (1) which is NP-hard,where‖·‖edenotes the vector's stricts its applicability in real-time tracking.In this Frobenius norm(i.e.,l2-nom),and‖·‖locounts study,motivated by the robustness of the I-tracker and the number of non-zero elements of the vector.Candes stability of NISSST,we propose a novel tracker, proved that the I-norm Il is the tightest upper called ELSS-tracker (ELSST),that is both robust and bound of the lo-nom‖·Io,and thus,Eq.(1)can efficient.The main contributions of this study are as be rewritten as the following I-optimization prob- follows: lem[67]: 1)An efficient tracker,i.e.,ELSST,is developed min,lly-Dcl+a‖lcli (2) by considering the local structure of the set of target Based on SC,Mei presented a nice I-tracker for candidates.In contrast to the Lu5s and Mei5s-track- robust trackingFig.1).Considering that the target e our tracker is more stable and sparse. is located in the latest frame,the I-tracker is initial- ized in the new arrival frame and N candidate regions 2)The proposed tracker shows excellent perform- are generated with Bayesian inference (Fig.la,b). ance in tracking different video sequences with regard With n templates learned from previous tracking and to scale,occlusion,pose variations,background clut- 2m trivial templates (m positive ones and m negative ter,and illumination changes. ones,where m is the dimension of 1D stretched image, The rest of this study is organized as follows:I- Fig.Ic),Eq.(2)can be solved Fig.1d,e,f).With and NLSSS-tracker are introduced in section 2:in sec- positive and negative trivial templates,Mei added a tion 3,we analyze the disadvantages of these two track- non-negative constraint c0 in Eq.(2),with which ers and propose our tracker;experimental results with the reconstruction errors of all candidate regions with our tracker and four comparison algorithms are reported SC coefficients can be used to determine the weights for each candidate,and the object in the new arrival frame in section 4;the conclusion and future work are sum- can be located with the sum of the weighted candidates. marized in section 5. The dictionaries updating strategies can be seen in target template T Trival Template/ (a)New arrival frame (b)N candidate (c)Over-complete regions dictionary Cr∈RW1 C∈RW C∈RmN cwith sparity constraint minllcll (d)Sample (e)Dictionaries (f)Coefficient of represention Fig.1 Original I-tracker algorithm
trade⁃off occurs between sparsity and stability when de⁃ signing a learning algorithm. In addition, instability in the l 1 ⁃optimization problem affects the performance of the l 1 ⁃tracker. Lu developed a NLSSS⁃tracker (NLSSST) based on SC applying a non⁃local self⁃similarity constraint by introducing the geometrical information of the set of candidates as a smoothing term to alleviate the instabil⁃ ity of the l 1 ⁃tracker [11] . However, its low efficiency (e⁃ ven slower than the original l 1 ⁃tracker, Table 4) re⁃ stricts its applicability in real⁃time tracking. In this study, motivated by the robustness of the l 1 ⁃tracker and stability of NLSSST, we propose a novel tracker, called ELSS⁃tracker (ELSST), that is both robust and efficient. The main contributions of this study are as follows: 1)An efficient tracker, i.e., ELSST, is developed by considering the local structure of the set of target candidates. In contrast to the Lu5s [11] and Mei5s⁃track⁃ er [8⁃9] , our tracker is more stable and sparse. 2) The proposed tracker shows excellent perform⁃ ance in tracking different video sequences with regard to scale, occlusion, pose variations, background clut⁃ ter, and illumination changes. The rest of this study is organized as follows: l 1 ⁃ and NLSSS⁃tracker are introduced in section 2; in sec⁃ tion 3, we analyze the disadvantages of these two track⁃ ers and propose our tracker; experimental results with our tracker and four comparison algorithms are reported in section 4; the conclusion and future work are sum⁃ marized in section 5. 1 Related works 1.1 Sparse coding and the l 1 ⁃tracker Sparse coding is an attractive signal reconstruction method proposed by Candes [6⁃7] that reconstructs a sig⁃ nal y ∈ R m×1 with an over⁃complete dictionary D ∈ R m×(n+2m) with a sparse coefficient vectorc ∈ R n×1 . The SC formulation can be written as the l 0 ⁃norm⁃constrain⁃ ed optimization problem as follows: minc‖y - Dc‖2 F + α‖c‖0 (1) which is NP⁃hard, where ‖·‖F denotes the vector’s Frobenius norm ( i. e., l 2 ⁃norm), and ‖·‖0 counts the number of non⁃zero elements of the vector. Candes proved that the l 1 ⁃norm ‖·‖1 is the tightest upper bound of the l 0 ⁃norm ‖·‖0 , and thus, Eq.(1) can be rewritten as the following l 1 ⁃optimization prob⁃ lem [6⁃7] : minc ‖y - Dc‖2 F + α ‖c‖1 (2) Based on SC, Mei presented a nice l 1 ⁃tracker for robust tracking [8⁃9] (Fig. 1). Considering that the target is located in the latest frame, the l 1 ⁃tracker is initial⁃ ized in the new arrival frame and N candidate regions are generated with Bayesian inference ( Fig. 1a, b). With n templates learned from previous tracking and 2m trivial templates (m positive ones and m negative ones, where m is the dimension of 1D stretched image, Fig. 1c), Eq.(2) can be solved (Fig. 1d,e,f). With positive and negative trivial templates, Mei added a non⁃negative constraint c≥0 in Eq. (2), with which the reconstruction errors of all candidate regions with SC coefficients can be used to determine the weights for each candidate, and the object in the new arrival frame can be located with the sum of the weighted candidates. The dictionaries updating strategies can be seen in [8⁃9] . Fig.1 Original l 1 ⁃tracker algorithm 第 1 期 WANG Hongyuan, et al: Efficient tracker based on sparse coding with Euclidean local structure⁃based constraint ·137·
·138. 智能系统学报 第11卷 1.2 Non-local self-similarity based sparse coding rameter enforcing similarity,and s,is the normalization for tracking (NLSSST) factor.The weight measures the similarity between Recently,Xu indicated the trade-off between the K-neighborhood of y;and y:.Lu's algorithm actual- sparsity and stability in sparse regularized algo- ly attempts to solve the following: rithms[].Moreover,Yang pointed out the same A-op- timization issue in pattern classification Based on minc I Y-DC+a cll+B II C-CWIl the fact that lots of similar regions exist in all N candi- (4) dates generated by Bayesian inference,Lu proposed Taking the solution of the I-tracker from Eq.(2) his tracker with the non-local self-similarity constraint as the initial coefficients co,Eq.(4)can be solved as 含s-含6P=1c-cw1 through iterative computations.However,the high (3) computational cost of the original I,-tracker and itera- i=1 where c;and c,are the sparse coefficients corresponding tive procedure for maintaining the neighborhood con- to the candidate regions y;and y,respectively,and w straints of sparse coefficients make NLSSST inefficient is the weight assigned to c.Given N m-dimensional in achieving real-timing tracking.In contrast to Fig.1, candidates Y=[y.yx]E RTxN,the first K-closest the schematic diagram of NLSSST presented in Fig.2, candidate point aroundy,is denoted by N(y),and includes an additional neighborhood constraint between 1_Iwy-W2☑ the weight w=一e where h is a pa- y;and N(y:). target template T trival template t 图图 (a)N Candidate regions (b)Over-complete dictionary Cr∈RxN C+∈RmxN C∈RmxN with sparity yNx()y\{y,N(y》 constraint min c (c)Sample (d)Dictionaries (e)Coefficient of representation Fig.2 Lu's NLSSST Algorithm 2 Euclidean local structure-based sparse known that the 12-norm is much more commonly used for measuring the distance between two vectors and is coding for tracking (ELSST) much easier to optimize than the I-norm.Thus,we To circumvent the heavy computation burden of take the former to measure the relationships between the 1-tracker and NISSST (Table 4),we propose an the sparse coefficient vectors,which are close to each efficient tracker,called ELSST,that considers the lo- other,i.e.,the Euclidean local-structure constraint, cal Euclidean structures of the candidates. and the latter I-norm of C to maintain the sparsity of 2.1 Original euclidean local structure constraint the optimization as follows: sparse coding Original ELSSC) It is evident from Eq.(4)that NLSSST attempts minc I Y-DC+a C+B I C-CWll to solve a double I-norm problem.However,it is well (5)
1.2 Non⁃local self⁃similarity based sparse coding for tracking (NLSSST) Recently, Xu indicated the trade⁃off between sparsity and stability in sparse regularized algo⁃ rithms [10] . Moreover, Yang pointed out the same A⁃op⁃ timization issue in pattern classification [12] . Based on the fact that lots of similar regions exist in all N candi⁃ dates generated by Bayesian inference, Lu proposed his tracker with the non⁃local self⁃similarity constraint as ∑ n i = 1 ci - ∑ K j = 1 wji cj 2 = ‖C - CW‖2 F (3) where ci and cj are the sparse coefficients corresponding to the candidate regions yi and yj, respectively, and wji is the weight assigned to cj . Given N m⁃dimensional candidates Y = y1 … yN [ ] ∈ R m×N , the first K⁃closest candidate point aroundyj is denoted by NK ( yj ), and the weight wji = 1 sj e - ‖NK (y j ) -NK (y i )‖2 h , where h is a pa⁃ rameter enforcing similarity, and sj is the normalization factor. The weight wji measures the similarity between the K⁃neighborhood of yj and yi . Lu’s algorithm actual⁃ ly attempts to solve the following: minC ‖Y - DC‖2 F + α ‖C‖1 + β ‖C - CW‖1 (4) Taking the solution of the l 1 ⁃tracker from Eq.(2) as the initial coefficients c0 , Eq. ( 4) can be solved through iterative computations [11] . However, the high computational cost of the original l 1 ⁃tracker and itera⁃ tive procedure for maintaining the neighborhood con⁃ straints of sparse coefficients make NLSSST inefficient in achieving real⁃timing tracking. In contrast to Fig. 1, the schematic diagram of NLSSST presented in Fig. 2, includes an additional neighborhood constraint between yi and NK(yi). Fig. 2 Lus NLSSST Algorithm 2 Euclidean local structure⁃based sparse coding for tracking (ELSST) To circumvent the heavy computation burden of the l 1 ⁃tracker and NLSSST (Table 4), we propose an efficient tracker, called ELSST, that considers the lo⁃ cal Euclidean structures of the candidates. 2.1 Original euclidean local structure constraint sparse coding (Original ELSSC) It is evident from Eq. (4) that NLSSST attempts to solve a double l 1 ⁃norm problem. However, it is well known that the l 2 ⁃norm is much more commonly used for measuring the distance between two vectors and is much easier to optimize than the l 1 ⁃norm. Thus, we take the former to measure the relationships between the sparse coefficient vectors, which are close to each other, i. e., the Euclidean local⁃structure constraint, and the latter l 1 ⁃norm of C to maintain the sparsity of the optimization as follows: minC ‖Y - DC‖2 F + α ‖C‖1 + β ‖C - CW‖2 F (5) ·138· 智 能 系 统 学 报 第 11 卷
1 WANG Hongyuan,et al:Efficient tracker based on sparse coding with Euclidean local structure-based constraint ·139. Table 1 Optimization for ELS constraint based SC(ELSSC) Input:Given N data points Y=[yY]ER,over-complete dictionary DR) Output:Sparse matrix C=[ce]R(x Parameters:Maxiumn iteration number J=10,neighborhood size K=5,all-zero vector co,a=0.01,B=0.5,y=0.001 1:For each point y,compute the nearest K neighborhoods N(y)and weights 2:Compute the SVD-decomposition of D UJVT,where VER()(2m) 3:compute ID'D‖,and set‖D'D|+2 B randomly 4:For i=1:N 5:For t=1:J 6:If lec)<,break inner iteration 7:0omie0=,6-,0=[Dy+290,-w+(y-28)c-DD,e-"],md0=VeR 8:Repesent”thparse colficient vector”,ie,optimize子l”-lke”tallo"l, 9:End 10:End Equation (5)is the objective function of our Eu- clidean local structure constraint-based SC and can be +7291a-ol-宁Ie-nl店 solved through iterative computation.In particular,at 1 lI3-6:D9〉+aIe0,-2pc9,-)+ the t-th iteration,for a single candidate y:in Y,Eq. (5)can be written as follows: 子"1-g-9e)+Y,81o+ 2 mineof(c()=min Iy:-De+ a ll e :+B lc)-0)(6) 〈De”,D〉-2ID,l3+BI-wI经= where.At the t-th iteration for the optimization of c,c,j is fixed.Therefore,we can 子1e"3-De)-29(c0,g-)- regard 1)as a constant.To solve Eq.(6),we intro- (y-2B)c.co+(Def Dco)+a llel+ duce the following surrogate function as presented in [11]: (兮%店+7291l+811》- 6o)=分1e-6l-71n0-l目 子Ic”-0+aI0,+R (7) (8) where is convex.According to Daubechies,when where AI-D'D is a strictly positive definite matrix,c,co) 1 =[D'y:+280+(y-28)c-D De-] 2 is strictly convex for any co with respect to c.Hence,in and our experiments,the constant A is set accordingly (A y-28;Table 1).Once the over-complete dictionary D is R=lxl+Y291o1-1l+ fixed,we can derive the following convex objective func- tion from Eq.(7): BIwI-子I” )lyDeare fixed at the t-th iteration.Thus,we can simplify
Table 1 Optimization for ELS constraint based SC(ELSSC) Input:Given N data points Y= [ y1… yN ]∈R m×N ,over⁃complete dictionary D∈R m×(n+2m) Output:Sparse matrix C= [c1… cN ]∈R (n+2m)×N Parameters: Maxiumn iteration number J = 10,neighborhood size K= 5,all⁃zero vector c0 ,α= 0.01,β = 0.5,γ = 0.001 1:For each point yi,compute the nearest K neighborhoods NK(yi) and weights wji 2:Compute the SVD⁃decomposition of D = UΣV T ,where V∈R (n+2m)×(n+2m) 3:compute ‖D TD‖, and set‖D TD‖+2β randomly 4:For i = 1:N 5: For t = 1:J 6: If ‖c (t) i - c (t-1) i ‖2 < τ , break inner iteration 7:Compute θ (t) i = ∑j wji c (t-1) j ,v (t) i = 1 γ [D T yi + 2βθi (t-1) + (γ - 2β)ci t-1 - D TDi c t-1 ] ,and x (t) i =Vv (t) i ∈R (n+2m)×1 8: Represent x (t) i with sparse coefficient vector c (t) i ,i.e.,optimize γ 2 ‖x (t) i -Vc (t) i ‖2 2 +α‖c (t) i ‖1 9: End 10:End Equation (5) is the objective function of our Eu⁃ clidean local structure constraint⁃based SC and can be solved through iterative computation. In particular, at the t⁃th iteration, for a single candidate yi in Y, Eq. (5) can be written as follows: minci (t) f(ci (t) ) = minci (t) ‖yi - Dci (t)‖2 2 + α ‖c (t) i ‖1 + β ‖c (t) i - θ (t-1) i ‖2 2 (6) where θ (t) i = ∑ jwji c (t-1) j . At the t⁃th iteration for the optimization of ci,cj,i ≠ j is fixed. Therefore, we can regard θ (t-1) i as a constant. To solve Eq. (6), we intro⁃ duce the following surrogate function as presented in [11]: ψ(ci,c0) = λ 2 ‖c (t) i - c0‖2 2 - 1 2 ‖Dc (t) i - Dc0‖2 2 (7) where λ is convex. According to Daubechies [13] , when λI - D TD is a strictly positive definite matrix, ψ(ci,c0) is strictly convex for any c0 with respect to ci . Hence, in our experiments, the constant λ is set accordingly ( λ = γ - 2β; Table 1). Once the over⁃complete dictionary D is fixed, we can derive the following convex objective func⁃ tion from Eq. (7): f(c (t) i ) = 1 2 ‖yi - Dc (t) i ‖2 2 + α ‖c (t) i ‖1 + β ‖c (t) i - θ (t-1) i ‖2 2 + γ - 2β 2 ‖c (t) i - c0‖2 2 - 1 2 ‖Dc (t) i - Dc0‖2 2 = 1 2 ‖yi‖2 2 - 〈yi,Dc (t) i 〉 + α ‖c (t) i ‖1 - 2β〈c (t) i ,θ (t-1) i 〉 + γ 2 ‖c (t) i ‖2 2 - (γ - 2β)〈c (t) i ,c0〉 + γ - 2β 2 ‖c0‖2 2 + 〈Dc (t) i ,Dc0 〉 - 1 2 ‖Dc0‖2 2 + β ‖θ (t-1) i ‖2 2 = γ 2 ‖c (t) i ‖2 2 - 〈yi,Dc (t) i 〉 - 2β〈c (t) i ,θ (t-1) i 〉 - (γ - 2β)〈c (t) i ,c0〉 + 〈Dc (t) i ,Dc0〉 + α ‖c (t) i ‖1 + ( 1 2 ‖yi‖2 2 + γ - 2β 2 ‖c0‖2 2 + β ‖θ (t-1) i ‖2 2 ) = γ 2 ‖c (t) i - v (t) i ‖2 2 + α ‖c (t) i ‖1 + R (8) where v (t) i = 1 γ D T yi + 2βθ (t-1) i + (γ - 2β)c t-1 i - D TDc t-1 [ ] and R = 1 2 ‖yi‖2 2 + γ - 2β 2 ‖c0‖ - 1 2 ‖Dc0‖2 2 + β ‖θ (t-1) i ‖2 2 - γ 2 ‖v (t) i ‖2 2 are fixed at the t⁃th iteration. Thus, we can simplify 第 1 期 WANG Hongyuan, et al: Efficient tracker based on sparse coding with Euclidean local structure⁃based constraint ·139·
·140· 智能系统学报 第11卷 Eg.(8)as follows: e)=子Ixw-reoI+alel )=+a (9) (12) To solve Eq.(9)using SVD,we decompose the over- where V"=[V'I'-I"]E R"x3,I'denotes the n-or- complete dictionary DR(asD=UV,where dered identity matrix,andx'is the first n rows of x;in U∈RmXm,∑∈Rmx(a+2m)andV∈Ra+2m)x(a+2). Eq.(10). Since V is an orthogonal matrix,Eq.(9)can be re- 2.3 Original and improved ELSSC-tracker written as fc")=子I9-e”+a1c”, Based on the above algorithm,our tracker can be obtained with the framework of the original I,-track- (10) er(Table 2).We need to iteratively solve the large- where x=Vo).Consequently,we can transform the scale l,-optimization problem in Eq.(10)twice,up to optimization problem with the Euclidean local structure three times for each candidate in the algorithm,and constraint in Eq.(6)to a pure l-optimization problem more than five times in NLSSST.The initial sparse in Eq.(10),i.e.,to represent the given signalxwith coefficients co are considered as all-zero vectors and it- sparse coefficientsc under the new dictionary V eratively solve the problem without any I,-optimization R(2m)x(+2m).The procedure of Euclidean local-struc- issues,as in Table 1 in [11].Nevertheless,we find ture constraint based sparse coding (ELSSC)is sum- that,in NLSSST,it is more effective and accurate to marized in Table 1 and is very different from the opti- initialize co as the solution of the I-optimization prob- mization procedure followed for NISSSC],even lem.Therefore,the computation complexity of our though the difference between their objective functions tracker is of the same order of magnitude as that of the seems very small (Eqs.(4)and (5),respectively).1-tracker and NLSSST.When we resize all n 10 tar- 2.2 Improved euclidean local structure constraint gets and N=200 candidate regions to 40 x 40,i.e., sparse coding (Improved ELSSC) m 1 600 Figs.1 and 2),then the over-complete If m in Eq.(10)is large,it is time-consuming to dictionary D is 1 600 x 3 210 and the orthogonal ma- obtain the optimization result c,as that in I-optimiza- trix V is 3 210 x 3 210 in Eq.(10).It is very difficult tion and NISSSC.Fortunately,in terms of SVD and to solve the corresponding I-optimization problem with the structure of D Figs.1 and 2),we have such a D in 1-tracker and NISSST)or V in our D UEV [UEV,I,-I]=[T,I,-I] ELSST). (11) With the improved ELSSC,is the first ten rows where I denotes the m-ordered identity matrix.'is the of and V'consists of the first ten rows and first ten first n rows of V'consists of the first n rows and the columns of V.Thus,each iteration of each candidate first n columns of V,and mn.As a result,when region in ELSST can be reduced from the large-scale constructing the dictionary V in Eq.(10),only the I-optimization problem to a much smaller one because first n rows and first n columns of V must be prepared,of the much smaller scale V E Rox10.To overcome whereas the remaining parts of V are not considered to the problem of occlusions in tracking,the analogous make any contribution to the target templates T.Thus,trivial templates are used to construct the new dictiona- the large scale optimization in Eq.(10)can be re-ryRx,i.e.,a ten-ordered identity matrix and duced to a much smaller one as follows: ten-ordered negative identity matrix
Eq. (8) as follows: f(c (t) i ) = γ 2 ‖c (t) i - v (t) i ‖2 2 + α ‖c (t) i ‖1 (9) To solve Eq. (9) using SVD, we decompose the over⁃ complete dictionary D∈R m×(n+2m) as D = UΣV T ,where U ∈ R m×m ,Σ ∈ R m×(n+2m) and V ∈ R (n+2m) ×(n+2m) . Since V is an orthogonal matrix, Eq. (9) can be re⁃ written as f(c (t) i ) = γ 2 ‖x (t) i - Vc (t) i ‖2 2 + α ‖c (t) i ‖1 (10) where x (t) i = Vv (t) i . Consequently, we can transform the optimization problem with the Euclidean local structure constraint in Eq. (6) to a pure l 1 ⁃optimization problem in Eq. (10), i.e., to represent the given signalx (t) i with sparse coefficientsc (t) i under the new dictionary V ∈ R (n+2m) ×(n+2m) . The procedure of Euclidean local⁃struc⁃ ture constraint based sparse coding (ELSSC) is sum⁃ marized in Table 1 and is very different from the opti⁃ mization procedure followed for NLSSSC [11] , even though the difference between their objective functions seems very small (Eqs. (4) and (5), respectively). 2.2 Improved euclidean local structure constraint sparse coding (Improved ELSSC) If m in Eq. (10) is large, it is time⁃consuming to obtain the optimization result ci, as that in l 1 ⁃optimiza⁃ tion and NLSSSC. Fortunately, in terms of SVD and the structure of D (Figs. 1 and 2), we have D = UΣV T = UΣV′ T [ ,I, - I] = [T,I, - I] (11) where I denotes the m⁃ordered identity matrix. Σ′ is the first n rows of Σ, V′ consists of the first n rows and the first n columns of V, and m≫n. As a result, when constructing the dictionary V in Eq. ( 10), only the first n rows and first n columns of V must be prepared, whereas the remaining parts of V are not considered to make any contribution to the target templates T. Thus, the large scale optimization in Eq. ( 10) can be re⁃ duced to a much smaller one as follows: f(c′ (t) i ) = γ 2 ‖x′ (t) i - V″c′ (t) i ‖2 2 + α ‖c′ (t) i ‖1 , (12) where V″ = [V′ I′ - I″] ∈ R n×3n ,I′ denotes the n⁃or⁃ dered identity matrix, and xi ′ is the first n rows of xi in Eq.(10). 2.3 Original and improved ELSSC⁃tracker Based on the above algorithm, our tracker can be obtained with the framework of the original l 1 ⁃track⁃ er [8⁃9] (Table 2). We need to iteratively solve the large⁃ scale l 1 ⁃optimization problem in Eq. (10) twice, up to three times for each candidate in the algorithm, and more than five times in NLSSST. The initial sparse coefficients c0 are considered as all⁃zero vectors and it⁃ eratively solve the problem without any l 1 ⁃optimization issues, as in Table 1 in [11]. Nevertheless, we find that, in NLSSST, it is more effective and accurate to initialize c0 as the solution of the l 1 ⁃optimization prob⁃ lem. Therefore, the computation complexity of our tracker is of the same order of magnitude as that of the l 1 ⁃tracker and NLSSST. When we resize all n = 10 tar⁃ gets and N = 200 candidate regions to 40 × 40, i.e., m = 1 600 ( Figs. 1 and 2), then the over⁃complete dictionary D is 1 600 × 3 210 and the orthogonal ma⁃ trix V is 3 210 × 3 210 in Eq. (10). It is very difficult to solve the corresponding l 1 ⁃optimization problem with such a D ( in l 1 ⁃tracker and NLSSST) or V ( in our ELSST). With the improved ELSSC, Σ′ is the first ten rows of Σ, and V′ consists of the first ten rows and first ten columns of V. Thus, each iteration of each candidate region in ELSST can be reduced from the large⁃scale l 1 ⁃optimization problem to a much smaller one because of the much smaller scale V′ ∈ R 10×10 . To overcome the problem of occlusions in tracking, the analogous trivial templates are used to construct the new dictiona⁃ ryV″ ∈ R 10×30 , i.e., a ten⁃ordered identity matrix and ten⁃ordered negative identity matrix. ·140· 智 能 系 统 学 报 第 11 卷