Training procedure We train the jfa matricies in the following order kenny et al 2007a] 1. Train the eigenvoice matrix V, assuming that u and d are zero 2. Train the eigenchannel matrix u given the estimate of V, assuming that D is zero 3. Train the residual matrix d given the estimates of v and u Using these matrices, we compute y for speaker, x for channel and z for residual factors We compute the final score by using these matrices and factors
Training procedure • We train the JFA matricies in the following order [Kenny et al., 2007a] • 1. Train the eigenvoice matrix V, assuming that U and D are zero • 2. Train the eigenchannel matrix U given the estimate of V, assuming that D is zero • 3. Train the residual matrix D given the estimates of V and U • Using these matrices, we compute y for speaker, x for channel, and z for residual factors • We compute the final score by using these matrices and factors
Total variability Subspaces u and v are not completely independent A combined total variability space was used Dehak et al, 201 Speaker factors factors u=m+Vy+Ux+D Speaker Supervector UBM Channel factors u=m+ Tw Speake vector(i-vector) supervene UBM Total variability matrix
Total variability • Subspaces U and V are not completely independent • A combined total variability space was used [Dehak et al., 2011]
-vector An i-vector system uses a set of low-dimensional total variability factors(w) to represent each conversation side Each factor controls an eigen-dimension of the total variability matrix ) and are known as the i-vectors Unlike ] or other Fa methods, the i- vector approach does not make a distinction between speaker and channel define a total variability space, contains speaker and channel variablities simultaneously
i-vector • An i-vector system uses a set of low-dimensional total variability factors (w) to represent each conversation side. Each factor controls an eigen-dimension of the total variability matrix (T), and are known as the i-vectors. • Unlike JFA or other FA methods, the i-vector approach does not make a distinction between speaker and channel • define a total variability space, contains speaker and channel variabilities simultaneously
Training total variability space Rank of T is set prior to training T and w are latent variables EM algorithm is used Random initialization for t Training total variability matrix t is similar to training v except that training T is performed by using all utterances from a given speaker but as produced by different speakers UBM diagonal covariance matrix 2 MDX MD)is introduced to model the residual variability not captured by t
Training total variability space • Rank of T is set prior to training • T and w are latent variables • EM algorithm is used • Random initialization for T • Training total variability matrix T is similar to training V except that training T is performed by using all utterances from a given speaker but as produced by different speakers • UBM diagonal covariance matrix Σ (MD×MD) is introduced to model the residual variability not captured by T
vector extraction oth order statistics Nc(u)=Etc(o,) of an utterance u order statistics Fc())=∑:(o-)t 2nd order statistics Sc(u)=diag(z e(o, )o, o[ )where Tcplotm nc(ot =p(c ot, ubm) C,C ∑H1pm,E) Centralized 1tn and 2nd order statistics =∑(o-m) (u)=diag E nc(o )( or-m )ot-m) where mc is the subvector corresponding to mixture component c
i-vector extraction