Introduction: Feature extraction Converting the raw speech signal into a sequence of acoustic feature vectors carrying characteristic information about the signal For each frame Frame Signal Window Frame FFT (2 log(p/(2) P(3) p/(3) Mel Filterbank kLPr(26) log(p/(26) Logo DC CTO
Introduction: Feature Extraction • Converting the raw speech signal into a sequence of acoustic feature vectors carrying characteristic information about the signal
Introduction: Pattern matching Main approaches in pattern matching for speaker recognition main Template matching Vector quantization [F. Soong, 1985 Gaussian Mixture Model [A. Reynolds, 2003] Probabilistic Joint factor analysis([P. Kenny, 2006] Main approach model vector[N Dehak, 2011] d-vector/Variani, 2014] Artificial Neura Network End-to-end[G heigold, 2016]
Introduction: Pattern matching • Main approaches in pattern matching for speaker recognition main Main approach Template matching Vector quantization [F. Soong, 1985] Probabilistic model Gaussian Mixture Model [A. Reynolds, 2003] Joint factor analysis [P. Kenny, 2006] i-vector [N. Dehak, 2011 ] Artificial Neural Network d-vector[Variani, 2014 ] End-to-end[G. Heigold, 2016]
The i-vector methodology of speaker recognition Over recent years, ivector has demonstrated state-of the-art performance for speaker recognition Cosine GMM-UBM JFA 1-vector framework Plda
The i-vector methodology of speaker recognition • Over recent years, ivector has demonstrated state-ofthe-art performance for speaker recognition. GMM-UBM framework JFA i-vector Lda Cosine Plda
Joint factor analysis A supervector for a speaker should be decomposable into speaker independent, speaker dependent, channel dependent, and residual components Each component is represented by low-dimensional factors, which operate along the principal dimensions of the corresponding component Speaker dependent component, known as the eigenvoice, and the corresponding factors Eigenvoice matrix Vi V2 Each speaker factor controls an eigendimension of the eigenvoice matrix Low dimensional eigenvoice factors
Joint factor analysis • A supervector for a speaker should be decomposable into speaker independent, speaker dependent, channel dependent, and residual components • Each component is represented by low-dimensional factors, which operate along the principal dimensions of the corresponding component • Speaker dependent component, known as the eigenvoice, and the corresponding factors
GMM supervector u for a speaker can be decomposed as Speaker-dependent component Speaker-dependent residual component u=m+ Vy+Ux+Dz Speaker supervector Speaker-independent Channel-dependent nere component component m is a speaker-independent supervector from UBM V is the eigenvoice matrix y no, )is the speaker factor vector U is the eigenchannel matrix X- N0, ) is the channel factor vector D is the residual matrix and is diagonal zNO, D) is the speaker-specific residual factor vector
• GMM supervector u for a speaker can be decomposed as Where: m is a speaker-independent supervector from UBM V is the eigenvoice matrix y ∼ N(0, I) is the speaker factor vector U is the eigenchannel matrix x ∼ N(0, I) is the channel factor vector D is the residual matrix, and is diagonal z ∼ N(0, I) is the speaker-specific residual factor vector