Study on Speaker Recognition Based on HHT 指導教授:謝傳璋 教授 王昭男 教授 學 生:吳明弦 日 期:98/12/10
Outline 一、abstract 二、Instantaneous frequency 三、EMD&IMF 四、speech signal pretreatment 五、Vector quantization 六、conclusion 七、reference
abstract 語音訊號屬於非線性非平穩,傳統的傅利業分析屬於線性,需要了解希爾伯特轉換(線性及非線性),可知道頻率含量隨時間的變化。 語者識別是一門很廣泛的學科,與心理學、訊號處理、資訊工程、語音學等息息相關,用於實現機器與人的溝通,提升識別身份的準確性。 語音訊號屬於非線性非平穩,傳統的傅利業分析屬於線性,需要了解希爾伯特轉換(線性及非線性),可知道頻率含量隨時間的變化。 另有提到經驗模態分解的概念,現實生活中,由於訊號為多頻率成份所組成,故將原始訊號分成有限個本質模態函數加一個趨勢訊號來表示原始訊號 希爾伯特轉換在語者識別上已有成功的應用例子,如語音訊號端點檢測、特徵提取,以便進行語者識別系統設計,達到想要的語者識別準確性,現今生活還應用在地震、軌道、財管等,貢獻良多。
Speaker Recognition Process pretreatment Feature extraction Speech signal feature Speaker database Comparison with Speaker database decision Yes or no
HHT Process no Trend Or constant Input data Shift process Intrinsic Mode Function (IMF) Empirical Mode Decomposition (EMD) Marginal spectrum Hilbert spectrum Hilbert transform
Fourier analysis x=0.5*sin(2*pi*15*t)+2*sin(2*pi*40*t)
Analytic signal
Hilbert transform
Instantaneous frequency 1.mean value=0 dt=1/400
Instantaneous frequency 2.mean value<1
Instantaneous frequency 3.mean value>1
EMD x(t) shift process: Use characteristic time scales vibrate mode definition,time difference of between max and min value analyze local property。 x(t) shift process: 1.Find x(t) all local max、min value,use cubic spline hold all local max、min point link up、low envelopment。 2.Find mean of up、low envelopment again that get mean envelopment m1(t) 。 3.h1(t)= x(t)-m1(t) get first component,first shift finish,if no,keep shift second until are IMF conditions 。
Shift process 1.x(t)
Shift process 2.m1(t) h1(t)
IMF shift process: 1.remove carrier wave(one mode vibrate) 2.waveform symmetry (avoid vibrate of no smooth) IMF property : shift process get decompose component 1. Number of local max and min value = function number ofzero crossing point,otherwise difference 1。 2. Mean value of local max and min value = 0。
Hilbert Spectrum
Produce of speech signal Voice (period impulse) Speech signal Vocal tract Unvoice (not period)
End-point Detection throrem 1.energy e(i)= Energy of voice more than unvoice, but unvoice may have large background noise ,may see very large energy
End-point Detection throrem 2.zero crossing rate ZCR(i)= voice→zero crossing rate small unvoice→ zero crossing rate large Frame enery> ,frame index 1 , A frame of after 1 > ,after A frame may start of speech index 1,back see inside B frame < start of speech is sure index 0
End-point Detection way 1.frequency change dt=0.1
End-point Detection way
End-point Detection way
End-point Detection way 2.phase change dt=0.1
End-point Detection way
End-point Detection way
Pre-emphasis& remove slience Signal amplitude <1/10 of Max amplitude → slience
Before pre-emphasis and after pre-emphasis
feature extraction Speaker 1 Speech Signal hello
Instantaneous frequency
Instantaneous frequency
Hilbert Spectrum
Speaker2 Speech Signal
Instantaneous frequency
Instantaneous frequency
Hilbert Spectrum
Speaker 1 Speech Signal
Instantaneous frequency
Instantaneous frequency
Hilbert Spectrum
Pulse code modulation 1.uniform quantization 出處 王小川 語音訊號處理
Scalar quantization 2.non-uniform quantization 出處 王小川 語音訊號處理
Vector quantization Mean quantization error smallest Condition: (1)nearest neighbor selection rule (2)quantization value
Produce of Vector quantization codebook centroid splitting algorithm 1.initally All train data calculate a centroid →initally codebook 2.splitting n stage splitting 2^n centroid,input data compare all centroid distance smallest →know input data in A region, calculate centroid again,reach codebook size
conclusion 簡單介紹經驗模態分解、本質模態函數、希爾伯特頻譜、語音識別的概念,語音預處理等,目前語者識別的特徵提取方法以希爾伯特轉換為基礎,適用於非線性非平穩的語音訊號,根據所提取的特徵,可知語者何時說話,另外利用向量量化所建的語音資料庫編碼本來進行距離比較,得知是哪個語者說話,由此可知瞬時頻率的重要性
reference 1. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis By Norden E. Huang1, Zheng Shen2, Steven R. Long3,Manli C.Wu4, Hsing H. Shih5, Quanan Zheng6, Nai-Chyuan Yen7,Chi Chao Tung8 and Henry H. Liu9 2. 方建、基於HHT語音識別技術研究,哈爾濱工程大學通信與信息系統研究所碩士論文,2006 3.許豔紅、HHT變換在說話人識別中的應用,浙江大學電子信息及技術研究所碩士論文,2005 4.王小川、語音訊號處理,2007
next step 1.Speaker Recognition system design 2.Find speaker database
Thank you