Presentation is loading. Please wait.

Presentation is loading. Please wait.

語音辨識 張智星 jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang 多媒體資訊檢索實驗室 清華大學 資訊工程系.

Similar presentations


Presentation on theme: "語音辨識 張智星 jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang 多媒體資訊檢索實驗室 清華大學 資訊工程系."— Presentation transcript:

1 語音辨識 張智星 jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang
多媒體資訊檢索實驗室 清華大學 資訊工程系

2 語音處理(Speech Processing)
Speech Recognition(語音辨識) -- converting speech into text, based on the input speech and on prior acoustic and textual analyses Speaker Recognition(語者辨識) --verifying a person’s identity or associating a person with a voice Speech Coding(語音編碼) -- digital coding (compression) of speech for efficient, secure storage and transmission Speech Synthesis(語音合成) -- automatic generation of a speech signal starting from a normal textual input Speech Enhancement(語音強化) -- a way that a speech signal, subject to certain degradations, can be processed to increase its intelligibility and/or its quality

3 「語音辨識」簡介 目標: 特性: 以聲音來進行特定範圍之詞彙辨識 技術門檻較高,需熟悉數位訊號處理、聲學模型、比對方法、語言模型等。
語料蒐集需花大量人力

4 「語音辨識」技術困難點與考量 Is the system required to recognize a specific individual or multiple speakers? What is the size of the vocabulary? Is the speech to be entered in discrete units with distinct pauses among them, or as a continuous utterance? What is the extent of ambiguity and acoustic confusability in the vocabulary? Is the system to be operated in a quiet or noisy environment, and what is the nature of the environmental noise if it exists? What are the linguistic constraints placed upon the speech, and what linguistic knowledge is built into the recognizer?

5 「語音辨識」應用面 應用面 語音點歌 自動語音電話總機 以語音為介面的全文檢索系統 歌詞檢索系統 其他任何可用語音為介面之應用
例如:「哈莉波特」的全文檢索與語音定位 歌詞檢索系統 例如:「潮起又潮落」 其他任何可用語音為介面之應用

6 「語音辨識」的分類 Vocabulary Size Small Vocabulary --- below 100 Words
Medium Vocabulary --- from 100 to 1000 Words Large Vocabulary --- more than 1000 Words Speaker Dependence Speaker-Dependent Speaker-Independent

7 「語音辨識」的分類 Speaking Style Isolated Words Connected Words
Continuous Speech Environment Clean Speech Noisy Speech Channel Distorted Microphone Mismatched

8 「語音辨識」的方向 1980s and 1990s Methodology Hidden Markov Models
Neural Networks The Trends Large Vocabulary Continuous Speech recognition Robust Speech Recognition Real-Time Speaker Adaptive Speech Recognition Language Modeling

9 「語音辨識」流程 錄音、特徵抽取 比對 顯示比對結果 Dynamic Time Warping(動態時間伸縮)
Hidden Markov Model(隱藏式馬可夫模型) 顯示比對結果

10 「語音辨識」示意圖 Isolated Word Problem Concept 詞彙 紐約 台北 台中 : 洛杉機

11 「語音辨識」之特徵抽取 MFCC: Mel-frequency Cepstral Coefficients 取音框 預強調 漢明視窗
(Frame blocking) 預強調 (Pre-emphasis) 漢明視窗 (Hamming window) 語音訊號 Discrete Cosine Transform 梅爾對數頻譜 三角濾波器 Differentiator Log Energy 13-D 39-D feature vector

12 Dynamic Time Warping Characteristics: Applications
Pattern-matching-based approach Require less computation Difficult to achieve speaker independency Suitable for small to medium vocabulary Suitable for microprocessor/chip implementation Applications 手機、車用電話、玩具、錄音筆

13 Dynamic Time Warping (DTW)
j t: input MFCC matrix r: reference MFCC matrix DTW recurrence: r(j) r(j-1) t(i-1) t(i) i

14 DTW Paths of “Match Ends”
We assume the speed of a user’s acoustic input falls within 1/2 and 2 times of that of the intended sentence. Both ends are fixed. (End point detection is critical.) Suitable for voice command applications j i

15 DTW Paths of “Match Anywhere”
Both ends are free to move. Suitable for personal voice retrieval applications, such as 錄音筆、個人語音文件 j i

16 Example DTW Path of “Match Ends”

17 DTW Demos Match-ends (asr/demoDTW.m) Match-anywhere (asr/demoVIR.m)

18 Hidden Markov Model Characteristics: Applications
Statistics-based approach Require more computation Can achieve speaker independency Suitable for large vocabulary Difficult for microprocessor/chip implementation Applications 語音全文檢索、聽寫機

19 Example of HMM An example: 欲辨識“紐約”這個詞 1. 斷詞轉長庚拼音 2. 找出對應syllable的model
niou-Ye 紐約 0 (*.syl) 2. 找出對應syllable的model niou sil+n n+i i+o o+u u+sil Ye sil+Y Y+e e+sil 3. 由macros讀入state資訊 Sil+*: 3 states,其餘: 5 states syllable model niou Ye

20 n+i i+h h+a a+u u+sil

21 Viterbi Search in HMM Dynamic programming via table filling
(i, j) hmmTable(i, j) = hmmTable(i-1, j) + transitionProb(j, j) hmmTable(i-1, j-1) + transitionProb(j-1, j) max + stateProb(i, j)

22 EM in HMM Acoustic parameters for each state are identified via Baum-Welch algorithms, which is a variant of EM (Expectation Maximization). In order to identify a set of suitable parameters, we need to have a balance corpus of recordings from various people.

23 Speedup Mechanism in HMM
Search strategies: Beam search in Viterbi search Tree lexicon instead of linear lexicon Implementation Fix-point instead of floating-point operations Many other tricks…

24 何謂Linear Net 陳惠操 陳建智 陳雅姿 陳雅秀 陳雅玲 蔡茂豐 孫愛玲 CrN-huei-cau CrN-jieN-Jy
CrN-ia-jy CrN-ia-siou CrN-ia-liG cai-mau-fG suN-ai-liG 注拼音 Sil Sil

25 何謂Tree Net 陳惠操 陳建智 陳雅姿 陳雅秀 陳雅玲 蔡茂豐 孫愛玲 CrN-huei-cau jieN-Jy ia-jy siou
liG cai-mau-fG suN-ai-liG 注拼音 Sil Sil

26 如何從Linear轉換到Tree 陳惠操 蔡茂豐 陳雅秀 陳雅玲 孫愛玲 陳建智 陳雅姿 陳惠操 陳建智 陳雅姿 陳雅秀 陳雅玲 蔡茂豐
依欄位順序排列 去除重複

27 Tree Net Structure 陳 惠 操 建 智 雅 姿 蔡 茂 豐 孫 愛 玲 Sil !N Sil Sil !N Sil !N=!NULL

28 標注音所遇到的問題 破音字 查已標完注音之詞庫(約九萬詞) 可能發生之問題 我們三人參加會議 朝辭白帝彩雲間 朝如青絲暮成雪

29 Robust Speech Recognition
語音特徵參數抽取方塊圖 倒頻譜平均值消去法(Cepstral Mean Subtraction) 訊號偏移消去法(Signal Bias Removal) 統計式對應法(Stochastic Matching) 頻譜消去法(Spectra Subtraction) 雜訊遮蔽法(Noise Masking) 時間濾波器(即差量濾波器) 模糊特徵法(Missing feature) 求取特徵參數的濾波器形狀改良

30 HMM Demos 語音全文檢索系統 人名系統:約60句 台北市街道:約900條路 唐詩三百首:約3200句 紅樓夢:約11萬句
六法全書:約30萬句


Download ppt "語音辨識 張智星 jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang 多媒體資訊檢索實驗室 清華大學 資訊工程系."

Similar presentations


Ads by Google