語音辨識 張智星 jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang 多媒體資訊檢索實驗室 清華大學 資訊工程系.

Slides:



Advertisements
Similar presentations
陳浩然 國立台灣師範大學英語系 網路與英語學習: 口語,閱讀,及寫作 陳浩然 國立台灣師範大學英語系
Advertisements

2014年11月12日: 日程 中国学生的采访 Model 考试 复习:怎么提高文章水平? 大学面试:六个问题.
Chapter 17 數位革命與全球電子市場 Global Marketing Warren J. Keegan Mark C. Green.
华东师范大学软件学院 王科强 (第一作者), 王晓玲
XI. Hilbert Huang Transform (HHT)
Operating System CPU Scheduing - 3 Monday, August 11, 2008.
A TIME-FREQUENCY ADAPTIVE SIGNAL MODEL-BASED APPROACH FOR PARAMETRIC ECG COMPRESSION 14th European Signal Processing Conference (EUSIPCO 2006), Florence,
深層學習 暑期訓練 (2017).
3-3 Modeling with Systems of DEs
Feng Lin, Chen Song, Yan Zhuang, Wenyao Xu, Changzhi Li, Kui Ren
Applications of Digital Signal Processing
Platypus — Indoor Localization and Identification through Sensing Electric Potential Changes in Human Bodies.
報告人:丁英智 資策會 網路多媒體研究所 11/3/2006
考试与考生 --不对等与对等 邹申 上海外国语大学
軟體原型 (Software Prototyping)
動態時間校正 (Dynamic Time Warping)
Acoustic规范和测试 Base Band 瞿雪丽 2002/1/30.
關鍵詞辨認 (Keyword Spotting)
On Some Fuzzy Optimization Problems
梅爾倒頻譜係數 (Mel-frequency cepstral coefficients)
Ch2 Infinite-horizon and Overlapping- generations Models (无限期与跨期模型)
視訊串流\Streaming Video Part-1 Multimedia on Computer Digital
隐马尔可夫模型 Hidden Markov model
第 17 章 數位革命與 全球電子市場 © 2005 Prentice Hall.
Sampling Theory and Some Important Sampling Distributions
32位元處理器之定點數MFCC演算法的改進與探討 Improvement and Discussion of MFCC Algorithm on 32-bit Fixed-point Processors 學生:陳奕宏 指導教授:張智星.
信号与图像处理基础 An Introduction to Signal and Image Processing 中国科学技术大学 自动化系
1 Introduction Prof. Lin-Shan Lee TA: Chun-Hsuan Wang.
1 Introduction Prof. Lin-Shan Lee.
II. Short-time Fourier Transform
VI. Brief Introduction for Acoustics
Network Application Laboratory
TTS (文字轉語音) Roger Jang (張智星)
学习报告 —语音转换(voice conversion)
Source: IEEE Transactions on Image Processing, Vol. 25, pp ,
关键技术 数据库构建 文本处理 声学建模 最优搜索 波形处理.
A Study on the Next Generation Automatic Speech Recognition -- Phase 2
The First Course in Speech Lab
1 Introduction Prof. Lin-Shan Lee.
Artificial Intelligence - 人工智慧導論
语音技术的应用及挑战 APPLICATIONS & CHALLENGES OF SPEECH TECHNOLOGIES
IBM SWG Overall Introduction
DIY Stroke Recognizer.
XIV. Orthogonal Transform and Multiplexing
基于人眼追踪的手机解锁系统 报告人:李映辉 指导老师:王继良
Maintaining Frequent Itemsets over High-Speed Data Streams
虚 拟 仪 器 virtual instrument
張智星 清大資工系 多媒體檢索實驗室 Tree Net Construction 張智星 清大資工系.
NSC D 蔣依吾 中山大學資訊工程系 紅外線點目標的檢知法則 Automatic detection of small targets in infrared image sequences containing evolving cloud clutter NSC D
指導教授:陳柏琳博士 研究生:許庭瑋 陳冠宇 中華民國 九十六 年 七 月 十三 日
田口方法應用於語音辨識 報告者:李建德.
Distance Vector vs Link State
李宏毅專題 Track A, B, C 的時間、地點開學前通知
國立台灣師範大學資訊工程研究所 語音實驗室研究簡介
張智星 多媒體資訊檢索實驗室 台灣大學 資訊工程系
作品名稱:手語翻譯君 指導老師:姚智原 開發團隊:陳建樺、程俊瑋 國立臺灣科技大學資訊工程系 2015/12/11
An Quick Introduction to R and its Application for Bioinformatics
More About Auto-encoder
參考資料: 林秋燕 曾元顯 卜小蝶,Chap. 1、3 Chowdhury,Chap.9
Distance Vector vs Link State Routing Protocols
語音訊號的特徵向量 張智星 多媒體資訊檢索實驗室 清華大學 資訊工程系.
第三章 音樂檢索技術 1) 內涵式音樂資訊檢索(content-based music information retrieval)
II. Short-time Fourier Transform
鳥聲辨識之初步研究與分析 Initial Studies and Analysis of Birdsong Recognition
Gyrophone: Recognizing Speech From Gyroscope Signals
WiFi is a powerful sensing medium
Gaussian Process Ruohua Shi Meeting
Google Voice Search: Faster and More Accurate
適用於數位典藏多媒體內容之 複合式多媒體檢索技術
Hybrid fractal zerotree wavelet image coding
Presentation transcript:

語音辨識 張智星 jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang 多媒體資訊檢索實驗室 清華大學 資訊工程系

語音處理(Speech Processing) Speech Recognition(語音辨識) -- converting speech into text, based on the input speech and on prior acoustic and textual analyses Speaker Recognition(語者辨識) --verifying a person’s identity or associating a person with a voice Speech Coding(語音編碼) -- digital coding (compression) of speech for efficient, secure storage and transmission Speech Synthesis(語音合成) -- automatic generation of a speech signal starting from a normal textual input Speech Enhancement(語音強化) -- a way that a speech signal, subject to certain degradations, can be processed to increase its intelligibility and/or its quality

「語音辨識」簡介 目標: 特性: 以聲音來進行特定範圍之詞彙辨識 技術門檻較高,需熟悉數位訊號處理、聲學模型、比對方法、語言模型等。 語料蒐集需花大量人力

「語音辨識」技術困難點與考量 Is the system required to recognize a specific individual or multiple speakers? What is the size of the vocabulary? Is the speech to be entered in discrete units with distinct pauses among them, or as a continuous utterance? What is the extent of ambiguity and acoustic confusability in the vocabulary? Is the system to be operated in a quiet or noisy environment, and what is the nature of the environmental noise if it exists? What are the linguistic constraints placed upon the speech, and what linguistic knowledge is built into the recognizer?

「語音辨識」應用面 應用面 語音點歌 自動語音電話總機 以語音為介面的全文檢索系統 歌詞檢索系統 其他任何可用語音為介面之應用 例如:「哈莉波特」的全文檢索與語音定位 歌詞檢索系統 例如:「潮起又潮落」 其他任何可用語音為介面之應用

「語音辨識」的分類 Vocabulary Size Small Vocabulary --- below 100 Words Medium Vocabulary --- from 100 to 1000 Words Large Vocabulary --- more than 1000 Words Speaker Dependence Speaker-Dependent Speaker-Independent

「語音辨識」的分類 Speaking Style Isolated Words Connected Words Continuous Speech Environment Clean Speech Noisy Speech Channel Distorted Microphone Mismatched

「語音辨識」的方向 1980s and 1990s Methodology Hidden Markov Models Neural Networks The Trends Large Vocabulary Continuous Speech recognition Robust Speech Recognition Real-Time Speaker Adaptive Speech Recognition Language Modeling

「語音辨識」流程 錄音、特徵抽取 比對 顯示比對結果 Dynamic Time Warping(動態時間伸縮) Hidden Markov Model(隱藏式馬可夫模型) 顯示比對結果

「語音辨識」示意圖 Isolated Word Problem Concept 詞彙 紐約 台北 台中 : 洛杉機

「語音辨識」之特徵抽取 MFCC: Mel-frequency Cepstral Coefficients 取音框 預強調 漢明視窗 (Frame blocking) 預強調 (Pre-emphasis) 漢明視窗 (Hamming window) 語音訊號 Discrete Cosine Transform 梅爾對數頻譜 三角濾波器 Differentiator Log Energy 13-D 39-D feature vector

Dynamic Time Warping Characteristics: Applications Pattern-matching-based approach Require less computation Difficult to achieve speaker independency Suitable for small to medium vocabulary Suitable for microprocessor/chip implementation Applications 手機、車用電話、玩具、錄音筆

Dynamic Time Warping (DTW) j t: input MFCC matrix r: reference MFCC matrix DTW recurrence: r(j) r(j-1) t(i-1) t(i) i

DTW Paths of “Match Ends” We assume the speed of a user’s acoustic input falls within 1/2 and 2 times of that of the intended sentence. Both ends are fixed. (End point detection is critical.) Suitable for voice command applications j i

DTW Paths of “Match Anywhere” Both ends are free to move. Suitable for personal voice retrieval applications, such as 錄音筆、個人語音文件 j i

Example DTW Path of “Match Ends”

DTW Demos Match-ends (asr/demoDTW.m) Match-anywhere (asr/demoVIR.m)

Hidden Markov Model Characteristics: Applications Statistics-based approach Require more computation Can achieve speaker independency Suitable for large vocabulary Difficult for microprocessor/chip implementation Applications 語音全文檢索、聽寫機

Example of HMM An example: 欲辨識“紐約”這個詞 1. 斷詞轉長庚拼音 2. 找出對應syllable的model niou-Ye 紐約 0 (*.syl) 2. 找出對應syllable的model niou sil+n n+i i+o o+u u+sil Ye sil+Y Y+e e+sil 3. 由macros讀入state資訊 Sil+*: 3 states,其餘: 5 states syllable model niou Ye

你 好 n+i i+h h+a a+u u+sil

Viterbi Search in HMM Dynamic programming via table filling (i, j) hmmTable(i, j) = hmmTable(i-1, j) + transitionProb(j, j) hmmTable(i-1, j-1) + transitionProb(j-1, j) max + stateProb(i, j)

EM in HMM Acoustic parameters for each state are identified via Baum-Welch algorithms, which is a variant of EM (Expectation Maximization). In order to identify a set of suitable parameters, we need to have a balance corpus of recordings from various people.

Speedup Mechanism in HMM Search strategies: Beam search in Viterbi search Tree lexicon instead of linear lexicon Implementation Fix-point instead of floating-point operations Many other tricks…

何謂Linear Net 陳惠操 陳建智 陳雅姿 陳雅秀 陳雅玲 蔡茂豐 孫愛玲 CrN-huei-cau CrN-jieN-Jy CrN-ia-jy CrN-ia-siou CrN-ia-liG cai-mau-fG suN-ai-liG 注拼音 Sil Sil

何謂Tree Net 陳惠操 陳建智 陳雅姿 陳雅秀 陳雅玲 蔡茂豐 孫愛玲 CrN-huei-cau jieN-Jy ia-jy siou liG cai-mau-fG suN-ai-liG 注拼音 Sil Sil

如何從Linear轉換到Tree 陳惠操 蔡茂豐 陳雅秀 陳雅玲 孫愛玲 陳建智 陳雅姿 陳惠操 陳建智 陳雅姿 陳雅秀 陳雅玲 蔡茂豐 依欄位順序排列 去除重複

Tree Net Structure 陳 惠 操 建 智 雅 姿 秀 玲 蔡 茂 豐 孫 愛 玲 Sil !N Sil Sil !N Sil !N=!NULL

標注音所遇到的問題 破音字 查已標完注音之詞庫(約九萬詞) 可能發生之問題 我們三人參加會議 朝辭白帝彩雲間 朝如青絲暮成雪

Robust Speech Recognition 語音特徵參數抽取方塊圖 倒頻譜平均值消去法(Cepstral Mean Subtraction) 訊號偏移消去法(Signal Bias Removal) 統計式對應法(Stochastic Matching) 頻譜消去法(Spectra Subtraction) 雜訊遮蔽法(Noise Masking) 時間濾波器(即差量濾波器) 模糊特徵法(Missing feature) 求取特徵參數的濾波器形狀改良

HMM Demos 語音全文檢索系統 人名系統:約60句 台北市街道:約900條路 唐詩三百首:約3200句 紅樓夢:約11萬句 六法全書:約30萬句