數位語音處理概論 第一講 Introduction to Digital Speech Processing

Slides:



Advertisements
Similar presentations
Is It Necessary to Develop Grammar Learning Materials? New York University Dela Jiao.
Advertisements

EBSCO Publishing 施宏明
陳浩然 國立台灣師範大學英語系 網路與英語學習: 口語,閱讀,及寫作 陳浩然 國立台灣師範大學英語系
中大系所英語自學小組 負責老師:陳若盈 自學助理:陳瑩珊 2009/3/17.
Business English Reading
Teaching the Chinese Copula 是 for CSL Purposes
Homework 2 : VSM and Summary
深層學習 暑期訓練 (2017).
MovieBot: Booking Tickets Easily
【本著作除另有註明外,採取創用CC「姓名標示-非商業性-相同方式分享」臺灣3.0版授權釋出】
Applications of Digital Signal Processing
HKLA FORUM 2006.
西洋哲學史 西洋哲學的創始:古希臘哲學 (一)
軟體原型 (Software Prototyping)
Unit title: 买东西 - Shopping
計算方法設計與分析 Design and Analysis of Algorithms 唐傳義
Chinese IAB (IA +IB) 11 Weather and Internet Module (L21-L22)
第 17 章 數位革命與 全球電子市場 © 2005 Prentice Hall.
Logistics 物流 昭安國際物流園區 總經理 曾玉勤.
Special Topics in Social Media Services 社會媒體服務專題
1 Introduction Prof. Lin-Shan Lee.
文字探勘與知識工程 Text Mining & Knowledge Engineering
China Standardization activities of ITS
An Introduction to Computer Science (計算機概論)
Data Mining 資料探勘 Introduction to Data Mining Min-Yuh Day 戴敏育
This Is English 3 双向视频文稿.
Hong Kong Library Education and Career Forum 2009
Formal Pivot to both Language and Intelligence in Science
第十單元 Comment compter en français ?
數學與文化:以數學小說閱讀為進路 洪萬生 台灣師範大學數學系退休教授
第二講:初步認識釋迦摩尼佛的生平與教學 授課教師:國立臺灣大學哲學系 蔡耀明 教授
数据库内容及检索功能 – 如何利用这些资源帮助科技论文的写作与发表 钟似璇 (Sixuan Zhong s.
Survey of Selected Western Classics Unit 6: 聖經中的詩- Psalms
A Study on the Next Generation Automatic Speech Recognition -- Phase 2
Chapter 3 Nationality Objectives:
第二讲 计算机信息检索概述 主要内容: 一 信息检索的基本概念 二 电子资源的概念与类型 三 计算机信息检索系统 四 计算机检索技术.
1 Introduction Prof. Lin-Shan Lee.
Chinese IAB (IA +IB) 11 Weather and Internet Module (L21-L22)
歐盟法與生命文化 (二) 第八單元 Protestantism Spirit of Reformation
语音技术的应用及挑战 APPLICATIONS & CHALLENGES OF SPEECH TECHNOLOGIES
戴运财 浙江农林大学 1.
数据摘要现状调研报告 上下文摘要初步思考 徐丹云.
資料結構 Data Structures Fall 2006, 95學年第一學期 Instructor : 陳宗正.
Operations Management Unit 3: Project Management (2)
Sensor Networks: Applications and Services
Unit 5 Reading A Couch Potato.
第一講:課程介紹 授課教師:國立臺灣大學哲學系 蔡耀明 教授
授課教師:國立臺灣大學 政治學系 王業立 教授
臺灣現代主義小說 Reading Taiwan's Modernism Fiction 第一講:課程簡介
Introduction to Operating Systems 作業系統 (上)
法學入門 第 1 單元:法學入門 【本著作除另有註明外,採取創用CC「姓名標示-非商業性-相同方式分享」台灣3.0版授權釋出】
【本著作除另有註明外,採取創用CC「姓名標示-非商業性-相同方式分享」臺灣3.0版授權釋出】
虚 拟 仪 器 virtual instrument
中央社新聞— <LTTC:台灣學生英語聽說提升 讀寫相對下降>
Cisco Troubleshooting and Maintaining Cisco IP Networks (TSHOOT)
Array I 授課教師 Wanjiun Liao
李宏毅專題 Track A, B, C 的時間、地點開學前通知
Operations Management Unit 4: Developing a Process Strategy
ACM Digital Library 進階利用與實作 郭珮琪主講
Operations Management Unit 5: Analyzing Processes (1)
第二單元(2):Case Study- Li & Fung
More About Auto-encoder
數位家庭中的人機介面研究.
【本著作除另有註明外,採取創用CC「姓名標示-非商業性-相同方式分享」臺灣3.0版授權釋出】
Infrastructure as Learning Environment 学习环境的基础结构
面向知识服务助力教学科研 同方知网(北京)技术有限公司甘肃分公司 2017年4月.
Homework 2 : VSM and Summary
Gaussian Process Ruohua Shi Meeting
適用於數位典藏多媒體內容之 複合式多媒體檢索技術
Presentation transcript:

數位語音處理概論 第一講 Introduction to Digital Speech Processing 2018/9/13 Digital Speech Processing 數位語音處理概論 第一講 Introduction to Digital Speech Processing 授課教師:國立臺灣大學 電機工程學系 李琳山 教授 【本著作除另有註明外,採取創用CC「姓名標示-非商業性-相同方式分享」台灣3.0版授權釋出】

Processing Algorithms Speech Signal Processing 2018/9/13 Major Application Areas Speech Coding:Digitization and Compression Considerations : 1) bit rate (bps) 2) recovered quality 3) computation complexity/feasibility Voice-based Network Access — User Interface, Content Analysis, User-content Interaction x(t) x[n] LPF Processing Algorithms output Speech Signals Carrying Linguistic Knowledge and Human Information: Characters, Words, Phrases, Sentences, Concepts, etc. Double Levels of Information: Acoustic Signal Level/Symbolic or Linguistic Level Processing and Interaction of the Double-level Information x[n] x[n] ^ xk 110101… Processing Inverse Processing Storage/transmission

Sampling of Signals X(t) X[n] t n

Double Levels of Information 字(Character) 詞(Word) 句(Sentence) 人人用電腦 電腦

Speech Signal Processing – Processing of Double-Level Information 2018/9/13 Speech Signal Sampling Processing 今 天 的 Algorithm 天 氣 非 Chips or Computers 常 好 Linguistic Structure 今天 的 今天的 天氣 非常 好 Linguistic Knowledge Lexicon Grammar

Well-Known Application Examples of Speech and Language Technologies – Speaking Personal Assistant Weather in New York next week ? Who is the president of US ? What did he say today ? How can I go to National Taiwan University ? Short messaging, personal scheduling, etc. Special Questions: 唐詩宋詞, 出師表… 說個笑話… Output Speech Signals Language Generation Input Speech Understanding Speech Synthesis Dialogue Manager Information Retrieval Knowledge Graph Machine Translation Recognition Question Answering Wikipedia Examples: Siri (Apple), Google Now (Google), Cortana (Microsoft)

User-Content Interaction Voice-based Network Access 2018/9/13 Internet User Interface Content Analysis User-Content Interaction User Interface —when keyboards/mice inadequate Content Analysis — help in browsing/retrieval of multimedia content User-Content Interaction —all text-based interaction can be accomplished by spoken language

User Interface —Wireless Communications Technologies have Created a Whole Variety of User Terminals 2018/9/13 Internet Text Content Multimedia Content at Any Time, from Anywhere Smart phones, Hand-held Devices, Notebooks, Vehicular Electronics, Hands-free Interfaces, Home Appliances, Wearable Devices… Small in Size, Light in Weight, Ubiquitous, Invisible… Post-PC Era Keyboard/Mouse Most Convenient for PC’s not Convenient any longer — human fingers never shrink, and application environment is changed Service Requirements Growing Exponentially Voice is the Only Interface Convenient for ALL User Terminals at Any Time, from Anywhere, and to the point in one utterance Speech Processing is the only less mature part in the Technology Chain

2018/9/13 Content Analysis—Multimedia Technologies have Created a World of Multimedia Content Internet Real–time Information weather, traffic flight schedule stock price sports scores Private Services personal notebook business databases home appliances network entertainments Intelligent Working Environment e–mail processors intelligent agents teleconferencing distant learning electric commerce Knowledge Archieves digital libraries virtual museums Special Services Google FaceBook YouTube Amazon Most Attractive Form of the Network Content is Multimedia, which usually Includes Speech Information (but Probably not Text) Multimedia Content Difficult to be Summarized and Shown on the Screen, thus Difficult to Browse The Speech Information, if Included, usually Tells the Subjects, Topics and Concepts of the Multimedia Content, thus Becomes the Key for Browsing and Retrieval Multimedia Content Analysis based on Speech Information

User-Content Interaction — Wireless and Multimedia Technologies are Creating An Era of Network Access by Spoken Language Processing 2018/9/13 text information Multimedia Content Text-to-Speech Synthesis Text Content voice information Spoken and multi-modal Dialogue Voice-based Information Retrieval Internet voice input/ output Multimedia Content Analysis Text Information Retrieval Hand-held Devices with Multimedia Functionalities Commonly used Today Network Access is Primarily Text-based today, but almost all Roles of Texts can be Accomplished by Speech User-Content Interaction can be Accomplished by Spoken and Multi-modal Dialogues Using Speech Instructions to Access Multimedia Content whose Key Concepts Specified by Speech Information

Voice-based Information Retrieval 2018/9/13 Voice-based Information Retrieval Voice Instructions 請問鼎泰豐的地址? Text Instructions d1 Text Information d2 d3 Voice Information 鼎泰豐台北101分店在… Both the User Instructions and Network Content Can be in form of Speech

Speech Recognition and Understanding Spoken and Multi-modal Dialogues 2018/9/13 Almost All User-Content Interaction can be Accomplished by Spoken or Multi-modal Dialogues Internet Sentence Generation and Speech Synthesis Users Output Speech Response to the user Databases Wireless Networks Discourse Context Dialogue Manager User’s Intention Dialogue Server Input Speech Speech Recognition and Understanding

2018/9/13 Outline Both Theoretical Issues and Practical Problems will be Discussed Starting with Fundamentals, but Entering Research Topics in the Second Half Part I: Fundamental Topics 1.0 Introduction to Digital Speech Processing 2.0 Fundamentals of Speech Recognition 3.0 Map of Subject Areas 4.0 More about Hidden Markov Models 5.0 Acoustic Modeling 6.0 Language Modeling 7.0 Speech Signals and Front-end Processing 8.0 Search Algorithms for Speech Recognition Part II: Advanced Topics 9.0 Speech Recognition Updates 10.0 Speech-based Information Retrieval 11.0 Spoken Document Understanding and Organization for User-content Interaction 12.0 Computer-assisted Language Learning(Call) 13.0 Speaker Variabilities: Adaption and Recognition 14.0 Latent Topic Analysis 15.0 Robustness for Acoustic Environment 16.0 Some Fundamental Problem-solving Approaches 17.0 Spoken Dialogues 18.0 Conclusion

References 教科書:無 主要參考書: 2018/9/13 References 教科書:無 主要參考書: X. Huang, A. Acero, H. Hon, “Spoken Language Processing”, Prentice Hall, 2001,松瑞 F. Jelinek, “Statistical Methods for Speech Recognition”, MIT Press, 1999 L. Rabiner, B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993, 民全 C. Becchetti, L. Prina Ricotti, “Speech Recognition- Theory and C++ implementation”, Johy Wiley and Sons, 1999, 民全 D. Jurafsky, J. Martin, “Speech and Language Processing- An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics, 2nd edition”, Prentice-Hall, 2009 G. Tur, R. De Mori, “Spoken Language Understanding- Systems for Extracting Semantic Information from Speech”, John Wiley & Sons, 2011 其他參考文獻課堂上提供

Other Information 教材: 適合年級:三、四(電機系、資工系) 成績評量方式 2018/9/13 Other Information 教材: available on web before the day of class (http://speech.ee.ntu.edu.tw) 適合年級:三、四(電機系、資工系) 成績評量方式 Midterm Exam 25% Homeworks (I) (II) (Ⅲ) 15%、5%、15% Final Exam 10% Term Project 30%

Goals 課程目的: 提供同學進入此一充滿機會與挑戰的新領域所需的基本知識,體 驗數學模型與軟體程式如何相輔相成,學習進入一個新領域由基 礎進入研究的歷程,體會吸收非結構性知識(Unstructured Knowledge)的經驗 Unstructured Knowledge Math & Programming A B C D Mathematical Models Programming Hardware

2018/9/13 1.0 Introduction — A Brief Summary of Core Technologies and Example Application Seenarios References for 1.0 1.“Speech and Language Processing over the Web”, IEEE Signal Processing Magazine, May 2008

Speech Recognition as a pattern recognition problem 2018/9/13 Speech Recognition as a pattern recognition problem Feature Extraction unknown speech signal Pattern Matching Decision Making x(t) W X output word feature vector sequence Reference Patterns y(t) Y training speech

Basic Approach for Large Vocabulary Speech Recognition 2018/9/13 Basic Approach for Large Vocabulary Speech Recognition A Simplified Block Diagram Example Input Sentence this is speech Acoustic Models (聲學模型) (th-ih-s-ih-z-s-p-ih-ch) Lexicon (th-ih-s) → this (ih-z) → is (s-p-iy-ch) → speech Language Model (語言模型) (this) – (is) – (speech) P(this) P(is | this) P(speech | this is) P(wi|wi-1) bi-gram language model P(wi|wi-1,wi-2) tri-gram language model,etc Front-end Signal Processing Acoustic Models Lexicon Feature Vectors Linguistic Decoding and Search Algorithm Output Sentence Speech Corpora Model Training Language Construction Text Input Speech

Speech Recognition Technologies, Applications and Problems 2018/9/13 Speech Recognition Technologies, Applications and Problems Word Recognition voice command/instructions Keyword Spotting identifying the keywords out of a pre-defined keyword set from input voice utterances Large Vocabulary Continuous Speech Recognition entering longer texts remote dictation/automatic transcription Speaker Dependent/Independent/Adaptive Acoustic Reception/Background Noise/Channel Distortion Read/Spontaneous/Conversational Speech

Text Analysis and Letter-to-sound Conversion 2018/9/13 Text-to-speech Synthesis Transforming any input text into corresponding speech signals E-mail/Web page reading Prosodic modeling Basic voice units/rule-based, non-uniform units/corpus-based, model-based Text Analysis and Letter-to-sound Conversion Prosody Generation Signal Processing and Concatenation Lexicon and Rules Prosodic Model Voice Unit Database Input Text Output Speech Signal

phrase/concept language model understanding results 2018/9/13 Speech Understanding Understanding Speaker’s Intention rather than Transcribing into Word Strings Limited Domains/Finite Tasks acoustic models phrase lexicon Syllable Recognition Key Phrase Matching input utterance syllable lattice phrase graph concept graph concept set phrase/concept language model Semantic Decoding understanding results An Example utterance: 請幫我查一下 台灣銀行 的 電話號碼 是幾號? key phrases: (查一下) - ( 台灣銀行) - (電話號碼) concept: (inquiry) - (target) - (phone number)

Speaker Verification Verifying the speaker as claimed 2018/9/13 Speaker Verification Verifying the speaker as claimed Applications requiring verification Text dependent/independent Integrated with other verification schemes input speech Feature Extraction Verification yes/no Speaker Models

Voice-based Information Retrieval 2018/9/13 Voice-based Information Retrieval Speech Instructions Speech Documents (or Multi-media Documents including Speech Information) speech instruction 請問鼎泰豐的地址? text instruction d1 text documents d2 d3 speech documents 鼎泰豐台北101分店在… Locate exactly the desired utterances Text descriptions not needed for indexing/retrieving puporses

Speech Recognition and Understanding 2018/9/13 Spoken Dialogue Systems Almost all human-network interactions can be accomplished by spoken dialogue Speech understanding, speech synthesis, dialogue management System/user/mixed initiatives Reliability/efficiency, dialogue modeling/flow control Transaction success rate/average number of dialogue turns Databases Sentence Generation and Speech Synthesis Output Speech Input Speech Dialogue Manager Speech Recognition and Understanding User’s Intention Discourse Context Response to the user Internet Networks Users Dialogue Server

Spoken Document Understanding and Organization 2018/9/13 Spoken Document Understanding and Organization Unlike the Written Documents which are easily shown on the screen for user to browse and select, Spoken Documents are just Audio Signals — the user can’t listen each one from the beginning to the end during browsing — better approaches for understanding/organization of spoken documents becomes necessary Spoken Document Segmentation — automatically segmenting a spoken document into short paragraphs, each with a central topic Spoken Document Summarization — automatically generating a summary (in text or speech form) for each short paragraph Title Generation for Spoken Documents — automatically generating a title (in text or speech form) for each short paragraph Key Term Extraction and Key Term Graph Construction for Spoken Documents — automatically extracting a set of key terms for each spoken document, and constructing key term graphs for a collection of spoken documents Semantic Structuring of Spoken Documents — construction of semantic structure of spoken documents into graphical hierarchies

Multi-lingual Functionalities 2018/9/13 Multi-lingual Functionalities Code-Switching Problem English words/phrases inserted in spoken Chinese sentences as an example 人人都用Computers,家家都上Internet OK不OK?OK啦! the whole sentence switched from Chinese to English as an example 準備好了嗎?Let’s go! Cross-language Information Processing globalized network with multi-lingual content/users cross-language network information processing with a certain input language Dialects/Accents hundreds of Chinese dialects as an example code-switching problem─ Chinese dialects mixed with Mandarin (or plus English) as an example Mandarin with a variety of strong accents as an example Global/Local Languages Language Dependent/Independent Technologies Code-Switching Speech Processing, Speech-to-speech Translation, Computer-assisted Language Learning

Computer-Assisted Language Learning Globalized World every one needs to learn one or more languages in addition to the native language Language Learning one-to-one tutoring most effective but with high cost Computers not as good as Human Tutors software reproduced easily used repeatedly any time, anywhere never get tired or bored Learning of pronunciation, vocabulary, grammar, sentences, dialogues, etc. sometimes in form of games

本作品採用創用CC「姓名標示-非商業性-相同方式分享3.0臺灣」許可協議。 版權聲明 頁碼 作品 版權標示 作者 / 來源 2 國立臺灣大學電機工程學系李琳山 教授。 本作品採用創用CC「姓名標示-非商業性-相同方式分享3.0臺灣」許可協議。 3 4 5

本作品採用創用CC「姓名標示-非商業性-相同方式分享3.0臺灣」許可協議。 版權聲明 頁碼 作品 版權標示 作者 / 來源 6 國立臺灣大學電機工程學系李琳山 教授。 本作品採用創用CC「姓名標示-非商業性-相同方式分享3.0臺灣」許可協議。 7 8 9

本作品採用創用CC「姓名標示-非商業性-相同方式分享3.0臺灣」許可協議。 版權聲明 頁碼 作品 版權標示 作者 / 來源 10 國立臺灣大學電機工程學系李琳山 教授。 本作品採用創用CC「姓名標示-非商業性-相同方式分享3.0臺灣」許可協議。 11 24 12 16

本作品採用創用CC「姓名標示-非商業性-相同方式分享3.0臺灣」許可協議。 版權聲明 頁碼 作品 版權標示 作者 / 來源 18 國立臺灣大學電機工程學系李琳山 教授。 本作品採用創用CC「姓名標示-非商業性-相同方式分享3.0臺灣」許可協議。 19 21

本作品採用創用CC「姓名標示-非商業性-相同方式分享3.0臺灣」許可協議。 版權聲明 頁碼 作品 版權標示 作者 / 來源 22 國立臺灣大學電機工程學系李琳山 教授。 本作品採用創用CC「姓名標示-非商業性-相同方式分享3.0臺灣」許可協議。 23 25