第三讲语音合成概述背景目标基本问题技术历程典型系统.

第三讲语音合成概述背景目标基本问题技术历程典型系统

背景计算机的普及人机自然语言交互语音合成在人机交互系统中的作用语音合成的其它应用 Speech Recognition
Natural Language Understanding Dialog Manager Speech Synthesis Natural Language Generation Information Database Speech In Speech Out 计算机的普及人机自然语言交互语音合成在人机交互系统中的作用语音合成的其它应用

目标 “让计算机像人一样说话” 现阶段-TTS 前瞻性-CTS 等待时机-ITS 从文字到语音TTS（Text-To-Speech）
从概念到语音CTS（Concept-To-Speech）从意念到语音ITS（Intention-To-Speech）现阶段-TTS 前瞻性-CTS 等待时机-ITS

基本问题原因从语音到文字的信息缺失从文字到语音（TTS）从文字到发音描述发什么音如何发音从发音描述到语音合成

技术历程 1937，Voder，Bell Lab.， H. Dudly 1962，级联共振峰，KTH，G. Fant
1970s 1980s 1990s 2000s Quality Time Format PSOLA Unit-selection: Segment-oriented Prosody-oriented Excellent: Human-liked Fair: acceptable Bad: unacceptable 1937，Voder，Bell Lab.， H. Dudly 1962，级联共振峰，KTH，G. Fant 1970s，混合共振峰，MIT，D. Klatt 1986，PSOLA，F. Charpentier 2000s，Unit-selection，N. Campbell & A. Black

技术历程 1937，Voder，Bell Lab.， H. Dudly 1962，级联共振峰，KTH，G. Fant
1970s 1980s 1990s 2000s Quality Time Format PSOLA Unit-selection: Segment-oriented Prosody-oriented Excellent: Human-liked Fair: acceptable Bad: unacceptable 1937，Voder，Bell Lab.， H. Dudly 1962，级联共振峰，KTH，G. Fant 1970s，混合共振峰，MIT，D. Klatt 1986，PSOLA，F. Charpentier 2000s，Unit-selection，N. Campbell & A. Black 音色，孤立音段音色，孤立词音色、韵律，语句韵律，语句

典型系统基于单元挑选的TTS系统构成（韵律导向）两个模块一个接口，发音描述数据库，合成单元前端：文本处理，从文字到发音描述
Parser Prosodic Event Predictor Phonetizer Prosodic & Phonemic context Prosodic Acoustics Predictor Segment Acoustics Predictor Unit Selection Speech Synthesizer Text Normalization Corpora: speech phonetic alignment, prosodic parameter Dictionary: Lexicon, Rules, Homograph Input text Interface Output speech Frontend Backend 基于单元挑选的TTS系统构成（韵律导向）两个模块前端：文本处理，从文字到发音描述后端：语音处理，从发音描述到语音合成一个接口，发音描述数据库，合成单元

典型系统例：北京交通大学成立于1896年。 Text Normalization:北京交通大学成立于1896年 Frontend
Parser Prosodic Event Predictor Phonetizer Prosodic & Phonemic context Prosodic Acoustics Predictor Segment Acoustics Predictor Unit Selection Speech Synthesizer Text Normalization Corpora: speech phonetic alignment, prosodic parameter Dictionary: Lexicon, Rules, Homograph Input text Interface Output speech Frontend Backend 例：北京交通大学成立于1896年。 Text Normalization:北京交通大学成立于1896年

典型系统例：北京交通大学成立于1896年。 Text Normalization:北京交通大学成立于1896年
Parser Prosodic Event Predictor Phonetizer Prosodic & Phonemic context Prosodic Acoustics Predictor Segment Acoustics Predictor Unit Selection Speech Synthesizer Text Normalization Corpora: speech phonetic alignment, prosodic parameter Dictionary: Lexicon, Rules, Homograph Input text Interface Output speech Frontend Backend POS (Part Of Speech) 例：北京交通大学成立于1896年。 Text Normalization:北京交通大学成立于1896年 Parser:北京(npr) 交通(ng)大学(ng)成立(vgo)于(pg)1896年(t)

Parser Prosodic Event Predictor Phonetizer Prosodic & Phonemic context Prosodic Acoustics Predictor Segment Acoustics Predictor Unit Selection Speech Synthesizer Text Normalization Corpora: speech phonetic alignment, prosodic parameter Dictionary: Lexicon, Rules, Homograph Input text Interface Output speech Frontend Backend 例：北京交通大学成立于1896年。 Text Normalization:北京交通大学成立于1896年 Parser:北京(npr) 交通(ng)大学(ng)成立(vgo)于(pg)1896年(t) Prosodic Event: OutPut PWord Layer: 北京 ng 交通 ng 大学 ng 成立于 vg_pg 一八九六年 t OutPut PPhrase Layer: ## 北京交通大学## 成立于## 一八九六年 OutPut IPhrase Layer: ## 北京交通大学成立于一八九六年 OutPut Sentence Layer: ## 北京交通大学成立于一八九六年

Parser Prosodic Event Predictor Phonetizer Prosodic & Phonemic context Prosodic Acoustics Predictor Segment Acoustics Predictor Unit Selection Speech Synthesizer Text Normalization Corpora: speech phonetic alignment, prosodic parameter Dictionary: Lexicon, Rules, Homograph Input text Interface Output speech Frontend Backend 例：北京交通大学成立于1896年。 Text Normalization:北京交通大学成立于1896年 Parser:北京(npr) 交通(ng)大学(ng)成立(vgo)于(pg)1896年(t) Prosodic Event: OutPut PWord Layer: 北京 ng 交通 ng 大学 ng 成立于 vg_pg 一八九六年 t Phonetizer:北 bei3 京 jing1(BL :北京) 交 jiao1(BL :交通) 通 tong1(BL :交通) 大 da4(BL :大学) 学 xue2(BL :大学) 成 cheng2(BL :成立) 立 li4(BL :成立) 于 yu2(BL :于) 一 yi1(BL :一八九六年) 八 ba1(BL :一八九六年) 九 jiu3(BL :一八九六年) 六 liu4(BL :一八九六年) 年 nian2(BL :一八九六年)

典型系统 Parser Prosodic Event Predictor Phonetizer Prosodic & Phonemic context Prosodic Acoustics Predictor Segment Acoustics Predictor Unit Selection Speech Synthesizer Text Normalization Corpora: speech phonetic alignment, prosodic parameter Dictionary: Lexicon, Rules, Homograph Input text Interface Output speech Frontend Backend 例：北京交通大学成立于1896年。 Interface: {2 {1 ^2 %0 {0 ^2 %0 ^2 %0 ( #bei3 &MC $北 ) ( #jing1 &MC $京 ) > ] ^2 %0 ( #jiao1 &MC $交 ) ( #tong1 &MC $通 ) > ] ^2 %0 ( #da4 &MC $大 ) ( #xue2 &MC $学 ) > ] 0} {0 ^2 %0 ^2 %0 ( #cheng2 &MC $成 ) ( #li4 &MC $立 ) > ( #yu2 &MC $于 ) > ] 0} {0 ^2 %0 ^2 %0 ( #yi1 &MC $一 ) ( #ba1 &MC $八 ) ( #jiu3 &MC $九 ) ( #liu4 &MC $六 ) ( #nian2 &MC $年 | ) > ] 0} 1} 2}

典型系统例：北京交通大学成立于1896年。 Prosodic Acoustic Predictor:
Parser Prosodic Event Predictor Phonetizer Prosodic & Phonemic context Prosodic Acoustics Predictor Segment Acoustics Predictor Unit Selection Speech Synthesizer Text Normalization Corpora: speech phonetic alignment, prosodic parameter Dictionary: Lexicon, Rules, Homograph Input text Interface Output speech Frontend Backend 例：北京交通大学成立于1896年。 Prosodic Acoustic Predictor: GMM(bei3) GMM(jing1) GMM(jiao1) GMM(tong1) GMM(da4) GMM(xue2) GMM(cheng2) GMM(li4) GMM(yu2) GMM(yi1) GMM(ba1) GMM(jiu3) GMM(liu4) GMM(nian4) Segment Acoustic Predictor: occ(bei3) occ(jing1) occ(jiao1) occ(tong1) occ(da4) occ(xue2) occ(cheng2) occ(li4) occ(yu2) occ(yi1) occ(ba1) occ(jiu3) occ(liu4) occ(nian4)

Parser Prosodic Event Predictor Phonetizer Prosodic & Phonemic context Prosodic Acoustics Predictor Segment Acoustics Predictor Unit Selection Speech Synthesizer Text Normalization Corpora: speech phonetic alignment, prosodic parameter Dictionary: Lexicon, Rules, Homograph Input text Interface Output speech Frontend Backend 例：北京交通大学成立于1896年。 Prosodic Acoustic Predictor: GMM(bei3) GMM(jing1) … Segment Acoustic Predictor: occ(bei3) occ(jing1) … Unit Selection: argmin cost(sam(bei3),sam(jing1),sam(jiao1),…) Corpus: bei3 jing1

Parser Prosodic Event Predictor Phonetizer Prosodic & Phonemic context Prosodic Acoustics Predictor Segment Acoustics Predictor Unit Selection Speech Synthesizer Text Normalization Corpora: speech phonetic alignment, prosodic parameter Dictionary: Lexicon, Rules, Homograph Input text Interface Output speech Frontend Backend 例：北京交通大学成立于1896年。 Prosodic Acoustic Predictor: GMM(bei3) GMM(jing1) … Segment Acoustic Predictor: occ(bei3) occ(jing1) … Unit Selection: argmin cost(sam(bei3),sam(jing1),sam(jiao1),…) Corpus: bei3 jing1… Speech Synthesizer:北京交通大学成立于1896年。

后续题目基础知识韵律原理分析关键技术数据库构建文本处理声学建模最优搜索/合成器相关研究音色调整/转换 HMM合成器挑战

基于数据驱动的韵律建模 Two trainable components: based on an annotated corpus
Prosodic event predictor Prosodic parameter predictor

韵律功能 Prosody structure Intonation Accent Mood Ex. 1, 已经取得文凭的和尚未取得文凭的干部
Ex. 4, 明天是个晴天,最高气温... (flat) Ex. 5, 明天是个晴天!我们可以... (glad) Ex. 6, 明天是个晴天? (interrogative) Accent Ex. 7, 明天是个晴天 vs. 明天是个晴天 Mood Ex. 8, 明天是个晴天 glad vs. 明天是个晴天 sad prosody is not all for mood, timbre is changed also

韵律的声学实现 In acoustic, prosody is presented as the variances of pitch
duration intensity pause

韵律描述 C-ToBI defined 1'st, prosody structure coming, accent index
知觉判断等级与韵律层级结构对应 coming, accent index

汉语韵律层次韵律结构标注，按照语调短语、中间短语、音步/韵律词三个韵律层级，描述每段发音。
语调短语（intonational phrase）：具有完整的语调，听感上可独立成句的一段发音音步（foot）：节奏的基本单位，一般由两个或三个音节构成，少数为单音节。韵律词（prosodic word）：所有的句法词具有类似词的连调模式和词重音模式、较短的词组其它凡是属于一个音步的结构跨度为1-4个音节，极大多数为2-3个音节, 少数为单音节和四音节结构。中间短语（intermediate phrase）：介于语调短语和韵律词之间的节奏单元由一个或多个韵律词构成中间短语之间可能存在嵌套结构

韵律标注依据听觉进行边界类型的判断，并辅助以特定类型处理的约定听觉判决所依据的线索标注符号特定类型约定
基频重置，边界末音节展延，停顿，节奏的变化需从全局的、层级的角度考察每段发音标注符号 BP2: 用以界定语调短语边界 BP1：用以界定中间短语边界 BP0：用以界定有明显停顿的音步/韵律词间的边界空格：用以界定音步/韵律词边界 *：用以界定韵律词内的音步边界特定类型约定位于短语边界的、听感上轻读的、作为短语间过渡的虚词，倾向于划归后一短语 BP0为音步边界，且具有明显的停、顿，倾向于从严标出

一个韵律结构标注的例子 S1 编者按(BP2)世界上(BP1) 有些事是相似的(BP2)甚至(BP0)惊人地相似
Problem of consistency training acceptable

韵律的深层次标注 Accent Index What is AI Sample 催眠师有相当的威望 Why is AI needed
体现语义上的着重和聚焦的一种韵律特征 Domains: word level: lexical stress sentence level: prominence, focus, emphasis, accented Why is AI needed more smooth voice more expressive synthesis voice AI acoustic realization relativity: relative accented/unaccented universal: integrate AI prosody function New topic Focus Stress pattern (技术/计数)

AI初步实验 accent index automatically detecting
based on the hierarchically prosodic structure prosodic approximation-ratio of the syllable as the indicator, ref. to Xu Yi’s work prosodic parameters predicted with AI Samples 催眠师有相当的威望

课程报告4 语音合成综述及专题阅读在线演示报告一，综述报告（提交：3-31）报告二，专题报告（提交：4-14）
《现代语音技术-基础与应用》第五章，蔡莲红等编著，清华大学出版社，2003 王仁华：“语音合成技术最新研究进展及其应用展望” 初敏 Interspeech, ieee ssw, icassp, speech prosody 在线演示科大讯飞捷通华声报告一，综述报告（提交：3-31）思考文语转换系统的任务基于数据库的文语转换系统的实现模块要求：参考文献3篇以上，相关分析注明出处报告二，专题报告（提交：4-14）四个专题选一：数据库、文本分析、韵律模型、波形拼接/合成器具体算法篇幅要求：2页（5号字）文件命名：学号_姓名_报告名称

第三讲语音合成概述背景目标基本问题技术历程典型系统.

Similar presentations

Presentation on theme: "第三讲语音合成概述背景目标基本问题技术历程典型系统."— Presentation transcript:

Similar presentations

About project

反馈

请登录

Auth with social network:

第三讲 语音合成概述 背景 目标 基本问题 技术历程 典型系统.

Similar presentations

Presentation on theme: "第三讲 语音合成概述 背景 目标 基本问题 技术历程 典型系统."— Presentation transcript:

Similar presentations

About project

反馈

第三讲语音合成概述背景目标基本问题技术历程典型系统.

Presentation on theme: "第三讲语音合成概述背景目标基本问题技术历程典型系统."— Presentation transcript: