1 Introduction Prof. Lin-Shan Lee.

Introduction Prof. Lin-Shan Lee

2 Introduction of the Project
2 Speech Recognition by Kaldi toolkit

3 第一階段專題 目的:透過建立一個基本的大字彙語音辨識系統,讓同學對語音辨識有具體的了解,並且以此作為進一步研究各項進階技術的基礎。
4 How to do recognition? How to map speech O to a word sequence W ?
4 How to map speech O to a word sequence W ? P(O|W): acoustic model P(W): language model

5 Language model P(W) 5 W = w1, w2, w3, …, wn

6 Language model examples
6 Probability in log scale

7 Acoustic Model P(O|W) Model of a phone Markov Model
7 Model of a phone Markov Model Gaussian Mixture Model

8 Feature Extraction 8 Feature Extraction

9 MFCC (Mel-frequency cepstral coefficients)
9 13 dimensions vector

10 Lexicon 10

11 語音辨識系統 Use Kaldi as tool Feature Vectors Output Sentence
11 Use Kaldi as tool Input Speech Feature Vectors Output Sentence Linguistic Decoding and Search Algorithm Front-end Signal Processing Speech Corpora Acoustic Model Training Acoustic Models Language Model Language Model Construction Text Corpora Lexicon Lexical Knowledge-base Grammar

12 Linux Introduction 12

13 Vim 如何建立文件: vim hello.txt 進去後,輸入”i”即可進入編輯模式 此時,按下ESC即可回復一般模式,此時可以:
14 Screen 簡單講一下,避免因為斷線而程式跑到一半就失敗了, 大家可以使用screen,簡單使用法如下:
15 Linux Shell Script Basics
15 echo “Hello” (print “hello” on the screen) a=ABC (assign ABC to a) echo $a (will print ABC on the screen) b=$a.log (assign ABC.log to b) cat $b > testfile (write “ABC.log” to testfile) 指令 -h (will output the help information)

16 Feature Extraction 16 02.extract.feat.sh

17 Feature Extraction - MFCC

18 Extract Feature (02.extract.feat.sh)
18 Training Set Input Output Archive 目錄 Development Set Testing Set

19 Kaldi rspecifier & wspecifier format
19 ark:<ark file> 眾多小檔案的檔案庫,可能是wav檔、mfcc檔、statistics的集合 scp:<scp file> 一群檔案的位置表,可能指向個別檔案(如我們的material/train.wav.scp),也可以指向ark檔中的位置 ark,t:<ark file> 輸出文字檔案的ark,當輸入時,t無作用;不加,t,預設輸出二進位格式 ark,scp:<ark file>,<scp file> 同時輸出ark檔和scp檔

20 Extract Feature (extract.feat.sh)
20 add-deltas compute-cmvn-stats apply-cmvn

21 MFCC – Add delta add-deltas Deltas and Delta-Deltas
21 add-deltas Deltas and Delta-Deltas 將MFCC的Δ以及ΔΔ (意近一次微分與二次微分) 加入參數中,使得總維度變成39維 Usage:

22 MFCC – CMVN 22 CMVN: Cepstral Mean and Variance Normalization

23 MFCC – CMVN 23 compute-cmvn-stats Usage: apply-cmvn

24 Hint (Important!!) 24

25 Homework Linux, background knowledge 01.format.sh, 02.extract.feat.sh
25 Linux, background knowledge 01.format.sh, 02.extract.feat.sh

26 Homework 如果你沒有操作 Linux 系統的經驗,請事先預習 Linux 系統的指令。 鳥哥的Linux 私房菜
27 Homework (optional) 閱讀: 使用加權有限狀態轉換器的基於混合詞與次詞以文字及語音指令偵測口語詞彙” – 第三章
28 Data 登入工作站 pietty/putty/Xshell ssh port 22 複製壓縮檔到自己的子資料夾
29 To Do Step 1: Execute the following command: Step 2:
30 Schedule Week Progress Group 1 Introduction
30 Week Progress Group 1 Introduction Linux入門 + Feature extraction 2 Acoustic model training: monophone & triphone 3 Language model training + Decoding A 4 Progress Report B 5 6

31 注意事項 If you have any problem …… 留下要開的專題工作站帳號和e-mail與FB 帳號
32 Happy Birthday to Professor Lin-Shan Lee

