1 Introduction Prof. Lin-Shan Lee TA: Chun-Hsuan Wang
Outline Project Introduction Linux and Bash Introduction 2 Project Introduction Linux and Bash Introduction Feature Extraction Homework
Project Introduction 3
第一階段專題 目的:透過建立一個基本的大字彙語音辨識系統,讓同學對語音辨識有具體的了解,並且以此作為進一步研究各項進階技術的基礎。 4 目的:透過建立一個基本的大字彙語音辨識系統,讓同學對語音辨識有具體的了解,並且以此作為進一步研究各項進階技術的基礎。 Input Speech Speech Recognition System Output Sentence 今天 We are going to learn the black box of the speech recognition system.
語音辨識系統 Conventional ASR (Automatic Speech Recognition) system: 5 Conventional ASR (Automatic Speech Recognition) system: Input Speech Feature Vectors Output Sentence Linguistic Decoding and Search Algorithm Front-end Signal Processing 今天 Speech Corpora Acoustic Model Training Acoustic Model Language Model Language Model Construction Text Corpora Lexicon Deep learning based ASR system
語音辨識系統 Conventional ASR system Deep learning based ASR system 6 Conventional ASR system Widely used in commercial system Deep learning based ASR system Still to be studied Both will be implemented in this project with Kaldi toolkit Kaldi is the most widely used ASR toolkit.
Schedule Week Progress Report Group 1 7 Week Progress Report Group 1 Introduction + Linux intro+ Feature extraction 2 Acoustic model training : monophone & triphone 3 Language model training + Decoding A 4 Live demo B 5 Deep Neural Network 6 Progress Report 7 ... ... 第一階段 …. 第二階段
語音辨識系統 Week 1 Week 3 Week 4 Week 2 Week 5 8 Conventional ASR (Automatic Speech Recognition) system: Week 1 Week 3 Input Speech Feature Vectors Output Sentence Linguistic Decoding and Search Algorithm Front-end Signal Processing 今天 Week 4 Speech Corpora Acoustic Model Training Acoustic Model Language Model Language Model Construction Text Corpora Lexicon Week 2 Deep learning based ASR system Week 5
How to do recognition? How to map speech O to a word sequence W ? 9 How to map speech O to a word sequence W ? P(O|W): acoustic model P(W): language model
Language model P(W) W = w1, w2, w3, …, wn 10 W = w1, w2, w3, …, wn 𝑃 𝑊 =𝑃 𝑊 1 𝑃 𝑊 2 𝑊 1 𝑖=3 𝑛 𝑃( 𝑊 𝑖 | 𝑊 𝑖−2 , 𝑊 𝑖−1 )
Language model examples 11 log Prob Probability in log scale
Acoustic Model P(O|W) Model of a phone Markov Model 12 Model of a phone Markov Model Gaussian Mixture Model
Lexicon 13
語音辨識系統 Conventional ASR (Automatic Speech Recognition) system: 5 Conventional ASR (Automatic Speech Recognition) system: Input Speech Feature Vectors Output Sentence Linguistic Decoding and Search Algorithm Front-end Signal Processing 今天 Speech Corpora Acoustic Model Training Acoustic Model Language Model Language Model Construction Text Corpora Lexicon Deep learning based ASR system
Linux and Bash Introduction 15
Vim 如何建立文件: vim hello.txt 進去後,輸入“ i ”即可進入編輯模式 此時,按下ESC即可回復一般模式,此時可以: 16 如何建立文件: vim hello.txt 進去後,輸入“ i ”即可進入編輯模式 此時,輸入任何你想要打的 此時,按下ESC即可回復一般模式,此時可以: 輸入” /想搜尋的字“ 輸入”:w”即可存檔 輸入”:wq”即可存檔+離開
Screen 簡單講一下,避免因為斷線而程式跑到一半就失敗了, 大家可以使用screen,簡單使用法如下: 17 簡單講一下,避免因為斷線而程式跑到一半就失敗了, 大家可以使用screen,簡單使用法如下: 1. 一登入後打"screen",就進入了screen使用模式,用法都相同 2. 如果想要關掉此screen也是用"exit" 3. 如果還有程式在跑沒有想關掉他,但是想要跳出, 按"Ctrl + a" + "d"離開screen模式(此時登出並關機程式也不會斷掉) 4. 下次登入時,打"screen -r"就可以跳回之前沒關掉的screen唷~ 5. 打”screen -r” 也許會有很多個未關的screen,輸入你要的 screen id 即可(越大的越新) 這樣就算關掉電腦,工作仍可以進行!!! 也可以用tmux,tmux像是有更多功能的screen
Linux Shell Script Basics 18 echo “Hello” (print “hello” on the screen) a=ABC (assign ABC to a) echo $a (will print ABC on the screen) b=$a.log (assign ABC.log to b) cat $b > testfile (write “ABC.log” to testfile) 指令 -h (will output the help information)
Bash Example 19
Bash script 20 [ condition ] uses ‘test’ to check. Ex. test -e ~/tmp; echo $? File [ -e filename ] -e 該「檔名」是否存在? -f 該「檔名」是否存在且為檔案(file)? -d 該「檔名」是否存在且為目錄(directory)? Number [ n1 -eq n2 ] -eq equal (n1==n2) -ne not equal (n1!=n2) -gt greater than (n1>n2) -lt less than (n1<n2) -ge greater or equal (n1>=n2) -le less than or equal (n1<=n2) SPACE COUNTS!!!!
Bash script Logic -a and -o or ! negation 21 Logic -a and -o or ! negation [ "$yn" == "Y" -o "$yn" == "y" ] [ "$yn" == "Y" ] || [ "$yn" == "y" ] Don’t forget the space and the double quote!!!!
Bash script ` operation && || ; operation Some useful commands. 22 ` operation echo `ls` my_date=`date` echo $my_date && || ; operation echo hello || echo no~ echo hello && echo no~ [ -f tmp ] && cat tmp || echo "file not found” [ -f tmp ] ; cat tmp ; echo "file not found” Some useful commands. grep, sed, touch, awk, ln
Bash script Pipeline program1 | program2 | program3 23 Pipeline program1 | program2 | program3 echo “hello” | tee log More information about pipeline: http://www.gnu.org/software/bash/manual/html_node/Pipelines.html
Bash script Input / output for bash: 24 Input / output for bash: cmd > logfile # 將 stdout 導入logfile,stderr 印於螢幕 cmd > logfile 2>&1 # 將stdout、stderr 全部導到 logfile cmd <inputfile 2>errorfile | grep stdoutfile More Information about bash input/output: http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_08_02.html
Feature Extraction 02.extract.feat.sh 25 今天 Feature Vectors Output Input Speech Feature Vectors Output Sentence Linguistic Decoding and Search Algorithm Front-end Signal Processing 今天 Language Model Speech Corpora Acoustic Model Training Acoustic Model Language Model Construction Text Corpora Lexicon
Feature Extraction - MFCC 26
MFCC (Mel-frequency cepstral coefficients) 27 13 dimensions vector 數位語音第二章
Extract Feature (02.extract.feat.sh) 28 Training Set Input Output Archive 目錄 Development Set Testing Set
Kaldi rspecifier & wspecifier format 29 ark:<ark file> 眾多小檔案的檔案庫,可能是wav檔、mfcc檔、statistics的集合 scp:<scp file> 一群檔案的位置表,可能指向個別檔案(如我們的material/train.wav.scp),也可以指向ark檔中的位置 ark,t:<ark file> 輸出文字檔案的ark,當輸入時,t無作用;不加,t,預設輸出二進位格式 ark,scp:<ark file>,<scp file> 同時輸出ark檔和scp檔
Extract Feature (extract.feat.sh) 30 compute-mfcc-feats add-deltas compute-cmvn-stats apply-cmvn
MFCC – Add delta add-deltas Deltas and Delta-Deltas 31 add-deltas Deltas and Delta-Deltas 將MFCC的Δ以及ΔΔ (意近一次微分與二次微分) 加入參數中,使得總維度變成39維 Usage:
MFCC – CMVN 32 CMVN: Cepstral Mean and Variance Normalization
MFCC – CMVN 33 compute-cmvn-stats Usage: apply-cmvn
Hint (Important!!) compute-mfcc-feats output為 ark:$path/$target.13.ark 34 compute-mfcc-feats output為 ark:$path/$target.13.ark add-deltas [input] [add_deltas] [input] = ark:$path/$target.13.ark compute-cmvn-stats [add_deltas] [comput_result] apply-cmvn [comput_result] [add_deltas] [output] [output] MUST BE rm -f [add_deltas] [comput_result] ark,t,scp:$path/$target.39.cmvn.ark,$path/$target.39.cmvn.scp
Homework Linux, background knowledge 01.format.sh, 02.extract.feat.sh 35 Linux, background knowledge 01.format.sh, 02.extract.feat.sh
Homework 如果你沒有操作 Linux 系統的經驗,請事先預習 Linux 系統的指令。 鳥哥的Linux 私房菜 36 如果你沒有操作 Linux 系統的經驗,請事先預習 Linux 系統的指令。 鳥哥的Linux 私房菜 第七章Linux 檔案與目錄管理http://linux.vbird.org/linux_basic/0220filemanager.php 第十章vim 程式編輯器http://linux.vbird.org/linux_basic/0310vi.php
Homework (optional) 閱讀: 使用加權有限狀態轉換器的基於混合詞與次詞以文字及語音指令偵測口語詞彙” – 第三章 37 閱讀: 使用加權有限狀態轉換器的基於混合詞與次詞以文字及語音指令偵測口語詞彙” – 第三章 https://www.dropbox.com/s/dsaqh6xa9dp3dzw/wfst_thesis.pdf Kaldi documentation: http://kaldi-asr.org/doc/tools.html
Login Workstation By pietty/putty/Xshell ssh 140.112.21.80 port 22 38 By pietty/putty/Xshell ssh 140.112.21.80 port 22 By terminal ssh -p 22 username@140.112.21.80
Data 將壓縮檔複製至自己的家目錄底下 cp /share/proj1.ASTMIC.subset.tar.gz ~/. 解壓縮 tar -zxvf proj1.ASTMIC.subset.tar.gz
To Do Step 1: Execute the following command: Step 2: 40 Step 1: Execute the following command: script/01.format.sh | tee log/01.format.log script/02.extract.feat.sh | tee log/02.extract.feat.sh.log Step 2: Add-delta CMVN Observe the output and report
工作站注意事項 請避免在程式中重複暴力的搜尋外網或抓取資料,這類的行為如果被計中偵測到,會將ip給ban,造成大家無法連進工作站。 41 請避免在程式中重複暴力的搜尋外網或抓取資料,這類的行為如果被計中偵測到,會將ip給ban,造成大家無法連進工作站。 如果需要train的corpus佔用空間需要超過50G以上,麻煩請寄信給我,以控制專題工作站的空間使用量。 因工作站運算資源有限,請避免使用工作站train一些個人作業等,而讓資源留給大家使用在專題研究上。 本次project中,Week 3 & Week 5 的實驗需要的大量運算資源和時間,請大家儘早開始,免得積到最後一兩天,大家的程式會因運算資源有限,而造成全部卡住,大家都無法進行實驗。
工作站注意事項 42 有為大家裝不同版本的cuda library,大家如果在某些檔案需要使用各版本的cuda library,請自行加進path中,如果所需要的cuda版本沒有,可以寫信請我幫忙裝。 第二階段專題的時候建議大家使用virtual environment,Ex: virtualenv, conda等,也可以使用pip --user 將需要的package放在local端 請不要在工作站跑ipython之類互動式的程式,會吃掉大家的資源,請直接跑python檔。
其他注意事項 Problems about the project: Problems about the workstation: 43 Problems about the project: Facebook Group:數位語音專題 DSP Website: http://speech.ee.ntu.edu.tw/courses.html Week 1 TA: 王君璇 r07942076@ntu.edu.tw Problems about the workstation: Workstation TA: 王君璇 r07942076@ntu.edu.tw
其他注意事項 請大家務必至以下網址填入自己個人資料:https://goo.gl/Qm81M2 所有公告會同時公告於fb社團和寄信給各位。 44 請大家務必至以下網址填入自己個人資料:https://goo.gl/Qm81M2 所有公告會同時公告於fb社團和寄信給各位。