Advanced Query by Humming System(QBH) 電信所一 王治皓
參考 Advanced Query by Humming System Using Diffused Hidden Markov Model and Tempo Based Dynamic Programming (Chiao-Wei Lin, Jian-Jiun Ding, and Che-Ming Hu) Improved Onset Detection Algorithm based on Fractional Power Envelope match Filter (Jian-Jiun Ding , Che-Ming Hu , Ta Hsien , and Chi-Jung Tseng ) Youtube影片
大綱 onset detection(起始檢測) pitch estimation(音調提取) melody matching(旋律匹配)
Abstract QBH是基於內容(content-based)來判斷哪首歌 本文改利用音符(note-based)來判斷 透過使用時頻分析來找出在Time domain難找到的起始點 除了音調,速度也是可以拿來判斷的依據
Introduction 把信號分割成一個一個音符 signal note
The note-based methods mainly include three parts: onset detection,1(起始檢測) pitch extraction2(音調提取) and melody matching3(旋律匹配). 5.In order to enhance the system speed and accuracy, we propose a two-stage matching QBH system. We apply the hidden Markov model (HMM) as the first stage to filter out the unlikely songs in target. The flowchart of our proposed system is shown as in Fig. 1.
onset detection 困難處 當信號能量變化不大時,難以檢測出onset point Thrill(顫抖音)和 end tone of the music signal 容易使我們誤判 onset point The note-based methods mainly include three parts: onset detection,1(起始檢測) pitch extraction2(音調提取) and melody matching3(旋律匹配). 5.In order to enhance the system speed and accuracy, we propose a two-stage matching QBH system. We apply the hidden Markov model (HMM) as the first stage to filter out the unlikely songs in target. The flowchart of our proposed system is shown as in Fig. 1.
Proposed method (for onset detection) Difference of magnitude Short-term energy HFC method Surf method Proposed method 與方法1較類似 difference of magnitude 是去計算出兩個時槽間的envelope amplitude 的差值 Short-term energy 類似於第一個方法,是算出兩個時槽能量的差值 HFC 是透過觀察頻率有沒有高頻的成分來判斷 Surf 觀察envelope的斜率來判斷
(Step1)find the envelope amplitude for each time slot (Step1)find the envelope amplitude for each time slot. 類似於Difference of magnitude 可以提step1少掉LPF是為了減少計算時間,但是同時也還會減少noise的影響 N0是timeslot寬度 STEP2之前有先做過normalize
(Step2)Take the fractional power of the envelope amplitude. 假設Ak = 0 (Step2)Take the fractional power of the envelope amplitude. 假設Ak = 0.12, Ak1 = 0, Ah = 0.52, and Ah1 = 0.4. = 0.7 Dk =Ak –Ak1 =0.12 Dh = Ah –Ah1 =0.12 Bk =0.1995 Bh =0.1062 可以提step1少掉LPF是為了減少計算時間,但是同時也還會減少noise的影響 N0是timeslot寬度 STEP2之前有先做過normalize
(Step3)跟match filter做convolution (Step3)跟match filter做convolution. f[n] = [3, 3, 4, 4, -1, -1, -2, -2, -2, -2, -2, -2]. (Step4) 如果convolution的值大於thd,我們就可以把第K點當 作onsetpossible1. 可以提step1少掉LPF是為了減少計算時間,但是同時也還會減少noise的影響 N0是timeslot寬度
(Step5) Ck :convolution of envelope and envelope matched filter Max(i) and Min(i) : the i-th peak in Ck. 計算差值,如果大於THD則判斷為onsetpossible2
時域上可能的起始點會在onsetPossible1 or onsetPossible2 如果Ck<thd1 但 > thd2 代表可能有缺失onset point,我們可 以用STFT去觀察頻率是否顯著變化考慮增加新的onset point, 及代表出現新的note。
The note-based methods mainly include three parts: onset detection,1(起始檢測) pitch extraction2(音調提取) and melody matching3(旋律匹配). 5.In order to enhance the system speed and accuracy, we propose a two-stage matching QBH system. We apply the hidden Markov model (HMM) as the first stage to filter out the unlikely songs in target. The flowchart of our proposed system is shown as in Fig. 1.
Proposed method (for pitch estimation) 對每個音符做傅立葉轉換,並取得前三個local maximum。最 低的選擇作為基本頻率。
Tempo Feature 除了音調之外,速度也是歌曲的重要組成部分。如果有兩個相 同的音調序列,每個音符的速度不同,它們仍然是不同的歌曲。
飄向北方 一是鄧子棋&黃明志 二是小小兵 三是0.9倍速
The note-based methods mainly include three parts: onset detection,1(起始檢測) pitch extraction2(音調提取) and melody matching3(旋律匹配). 5.In order to enhance the system speed and accuracy, we propose a two-stage matching QBH system. We apply the hidden Markov model (HMM) as the first stage to filter out the unlikely songs in target. The flowchart of our proposed system is shown as in Fig. 1.
Hidden Markov Model 傳統HMM的小麻煩 解決方案 :使用diffusion matrix 傳統HMM對唱錯音跟音符有多或少都不太能work得太好 當轉移發生的時候,我們會把可能發生的機率範圍擴大
The note-based methods mainly include three parts: onset detection,1(起始檢測) pitch extraction2(音調提取) and melody matching3(旋律匹配). 5.In order to enhance the system speed and accuracy, we propose a two-stage matching QBH system. We apply the hidden Markov model (HMM) as the first stage to filter out the unlikely songs in target. The flowchart of our proposed system is shown as in Fig. 1.
Dynamic Programming 為了解決人們可能用不同key唱歌的問題,使用音符的音調間隔 作為音高特徵。 由於人們對音調比節奏更敏感,所以DP的匹配過程主要是基於音 調間隔矩陣。 PitchInterval表示query and target半音不同的數量 MappingBeat表示query and target速度的差距。
為甚麼要用two stage melody matching? HMM :速度快,幫助我們快速篩選不可能的歌曲。 DP :速度慢,但識別率很高。 (第一階段HMM被用作在數據庫中篩選不可能的歌曲,數據庫中只有一小部分歌曲將存活到下一階段,第二階段是DP處理以進行更準確的匹配。)
The note-based methods mainly include three parts: onset detection,1(起始檢測) pitch extraction2(音調提取) and melody matching3(旋律匹配). 5.In order to enhance the system speed and accuracy, we propose a two-stage matching QBH system. We apply the hidden Markov model (HMM) as the first stage to filter out the unlikely songs in target. The flowchart of our proposed system is shown as in Fig. 1.
output 在確定了第二階段的匹配分數之後,最終的結果將會參考第一階 段HMM的分數與第二階段DP的分數綜合起來決定。 由於HMM影響較少,故DP分數的權重較HMM高。
其他能改進的地方 根據人們唱歌的習慣,可以把onset point 設在一開始或副歌。
The note-based methods mainly include three parts: onset detection,1(起始檢測) pitch extraction2(音調提取) and melody matching3(旋律匹配). 5.In order to enhance the system speed and accuracy, we propose a two-stage matching QBH system. We apply the hidden Markov model (HMM) as the first stage to filter out the unlikely songs in target. The flowchart of our proposed system is shown as in Fig. 1.
其 他 方式 Precision :正确被检索的item(TP)"占所有"实际被检索到的(TP+FP)"的比例. Recall :正确被检索的item(TP)"占所有"应该检索到的item(TP+FN)"的比例。
The note-based methods mainly include three parts: onset detection,1(起始檢測) pitch extraction2(音調提取) and melody matching3(旋律匹配). 5.In order to enhance the system speed and accuracy, we propose a two-stage matching QBH system. We apply the hidden Markov model (HMM) as the first stage to filter out the unlikely songs in target. The flowchart of our proposed system is shown as in Fig. 1.
10-second case. 20-second case. Method MRR DP 0.725 HMM 0.577 LS 0.639 [14] 0.728 [15] 0.750 [16] 0.742 Proposed Method 0.774 Method MRR DP 0.803 HMM 0.743 LS 0.754 [14] 0.831 [15] 0.941 [16] 0.929 Proposed Method 0.983 MRR是一個messurement用來衡量結果的正確性
回顧
總結 透過使用時頻分析(STFT)觀察頻率的變化,找出在Time domain難找到的onset point。 人們對於音調的變化比起節奏的變化更為敏感。 使用2階段melody matching的原因是因為HMM(快速篩掉不 對的歌曲)與DP(速度較慢但精準高)的特性。