Download presentation
Presentation is loading. Please wait.
1
Music Genre Classification 音樂曲風分類
Jyh-Shing Roger Jang (張智星) MIR Lab (多媒體資訊檢索實驗室) CSIE Dept, National Taiwan University
2
Intro. to Music Genre Classification
Goal of MGC (music genre classification) Classify an audio music clip into the right genre Approach Typical two stages of training and test Similar applications Mood classification (for playlist generation) Artist/composer identification Music therapy… GTZAN: The files were collected in from a variety of sources including personal CDs, radio, microphone recordings, in order to represent a variety of recording conditions.
3
Commonly Used Datasets for MGC
George’s dataset second clips 10 genres 100 clips/genre MIREX 2007 genre dataset second clips 700 clips/genre Unique dataset ~ sec clips More than 10 genres Hainsworth dataset 222 clips 6 genres Million song dataset Many many more… GTZAN: The files were collected in from a variety of sources including personal CDs, radio, microphone recordings, in order to represent a variety of recording conditions.
4
10 Genres of George’s Dataset
Blue Classic Country Disco Hiphop Jazz Metal Pop Raegae Rock
5
Features vs. Classifier
MFCC Mean, var, min, max, etc Spectrogram Spectral centroid, flux, rolloff, skewness, kurtosis Gabor filters Gaussian super vector Octave-based spectral contrast … Classifiers Support vector machines Nearest-nearest classifiers Gaussian mixture models Naïve Bayes classifiers Quadratic classifiers Decision trees Random forests Deep neural networks …
6
A Baseline Classifier for MGC
Features Mean, variance, min, and max of MFCC 39*4 features for each clip Classifiers SVM, quadratic classifiers, GMM-based classifiers, sparse-representation classifiers… Performance 77.00% via leave-one-out cross-validation of SVM classifier Reference Published document
7
Combining Acoustic and Multi-level Visual Features for MGC
Combine the decisions from acoustic and visual features based on the proposed confidence-based late fusion Status Ranked #1 in MIREX MGC contests of 2011, 2012, and 2013. Published in ACM Trans. Classical Hiphop 音樂曲風分類是要讓電腦學習如何從聲音訊號中辨認一首歌的曲風,如古典樂、嘻哈樂,迪斯可等等。其終極目的是,要能夠辨認電腦沒有聽過的音樂的曲風。我們方法的特點在於我們直接把一個聽覺辨識的問題轉換成視覺辨識的問題,因為我們發現音樂從1D訊號轉換成2D的頻譜圖時有顯著差異,所以我們針對頻譜圖設計了一套視覺特徵,並設計一套信心度的度量,來協助將視覺特徵和傳統的語音特徵進行結合來幫助音樂曲風分類。 Disco
8
Our Methods for MGC References
Ming-Ju Wu and Jyh-Shing Roger Jang, "Combining Acoustic and Multilevel Visual Features for Music Genre Classification", ACM Transactions on Multimedia Computing Communications and Applications, 2015 Ming-Ju Wu and Jyh-Shing Roger Jang, "Combining Visual And Acoustic Features For Music Genre Classification", The Tenth International Conference on Machine Learning and Applications, Honolulu, Hawaii, USA, Dec 2011.
9
Detailed System Overview
Multi-level Visual Features Based on spectrograms segmented by beat tracking Confidence-based Late Fusion Factor 1: The distance between the test instance and the hyperplane in the Hilbert space Factor 2: The distance between the test instance and its nearest neighbor in the Hilbert space Confidence measure = Factor 1/ Factor 2 對於視覺特徵而言,我們可以分成song-level 與beat-level的特徵,beat的資訊來自於我們結合了拍點偵測的技術,電腦可以自動辨認音樂的拍點,我們再進一步對頻譜圖在時間上進行切割。另一方面,我們也會計算beat與beat之間的異質性。對於confidence-based late fusion而言,因為我們的分類器是使用SVM搭配RBF kernel,而RBF kernel會把資料點投影到Hilbert 上,我們假設若測試歌曲在Hilbert space上,距離hyperplane越遠,代表越有信心。而測試歌曲在Hilbert space上,距離training data越近,代表現在測試的資料和過去學習過的資料越像,所以代表信心度越高。因此,就可以透過比較傳統語音特徵(GSV)和我視覺特徵的信心度,來採用最後的預測。
10
Performance Evaluation
MIREX task of music genre classification 10 genres of Blues, Jazz, Country/Western, Baroque, Classical, Romantic, Electronica, Hip-Hop, Rock, HardRock/Metal 7000 songs, with 3-fold cross validation and artist filtering MIREX results Our submission is ranked no. 1 in 2011, 2012, and 2013. Submission Ranking (# of submissions) Year Accuracy Our submission 1 (11) 2013 76.23% 1 (16) 2012 76.13% 1 (15) 2011 75.57% Seyerlehner et al. 1 (24) 2010 73.64% Cao and Li 1 (31) 2009 73.33% MARSYAS 1 (13) 2008 66.41% IMIRSEL M2K 1 (7) 2007 68.29% 我們參加了MIREX競賽裡的音樂曲風分類比賽,MIREX是由伊利諾大學的Stephen Downie教授所舉辦,可以算是音樂資訊檢索屆裡面最重要的比賽。我們和歷年的冠軍相比,我們的方法囊括了三年冠軍,這顯示我們方法的優異性。
11
Comparison of Audio Features
Quiz! Reproducible QBSH Pitch Speech Recognition MFCC: reproducible to some extent Not reproducible Audio fingerprinting Landmarks Music genre/mood classification Statistics over spectrogram, such as spectral centroid, flux, rolloff, skewness, kurtosis, etc. Can be used to reproduce perceptible part of the original audio
12
Demo of MGC Demo of MGC MIR lab
Similar presentations