Automatic Music Genre Classification System by ijcsiseditor

VIEWS: 508 PAGES: 10

									(IJCSIS) International Journal of Computer Science and Information Security, Vol. 2, No. 1, 2009

Analysis of Automatic Music Genre Classification System
S.S. Manvi Rajinder Kumar Math
Department of ECE B.L.D.E.A’S College of Engineering Bijapur, Karnataka, India

A. V. Sutagundar
Department of ECE, Basaveshwar Engineering College Bagalkot, Karnataka, India

Department of ECE, Wireless Information Networks Research Laboratory, Reva Institute of Technology and Management, Bangalore, Karnataka, India

Abstract— This paper analysis an effective automatic Music Genre Classification System to provide quick classification of Mp3 files within no time. The designed system not only provides accurate classification but also helps search and manage the music files. The design of the system is mainly divided into two parts, first part being feature extraction and second is the classification part. In feature extraction part the important features pertaining to the Mp3 file are extracted and statistics are obtained for the extracted features. Histograms are plotted for all statistics corresponding to each feature. A simple algorithm is used for measuring the similarity between different classes of music. In this work, a classifier is employed for the classification of the music files into different genres by employing histogram based class separable measure. Classifier calculates the power spectral density from the static features for each class, thereby calculating histogram errors for each class as opposed to the other classes. The probability of error is evaluated and it is used to group the Mp3 files into different classes of the genres. Finally the rank list of classified genres is generated. Keywords-Information Retrieval, automatic genre classification. Genre Classification;

sounds such as a bird chirping, a bell ringing or a car starting, as in [1] and [2]. Content based retrieval is important to content based navigation because the issues of content representation and techniques for indexing and performing approximate matching on that representation, have to be considered. Indeed, the difference between retrieval and navigation is quite subtle: items containing the query are located in content based retrieval, while items relating to the query are returned in navigation. The term genre comes from the Latin word genus, which means kind or class. A genre might be described as a type or category, defined by structural, thematic, or functional criteria [3]. It is important to note that these criteria are not determined by objective facts but are part of a dynamic cultural process. Automatic music genre classification is the classification of a piece of music into its corresponding genre (such as jazz or rock) by a computer. It is considered to be a cornerstone of the research area Music Information Retrieval (MIR) and closely linked to the other areas in MIR. It is thought that MIR will be a key element in the processing, searching and retrieval of digital music in the near future. There has been a number of interesting research works on automatic genre classification of audio files. The papers [4] and [5] have published research that used a variety of lowlevel features to achieve success rates of 61% when classifying between ten genres. Additional research based on audio recordings has been performed in [6] where the music was successfully classified into one of six categories 92% of the time. In [7] a system was constructed that correctly classified among three categories 75% of the time. The paper [8] correctly classified 90.8% of recordings into five genres. The work in [9] achieved a success rate of 88% with three genres. In [10] a success rate of 73.3% was achieved when classifying between five categories. The paper [11] achieved a success rate of 93% with four categories. The work in [12] achieved a success rate of 74% with seven categories. The work in [13] was successful in correctly performing three way classifications 63% of the time. Paper [14] achieved



Information Retrieval (IR) is concerned with the processes involved in the representation, storage, searching and finding of information which is relevant to a requirement for information desired by a human user. Content based information retrieval is important to content based navigation because the issue of content representation has to be addressed. Techniques for indexing and approximate matching are also prevalent in this field. As such, before considering content based navigation, it is useful to investigate content based information retrieval systems. Retrieval has an additional use as a means of locating an initial media document before the document space can be navigated. Music Information Retrieval (MIR) is an emerging research area devoted to fulfill user’s music information needs. As it is seen despite the emphasis on retrieval of its name MIR encompasses a number of different approaches aimed at music management, easy access, and enjoyment. The term “digital audio sample” is used here to refer to short, descriptive,

success rates between 64% and 84% for two way classifications. Although these studies are very interesting, they focus more on pattern classification techniques rather than features.

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 2, No. 1, 2009

 Orchestra: songs played by orchestras.

Paper [15] presents the combining features derived from audio recordings with “community metadata” that was derived from text data mined from the web although beyond the scope of this paper, this line of research holds a great deal of potential, despite the problems related to finding, parsing and interpreting the metadata. When working on a complex problem, it is often desirable to see if it can be split into individual parts that can be tackled independently. Music genre recognition is a classification problem, and as such consists of two basic steps that have to be performed: Feature Extraction and Classification The goal of the first step, feature extraction, is to get the essential information out of the input data. The second step is to find what combinations of feature values correspond to what categories, which is done in the classification part. The two steps can be clearly separated: output of the feature extraction step is the input for the classification step. This work is concerned with music genre classification systems and in particular systems which use the raw audio signal and its content as input to estimate the corresponding genre [4]. The considered music genre classification systems can basically be seen as a feature representation of the song followed by a classification system which predicts the genre. The rest of the paper is organized as fallows: section II presents the music genre types, section III depicts the feature extraction, section IV describes classification, section V explains the system model, section VI presents the results and discussions and finally section VII concludes the work. II. MUSIC GENRE TYPES A genre might be described as a type or category, defined by structural, thematic, or functional criteria [4]. A genre is a patterning of communication created by a combination of the individual (cognitive), social, and technical forces implicit in a recurring communicative situation. A genre provides communication by creating shared expectations about the form and content of the interaction, thus easing the burden of production and interpretation. The list of genre turns out to never ending, so some basic genres are defined which are invariably found in almost all the music collections. A. Classical The songs of this class have the predominance of classical instruments like violin, cello, piano, flute, and so forth. This class has the following divisions. (1) Instrumental: the songs of this genre do not have any vocal elements.  Piano: songs dominated by or composed exclusively for piano.

Pop/Rock This is the largest class, including a wide variety of songs. It is divided according to the following criteria. (1) Organic: this class has the prevalence of electric guitars and drums; electronic elements are mild or not present.  Rock: songs with strong predominance of electric guitars and drums.  Heavy metal: this genre is noisy, fast, and often has very aggressive vocals. (2) Country: songs typical of southern United States; have elements both from rock and blues. Electric guitars and vocals are predominant. (3) Electronic: most of the songs of this class have the predominance of electronic elements, usually generated by synthesizers.  Pop: the songs of this class are characterized by the presence of electronic elements. Vocals are usually present. The beating is slower and less repetitive than techno songs, and vocals often play an important role.  Disco: songs typical of late 70 are with a very particular beating. Electronic elements are also present here, but they are less marked than in pop songs.  Techno: this class has fast and repetitive electronic beating. C. Dance The songs that compose this third and last general musical class have strong percussive elements and very marked beating. This class is divided according to the following rules. (1) Vocal: vocals play the central role in this class of songs. Percussive elements also have strong presence, but not as significant as vocals.  Hip-Hop: these songs have strong predominance of vocals and a very marked rhythm.  R & B: the songs of this genre are soft and slow.  Rap: this genre presents really marked vocals, sometimes looking like pure speech. (2) Reggae: typical music of Jamaica that has a very particular beating and rhythm. i. Jazz: this class is characterized by the predominance of instruments like piano and saxophone. Electric guitars and drums can also be present.  Blues: vocal and instrumental genre, it has strong presence of guitars, piano and harmonica.  Fusion: it is a mix of jazz and rock elements. ii. Latin: this class is composed of Latin rhythms like salsa, mambo, and samba. The songs of this genre have strong presence of instruments of percussion and, sometimes, guitars.  Mambo/Salsa: dancing Caribbean rhythms with strong presence of percussive drums and tambours.


 Samba: strongly percussive Brazilian genre with predominance of instruments like tambourines, small and guitars. III. FEATURE EXTRACTION

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 2, No. 1, 2009

B. Mel Frequency Cepstral Coefficients (MFCCs) MFCCs are well known compact forms that can represent speeches. They are the most common representation used for Spectra in Music Information Retrieval (MIR). A brief algorithm for their computation is shown below: Begin 1. Apply Hamming window function 2. Compute Discrete Fourier Transform (DFT) 3. Apply Mel Scale 4. Apply Log Scale 5. Apply Discrete Cosine Transform (DCT) End. MFCCs are the features we used to represent a song model. They are computed directly from an audio waveform using the approach in Figure 2.

Musical feature extraction is process of recording and generating numerical representations of what are, hopefully, traits of the recording that are characteristic of the category or categories that it should be classified as belonging to. These features can then be grouped together into feature statistics that serve as the input to classification systems. Choosing the appropriate features can be a difficult task, as it is often unclear and whose features will be useful ahead of time. Furthermore, it is not always easy to extract the kind of features that one would ideally like. Feature extraction is one of two commonly used pre processing techniques in classification; it means that new features are generated from the raw data by applying one or more transformations. The other possible technique is feature selection, the process of identifying a subset of features within the input data that can be used for effective classification. Feature selection can be applied to the original data set or to the output of a feature extraction process. A classification system might use both or either of these techniques. Theoretically, it is also possible to use the raw data, if these are already in a format suitable for classification. A schematic overview of the connection between features selection and feature extraction is shown in Figure 1 shows that how feature statistics are generated by using the extracted features which are important for classification.

Figure 2. Computation of MFCCs from an audio signal.

The audio signal is first divided into frames of equal length. In this work, the frame rate was either 50 milliseconds (genre classifier). These frame rates were based on the default frame-rate used in MATLAB to calculate MFCCs, different MFCC calculation approaches were used for the genre and artist classifiers. Successive frames overlap the same segment of the time-domain signal by 50 %. This is done to improve or smooth the transition from frame to frame. A Hamming window, which smoothes the edges, is then applied to each frame. The Discrete Fourier Transform (DFT) is then taken for each frame. The steps shown in Figure 2 basically compute the spectrogram of the time-domain signal. The spectrogram gives a frequency Vs time visualization of the input signal. The cause of the variations of the frequencies over time can be determined by listening to the song concurrently with the spectrogram visualization. Next, the logarithm of the amplitude spectrum is taken for each frame, and then the Mel scale is applied to remove the linearity within the frequency scale. Taking the logarithm of the amplitude spectrum provides information about the spectral envelope of the signal, and computing this over all frames shows how the spectral envelope varies with time. The Mel-Scale is a nearly logarithmic scale that gives a higher emphasis to the lower frequencies. It is based on a mapping between actual frequency and perceived pitch of the human auditory system. The Mel scale sections a range of frequency values into a number of bins. The number of bins used to adjust the frequency scale can vary and it depends on the application. The bins are also logarithmically spaced and the number of bins defines the dimensionality of the MFCC feature. In order to account for both the low and high varying

Figure 1. Generating a feature statistics from an input data set.

The selection of appropriate features is very important because it plays a major role in the classification. In this work, the majority of the short-time features are chosen together. The next sections are devoted to the description of the chosen features for the classification. A. Frequency Spectrum The frequency distribution of a signal can be calculated through the Fourier transform. Similar processes also happen in the auditory perception system of humans and other vertebrates, which indicates that the spectral composition of a signal is indeed the primary carrier of information. The frequency spectrum is one of the essential features for music genre classification, and is used as the basis for deriving many other features.

spectral envelope, which includes timbre and pitch measures, this work will incorporate 20-diminensional Mel-frequency Cepstral Coefficients. The timbre of the song can be represented by the first 8 bins of a MFCC vector, while the last 10 bins can represent the pitch of a song. The first channel, or bin, of the MFCC matrix is the overall energy of the signal. It generally defines how the loudness of the song varies with time. The final step in computing the MFCCs of a signal consists of taken the Discrete Cosine Transform (DCT) to reduce the number of parameters in the system and to remove the correlation between the Mel- spectral vectors calculated for each frame. The MFCCs have important advantages: they are simple and fast, well tested. Moreover, they have also a compressed and flexible (i.e. easy to handle) representation. C. Zero-Crossing Rate (ZCR) A zero-crossing occurs when successive samples in a digital signal have different signs. Therefore, the rate of zerocrossings can be used as a simple measure of a signal’s frequency content. The Zero-Crossing Rate (ZCR) also has a background in speech analysis. This very common short-time feature has been used for music genre classification. It is simply the number of time-domain zero-crossings in a time window. The ZCR curves are calculated as follows:

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 2, No. 1, 2009


1 N


t= n-N +1




E. Spectral Centroid (SC) Centroid is an interesting psycho-acoustical feature that measures the mean spectral frequency in relation with the amplitude; in other words the position in Hz of the center of mass of the spectrum. It is useful as a measure of the sound brightness. It can be given as:
N / 2

SC = C =

fk k =0 N / 2
k =0

X(k )

X(k )
Where N is the FFT size, X (k), k = 0; N is the FFT of the input signal, and fk, k = 1; N, is the k-th frequency bin. F. Spectral Roll-off (SR) The spectral roll-off is the frequency below which 85% of the magnitude of the spectrum is contained. It also provides a measure of the density of low frequencies in the signal. Mathematically, the roll-off is XR in the equation:
XR f =1 M

ZCR = Z n =

sgn[x( m )] - sgn[x(m- 1)]w( n - m ),
1 Sgn [x (n)] = -1 x (n) <0 x (n) ≥0


SR =

S( f ) = 0.85

S( f )
f =1



and ½ 0 ≤ n ≤ N-1 w (n) = 0 Otherwise. m is the window size in this short-time function. D. Short-Time Energy (STE) Short-Time Energy is correlated to the Fourier Transform, which maps audio signal into frequency domain. For each audio sample set, we compute Fast Fourier Transformation (FFT). The common Short-Time Energy (STE) has been used in speech and music analysis as well as many other areas. It is used to distinguish between speech and silence, but mostly useful in high signal-to-noise ratio. It is a very common short time feature in music genre classification and has been used in one of the earliest approaches to sound classification to distinguish between (among other things) different music instrument sounds. Short-Time Energy is calculated as

G. Spectral flux (SF) The spectral flux measures the difference between spectra of subsequent time frames. It is defined as:


SF = X F =

S ( f ) - S(f - 1))2


f =1

H. Rhythm and Beat Tapping their foot along with a piece of music is a common task for human listeners, yet it has turned out to be remarkably difficult for automated systems. Beat tracking and rhythm detection is a large and rapidly evolving research area in itself. An in-depth discussion is not possible within the limitations of this work. Refer to [16] for an overview of beat tracking methods; a system for tracking musical beats in real time is proposed in [17]. There are a number of factors that make beat tracking difficult, but that are no problem for the human auditory perception system. For instance, we can easily determine the beat if tempo and metrical structure are not explicitly specified in the beginning of the song, something that most commercially available beat tracking systems depend on. If the tempo changes throughout the song, we can easily adapt to

this within seconds, while most current automatic systems are unable to adjust. Furthermore, beat tracking systems usually cope poorly with noise, i.e. deviations from the expected timing. Another important issue is dealing with syncopation sections where salient events occur between the beats and not on the beat [16]. IV. CLASSIFICATION

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 2, No. 1, 2009

Statistical pattern classification refers to a series of techniques where the extracted feature statistics are assigned to one of the classes. Classification algorithms are divided into two types, unsupervised and supervised. In the supervised case a labeled set of training samples is provided that is used to ”train" the Classification algorithm. Unsupervised Classification or clustering tries to group the data into meaningful clusters without the use of a labeled training set. Another way of dividing Classification algorithms is parametric vs. nonparametric algorithms. In parametric approaches the functional form of the underlying probability distribution of the feature vectors for each class is known. In non-parametric approaches no assumption about the functional form is made and the underlying probability distribution is approximated locally based on the training set. The feature extractor computes feature vectors representing the data to be classified. These feature vectors are then used to assign each object to a specific category. This is the classification part, which constitutes the second basic building block of a music genre classification system. There is a large variety of classification techniques to choose from. The fact that many of these can be used as blackbox algorithms makes it tempting to just apply one without understanding it. However, basic understanding of the subject is essential in order to choose the right classifier, a short overview is given in the next section, including the general concepts, mathematical background, and basic ideas behind the most widely used classifiers. Classification is a subfield of decision theory. It relies on the basic assumption that each observed pattern belongs to a category, which can be thought of as a prototype for the pattern. Regardless of the differences between the individual patterns, there is a set of features that are similar in patterns belonging to the same class, and different between patterns from different classes. These features can be used to determine class membership. In case of all the different types of classifiers it is important use an efficient algorithm that would classify the different audio inputs with less of computational complexities. At the same time the accuracy must be preserved. The classifiers can be broadly classified into two types (a) supervised classifier, where require training data for classification and (b) Unsupervised classifier which does not require any training data for classification. The three most commonly used supervised classifiers are Gaussian Mixture Model classifier (GMM), K-Nearest Neighbor (KNN) classifier and the Artificial Neural Network (ANN) classifier. The K-means is a simple unsupervised classifier.

The Gaussian classifier assumes each class can be represented as a multidimensional Gaussian distribution in feature space. The parameters of the distribution (means and covariance matrix) are estimated using the labeled data of the training set. This classifier is typical of parametric statistical classifiers that assume a particular form for the underlying class probability density functions. This classifier is more appropriate when the feature distribution is unimodal. The locus of equal distance points from a particular Gaussian classifier is an ellipse which is axis-aligned if the covariance matrix is diagonal. The Gaussian Mixture Model classifier (GMM) models each class as a fixed-size weighted mixture of Gaussian distributions. The GMM classifier is characterized by the mixture weights and the means and covariance matrices for each mixture component. For efficiency reason typically the covariance matrices are assumed to be diagonal. Unlike parametric classifiers, the K-Nearest Neighbor (KNN) classifier directly uses the training set for classification without assuming any mathematical form for the underlying class probability density functions. Each sample is classified according to the class of its nearest neighbor in the training data set. In the KNN classifier, the K nearest neighbors to the point to be classified is calculated and voting is used to determine the class. An interesting result about the nearest neighbor classifier is that no matter what classifier is used; we can never do better than to cut the error rate in half over the nearest-neighbor classifier. In all the previously described classifiers a labeled training set was used (supervised learning). The K-means algorithm is a well-known technique for unsupervised learning where no labels are provided and the system automatically forms clusters based solely on the structure of the training data. In the K-means algorithm each cluster is represented by its centroid and the clusters are iteratively improved. The same algorithm can also be used for Vector Quantization (VQ), a technique where each feature vector is coded as an integer index to a codebook of representative feature vectors corresponding to the means of the clusters. V. SYSTEM MODEL

In this section the system model is discussed by using the block diagram shown in figure 3 for the proposed system (Automatic Music Genre Classification System) and is followed by an algorithm. A. Feature Extraction In feature extraction part, the directory containing different classes of Mp3s is inputted to the system. Each Mp3 is read and its header information is extracted which consists the information such as artist name/album name/song title. Each Mp3 file is broken into short-term non-overlapping Hamming window (frames) of 50 milliseconds. For each frame, Fast Fourier Transformation is applied and 8 features are calculated: Energy Entropy, Energy, Zero Crossing Rate, Spectral Roll off, Spectral Centroid and Spectral Flux, MFCCs

and beats. This step leads to 8 feature sequences for the whole Mp3 signal. In the sequel, for each of the 8 feature sequences,

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 2, No. 1, 2009

C. Algorithm Algorithm Description: Music Genre Classification algorithm is mainly divided into two modules (a) Feature Extraction module and (b) Classification Module. The output of the feature extraction module is the input to the classification module. These modules are discussed one by one. Module1: Feature Extraction Description: This module is implemented for reading Mp3 file, extracting features corresponding to the file, computing statistics for the extracted features and finally computing the histograms for the statistics. In the block diagram, the steps 1 through 8 correspond to the feature extraction module. Nomenclature: Dirmp3=directory of Mp3 files, Mp3i =Mp3 file in the directory, where i is the index value of the Mp3 file, H=Hamming Window, FFT= Fast Fourier Transformation, EE=Energy Entropy, ZCR=Zero Crossing Rate, STE=Short Time Energy, SR=Spectral Rolloff, SF=Spectral Flux, SC=Spectral Centroid, MFCCs=Mel Frequency Cepstral Coefficients, pl=Number of window frames, where l is the index value of p, n= Number of files in the directory, Hist=histogram of statistics, Cep=cepstrum of Mp3 signal, Std=standard deviation. Begin Step1: Input the Mp3 directory to system. Step2: Fetch one Mp3 file. For i= 1 to n do, Begin Step3: Read the Mp3i and extract the header information. Step4: The header gives the ID3v1 tag information in the form of artist name, album name and song title. Step5: Apply Hamming window H= hamming (Mp3i) Step6: For l=1 to number of frames do, Apply FFT to the read file, given as FFT= fft (MP3i) Step7: Compute the following features and thereby obtain feature statistics for the Mp3 file.  Compute Spectral Rolloff for the Mp3 file as SR= Spectral Rolloff (Mp3i) StatSR= Std (SR)  Compute Short Time Energy for the Mp3 file as STE= Short Time Energy (Mp3i) Compute standard deviation by mean ratio for STE as StatSTE= Std/mean ratio (STE)  Compute Spectral Centroid for the Mp3 file as SC= Spectral Centroid (Mp3i) Compute standard deviation of SC as StatSC= Std (SC)  Compute Spectral Flux for the Mp3 file as SF= Spectral Flux (Mp3i)

Figure 3. Block diagram of Designed Automatic Music Genre Classification System.

a simple statistic is calculated. For example, the standard deviation value of the Spectral Roll off sequence. This step leads to 8 single statistic values (one for each feature sequence). Those 8 values are the final feature values that characterize the input Mp3 signal. MATLAB programs were written for computing the features and feature statistics for Mp3 files under different classes. In order to compute the 8 feature statistics for a specific Mp3 file, the MATLAB function computeAllStatistics (fileName, win, step) is used. B. Classification We have used histogram-based measure for class separability. Once the feature statistics for each class are obtained, the histograms of each feature statistics for all classes are estimated. The proposed error classification estimation method is based on estimating the probability density function of each class using histograms. MATLAB functions are used for classification. The function that estimates the class seperability method is computeHistError(), theoreticalError() computes the theoretical error for each class as opposed to other classes and Function testClassSeperability() calls these two functions and displays the results. It has to be noted that computeHistError() can be used for any kind of class distribution, since it estimates the pdf of each class using the histogram method.

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 2, No. 1, 2009

Compute the standard deviation by mean ratio for SF as StatSF= Std/mean ratio (SF)  Compute the Zero Crossing Rate for the Mp3file as ZCR= Zero Crossing Rate (Mp3i)  Compute standard deviation of Zero Crossing Rate as StatZCR= Std (ZCR)  Compute Energy Entropy for the Mp3 file as EE= Energy Entropy (Mp3i) Compute standard deviation of EE as StatEE= Std (EE)  Compute Beats from Statistical measures of the histogram such as mean, standard deviation, mean of the derivative, standard deviation of the derivative, are evaluated to overall measure of beat strength. B= Beats (Mp3i) Compute standard deviation of B as StatB= Std (B)  Compute MFCCs as MFCC=Mel Frequency Cepstral Coefficients (Mp3i) Obtain first five vectors and compute statistics as StatMFCC= Std (MFCC) End. Step8: Compute histogram of all the statistics obtained in step 7 given as Hist=histogram (StatSR, StatSTE, StatSC, StatSF, StatZCR, StatEE, StatB, StatMFCC). End. Module2: Classification Description: This module is implemented for classifying the Mp3 songs into different genres based on simple similarity measure algorithm. The classifier employed here is unsupervised which does not require any training data. The histogram based class separability measure is used for the classification. The input for the classifier is the histogram of statistics generated in step 8. In the block diagram, for classifier module, the steps 8 and 9 are the input and outputs of the classifier respectively. Nomenclature: PDF=power spectral density of music classes, computeHistError= computes histogram error for each class, theoreticalError= computes theoretical error, testClassSeperability=computes the class separability in terms of theoretical error and histogram error. E1= corresponds to the probability a sample from class 1 is (mis)classified to class n, En= corresponds to the probability a sample from class n is (mis)classified to class 1, H1=histogram of Mp3 in class 1, Hn= histogram of Mp3 in class n, x=histogram bin, m1= average value of the Gaussian distribution of the 1st class, s1: standard deviation value of the Gaussian distribution of the 1st class, mn= average value of the Gaussian distribution of the nth class, sn= standard deviation value of the Gaussian distribution of the nth class, PDF1= power

spectral density of class1, PDFn=power spectral density of Classn. Step 9: Compute the Power Spectral Density for each static feature class in the directory as  PDF= pdf (Mp31, Mp32 …. Mp3n) by using the histograms of feature statistics.  Initialize E1=0, En =0, for each bin i to n do if (H1 (i) < Hn (i)) E1 = E1 + H1 (i) else En= En + Hn (i)  Compute the normalized histogram error for each class as ComputeHistError = (E1+En)/(H1+Hn)  Compute theoretical error by calculating the PDF of the classes  Generate PDFs for class1 and Classn

PDF1 =

2 1 * e(-(x- m1) /(2*S1)) 2 * S1 2 1 * e(-(x-mn) /(2*Sn)) 2 * Sn


PDFn =
End. VI.



To evaluate the performance of our system we use the different datasets of known genres and these datasets are then given as the directory for the system and results are evaluated based on the correct classification of the provided genre classes. For small databases containing only a few different styles of music, it can very accurately sort the database into a list of genre groupings. This system is less successful for large database containing a wide range of genres, however it still produces broad genre groupings (though these may be overlapped with other genres). Four different datasets were used for testing purpose. The datasets used for the evaluation were downloaded from All the music files are in Mp3 format having bit rates of 128 kbps and 192 kbps. The results are evaluated for four datasets. DATASET A TABLE I shows the distribution of each song in the experimental DATASET A which has collection of 11 songs. The genre classes were known and were kept in different folder by name of genre class to which they belong. The efficiency or rather the accuracy of the system was evaluated by inputting the directory in which these Mp3s are stored. For

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 2, No. 1, 2009

DATASET, the system was able to successfully classify 9 songs into respective classes out of 11 songs. The remaining two songs were classified as “others”.

Genre Classical Blue Country Hip-hop World Total

No. of songs 5 2 1 2 1 11

DATASET C TABLE III shows the genre and song composition of DATASET C having 10 Mp3 songs belonging to 6 different genre classes. When this DATASET C was inputted to the developed system, the system was able to successfully classify the 7 songs into their respective genre classes out of 10 songs. The system could identify and classify the songs into 5 genres out of 6 genre classes.

Genre Classical Jazz Rap Punk Techno Country Total

No. of songs 1 3 1 2 2 1 10

Thus the system is able to reach the accuracy as high as 80% for the DATASET A having five classes of genres and total of 11 songs. DATASET B TABLE II shows the genre and song composition of DATASET B having 20 Mp3 songs belonging to 14 different genre classes. When this DATASET B was inputted to the developed system, the system was able to successfully classify the 16 songs into their respective genre classes out of 20 songs. The system could identify and classify the songs into 13 genres out of 14 genre classes. Thus with DATASET B as the input the system is able to obtain the accuracy of 80% having 14 genre classes and 20 songs.

Thus with DATASET C as the input the system is able to obtain the accuracy of 70% having 6 genre classes and 10 songs. DATASET D TABLE IV shows the genre and song composition of DATASET D having 20 Mp3 songs belonging to 12 different genre classes. When this DATASET D was inputted to the developed system, the system was able to successfully classify the 13 songs into their respective genre classes out of 20 songs. The system could identify and classify the songs into 10 genres out of 12 genre classes. Thus with DATASET D as the input the system is able to obtain the accuracy of 65% having 12 genre classes and 20 songs. The overall accuracy of the developed system is plotted in Figure 4 which shows the individual accuracies for DATASET A, DATASET B, DATASET C and DATASET D and also the overall accuracy of the system. The designed system of music genre classification has the number of strengths in terms of simple design and user friendliness. The designed system can classify the music files which are only in Mp3 format, as an extension to this we can modify the system such that it can handle music files in different formats such as MIDI, WAVE and so on. This system utilizes MATLAB programs for classification of music which happen to be platform independent. The designed system uses a simple unsupervised classifier which does not require any training data. As discussed earlier classifiers such as (GMM), K-Nearest Neighbor (KNN) classifier and the Artificial Neural Network (ANN) classifier are all supervised classifiers which require a

Genre Jazz Rap Punk Rock Blues Dance Pop Hip Hop Rock Power Ballad Trip Hop Hindi Reggeaton Total

No. of songs 3 1 1 2 2 3 1 2 1 1 2 1 20

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 2, No. 1, 2009

set of training data which is difficult to provide before hand and makes the design complex.

Genre Classical Blue Ambient Country Dance Hip-hop Latin Metal Pop Rap Rock Indie Gothic Punk rock Total

No. of songs 1 1 4 1 1 1 1 2 1 1 2 1 1 2 20

The results of the classification system show a highly satisfactory classification rate for ten musical genres as compared to related work in the literature. Besides, the system user friendly and provides a good opportunity for personalization and management of user’s music files. The implementation of the automatic classification is accomplished in Matlab platform and shows a relatively good timing performance, makes its possible use for real-time applications. VII. CONCLUSION AND FUTURE WORK The proposed system not only outperforms the conventional state-of-the-art cover song identification system in terms of accuracy, but also requires very low memory and computing power. It was able to reduce the search time significantly, by up to 5 times, and improve accuracy by 15% relatively. These features of the proposed music classification are promising because it can be implemented in portable devices and personal computers expending very limited computing power and memory. From the results, we conclude that the music signals can be effectively classified into different classes by using the content of the signal itself, what is required is the set of appropriate features based on which the classification can be done. In future work, we will explore the usefulness of the proposed music classification in other related applications such as automated composer and mood classification, assuming that the musical information compacted in the proposed music classifier can provide with those desirable discriminative information. The main function of the application to be developed is providing a framework through which the algorithms that have been produced can be easily utilized. The first step in the process is writing program in MATLAB M-File code. The code is then compiled into a COM (Component Object Model) object which can be easily incorporated into other software environments. The future work will also include the classification of music involving different formats such as WAVE or MIDI. Future work may include methods to examine broader ranges of musical categories, in order to establish the suitability of the proposed features, as well as the incorporation of the notion of patterns in genre classification. REFERENCES

The overall accuracy of the developed system is plotted in Figure 4 which shows the individual accuracies for DATASET A, DATASET B, DATASET C and DATASET D and also the overall accuracy of the system.

100 90 80 70 60 50 40 30 20 10 0







[2] [3]

Figure 4. Accuracies for individual datasets (in %) and overall accuracy of the system.



Musica. The international database of choral repertoire, MuseData. An electronic library of classical music scores, H. Deshpande, R. Singh, and U. Nam. “Classification of music signals in the visual domain”. In Proceedings of the COST-G6 Conference on Digital Audio Effects, 2001. Tzanetakis, G., and P. Cook. “Musical genre classification of audio signals” In proceedings of IEEE Transactions on Speech and Audio Processing, vol. 10, No.5, pp. 293–302. 2002. Tzanetakis, George, Essl, Georg, and Cook, Perry "Automatic Musical Genre Classification of Audio Signals". International Symposium on Music Information Retrieval (ISMIR), Bloomington Indiana, pp. 205210, 2001.

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 2, No. 1, 2009
[6] Pye, D. Content-based methods for the management of digital music. In proceedings of the International Conference on Acoustics, Speech, and Signal Processing. pp. 2437–2440.2000. Leen Breure. Development of the genre concept. URL, Development.htm, August 2001. Jiang, D. N., L. Lu, H. J. Zhang, J. H. Tao, and L. H. Cai. “Music type classification by spectral contrast feature”. In Proceedings of Intelligent Computation in Manufacturing Engineering, pp.113-116.2002. Kosina, K. “Music genre recognition”. Diploma thesis. Technical College of Hagenberg, Austria. 2002. Grimaldi, M., A. Kokaram, and P. Cunningham. “Classifying music by genre using a discrete wavelet transform and a round Robin ensemble”. Work Report. Trinity College, University of Dublin, Ireland. 2003. Xu, C., N. C. Maddage, X. Shao, F. Cao, and Q. Tian. “Musical genre classification using support vector machines”. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, vol. 5, pp. 429–432. 2003. McKinney, M. F., and J. Breebaart. “Features for audio and music classification”. In Proceedings of the International Symposium on Music Information Retrieval. pp. 151–158. 2003. Chai, W., and B. Vercoe. “Folk music classification using hidden Markov models”. In Proceedings of the International Conference on Artificial Intelligence. 2001. Shan, M. K., and F. F. Kuo. Music style mining and classification by melody. IEICE Transactions on Information and Systems E86-D, vol.3, pp. 655–659. Tagg, P. 1982. Analyzing popular music: Theory, method and practice. Popular Music 2: pp. 37–67. 2003. Whitman, B., and P. Smaragdis. “Combining musical and cultural features for intelligent style detection”. In Proceedings of the International Symposium on Music Information Retrieval. pp. 47–52. 2002. Simon Dixon. “A beat tracking system for audio signals”. In Proceedings of the Diderot Forum on Mathematics and Music, Austrian Computer Society, pp. 101–110. 1999. Paul E. Allen and Roger B. Dannenberg. “Tracking musical beats in real time”. In S. Arnold and G. Hair, ICMC Glasgow Proceedings, International Computer Music Association, pp. 140–143. 1990.



[9] [10]



A. V. SUTAGUNDAR completed his M. Tech from Visvesvaraya Technological University Belgaum, Karnataka. He is pursing his PhD in the area of Content Based Information Retrieval in wireless Networks using Mobile Agents. Presently he is serving as a Assistant Professor of Department of Electronics and Communication Engineering Bagalkot, Karnataka. His areas of interest include Signal and system, Digital Signal Processing, Digital Image Processing, Multimedia Networks, Computer communication networks, Wireless networks, Mobile ad-hoc networks, Agent technology. He has published 10 papers in referred National/International Conferences. S. S. MANVI completed his PhD from Indian Institute of Science, Bangalore. Presently he is serving as a Professor of Electronics and Communication Engineering, REVA Institute of Technology and Management, Bangalore. His areas of interest include wireless/wired network, AI applications in network management, Ecommerce, Grid Computing and Multimedia Communications. He has published over 25 papers in referred National/International Journals and 60 papers in referred National/ International Conferences. He has coauthored books “Communication Protocol Engineering” and “Computer Concepts and C Programming” published by PHI. He is a reviewer of several national/international journals. He is a member of IEEE USA, Fellow of IETE, India Fellow of IE, India and member of ISTE, India. He has been included in Marqui’s Who’s who in world and International Biographies of Cambridge, London in the year 2006.







RAJINDER KUMAR MATH completed his B.E in Electronics and communication engineering from Karnataka University Dharwad, Karnataka in year 2001. Presently he is serving as a Lecturer of Department of Electronics and Communication Engineering, B.L.D.E.A’s College of Engineering and Technology, Bijapur, Karnataka. His areas of interest include Image Processing, Network Analysis, Computer Networks, and Electronic Circuits.

To top