January, 9th 2024 | Nantalira Niar Wijaya, De Rosal Ignatius Moses Setiadi*, and Ahmad Rofiqul Muslikh
This research proposes a music genre classification method using Bidirectional Long Short-Term Memory (BiLSTM) and Mel-Frequency Cepstral Coefficients (MFCC) features. The method was tested on the GTZAN and ISMIR2004 datasets. The ISMIR2004 dataset was processed to match the 30-second duration of GTZAN. Preprocessing steps included converting audio formats, removing silent parts, and stretching audio to normalize input. The MFCC features were extracted using the Librosa library, including 20 MFCC coefficients, delta, and delta squared. The BiLSTM model was implemented with a sequential architecture, including normalization and a softmax output layer. The model was trained and validated using the Keras library, achieving 93.10% accuracy on GTZAN and 93.69% on ISMIR2004. The results showed that the BiLSTM model outperformed previous methods, including LSTM and MCLNN, in terms of accuracy and performance. The model demonstrated high accuracy on both balanced and imbalanced datasets, with the GTZAN dataset achieving 99.87% training accuracy and 94.60% test accuracy, while the ISMIR2004 dataset achieved 100.00% training accuracy and 94.65% test accuracy. The study highlights the effectiveness of BiLSTM in music genre classification and suggests future research into other feature extraction methods and updated datasets.This research proposes a music genre classification method using Bidirectional Long Short-Term Memory (BiLSTM) and Mel-Frequency Cepstral Coefficients (MFCC) features. The method was tested on the GTZAN and ISMIR2004 datasets. The ISMIR2004 dataset was processed to match the 30-second duration of GTZAN. Preprocessing steps included converting audio formats, removing silent parts, and stretching audio to normalize input. The MFCC features were extracted using the Librosa library, including 20 MFCC coefficients, delta, and delta squared. The BiLSTM model was implemented with a sequential architecture, including normalization and a softmax output layer. The model was trained and validated using the Keras library, achieving 93.10% accuracy on GTZAN and 93.69% on ISMIR2004. The results showed that the BiLSTM model outperformed previous methods, including LSTM and MCLNN, in terms of accuracy and performance. The model demonstrated high accuracy on both balanced and imbalanced datasets, with the GTZAN dataset achieving 99.87% training accuracy and 94.60% test accuracy, while the ISMIR2004 dataset achieved 100.00% training accuracy and 94.65% test accuracy. The study highlights the effectiveness of BiLSTM in music genre classification and suggests future research into other feature extraction methods and updated datasets.