Please use this identifier to cite or link to this item: http://nopr.niscpr.res.in/handle/123456789/65578
metadata.dc.identifier.doi: https://doi.org/10.56042/jsir.v84i03.13751
Title: DUALBIGRU-UCSA: Deep Learning based Music Emotion Recognition Model
Authors: Man, Szeto Chung
Kumar, Alok
Tiwari, Ajay
Srivastava, Prateek
Verma, Deepak Kumar
Mamoria, Pushpa
Singh, Vineeta
Kumar, Chandra Shekhar
Seth, Amit
Joshi, Kapil
Kaushik, Vandana Dixit
Keywords: Dual bidirectional gated recurrent unit;Mel frequency cepstral coefficients;Music emotion recognition;Unified contextual shuffle attention fusion;Weighted categorical cross-entropy
Issue Date: Mar-2025
Publisher: NIScPR-CSIR, India
Abstract: Music Emotion Recognition (MER) is a process to classify emotions perceived in a given piece of music with computational models. There are several problems regarding existing models, due to subjective perception of emotions and individual differences and culture diversity. To overcome these challenges, we developed a Dual Bidirectional Gated Recurrent Unit with Unified Contextual Shuffle Attention Fusion (DualBiGRU-UCSA) model. Here, the primary contribution lies in the practical implementation of bidirectional and gated recurrent units along with developed attention mechanisms to address the requirements for understanding and perceiving complex musical features. Using Bidirectional GRUs, the model taps the information of past and future contexts of music sequences in addition to refining the features of temporal dynamics and feelings. The final model’s performance enhancements involve the integration of bidirectional GRU outputs to the UCSA module through paying much attention and shifting through the feature representations, the module consisting of Shuffle Attention and Multi-Head Location-Aware Attention performs by reducing the unimportant feature representations while enhancing the important patterns and contextual cues. The proposed model performs better in terms of high accuracy, f1-score, negative predictive values, positive predictive values and recall of 96.28%, 96.32%, 96.26%, 96.60% and 96.27% respectively as compared to recent State-of-the-Art techniques.
Page(s): 308-323
ISSN: 0022-4456 (Print); 0975-1084 (Online)
Appears in Collections:JSIR Vol.84(03) [March 2025]

Files in This Item:
File Description SizeFormat 
JSIR 84(3) 308-323.pdf7.68 MBAdobe PDFView/Open


Items in NOPR are protected by copyright, with all rights reserved, unless otherwise indicated.