OATAO - Open Archive Toulouse Archive Ouverte Open Access Week

Cosine-similarity penalty to discriminate sound classes in weakly-supervised sound event detection

Pellegrini, Thomas and Cances, Léo Cosine-similarity penalty to discriminate sound classes in weakly-supervised sound event detection. (2019) In: International Joint Conference on Neural Networks (IJCNN 2019), 14 July 2019 - 19 July 2019 (Budapest, Hungary).

(Document in English)

PDF (Author's version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader

Official URL: https://doi.org/10.1109/IJCNN.2019.8852143


The design of new methods and models when only weakly-labeled data are available is of paramount importance in order to reduce the costs of manual annotation and the considerable human effort associated with it. In this work, we address Sound Event Detection in the case where a weakly annotated dataset is available for training. The weak annotations provide tags of audio events but do not provide temporal boundaries. The objective is twofold: 1) audio tagging, i.e. multi-label classification at recording level, 2) sound event detection, i.e. localization of the event boundaries within the recordings. This work focuses mainly on the second objective. We explore an approach inspired by Multiple Instance Learning, in which we train a convolutional recurrent neural network to give predictions at frame-level, using a custom loss function based on the weak labels and the statistics of the frame-based predictions. Since some sound classes cannot be distinguished with this approach, we improve the method by penalizing similarity between the predictions of the positive classes during training. On the test set used in the DCASE 2018 challenge, consisting of 288 recordings and 10 sound classes, the addition of a penalty resulted in a localization F-score of 34.75%, and brought 10% relative improvement compared to not using the penalty. Our best model achieved a 26.20% F-score on the DCASE-2018 official Eval subset close to the 10-system ensemble approach that ranked second in the challenge with a 29.9% F-score.

Item Type:Conference or Workshop Item (Paper)
Additional Information:Thanks to IEEE editor. The definitive version is available at http://ieeexplore.ieee.org This papers appears in Proceedings of IJCNN 2019 (paper N-19523) Electronic ISBN: 978-1-7281-1985-4 Electronic ISSN: 2161-4407 The original PDF of the article can be found at: https://ieeexplore.ieee.org/document/8852143 Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Audience (conference):International conference proceedings
Uncontrolled Keywords:
Institution:Université de Toulouse > Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE)
French research institutions > Centre National de la Recherche Scientifique - CNRS (FRANCE)
Université de Toulouse > Université Toulouse III - Paul Sabatier - UT3 (FRANCE)
Université de Toulouse > Université Toulouse - Jean Jaurès - UT2J (FRANCE)
Université de Toulouse > Université Toulouse 1 Capitole - UT1 (FRANCE)
Laboratory name:
Deposited On:21 Jan 2020 15:28

Repository Staff Only: item control page