OATAO - Open Archive Toulouse Archive Ouverte Open Access Week

Inferring phonemic classes from CNN activation maps using clustering techniques

Pellegrini, Thomas and Mouysset, Sandrine Inferring phonemic classes from CNN activation maps using clustering techniques. (2016) In: Annual conference Interspeech (INTERSPEECH 2016), 9 September 2016 - 12 September 2016 (San Francisco, United States).

(Document in English)

PDF (Author's version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader

Official URL: http://dx.doi.org/10.21437/Interspeech.2016-1299


Today's state-of-art in speech recognition involves deep neu-ral networks (DNN). These last years, a certain research effort has been invested in characterizing the feature representations learned by DNNs. In this paper, we focus on convolutional neu-ral networks (CNN) trained for phoneme recognition in French. We report clustering experiments performed on activation maps extracted from the different layers of a CNN comprised of two convolution and sub-sampling layers followed by three dense layers. Our goal was to get insights into phone separability and phonemic categories inferred by the network, and how they vary according to the successive layers. Two directions were explored with both linear and non-linear clustering techniques. First, we imposed a number of 33 classes equal to the number of context-independent phone models for French, in order to assess the phoneme separability power of the different layers. As expected, we observed that this power increases with the layer depth in the network: from 34% to 74% in F-measure from the first convolution to the last dense layers, when using spectral clustering. Second, optimal numbers of classes were automatically inferred through inter-and intra-cluster measure criteria. We analyze these classes in terms of standard French phonological features.

Item Type:Conference or Workshop Item (Paper)
Additional Information:Thanks to International Society for Computers and their Applications (ISCA). The original PDF is available at: http://www.isca-speech.org/archive/Interspeech_2016/pdfs/1299.PDF
HAL Id:hal-01474886
Audience (conference):National conference proceedings
Uncontrolled Keywords:
Institution:Université de Toulouse > Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE)
French research institutions > Centre National de la Recherche Scientifique - CNRS (FRANCE)
Université de Toulouse > Université Toulouse III - Paul Sabatier - UT3 (FRANCE)
Université de Toulouse > Université Toulouse - Jean Jaurès - UT2J (FRANCE)
Université de Toulouse > Université Toulouse 1 Capitole - UT1 (FRANCE)
Laboratory name:
Deposited On:31 Jan 2017 13:18

Repository Staff Only: item control page