Filstroff, Louis. Contributions to probabilistic nonnegative matrix factorization  Maximum marginal likelihood estimation and Markovian temporal models. PhD, Signal, Image, Acoustique et Optimisation, Institut National Polytechnique de Toulouse, 2019

(Document in English)
Nonnegative matrix factorization (NMF) has become a popular dimensionality reductiontechnique, and has found applications in many different fields, such as audio signal processing,hyperspectral imaging, or recommender systems. In its simplest form, NMF aims at finding anapproximation of a nonnegative data matrix (i.e., with nonnegative entries) as the product of twononnegative matrices, called the factors. One of these two matrices can be interpreted as adictionary of characteristic patterns of the data, and the other one as activation coefficients ofthese patterns. This lowrank approximation is traditionally retrieved by optimizing a measure of fitbetween the data matrix and its approximation. As it turns out, for many choices of measures of fit,the problem can be shown to be equivalent to the joint maximum likelihood estimation of thefactors under a certain statistical model describing the data. This leads us to an alternativeparadigm for NMF, where the learning task revolves around probabilistic models whoseobservation density is parametrized by the product of nonnegative factors. This general framework, coined probabilistic NMF, encompasses many wellknown latent variable models ofthe literature, such as models for count data. In this thesis, we consider specific probabilistic NMFmodels in which a prior distribution is assumed on the activation coefficients, but the dictionary remains a deterministic variable. The objective is then to maximize the marginal likelihood in thesesemiBayesian NMF models, i.e., the integrated joint likelihood over the activation coefficients.This amounts to learning the dictionary only; the activation coefficients may be inferred in asecond step if necessary. We proceed to study in greater depth the properties of this estimation process. In particular, two scenarios are considered. In the first one, we assume the independence of the activation coefficients samplewise. Previous experimental work showed that dictionarieslearned with this approach exhibited a tendency to automatically regularize the number of components, a favorable property which was left unexplained. In the second one, we lift thisstandard assumption, and consider instead Markov structures to add statistical correlation to themodel, in order to better analyze temporal data.
Institution:  Université de Toulouse > Institut National Polytechnique de Toulouse  Toulouse INP (FRANCE) 
Research Director:  Févotte, Cédric 
Deposited On:  05 Mar 2020 08:12 
