Fast Hyperspectral Unmixing in Presence of Nonlinearity or Mismodelling Effects

This paper presents two novel hyperspectral mixture models and associated unmixing algorithms. The two models assume a linear mixing model corrupted by an additive term whose expression can be adapted to account for multiple scattering nonlinearities (NL), or mismodelling effects (ME). The NL model generalizes bilinear models by taking into account higher order interaction terms. The ME model accounts for different effects such as endmember variability or the presence of outliers. The abundance and residual parameters of these models are estimated by considering a convex formulation suitable for fast estimation algorithms. This formulation accounts for constraints such as the sum-to-one and non-negativity of the abundances, the non-negativity of the nonlinearity coefficients, the spectral smoothness of the ME terms and the spatial sparseness of the residuals. The resulting convex problem is solved using the alternating direction method of multipliers (ADMM) whose convergence is ensured theoretically. The proposed mixture models and their unmixing algorithms are validated on both synthetic and real images showing competitive results regarding the quality of the inference and the computational complexity when compared to the state-of-the-art algorithms.


I. INTRODUCTION
Hyperspectral imaging is a remote sensing technology that collects three dimensional data cubes composed of 2D spatial images acquired in numerous contiguous spectral bands.
Assuming that each pixel spectrum is a mixture of several pure materials (endmembers), spectral unmixing consists of recovering the spectral signatures (endmembers) of the materials present in the scene, and quantifying their proportions within each hyperspectral image pixel [1]. More precisely, unmixing hyperspectral images consists of three stages: (i) determining the number of endmembers and possibly projecting the data onto a subspace of reduced dimension [2]- [4], (ii) identifying the endmembers using an endmember extraction algorithm (EEA) such as vertex component analysis (VCA) [5], and N-FINDR [6] and (iii) estimating their abundances [7]- [9]. Akin to [7], [9]- [11], this paper considers a supervised unmixing scenario which aims at estimating the abundances while assuming that the two first unmixing steps have been successfully implemented.
As a result of its simplicity, the linear mixing model (LMM) is used by many of the hyperspectral unmixing algorithms presented in the literature [7], [9]. This is generally justified when considering flat scenes without component interactions, and a fixed endmember spectra for all the pixels. However, an inherent limitation of the LMM occurs in presence of volumetric scattering, terrain relief, or intimate mixtures of materials which require the definition of new sophisticated models, to take these effects into account. Nonlinear mixture models are an alternative to better account for those effects [12], [13] and we distinguish between two main families: the first is signal processing based and seeks to construct flexible models that can represent a wide range of nonlinearities. The second is physical based models that include the intimate mixture models [14] and those accounting for bilinear interactions [15]- [20]. This paper considers a physical based nonlinearity which generalizes the bilinear formulation in [10], [21] to account for multiple scattering effects. A second inherent limitation of the LMM appears when the endmember spectra vary spectrally and spatially causing what is known as endmember variability [22], [23]. In this case, and under a supervised SU scenario, the endmember fluctuation can not be captured by traditional EEA algorithms which affect the LMM by the presence of an additional spectrally smooth residual component [21]. A third LMM limitation is related to the presence of sparse outliers, e.g. due to the presence of impulse noise, horizontal or vertical line stripes, dead lines, and others types of noise [24], [25]. The latter two LMM limitations can be solved separately July 20, 2016 DRAFT by considering specialized algorithms that deal with EV [26]- [28] or outliers [24], [25]. In this paper, we adopt the same strategy as in [21], [29] and propose a robust algorithm that encompasses the first two effects described above.
The first contribution of this paper is the introduction of two mixture models to deal with NL and ME. The models proposed are based on the residual component principle [30] and are closely relate to the RCA-NL and RCA-ME models introduced in [21]. More precisely, the proposed NL generalizes RCA-NL by accounting for multiple scattering effects. Indeed, the residual term is assumed to be a linear combination of high order interaction spectra. Due to the high number of interactions, the non-negative nonlinearity coefficients are assumed sparse so that only a few interactions are active for each pixel. The resulting formulation is then general, and covers many NL models [15]- [20], [31]. In a similar fashion to RCA-ME, the proposed ME model assumes a spectrally smooth residual term. However, in contrast with RCA-ME that adopts a statistical approach to account for this smooth property, the proposed model assumes the residual term to be a sparse linear combination over a dictionary (such as the discrete cosine transform (DCT), or the spline decomposition). For both models, the corrupted pixels are assumed spatially sparse meaning that only a small number of nonlinear or outlier pixels are present, as previously suggested in [31], [32] for NL and in [24], [25] for outliers. This effect has been introduced by considering the well known collaborative sparse regression strategy [24], [32]- [35] since it promotes group-sparsity over the residual terms while using the information of the residuals in all the pixels. Note that the first motivation for these new reformulations is that both models assume a residual term that is written as a linear combination of sparse coefficients, which is suitable for the development of a joint formulation to achieve the unmixing strategy. The second motivation is related to the unmixing problem that is significantly simplified by considering separable variables (between the abundances and the residual coefficients) as well as a linear expression for both the LMM term and the residual term.
The second contribution of this paper is the introduction of a convex formulation for unmixing the proposed observation models. The convexity is obtained thanks to the linearity of the observation models with respect to the unknown parameters, as well as the considered regularization terms. Indeed, the formulation accounts for the known physical constraints on the estimated parameters such as the sum-to-one and non-negativity of the abundances, the non-negativity of the nonlinearity coefficients, the spectral smoothness of the ME terms and the spatial sparseness of the residuals. The resulting convex problem is solved using July 20, 2016 DRAFT the alternating direction method of multipliers (ADMM) whose convergence is theoretically ensured. More precisely, we propose two algorithms denoted as NUSAL-K for nonlinear unmixing by variable splitting and augmented Lagrangian with order K, and RUSAL for robust unmixing by variable splitting and augmented Lagrangian. Note that the ADMM algorithms are well adapted for large scale problems, i.e., with a large number of parameters to be estimated [36], [37]. Moreover, this method offers good performance at a reduced computational cost as already shown in many hyperspectral unmixing works [9], [34], [35].
The proposed mixture models and estimation algorithms are validated using synthetic and real hyperspectral images. The results obtained are very promising and show the potential of the proposed mixture models and associated inference algorithms with respect to the estimation quality and the computational cost.
The paper is structured as follows. Section II presents the proposed NL and ME mixture models considered in this study. Section III introduces the convex unmixing formulations and the ADMM-based optimization algorithms associated with the two mixture models. Section IV analyzes the performance of the proposed algorithms when applied to synthetic images with known ground truth. Results on real hyperspectral images are presented in Section V and conclusions and future work are reported in Section VI.

II. MIXTURE MODELS
As a result of its simplicity, the LMM is widely used in hyperspectral images. However, the LMM has some limitations in the presence of nonlinearity or outlier effects. This paper deals with these issues by considering the observation model proposed in [21], itself inspired from the residual component analysis model described in [30]. This model introduces a general formulation that is expressed as the sum of a linear model and a residual term that accounts for the remaining effects. The general observation model for the (L × 1) pixel spectrum y n , where L is the number of spectral bands, is given by a r,n m r + φ n (M , a n , x n ) + e n = M a n + φ n (M , a n , x n ) + e n , where a n = (a 1,n , · · · , a R,n ) T is an (R × 1) vector of abundances associated with the nth pixel, x n = (x 1,n , · · · , x D,n ) T is a (D × 1) vector of residual coefficients associated with the nth pixel, R (resp. D) is the number of endmembers (resp. residual coefficients), e n ∼ N (0, Σ) is a centered Gaussian noise and φ n is a residual term that might depends July 20, 2016 DRAFT on the endmembers, the abundances or residual coefficients to account for the additional mismodelling effect. In model (1), the endmembers matrix M is fixed (extracted using an EEA) and endmember variability can be accounted for by the pixel dependent residual term φ n . Moreover, the paper deals with the supervised case in which we assume the endmembers to be known and we only estimate the abundances and the residual terms. Due to physical constraints, the abundance vector a n satisfies the following abundance non-negativity (ANC) and abundance sum-to-one (ASC) constraints a r,n ≥ 0, ∀r ∈ {1, . . . , R} and R r=1 a r,n = 1.
Eq. (1) shows a general model that can be adapted to account for different physical phenomena. The next sections present in details the considered model variants that will account for NL and ME.

1) Effects of Nonlinearity (NL):
Nonlinear mixing models provide a powerful tool to deal with the inherent limitations of the LMM. Many nonlinear models have been introduced in the literature and we can divide them into two categories: physical based models (including bilinear and intimate mixture models) and signal processing models (such as the PPNMM [15], [38]). This paper considers a physical based model to deal with the multiple scattering effects. More precisely, the model considered accounts for higher order interactions between the endmembers and reduces to [10], [21] when only the bilinear second order interactions are considered. Note that bilinear models assume that the effect of the interaction terms decreases as the order increases, as suggested in [16], [18], [19]. However, in this paper, we include higher order interaction terms in the proposed model/algorithm to highlight their benefit as recently shown in [39]. The proposed NL model considering the Kth order of interactions is given by where the residual component is with γ n = γ considering only second order interaction terms (i.e., K = 2) leads to and a residual term similar to [10] as follows where m i,j = m i m j , and the interaction terms are weighted by the coefficient √ 2 obtained by comparison with a homogeneous polynomial kernel of the 2nd degree (see Appendix A for more details regarding these coefficients). In what follows, and for brevity, we drop the order index (K) for general statements (related to all interaction orders) and only include it when dealing with specific orders. The model proposed in (3) reduces to the LMM for γ n = 0, ∀n and has many links to state-of-the-art models. Indeed, model (3) with K = 2 is similar to [10] and has a close relation to the RCA model [31] (as shown in [10]). Moreover, it generalizes the GBM model [16], [17] by accounting for self-interaction between the endmembers, and also generalizes the PPNMM [15] by considering different weights for the bilinear terms. Overall, model (3) is of a similar polynomial form as the bilinear models (GBM [16], PPNMM [15], Nascimento [18], Fan [19], and Meganem [20] models) with the main difference due to the introduction of higher order interaction terms, and the nonnegativity and sum-to-one constraints associated with each model. In contrast with the model described in [39], which accounts for all the interactions by using only one parameter, the model (3) includes a different coefficient for each interaction term, which enables analysis of the interaction between any specific physical components (i.e., availability of interaction maps).
Note that the nonlinear behavior generally affects some pixels of the image as already exploited in [31], [32], which suggest a spatial sparsity of the nonlinear pixels. Moreover, it makes sense to assume that the elements of the nonlinear vector γ n will not be active at the same time, meaning that the vector is sparse. This can be explained since the lowest order of interactions have often a higher effect [16], [18], [19] and all the interactions between endmembers are not likely to be active at the same time. These sparsity properties are of great importance and will be exploited when designing the unmixing algorithm associated with model (3) in Section III.
2) Mismodelling effects (ME) or outliers: In recent years, there has been considerable interest in robust hyperspectral unmixing to enable adaptation of the simple LMM to realistic scenes which often present outliers or other unknown effects [40]. This goal can be achieved July 20, 2016 DRAFT using different strategies such as adapting the optimization cost function [41] or changing the observation model by introducing a residual term that accounts for the mismodelling effects [21], [29], [32]. The latter strategy is adopted in this paper by considering spectrally smooth residuals as for the ME model introduced in [21]. More precisely, the model is where the residual component is with F is a D × L matrix gathering the first D rows of the DCT, b n is a vector of DCT coefficients and φ ME n is a smooth spectral function. In this paper, the smooth property of φ ME n is obtained by imposing sparsity on the elements of each vector b n , ∀n. Model (6) reduces to the LMM for b n = 0 L , ∀n. Moreover, the residual terms φ ME 1 , · · · , φ ME N are assumed to be spatially sparse to approximate sparse nonlinear effects, endmember variability effects or other mismodelling effects such as outliers. In the following, we highlight the link between model (6) and each of these phenomena. Consider first the NL model (3) with In this special case, the nonlinear term reduces to φ NL n (M ) = γ n D d=1 q d , where q d represents the dth column of Q. Thus, as a result of the smooth spectral property of the interaction spectra q d , the nonlinear term φ NL n can be approximated by the term φ ME n . This means that model (6) links to the NL model (3) for this special case. Second, the EV-model proposed in [21] and assuming pixel dependent endmembers s r,n = m r,n + k r,n , reduces to model (6) when the same variability affects the different endmembers (i.e., k r,n = k r ,n , ∀r = r ). Third, spatially sparse outliers can be present in hyperspectral images as shown in [24], [25], [32], and can also be approximated using φ ME n . This illustrates how the model described by (6) can be used to process hyperspectral images with a combination of different effects such as NL, EV and/or outliers. The next section introduces the proposed estimation algorithms associated with these NL and ME models.

NUSAL-K, AND RUSAL
This section introduces the unmixing algorithms used to estimate the abundances and the residual coefficients of the proposed models. To this end, we adopt an optimization approach that minimizes a regularized data fidelity cost function. More precisely, considering an July 20, 2016 DRAFT independent and identically distributed (i.i.d.) Gaussian noise (Σ proportional to the identity matrix) in model (1) leads to the following negative log-likelihood (referred to as data fidelity term in what follows, and defined up to a multiplicative constant which is the noise variance) denotes the Frobenius norm. Note that P = Q and x = γ (resp. P = F and x = b) when considering the NL model (resp. the ME model). Estimating the abundances and the residual coefficients is an ill-posed inverse problem that requires the introduction of prior knowledge (or regularization terms) about these parameters of interest.
Therefore, we propose to solve the following regularized optimization problem where i R + (A) = N n=1 i R + (a n ) is the indicator function that imposes the ANC (i R + (a n ) = 0 if a n belongs to the non-negative orthant and +∞ otherwise) a n is the indicator function that imposes the ASC to each abundance vector a n , 1 (i,j) denotes the i × j vector of 1s, ψ (X) = i R + (X) when considering the NL model and ψ (X) = 0 for the ME model. The first line of (9) is a sum of the quadratic data fidelity term associated with the Gaussian noise statistics and two convex terms imposing the abundance constraints.
The second line of (9) accounts for the sparsity behavior of the residual coefficients. The first convex term ||X|| 1 = N n=1 ||x n || 1 is an 1 norm that promotes element-wise sparsity on the D × N matrix X. This behavior is illustrated in Fig. 1 (left) which shows a point-wise repartition of the active elements of X. The second convex term x T n x n is the 21 mixed norm of X which promotes sparsity among the columns of X, i.e., it promotes solutions of (9) with a small number of nonlinear or outlier pixels.
This regularization term has received increasing interest in recent years [24], [32]- [35] and is known as a collaborative regularization since it uses information about the residuals in all the pixels to promote group-sparsity over the columns of X. The effect of this mixed norm is illustrated in Fig. 1 (middle). Equation (9) includes a combination of the 1 norm and the 21 mixed norm which leads to a slightly different effect as highlighted in Fig. 1 (right). Indeed, this combination allows for sparsity among the elements of the columns of X. Finally the cost function (9) is a sum of convex functions that is solved using the ADMM algorithm proposed in [36], [42] and described in the next section.

A. The ADMM algorithm
Consider the optimization problem where Z ∈ R (R+D)×N , g j : R p j ×N → R are closed, proper, convex functions, and H j ∈ R p j ×(R+D) are arbitrary matrices. After denoting U j = H j Z ∈ R p j ×N and introducing the auxiliary variable D j ∈ R p j ×N , the authors in [36], [42] introduced the ADMM variant summarized in Algo. 1 to solve (10)  Another important point to note is that the setting of µ has a strong impact on the convergence speed of the algorithm. In this paper, µ is updated using the adaptive procedure described in [34], [37], whose objective is to keep the ratio between the ADMM primal and dual residual norms within a given positive interval, as they both converge to zero. Note finally that the algorithms are stopped if the primal or dual residual norms are lower than a given threshold [37]. We refer the reader to [34], [36], [37], [42] for more details regarding the ADMM algorithm.

B. The NUSAL-K algorithm
This section presents the optimization problem considered for estimating the parameters of the NL model (3). We first recall the two assumptions: (i) the nonlinearity appears in some  6: end for 7: Linear system of equations 8: j , 9: Moreau proximity operators 10: for j=1:J do 11: 13: end for 14: Update Lagrange multipliers 15: for j=1:J do 16: where Γ = [γ 1 , · · · , γ N ] is a (D K ×N ) matrix of nonlinear coefficients, and Z = A , Γ .
The mixed norm 21 imposes sparsity on the nonlinear pixels, i.e., it imposes sparsity on the columns of Γ (see Fig. 1). In addition, the 1 norm further enforces sparsity on the nonlinear interactions in the active nonlinear pixels as highlighted in Fig. 1 (right). Using the same notation as in (10), problem (11) can be expressed as the sum of J = 5 convex terms given by where I n denotes the n × n identity matrix and 0 (i,j) denotes the i × j matrix of zeros. For this problem, the matrix G is given by G = diag [31 (1,R) , 41 (1,D K ) ] which is clearly of full rank. This matrix and the properties of g i , i ∈ {1, · · · , J} ensures the algorithm convergence.

C. The RUSAL algorithm
The optimization problem used to estimate the parameters of the ME model (6) is based on following assumptions: (i) the outliers appear at some pixels of the image, (ii) the residual spectra are smooth (i.e., the DCT coefficients are sparse). Under these considerations, we propose to solve the following optimization problem where B = [b 1 , · · · , b N ], and Z = A , B . In a similar fashion to NUSAL-K, the mixed norm 21 ensures spatial sparsity of the mismodelling coefficients B. In addition, the 1 norm further enforces sparsity on the DCT coefficients of each active pixel to impose spectral smoothness of the residuals. Using the same notation as in (10), problem (13) can be expressed as the sum of J = 5 convex terms given by For this problem, the full rank matrix G is given by G = 3I (R+D) which, in addition to the properties of g i , i ∈ {1, · · · , J}, ensures the algorithm convergence.

D. Computational complexity
The ADMM algorithm involves the iterative update of the matrices Z ∈ R (R+D)×N (line 8 in algo. 1) and U ∈ R p j ×N (line 12 in algo. 1), where the details of the optimizations with respect to U j , j ∈ {1, · · · , 5} are provided in the Appendix B. The computational complexity of Algo. 1 per iteration is O ((R + D) 2 N ), which is related to the most expensive step introduced by the calculus of U 1 . Finally, it is interesting to note that the matrices to inverse involve low complexity since the matrix G in line 8 is diagonal, and the matrix to inverse to update U 1 is fixed and then can be precomputed outside the iterative loop.

IV. SIMULATION RESULTS ON SYNTHETIC DATA
This section evaluates the performance of the proposed algorithms with synthetic data.
This enables the performance of the algorithms to be compared on data with a known ground truth. All simulations have been implemented using MATLAB R2015a on a computer with Intel(R) Core(TM) i7-4790 CPU@3.60GHz and 32GB RAM. The section is divided into three parts whose objectives are: 1) introducing the criteria used for the evaluation of the unmixing quality, 2) description of the synthetic images considered in the experiments, and 3) evaluating and comparing the proposed NUSAL-K and RUSAL algorithms with other state-of-the-art algorithms.

A. Evaluation criteria
The performance of the algorithm has been assessed in terms of abundance estimation by comparing the estimated and actual abundances using the average root mean square error (aRMSE) defined by aRMSE (A) = 1 N R N n=1 a n −â n 2 2 . As a measure of fit, we consider the following reconstruction error RE =

B. Description of the synthetic images
The proposed unmixing algorithms are evaluated on two images with different parameters.
The images of size 100 × 100 pixels and L = 207 spectral bands have been generated using R endmembers corresponding to spectral signatures available in the ENVI software library [45]. All images have been corrupted by i.i.d. Gaussian noise of variance σ 2 whose level is July 20, 2016 DRAFT adjusted to obtain SNR= 25 dB where SNR = 10 log The images have been generated using different mixture models as follows • Linear+Nonlinear models: image I 1 has been generated with 4 linear/nonlinear models.
An image partition into 4 classes has been generated by considering a Potts-Markov random field (with granularity parameter β = 0.8) as shown in Fig. 2 (left). The four spatial classes are associated with the LMM, NL-3 model (3)  respectively. Note that the generated nonlinear coefficients γ n are not sparse, which is a challenging scenario for the NUSAL-K algorithm. Finally, the abundances have been generated uniformly in the simplex of ANC and ASC.
• Mismodelling effects: image I 2 has been partitioned into 3 classes by considering a Potts-Markov random field as shown in Fig. 2 (right). Pixels of the first class have been generated according to the LMM model, and the pixels of class 2 have been generated while considering EV. This has been achieved by varying the endmembers in each pixel of the image. Indeed, a pixel dependent smooth spectral function p rn ∈ R L×1 has been added to each endmember to model EV. As in [21], the smooth functions were generated as follows p rn ∼ N (0 L×1 , 2 Σ p ), where Σ p is an (L × L) squaredexponential covariance matrix modeling the spectral correlations and 2 = 0.001. The pixels of class 3 have been generated according to the ME model proposed in [21], since it leads to smooth spectral residuals as in (6). More precisely, the residuals have been generated as follows φ ME n ∼ N (0 L×1 , 2 Σ p ), with 2 = 0.002. Finally, the abundances have been generated uniformly in the simplex of ANC and ASC.
Note that both images have been generated with the number of endmembers varying in the interval {3, 6}.

C. Performance of the proposed algorithms
The proposed RUSAL and NUSAL-K algorithms are compared to state-of-the-art algorithms by processing the generated synthetic images. We consider the two variants NUSAL-2 and NUSAL-3 to study the effect of high order interaction terms. The comparison algorithms are associated with different mixture models as follows • Linear unmixing: the abundances are estimated using the FCLS algorithm [7] and the SUNSAL algorithm [9].
For comparison purposes, the endmembers of these algorithms have been fixed to the actual spectra used to generate the data (the endmember update step in RNMF has been removed).
Moreover, the CDA algorithms have been used while fixing the illumination coefficient to the value #1 to provide a fair comparison with the remaining algorithms. Note also that the RNMF, NUSAL-K and RUSAL algorithms require the regularization parameters to be set. In this study, we provide the best performance (in terms of abundance RMSE) of these algorithms when varying the regularization parameters as follows: λ of RNMF varies in  Table I clearly shows that the NUSAL-2 and NUSAL-3 algorithms are faster than the NL state-of-the-art algorithms, i.e., CDA-NL and SKhype. It is also shown that NUSAL-3 requires more computational time than NUSAL-2, while it performs slightly better. This highlights the effect of the third order nonlinear interaction terms that improve the unmixing at a price of a higher computational time. As expected, the mismodelling-based algorithms CDA-ME, RNMF and RUSAL provide an intermediate performance between the LMM algorithms and the NL algorithms. Indeed, these algorithms are designed to deal with different effects including the NL effect. Moreover, RUSAL is less sensitive to the variation of the endmember number R than NUSAL-K and CDA-NL. Indeed, the latter algorithms account for interaction terms whose number increases with R, while RUSAL use a flexible residual formulation that is not related to R (it simply accounts for the spectral smoothness of the residuals). Table II shows the obtained results when processing the second image that includes pixels with LMM, EV and ME. The best RMSE performance are generally obtained with the CDA-ME algorithm. The proposed RUSAL algorithm provides competitive abundance RMSE with RNMF with the advantage of a reduced computational time. In contrast with SKhype which demonstrates robust behavior, the NL based algorithms are not well adapted to these data and provide a lower unmixing quality than the ME algorithms. These results highlight the benefit of the NUSAL-K and RUSAL algorithms that show competitive results when compared to the other algorithms.
Moreover, the proposed algorithms exhibit a reduced computational cost that is suitable for real world applications. While both NUSAL-K and CDA-NL algorithms are sensitive to an increase in the number of endmembers, this effect is more important for CDA-NL while it is reduced for the NUSAL-K algorithms that remain faster than SKhype for R = 6 and K = 3 (i.e., 77 interaction terms as reported in Table V). Note finally that additional experiments, conducted with SNR= 15 dB, show a reduction of the unmixing quality for all algorithms.
However, the algorithms relative behavior is similar to the studied case, and the conclusions remain valid. These results are not provided here for brevity.

V. RESULTS ON REAL DATA
This section illustrates the performance of the proposed algorithms when applied to three real hyperspectral images. The first hyperspectral image has received much attention in the remote sensing community [16], [46]. This image was acquired over Moffett Field, CA, in 1997 by AVIRIS. The dataset contains 100 × 100 pixels, L = 152 spectral bands (after removing water absorption bands) acquired in the interval 0.4−2.5µm, has a spatial resolution of 100m and is mainly composed of three components: water, soil, and vegetation (see Fig. 3 (a)). This image is interesting since it is known to include bilinear scattering effects [16], [21], [32] which makes it suitable for the assessment of the NUSAL-K and RUSAL algorithms presented in this paper. The second image, denoted as Madonna, was acquired in 2010 by the Hyspex hyperspectral scanner over Villelongue, France (00 03'W and 4257'N). The dataset contains L = 160 spectral bands recorded from the visible to near infrared (400 − 1000nm) with a spatial resolution of 0.5m [47]. It has previously been studied in [21], [48], [49] showing NL effects (between the trees and the soil), EV effects (mainly for the vegetation) and shadow effect. The subimage considered contains 160 × 200 pixels and is composed of R = 4 components: tree, grass, soil and shadow (see Fig. 3 (b)). For these two images, the VCA algorithm [5] was used to extract the corresponding endmembers, i.e., R = 3 endmembers for the Moffett image and R = 4 endmembers for the Madonna image. The third image was acquired by the AVIRIS sensor, in 1998, over Salinas Valley, California (see Fig. 3 (c)). The dataset contains 86 × 83 pixels, 204 spectral bands with the same spectral resolution and spectral range as the Moffett image (the water absorption bands were removed) and a spatial resolution of 3.7 m. This image is interesting since it includes different species of vegetables showing endmember variability, which makes it suitable for the assessment of the RUSAL algorithm. According to the ground truth information [50], this image contains 6 classes that are: Broccoli, Corn senesced green weeds, lettuce of different ages (4, 5, 6, and 7 weeks). As a result of the similarity between the different spectra and the presence of highly mixed pixels [51], we have manually extracted 4 endmembers associated with these classes: Corn senesced green weeds + lettuce-4-5, Broccoli, lettuce-6, and lettuce-7 2 . Indeed, these endmembers have a different shape (minimum pairwise angle of 9 degrees) while the remaining fluctuations can be associated with the effect of EV. Table III shows the unmixing performance for the different algorithms. Overall, the NL and robust algorithms provide a better fit than the LMM-based ones. Among the sophisticated algorithms, the proposed NUSAL-2, NUSAL-3 and RUSAL algorithms provide the best performance for the computational cost. The algorithms all generated similar abundance maps for the Moffett image and we only show those of NUSAL-2, NUSAL-3 and RUSAL in Fig.   4, for brevity. Fig. 5 presents the residual maps associated with the NL algorithms (left column) and the robust algorithms (right column). This figure highlights good agreement between the NL algorithms that detect nonlinearity in the coastal region (as in [16]). In addition to this region, the robust algorithms (RNMF, ME, RUSAL) detect other mismodelling effects probably due to EV as already reported in [21]. The NL coefficients estimated by NUSAL-2 and NUSAL-3 are reported in Fig. 6. This figure shows good agreement between the estimated bilinear coefficients when considering NUSAL-2 and NUSAL-3. Moreover, it highlights the sparse behavior of the nonlinear coefficients and clearly shows that they are mainly due to the second order interactions. Indeed, Fig. 6 (bottom-right) shows that the average of the nonlinear coefficients over all the pixels is higher for the first six terms, i.e., the second order terms. The abundances obtained for the Madonna scene are displayed in Fig. 7 for SKhype, RNMF, RUSAL, NUSAL-2, and NUSAL-3 (the other algorithms provided similar maps to NUSAL/RUSAL and were not displayed for brevity). This figure shows a slight difference between the RNMF soil map and the other algorithms. Similar differences are observed when considering the residual maps in Fig. 8 since RNMF detected a higher residual effect in the soil area (bottom-left corner in the RNMF image) than ME and RUSAL.
Apart this, the robust algorithms detected residuals in the shadow areas and in trees. The latter is mainly due to the presence of multiple scattering effects as highlighted by the NL algorithms that show similar maps (see left figures). In a similar manner to Moffett field, the NUSAL-2 and NUSAL-3 estimated NL coefficients are mainly due to bilinear interactions as highlighted in Fig. 9. This justifies the good behavior of the bilinear models [15]- [20], [31] that assume that the effect of the interaction terms decreases when increasing the interaction orders. The abundances obtained for the Salinas scene are displayed in Fig. 10 for SKhype, CDA-NL, RUSAL, NUSAL-2, and NUSAL-3 (the other algorithms provided similar maps to NUSAL/RUSAL and were not displayed for brevity). Because of the high EV effect, both CDA-NL and SKhype fails to extract the abundances of Broccoli, and lettuce-6. The residual maps shown in Fig. 11 confirm this since NL algorithms detect a reduced effect while CDA-EV, CDA-ME and RUSAL detect more EV effect especially in the region of the lettuce.
These results highlight the ability of CDA-ME and RUSAL to capture EV effects. Fig. 12 shows some randomly selected outlier spectra obtained with CDA-EV, CDA-ME and RUSAL algorithms. These spectra show a similar global shape while they highlight the properties of each algorithm. Indeed, it can be seen that the CDA-EV and CDA-ME algorithms provide rougher spectra that are more realistic than RUSAL. However, the RUSAL algorithm allows the absence of outliers (null spectra) thanks to the sparsity promoting property imposed on the outliers. To summarize, the obtained results highlighted the benefit of RUSAL/NUSAL-K that estimate abundance and residual maps which are in good agreement with state-of-the-art algorithms, but at a lower computational cost. NUSAL-K generalizes the common bilinear models and provide NL coefficient maps associated with different interaction orders. This provides a useful tool to better analyze the scattering effect between the physical elements.
RUSAL provides a flexible tool to capture different mismodelling effects due to EV, NL or July 20, 2016 DRAFT outliers. It is more robust than NUSAL-K with respect to the variation of the endmember number R, but provide less information regarding the interaction terms. Table IV finally summarizes the main characteristics of the nonlinear and robust algorithms considered in this paper.

VI. CONCLUSIONS
This paper has introduced two mixture models and their supervised unmixing algorithms.
The two models accounted for the presence of nonlinearity or mismodelling effects by considering a residual term in addition to the linear mixture of endmembers. The residual term was expressed as a sparse linear combination of some signals, thus, the proposed models reduced to a linear combination with respect to the abundances and the residual coefficients.
The unknown parameters associated with these models were estimated using an optimization approach that included convex regularization terms. More precisely, the non-negativity and sum-to-one constraints were imposed on the abundances and the residual terms were assumed to be spatially sparse by considering a collaborative sparse regression approach. The resulting convex problem was solved using an alternating direction method of multipliers whose convergence was theoretically ensured. The proposed algorithms showed good performance when processing synthetic data generated with the linear model or other more sophisticated models. Results on real data confirmed the good performance of the proposed algorithms and showed their ability to extract different features in the observed scenes, with a reduced computational cost. These results confirmed that most vegetation nonlinearity can be captured by bilinear interactions and that endmember variability is mainly located in vegetation areas.
Future work includes the introduction of spatial correlation on the abundances. Considering endmember variability jointly with nonlinearity is also an interesting issue which would deserve to be investigated. .
The number of interaction spectra is given by 16 (see Table V gathers the interaction spectra of the ith order. The size of Q (K) is then obtained by summing the size of the interaction spectra D K (i) associated with the ith order, as follows where x! = 1 · · · (x − 2)(x − 1)x, denotes the factorial of x.  [10], [11], [21], each interaction term in Q The details of the MPOs can be found, for example, in [44].
• Linear system of equations: ] for the NL model and G = 3I (R+D) for the ME model • MPO for g 1 (U 1 ) = L P (U 1 ): where soft(.) denotes the soft threshold operator given by soft V , τ µ = sign(V ) max |V | − τ µ , 0 , sign(.) denotes the element-wise application of the sign function, |V | denotes the matrix of absolute values of the elements of V , max(.) is the element-wise maximum operator, and vect-soft(.) is the well known vect-soft-threshold operator given by vect-soft v, τ Note finally that P = Q for NUSAL and P = F for RUSAL.