Stochastic Contrast Measures for SAR Data: A Survey (in English)

Alejandro C. Frery


Stochastic Contrast Measures for SAR Data: A Survey (in English)

  • 中图分类号: TN957.7

Stochastic Contrast Measures for SAR Data: A Survey (in English)

    Author Bio: Alejandro C. Frery (S’92–SM’03) received a B.Sc. degree in Electronic and Electrical Engineering from the Universidad de Mendoza, Mendoza, Argentina. His M.Sc. degree was in Applied Mathematics (Statistics) from the Instituto de Matemática Pura e Aplicada (IMPA, Rio de Janeiro) and his Ph.D. degree was in Applied Computing from the Instituto Nacional de Pesquisas Espaciais (INPE, São José dos Campos, Brazil). He is currently the leader of LaCCAN – Laboratório de Computação Científica e Análise Numérica, Universidade Federal de Alagoas, Maceió, Brazil, and holds a Huashan Scholar position (2019–2021) with the Key Lab of Intelligent Perception and Image Understanding of the Ministry of Education, Xidian University, Xi’an, China. His research interests are statistical computing and stochastic modeling.
    Corresponding author: Laboratório de Computação Científica e Análise Numérica – LaCCAN, Universidade Federal de Alagoas – Ufal, 57072-900 Maceió, AL – Brazil, and the Key Lab of Intelligent Perception and Image Understanding of the Ministry of Education, Xidian University, Xi’an, China. Email:
  • CLC number: TN957.7

  • Figure 1.  Mind map of this review contents

    Figure 2.  Exponential densities with mean 1/2, 1, and 2 (red, black and blue, resp.) in linear and semilogarithmic scales

    Figure 3.  Unitary mean Gamma densities with 1, 3, and 8 looks (black, red, and blue, resp.) in linear and semilogarithmic scales

    Figure 4.  Densities in linear and semi-logarithmic scale of the ${\rm E}(1) $ (black) and $ {{\cal{G}}^0} $ distributions with unitary mean and $ \alpha\in\{-1.5,-3.0,-8.0\} $ in red, green, and blue, resp

    Figure 5.  Densities in linear and semilogarithmic scale $ {\cal{G}}^0(-5,4,L) $ distributions with unitary mean and $ L\in\{1,3,8\} $ in red, green, and blue, resp

    Figure 6.  Equalized intensity data with grid

    Figure 7.  Regression analysis for the estimation of the equivalent number of looks

    Figure 8.  Strips of 10 × 500 pixels with samples from two $ {\cal{G}}^0 $ distributions

    Figure 9.  Illustration of edge detection by maximum likelihood

    Figure 10.  Illustration of parameter estimation by distance minimization

    Figure 11.  Illustration of the Nonlocal Means approach

    Table 1.  ($h,\phi$)-divergences and related functions $\phi$ and $h$

    $(h,\phi)$-divergence $h(y)$ $\phi(x)$
    Kullback-Leibler $y$ $x\ln(x)$
    Rényi (order $\beta$) $\dfrac{1}{\beta-1}\ln\left((\beta-1)y+1\right),\;0\leq y < \dfrac{1}{1-\beta}$ $\dfrac{x^{\beta}-\beta(x-1)-1}{\beta-1},0 < \beta<1$
    Hellinger ${y}/{2},0\leq y<2$ $(\sqrt{x}-1)^2$
    Bhattacharyya $-\ln(1-y),0\leq y < 1$ $-\sqrt{x}+\dfrac{x+1}{2}$
    Jensen-Shannon $y$ $x\ln\left(\dfrac{2x}{x+1}\right)$
    Arithmetic-geometric $y$ $\left(\dfrac{x+1}{2}\right)\ln \dfrac{x+1}{2x}$
    Triangular $y,\;0\leq y <2$ $\dfrac{(x-1)^2}{x+1}$
    Harmonic-mean $-\ln\left(-\dfrac{y}{2}+1\right),\;0\leq y < 2$ $\dfrac{(x-1)^2}{x+1}$
    下载: 导出CSV

    Table 2.  $h$-$\phi$ entropies and related functions

    $(h,\phi)$-entropy $h(y)$ $\phi(x)$
    Shannon[35] $y$ $-x\ln x$
    Restricted Tsallis (order $\beta \in \mathbb{R}_{+}\,:\,\beta\neq 1$)[39] $y$ $\dfrac{x^\beta-x}{1-\beta} $
    Rényi (order $\beta \in \mathbb{R}_+\,:\,\beta\neq 1$)[29] $\dfrac{\ln y}{1-\beta}$ $x^\beta$
    Arimoto of order $\beta$ $\dfrac{\beta-1}{y^\beta-1}$ $x^{1/\beta}$
    Sharma-Mittal of order $\beta$ $ \dfrac{\exp\{(\beta-1)y\} }{\beta-1}$ $ x\ln x$
    下载: 导出CSV
  • [1] LEE J S and POTTIER E. Polarimetric Radar Imaging: From Basics to Applications[M]. Boca Raton: CRC, 2009.
    [2] TUR M, CHIN K C, and GOODMAN J W. When is speckle noise multiplicative?[J]. Applied Optics, 1982, 21(7): 1157–1159. doi: 10.1364/AO.21.001157
    [3] ARGENTI F, LAPINI A, BIANCHI T, et al. A tutorial on speckle reduction in synthetic aperture radar images[J]. IEEE Geoscience and Remote Sensing Magazine, 2013, 1(3): 6–35. doi: 10.1109/MGRS.2013.2277512
    [4] GOMEZ L, OSPINA R, and FRERY A C. Unassisted quantitative evaluation of despeckling filters[J]. Remote Sensing, 2017, 9(4): 389. doi: 10.3390/rs9040389
    [5] HARALICK R M. Statistical and structural approaches to texture[J]. Proceedings of the IEEE, 1979, 67(5): 786–804. doi: 10.1109/PROC.1979.11328
    [6] VITALE S, COZZOLINO D, SCARPA G, et al. Guided patchwise nonlocal SAR despeckling[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(9): 6484–6498. doi: 10.1109/TGRS.2019.2906412
    [7] GOMEZ L, OSPINA R, and FRERY A C. Statistical properties of an unassisted image quality index for SAR imagery[J]. Remote Sensing, 2019, 11(4): 385. doi: 10.3390/rs11040385
    [8] FRERY A and WU J C. Operational statistics for SAR imagery[EB/OL]., 2019.
    [9] MANSKI C F. Analog Estimation Methods in Econometrics[M]. New York: Chapman & Hall, 1988.
    [10] MEJAIL M E, JACOBO-BERLLES J C, FRERY A C, et al. Classification of SAR images using a general and tractable multiplicative model[J]. International Journal of Remote Sensing, 2003, 24(18): 3565–3582. doi: 10.1080/0143116021000053274
    [11] CINTRA R J, FRERY A C, and NASCIMENTO A D C. Parametric and nonparametric tests for speckled imagery[J]. Pattern Analysis and Applications, 2013, 16(2): 141–161. doi: 10.1007/s10044-011-0249-3
    [12] TSYBAKOV A B. Introduction to Nonparametric Estimation[M]. New York: Springer, 2009.
    [13] WASSERMAN L. All of Nonparametric Statistics[M]. New York: Springer, 2006.
    [14] GIBBONS J D and CHAKRABORTI S. Nonparametric Statistical Inference[M]. 4th ed. New York: Marcel Dekker, 2003.
    [15] PALACIO M G, FERRERO S B, and FRERY A C. Revisiting the effect of spatial resolution on information content based on classification results[J]. International Journal of Remote Sensing, 2019, 40(12): 4489–4505. doi: 10.1080/01431161.2019.1569278
    [16] NEGRI R G, FRERY A C, SILVA W B, et al. Region-based classification of PolSAR data using radial basis kernel functions with stochastic distances[J]. International Journal of Digital Earth, 2019, 12(6): 699–719. doi: 10.1080/17538947.2018.1474958
    [17] FRERY A C, SANT’ANNA S J S, MASCARENHAS N D A, et al. Robust inference techniques for speckle noise reduction in 1-look amplitude SAR images[J]. Applied Signal Processing, 1997, 4(2): 61–76.
    [18] CHAN D, REY A, GAMBINI J, et al. Low-cost robust estimation for the single-look GI0 model using the Pareto distribution[J]. IEEE Geoscience and Remote Sensing Letters, 2019. doi: 10.1109/LGRS.2019.2956635
    [19] BUSTOS O H, LUCINI M M, and FRERY A C. M-estimators of roughness and scale for ${\cal{G}}_A^0 $ -modelled SAR imagery[J]. EURASIP Journal on Advances in Signal Processing, 2002, 2002(1): 105–114.
    [20] MOSCHETTI E, PALACIO M G, PICCO M, et al. On the use of Lee’s protocol for speckle-reducing techniques[J]. Latin American Applied Research, 2006, 36(2): 115–121.
    [21] ALLENDE H, FRERY A C, GALBIATI J, et al. M-estimators with asymmetric influence functions: The ${\cal{G}}_A^0$ distribution case[J]. Journal of Statistical Computation and Simulation, 2006, 76(11): 941–956. doi: 10.1080/10629360600569154
    [22] CASELLA G and BERGER R L. Statistical Inference[M]. 2nd ed. Pacific Grove: Duxbury, 2002.
    [23] NASCIMENTO A D C, CINTRA R J, and FRERY A C. Hypothesis testing in speckled data with stochastic distances[J]. IEEE Transactions on Geoscience and Remote Sensing, 2010, 48(1): 373–385. doi: 10.1109/TGRS.2009.2025498
    [24] GOUDAIL F and RÉFRÉGIER P. Contrast definition for optical coherent polarimetric images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(7): 947–951. doi: 10.1109/TPAMI.2004.22
    [25] ALI S M and SILVEY S D. A general class of coefficients of divergence of one distribution from another[J]. Journal of the Royal Statistical Society. Series B (Methodological) , 1996, 28(1): 131–142.
    [26] CSISZÁR I. Information-type measures of difference of probability distributions and indirect observations[J]. Studia Scientiarum Mathematicarum Hungarica, 1967, 2: 299–318.
    [27] SALICRÚ M, MORALES D, MENÉNDEZ M L, et al. On the applications of divergence type measures in testing statistical hypotheses[J]. Journal of Multivariate Analysis, 1994, 51(2): 372–391. doi: 10.1006/jmva.1994.1068
    [28] COVER T M and THOMAS J A. Elements of Information Theory[M]. 2nd ed. New York: John Wiley & Son, 1991.
    [29] RÉNYI A. On measures of entropy and information[C]. The 4th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, USA, 1961: 547–561.
    [30] FUKUNAGA K. Introduction to Statistical Pattern Recognition[M]. 2nd ed. San Diego: Academic, 1990.
    [31] DIACONIS P and ZABEL S L. Updating subjective probability[J]. Journal of the American Statistical Association, 1982, 77(380): 822–830. doi: 10.1080/01621459.1982.10477893
    [32] BURBEA J and RAO C. On the convexity of some divergence measures based on entropy functions[J]. IEEE Transactions on Information Theory, 1982, 28(3): 489–495. doi: 10.1109/TIT.1982.1056497
    [33] BURBEA J and RAO C R. Entropy differential metric, distance and divergence measures in probability spaces: A unified approach[J]. Journal of Multivariate Analysis, 1982, 12(4): 575–596. doi: 10.1016/0047-259X(82)90065-3
    [34] SEGHOUANE A K and AMARI S I. The AIC criterion and symmetrizing the Kullback-Leibler divergence[J]. IEEE Transactions on Neural Networks, 2007, 18(1): 97–106. doi: 10.1109/TNN.2006.882813
    [35] SALICRÚ M, MENÉNDEZ M L, MORALES D, et al. Asymptotic distribution of (h, ϕ)-entropy[J]. Communications in Statistics-Theory and Methods, 1993, 22(7): 2015–2031. doi: 10.1080/03610929308831131
    [36] PARDO L, MORALES D, SALICRÚ M, et al. Generalized divergence measures: Information matrices, amount of information, asymptotic distribution, and its applications to test statistical hypotheses[J]. Information Sciences, 1995, 84(3/4): 181–198.
    [37] PARDO L, MORALES D, SALICRÚ M, et al. Large sample behavior of entropy measures when parameters are estimated[J]. Communications in Statistics – Theory and Methods, 1997, 26(2): 483–501. doi: 10.1080/03610929708831929
    [38] FRERY A C, CINTRA R J, and NASCIMENTO A D C. Entropy-based statistical analysis of PolSAR data[J]. IEEE Transactions on Geoscience and Remote Sensing, 2013, 51(6): 3733–3743. doi: 10.1109/TGRS.2012.2222029
    [39] HAVRDA J and CHARVÁT F. Quantification method of classification processes: Concept of structural α-entropy[J]. Kybernetika, 1967, 3: 30–35.
    [40] ATKINSON C and MITCHELL A F S. Rao’s distance measure[J]. Sankhyā: The Indian Journal of Statistics, Series A, 1981, 43(3): 345–365.
    [41] MENÉNDEZ M L, MORALES D, PARDO L, et al. Statistical tests based on geodesic distances[J]. Applied Mathematics Letters, 1995, 8(1): 65–69. doi: 10.1016/0893-9659(94)00112-P
    [42] NARANJO-TORRES J, GAMBINI J, and FRERY A C. The geodesic distance between ${\cal{G}}_I^0$ models and its application to region discrimination[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2017, 10(3): 987–997. doi: 10.1109/JSTARS.2017.2647846
    [43] FRERY A C and GAMBINI J. Comparing samples from the ${\cal{G}}^0$ distribution using a geodesic distance[J]. TEST, 2019. doi: 10.1007/s11749-019-00658-2
    [44] GAO Gui. Statistical modeling of SAR images: A survey[J]. Sensors, 2010, 10(1): 775–795. doi: 10.3390/s100100775
    [45] FRERY A C, MÜLLER H J, YANASSE C C F, et al. A model for extremely heterogeneous clutter[J]. IEEE Transactions on Geoscience and Remote Sensing, 1997, 35(3): 648–659. doi: 10.1109/36.581981
    [46] CHAN D, REY A, GAMBINI J, et al. Sampling from the ${\cal{G}}_I^0$ distribution[J]. Monte Carlo Methods and Applications, 2018, 24(4): 271–287. doi: 10.1515/mcma-2018-2023
    [47] HORN R. The DLR airborne SAR PROJECT E-SAR[C]. 1996 IEEE International Geoscience and Remote Sensing Symposium, Lincoln, USA, 1996: 1624–1628.
    [48] GAMBINI J, MEJAIL M E, JACOBO-BERLLES J, et al. Feature extraction in speckled imagery using dynamic B-spline deformable contours under the ${\cal{G}}^0$ model[J]. International Journal of Remote Sensing, 2006, 27(22): 5037–5059. doi: 10.1080/01431160600702616
    [49] GAMBINI J, MEJAIL M E, JACOBO-BERLLES J, et al. Accuracy of edge detection methods with local information in speckled imagery[J]. Statistics and Computing, 2008, 18(1): 15–26. doi: 10.1007/s11222-007-9034-y
    [50] FRERY A C, JACOBO-BERLLES J, GAMBINI J, et al. Polarimetric SAR image segmentation with B-Splines and a new statistical model[J]. Multidimensional Systems and Signal Processing, 2010, 21(4): 319–342. doi: 10.1007/s11045-010-0113-4
    [51] GAMBINI J, CASSETTI J, LUCINI M M, et al. Parameter estimation in SAR imagery using stochastic distances and asymmetric kernels[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2015, 8(1): 365–375. doi: 10.1109/JSTARS.2014.2346017
    [52] BUADES A, COLL B, and MOREL J M. A review of image denoising algorithms, with a new one[J]. Multiscale Modeling & Simulation, 2005, 4(2): 490–530.
    [53] BUADES A, COLL B, and MOREL J M. Image denoising methods: A new nonlocal principle[J]. SIAM Review, 2010, 52(1): 113–147. doi: 10.1137/090773908
    [54] TEUBER T and LANG A. A new similarity measure for nonlocal filtering in the presence of multiplicative noise[J]. Computational Statistics & Data Analysis, 2012, 56(12): 3821–3842. doi: 10.1016/j.csda.2012.05.009
    [55] PENNA P A A and MASCARENHAS N D A. SAR speckle nonlocal filtering with statistical modeling of Haar wavelet coefficients and stochastic distances[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(9): 7194–7208. doi: 10.1109/TGRS.2019.2912153
    [56] FERRAIOLI G, PASCAZIO V, and SCHIRINZI G. Ratio-based nonlocal anisotropic despeckling approach for SAR images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(10): 7785–7798. doi: 10.1109/TGRS.2019.2916465
    [57] LEE J S, HOPPEL K W, MANGO S A, et al. Intensity and phase statistics of multilook polarimetric and interferometric SAR imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 1994, 32(5): 1017–1028. doi: 10.1109/36.312890
    [58] HAGEDORN M, SMITH P J, BONES P J, et al. A trivariate chi-squared distribution derived from the complex Wishart distribution[J]. Journal of Multivariate Analysis, 2006, 97(3): 655–674. doi: 10.1016/j.jmva.2005.05.014
    [59] DENG Xinping, LÓPEZ-MARTÍNEZ C, CHEN Jinsong, et al. Statistical modeling of polarimetric SAR data: A survey and challenges[J]. Remote Sensing, 2017, 9(4): 348. doi: 10.3390/rs9040348
    [60] Core Team R. R: A language and environment for statistical computing, R foundation for statistical computing, Vienna, Austria[EB/OL]., 2019.
    [61] ANFINSEN S N, DOULGERIS A P, and ELTOFT T Ø. Estimation of the equivalent number of looks in polarimetric synthetic aperture radar imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2009, 47(11): 3795–3809. doi: 10.1109/TGRS.2009.2019269
    [62] FRERY A C, NASCIMENTO A D C, and CINTRA R J. Analytic expressions for stochastic distances between relaxed complex Wishart distributions[J]. IEEE Transactions on Geoscience and Remote Sensing, 2014, 52(2): 1213–1226. doi: 10.1109/TGRS.2013.2248737
    [63] MENÉNDEZ M L, MORALES D, PARDO L, et al. (h, $\varPhi $ )-entropy differential metric[J]. Applications of Mathematics, 1997, 42(2): 81–98. doi: 10.1023/A:1022214326758
    [64] NASCIMENTO A D C, FRERY A C, and CINTRA R J. Detecting changes in fully polarimetric SAR imagery with statistical information theory[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(3): 1380–1392. doi: 10.1109/TGRS.2018.2866367
    [65] COELHO D F G, CINTRA R J, FRERY A C, et al. Fast matrix inversion and determinant computation for polarimetric synthetic aperture radar[J]. Computers & Geosciences, 2018, 119: 109–114.
    [66] TORRES L, SANT’ANNA S J S, DA COSTA FREITAS C, et al. Speckle reduction in polarimetric SAR imagery with stochastic distances and nonlocal means[J]. Pattern Recognition, 2014, 47(1): 141–157. doi: 10.1016/j.patcog.2013.04.001
    [67] DELEDALLE C A, DENIS L Ï, and TUPIN F. Iterative weighted maximum likelihood denoising with probabilistic patch-based weights[J]. IEEE Transactions on Image Processing, 2009, 18(12): 2661–2672. doi: 10.1109/TIP.2009.2029593
    [68] CHEN Jiong, CHEN Yilun, AN Wentao, et al. Nonlocal filtering for polarimetric SAR data: A pretest approach[J]. IEEE Transactions on Geoscience and Remote Sensing, 2011, 49(5): 1744–1754. doi: 10.1109/TGRS.2010.2087763
    [69] ZHONG Hua, LI Yongwei, and JIAO Licheng. SAR image despeckling using Bayesian nonlocal means filter with sigma preselection[J]. IEEE Geoscience and Remote Sensing Letters, 2011, 8(4): 809–813. doi: 10.1109/LGRS.2011.2112331
    [70] DELEDALLE C A, DUVAL V, and SALMON J. Non-local methods with shape-adaptive patches (NLM-SAP)[J]. Journal of Mathematical Imaging and Vision, 2012, 43(2): 103–120. doi: 10.1007/s10851-011-0294-y
    [71] SILVA W B, FREITAS C C, SANT’ANNA S J S, et al. Classification of segments in PolSAR imagery by minimum stochastic distances between Wishart distributions[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2013, 6(3): 1263–1273. doi: 10.1109/JSTARS.2013.2248132
    [72] GOMEZ L, ALVAREZ L, MAZORRA L, et al. Classification of complex Wishart matrices with a diffusion-reaction system guided by stochastic distances[J]. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2015, 373(2056): 20150118. doi: 10.1098/rsta.2015.0118
    [73] GOMEZ L, ALVAREZ L, MAZORRA L, et al. Fully PolSAR image classification using machine learning techniques and reaction-diffusion systems[J]. Neurocomputing, 2017, 255: 52–60. doi: 10.1016/j.neucom.2016.08.140
    [74] NASCIMENTO A D C, HORTA M M, FRERY A C, et al. Comparing edge detection methods based on stochastic entropies and distances for PolSAR imagery[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2014, 7(2): 648–663. doi: 10.1109/JSTARS.2013.2266319
    [75] De BORBA A A, MARENGONI M, and FRERY A C. Fusion of evidences for edge detection in PolSAR images[C]. 2019 TENGARSS, Kochi, India, 2019, in press.
    [76] BHATTACHARYA A, MUHURI A, DE S, et al. Modifying the Yamaguchi four-component decomposition scattering powers using a stochastic distance[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2015, 8(7): 3497–3506. doi: 10.1109/JSTARS.2015.2420683
    [77] CONRADSEN K, NIELSEN A A, SCHOU J, et al. A test statistic in the complex Wishart distribution and its application to change detection in polarimetric SAR data[J]. IEEE Transactions on Geoscience and Remote Sensing, 2003, 41(1): 4–19. doi: 10.1109/TGRS.2002.808066
    [78] NIELSEN A A, CONRADSEN K, and SKRIVER H. Change detection in full and dual polarization, single-and multifrequency SAR data[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2015, 8(8): 4041–4048. doi: 10.1109/JSTARS.2015.2416434
    [79] RATHA D, BHATTACHARYA A, and FRERY A C. Unsupervised classification of PolSAR data using a scattering similarity measure derived from a geodesic distance[J]. IEEE Geoscience and Remote Sensing Letters, 2018, 15(1): 151–155. doi: 10.1109/LGRS.2017.2778749
    [80] RATHA D, GAMBA P, BHATTACHARYA A, et al. Novel techniques for built-up area extraction from polarimetric SAR images[J]. IEEE Geoscience and Remote Sensing Letters, 2019. doi: 10.1109/LGRS.2019.2914913
    [81] RATHA D, MANDAL D, KUMAR V, et al. A generalized volume scattering model-based vegetation index from polarimetric SAR data[J]. IEEE Geoscience and Remote Sensing Letters, 2019, 16(11): 1791–1795. doi: 10.1109/LGRS.2019.2907703
    [82] RATHA D, POTTIER E, BHATTACHARYA A, et al. A PolSAR scattering power factorization framework and novel roll-invariant parameters based unsupervised classification scheme using a geodesic distance[J]. arXiv:1906.11577, 2019.
    [83] FERNANDES D and FRERY A C. Statistical properties of geodesic distances between samples and elementary scatterers in PolSAR imagery[C]. 2019 TENGARSS, Kochi, India, 2019, in press.
    [84] YUE D X, XU F, FRERY A C, and JIN Q. A generalized Gaussian coherent scatterer model for correlated SAR texture[J]. IEEE Transactions on Geoscience and Remote Sensing, in press.
  • [1] 洪文 . 圆迹SAR 成像技术研究进展. 雷达学报, 2012, 1(2): 124-135. doi: 10.3724/SP.J.1300.2012.20046
    [2] Hu JingqiuLiu FalinZhou ChongbinLi BoWang Dongjin . CS-SAR Imaging Method Based on Inverse Omega-K Algorithm. 雷达学报, 2017, 6(1): 25-33. doi: 10.12000/JR16027
    [3] 洪永彬张勇鲁振兴黄巍 . 一种高效的基于对比度的步进频雷达运动补偿算法. 雷达学报, 2016, 5(4): 378-388. doi: 10.12000/JR16068
    [4] 朱敏慧 . SAR 的海洋动力探测研究及应用浅析. 雷达学报, 2012, 1(4): 342-352. doi: 10.3724/SP.J.1300.2012.20088
    [5] 张问一胡东辉丁赤飚 . 基于FABEMD 和Goldstein 滤波器的SAR 舰船尾迹图像增强方法. 雷达学报, 2012, 1(4): 426-435. doi: 10.3724/SP.J.1300.2012.20059
    [6] 王志豪李刚蒋骁 . 基于光学和SAR遥感图像融合的洪灾区域检测方法. 雷达学报, 2020, 9(): 1-15. doi: 10.12000/JR19095
    [7] 洪文林赟谭维贤王彦平向茂生 . 地球同步轨道圆迹SAR研究. 雷达学报, 2015, 4(3): 241-253. doi: 10.12000/JR15062
    [8] 刘宁赵博黄磊 . 单通道SAR抗欺骗干扰方法. 雷达学报, 2019, 8(1): 73-81. doi: 10.12000/JR18072
    [9] 周伟刘永祥黎湘凌永顺 . MIMO-SAR 技术发展概况及应用浅析. 雷达学报, 2014, 3(1): 10-18. doi: 10.3724/SP.J.1300.2013.13074
    [10] 杨震杨汝良 . HJ-1-C 卫星SAR 系统的内定标. 雷达学报, 2014, 3(3): 314-319. doi: 10.3724/SP.J.1300.2014.14028
    [11] 张增辉郁文贤 . 稀疏微波SAR图像特征分析与目标检测研究. 雷达学报, 2016, 5(1): 42-56. doi: 10.12000/JR15097
    [12] 贾丽贾鑫许小剑何永华 . 机场场景SAR原始数据模拟. 雷达学报, 2014, 3(5): 565-573. doi: 10.3724/SP.J.1300.2014.14071
    [13] 毛永飞汪小洁向茂生 . 机载干涉SAR 区域网三维定位算法. 雷达学报, 2013, 2(1): 60-67. doi: 10.3724/SP.J.1300.2012.20107
    [14] 陈功伯李勇陶满意 . 基于信号的环视SAR 成像参数估计方法. 雷达学报, 2013, 2(2): 203-209. doi: 10.3724/SP.J.1300.2013.20073
    [15] 赵晓辉姜义成朱同宇 . 基于表征转换机的SAR图像目标分割方法. 雷达学报, 2016, 5(4): 402-409. doi: 10.12000/JR16066
    [16] 邢孟道孙光才李学仕 . 用于高分辨率宽测绘带SAR系统的SAR/GMTI处理方法研究. 雷达学报, 2015, 4(4): 375-385. doi: 10.12000/JR15096
    [17] 周辉赵凤军禹卫东杨健 . 基于非理想运动误差补偿的SAR地面运动目标成像(英文). 雷达学报, 2015, 4(3): 265-275. doi: 10.12000/JR15024
    [18] 韩萍王欢 . 基于改进的稀疏保持投影的SAR目标特征提取与识别. 雷达学报, 2015, 4(6): 674-680. doi: 10.12000/JR15068
    [19] 顾福飞张群杨秋霍文俊王敏 . 基于NCS算子的大斜视SAR压缩感知成像方法. 雷达学报, 2016, 5(1): 16-24. doi: 10.12000/JR15035
    [20] 赵军香梁兴东李焱磊 . 一种基于似然比统计量的SAR相干变化检测. 雷达学报, 2017, 6(2): 186-194. doi: 10.12000/JR16065
  • 加载中
  • 文章访问数:  394
  • HTML浏览量:  228
  • PDF下载量:  125
  • 被引次数: 0
  • 收稿日期:  2019-12-05
  • 录用日期:  2019-12-20
  • 刊出日期:  2019-12-28

Stochastic Contrast Measures for SAR Data: A Survey (in English)

    作者简介: Alejandro C. Frery (S’92–SM’03) received a B.Sc. degree in Electronic and Electrical Engineering from the Universidad de Mendoza, Mendoza, Argentina. His M.Sc. degree was in Applied Mathematics (Statistics) from the Instituto de Matemática Pura e Aplicada (IMPA, Rio de Janeiro) and his Ph.D. degree was in Applied Computing from the Instituto Nacional de Pesquisas Espaciais (INPE, São José dos Campos, Brazil). He is currently the leader of LaCCAN – Laboratório de Computação Científica e Análise Numérica, Universidade Federal de Alagoas, Maceió, Brazil, and holds a Huashan Scholar position (2019–2021) with the Key Lab of Intelligent Perception and Image Understanding of the Ministry of Education, Xidian University, Xi’an, China. His research interests are statistical computing and stochastic modeling

English Abstract

    • Synthetic Aperture Radar (SAR) has been widely used as an important system for information extraction in remote sensing applications. Such microwave active sensors have as main advantages the following features: (i) their operation does not depend on sunlight, either of weather conditions and (ii) they are capable of providing high spatial image resolution.

      In recent years, the interest in understanding such type of imagery has increased. However, since the acquired images stem from a coherent illumination process, they are affected by a signal-dependent granular noise called “speckle”[1]. Such noise has a multiplicative nature, and its intensity does not follow the Gaussian law. Moreover, the availability of fully polarimetric images requires processing data in the form of complex matrices. Thus, analyzing SAR images requires tailored image processing based on the statistical properties of speckled data.

      There is a vast literature on techniques for SAR image processing and analysis that employ a diversity of approaches for the related problems these images pose. Among them, advances in modeling with Statistical Information Theory and Geometry have led to several exciting solutions for the problems of denoising, edge detection, segmentation, classification, parameter estimation, change detection, and feature selection. Moreover, these approaches provide unique insights on the properties of the data and the information they convey.

      This paper presents a survey of those results.

      This work is organized in nine sections following this Introduction.

      Section 2, where we show that it is possible to assess the quality of despeckling filters by measuring the distance of the denoised image to the ideal model.

      Once we see that the approach provides interesting results, we move to Section 3, where we review the main inference techniques, namely analogy (Sec. 3.1), maximum likelihood (Sec. 3.2), and Nonparametric and Robust estimation (Sec. 3.3).

      Section 4 discusses the main approaches for computing differences between distributions: the logarithm of the ratio of likelihoods (Sec. 4.1), tests based on divergences (Sec. 4.2), tests that rely on the quadratic difference of entropies (Sec. 4.3), and tests that employ geodesic distances between distributions (Sec. 4.4).

      Section 5 reviews the main models that describe SAR data in intensity format, and provides details about the Gamma (Sec. 5.1) and $ {\cal{G}}^0 $ (Sec. 5.2) distributions.

      Section 6 presents applications of the concepts discussed in Section 4 to the intensity models of the previous one: edge detection by maximum likelihood (Sec. 6.2) and by maximum geodesic distance between samples (Sec. 6.3); parameter estimation by distance minimization (Sec. 6.4); and speckle reduction (Sec. 6.5).

      Section 7 reviews the main distribution for fully polarimetric data: the Wishart law. We present the reduced log-likelihood of a sample from such a model and estimation by maximum likelihood.

      In Section 8, we study applications of measures of difference to polarimetric data: we propose a nonlocal filter whose weights are proportional to the closeness between samples (Sec. 8.1); we classify segments by minimizing the distance between samples and prototypes (Sec. 8.2); we go back to the problem of edge detection in this kind of data (Sec. 8.3); we show a technique for polarimetric decomposition, which corrects the effects of the orientation angle (Sec. 8.4), and we conclude showing how to detect changes in multitemporal images with a test based on likelihoods and by comparing entropies (Sec. 8.5).

      Section 9 presents the idea of projecting polarimetric data onto the surface of a $ 16 $-dimensional sphere, measuring distances and, with this, performing decompositions and clustering.

      Fig. 1 shows the mind map that led the writing of this survey. The arrows denote the model and/or technique used to tackle each problem.

      Figure 1.  Mind map of this review contents

      The paper concludes with Section 10, where we suggest research avenues and further readings. We also give our assessment of the value of these techniques in perspective in an era dominated by the Deep Learning approach.

    • The multiplicative model is widely accepted as a good descriptor for SAR data[2]. It assumes that the observed intensity $ Z $ in each pixel is the product of two independent random variables: $ X $, which describes the unobserved backscatter (the quantity of interest), and $ Y $, the speckle.

      One can safely assume that the speckle obeys a Gamma law with unitary mean and shape parameter $ L $, the number of looks, which depends on the processing level and is related to the signal-to-noise ratio.

      Without making any assumption about the distribution of $ X $, except that it has positive support, any despeckling filter $ {\widehat {X}} $ may be seen as an estimator of $ X $ based solely on the observation of $ X $, i.e., $ {\widehat {X}} = {\widehat {X}} (Z) $. Argenti et al.[3] present a comprehensive survey of such filters, along with measures of their performance.

      Image processing platforms suited for SAR, e.g., ENVI and Snap, offer users a plethora of despeckling filters, and all of them require stipulating parameters. The final user seldom chooses measures of quality, as they capture different features, and there is no obvious way to combine them.

      Gomez et al.[4] noted that the ideal filter $ {\widetilde {X}} $ should produce the actual backscatter $ X $ as output. The ideal ratio image $ \widetilde R = Z/{\widetilde {X}} = Z/X = $$XY/X = Y $ is a collection of independent identically distributed (iid) Gamma deviates with unitary mean and shape parameter $ L $. The worse the filter $ {\widehat {X}} $ is, the further (less adherent) the ratio image $ \widehat R = Z/{\widehat {X}} $ will be from this model. They then proposed measuring this distance with two components: one, based on first-order statistics (mean and looks preservation), and the other based on second-order statistics (how independent the observations are). This distance can also be used to fine-tune the parameters of a filter in order to obtain optimized results.

      The first component is comprised of quadratic differences between (i) local means that should be 1 and (ii) local estimates of the shape of the Gamma distribution that should be equal to the original number of looks.

      The second component was initially measured through the homogeneity from Haralik’s co-occurrence matrices[5]. The original proposal used as reference an ensemble of shuffled ratio images. Vitale et al.[6] improved this step by comparing the joint and marginal distributions of the co-occurrence matrices.

      Later, Gomez et al.[7] studied the statistical properties of this measure, leading to a test whose null hypothesis is that the filter is ideal. This approach led to a powerful tool without assuming any distribution for $ X $, thus for $ Z $.

      However, we need potent models in order to extract information from SAR images. These models will be reviewed in Sections 5 and 7.

    • This Section is based on the book by Frery and Wu[8], freely available with code and data.

      Assume we have data and models for them. We will see ways of using the former to make inferences about the latter.

      Without loss of generality, consider the parametric model $ (\varOmega, {\cal{A}}, \Pr) $ for real independent random variables $ X $ and $ { X} = (X_1,X_2, \dots ) $, in which $ \varOmega $ is the sample space, $ {\cal{A}} $ is the smallest $ \sigma $-algebra of events, and $ \Pr $ is a probability measure defined on $ {\cal{A}} $. The distribution of $ X $ is indexed by $ { \theta}\in{{\varTheta}}\subset {\mathbb R}^p $, where $ p\geq 1 $ is the dimension of the parametric space $ {{\varTheta}} $.

      Parametric statistical inferences consist in making assertions about $ { \theta} $ using the information provided by the sample $ { X} $.

    • This approach requires the knowledge of the expected value of transformations of the random variable $ X $:

      $ {\rm E}[{ \psi}(X)] = \big( {\rm E}[\psi_1(X)], {\rm E}[\psi_2(X)], \dots , {\rm E}[\psi_p(X)] \big) $

      where each $ \psi_j $ is a measurable function $ \psi_j: {\mathbb R}\to{\mathbb R} $. Each element of Eq. (1) is given by

      $ {\rm E}[\psi_j(X)] = \int_{{\mathbb R}} \psi_j(x) {\rm d}F(x) $

      and $ F $ is the cumulative distribution function of $ X $. If $ \psi(X) = X^k $, we say that $ {\rm E}(X^k) $ is the $ k $-th order moment of $ X $ (if it exists).

      The quantity $ {\rm E}(X-{\rm E}X)^k $ is called “the central moment” of $ X $, if it exists. The second central moment $ {\rm E}\big(X-{\rm E}(X)\big)^2 = {\rm E}(X^2)-\big({\rm E}(X)\big)^2 $ is called “the variance” of $ X $. We denote it $ {\rm Var}(X) $.

      The Law of Large Numbers is the basis for inference by analogy[9]. This law states that (under relatively mild conditions) holds that

      $ \mathop {\lim }\limits_{n \to \infty } \frac{1}{n}\sum\limits_{i = 1}^n \psi ({X_i}) = {\rm{E}}[\psi (X)] $

      provided $ X,X_1,X_2,\dots $ are iid.

      With this in mind, and assuming one has a large sample, it seems reasonable to equate sample quantities (the left-hand side) and parametric expressions (the right-hand side).

      When we have $ p $ parameters, i.e. $ { \theta}\in{ \varTheta}\subset{\mathbb R}^p $, we need $ p $ linearly independent equations to form an estimator of $ { \theta} $. Such estimator has the following form:

      $ \left. \begin{array}{c} g_1\left({{\widehat{\theta}}}_1,{{\widehat{\theta}}}_2,\dots,{{\widehat{\theta}}}_p\right) - \dfrac{1}{n} \displaystyle\sum_{i = 1}^n \psi_1(X_i) = 0\\ \vdots\\ g_p \left({{\widehat{\theta}}}_1, {{\widehat{\theta}}}_2, \dots ,{{\widehat{\theta}}}_p\right) - \dfrac{1}{n} \displaystyle\sum_{i = 1}^n \psi_p(X_i) = 0\\ \end{array} \right\} $

      More often than not, one has to rely on numerical routines to obtain $ {\widehat{{ \theta}}} $ with this approach. In some cases, Eq. (4) has no solution; cf.[10].

    • Consider again the sample of iid random variables $ { X} = (X_1,X_2,\dots ,X_n) $ each with the same distribution characterized (without lack of generality) by the density $ f(X_i;{ \theta}) $. The likelihood function is

      $ {\cal L}({ \theta};{ X}) = \prod\limits_{i = 1}^{n} f({ \theta};X_i) $

      Notice that $ {\cal L} $ is not a joint density function, as it depends on $ { \theta} $, not on the variables.

      The principle of maximum likelihood proposes as estimator for $ { \theta} $ the parameter that maximizes Eq. (5):

      $ {{\widehat{{ \theta}}}}_{\rm{ML}} = \arg\max\limits_{{ \theta}\in{{\Theta}}} \Big\{ {\cal L}({ \theta};{ X}) \Big\} $

      that is, the point in $ {{ \varTheta}} $ that makes the observations most plausible. It sometimes coincides with some analogy estimators.

      The two most widely used approaches for maximizing Eq. (5) are by optimization and by the solution of a system of equations. They both operate on the reduced log-likelihood rather than on Eq. (5).

      Using the fact that Eq. (5) is a product of positive functions, one may consider

      $ {{\widehat{{ \theta}}}}_{\rm{ML}} = \arg\max\limits_{{ \theta}\in{{\Theta}}} \Big\{ \ln{\cal L}({ \theta};{ X}) \Big\} $

      instead of Eq. (6). If we now discard from $ \ln{\cal L}({ \theta};{ X}) $ the terms that do not depend on $ { \theta} $, we obtain the reduced log-likelihood $ \ell({ \theta};{ X}) $, our function of interest.

      A maximum likelihood estimator may be found by maximizing $ \ell({ \theta};{ X}) $, often with numerical tools, or by solving the system of equations given by

      $ \nabla \ell({{\widehat{{ \theta}}}};{ X}) = { 0} $

      i.e., the partial derivatives with respect to each $ \theta_i $ equated to zero.

      Both approaches often require an initial solution, and a general recommendation is towards using an analogy estimator (if available) for that purpose.

      Provided there are no doubts about the validity of the model, the maximum likelihood estimators should be preferred over those obtained by analogy.

    • There are two remarkable approaches to statistical inference that differ from analogy and maximum likelihood: nonparametric and robust estimation.

      The nonparametric approach makes little or no assumptions about the underlying distribution of the data. Cintra et al.[11] employed such techniques with success for the problem of finding edges in SAR imagery. Nonparametric estimators are resistant to contamination, at the expense of having less efficiency. They should be part of every practitioner’s toolbox. The interested reader is referred to the books listed in Refs. [1214] for details about this approach.

      The reader must always bear in mind the underlying hypotheses used to devise an inference procedure. All the techniques discussed above assume that the sample is comprised of observations from iid random variables. A middle path between parametric (Analogy and maximum likelihood) and nonparametric is the robust approach.

      Robust inference assumes there is a model of interest $ {\cal D}({ \theta}) $, and that we aim at estimating $ { \theta} $ with a sample $ { Z} = (Z_1,Z_2,\dots ,Z_n) $. The sample may or may not be produced by $ {\cal D}({ \theta}) $ as hypothesized.

      A robust estimator for $ { \theta} $ should behave well when the model is correct, although not as good as the maximum likelihood estimator $ {\widehat{{ \theta}}}_{\rm{ML}} $ (by definition). However, differently from $ {\widehat{{ \theta}}}_{\rm{ML}} $, it should provide acceptable estimates when $ { Z} $ originates from a different distribution, i.e., under the presence of contamination.

      Contamination may describe several departures from the primary hypothesis as, for instance, a few gross errors when registering the observations, or the presence of data originated from a different law. Contamination happens, for instance, when computing local statistics for designing a filter. The underlying assumption that $ { Z} $ consists of observations from iid random variables breaks when the estimation window is transitioning over two or more different areas. The works by Palacio et al.[15] and by Negri et al.[16] examine the impact that mild violations of this hypothesis on the behavior of subsequent operations, when the inference procedure is unaware of such departure.

      Frery et al.[17] proposed robust estimators for speckle reduction in single-look amplitude data. The authors used an empirical approach and devised estimators based on trimmed statistics, on the median absolute deviation, on the inter-quartile range, on the median, and on the best linear unbiased estimator. They compared the performance of these techniques with respect to estimators based on analogy (moments) and maximum likelihood and concluded that it is advisable to consider robust procedures.

      More recently, Chan et al.[18] used a connection between the model defined by (21) and the Pareto distribution to propose robust estimators with low computational cost. The authors show applications to region classification in the presence of extreme values caused by, for instance, corner reflectors.

      Refs. [1921] discuss other robust techniques for inference with small samples under contamination.

      The visible gain in using robust estimators in speckle reduction techniques is a reduced blurring of the edges and the preservation of small details in the image. This advantage usually comes at the cost of more computational resources.

    • Several relevant problems in signal and image processing and analysis can be solved by formulating them in the form of the following hypothesis test:

      Consider the $ N\geq2 $ samples $ { z}_1,{ z}_2,\dots,{ z}_N $ of the form $ { z}_j = z_{1,j}, z_{2,j},\dots , z_{n_j,j} $ for every $ 1\leq j\leq N $. Is there enough evidence to reject the null hypothesis that they were produced by the same model $ {\cal D}({{ \theta}}) $?

      This problem assumes that the $ N $ samples might be of different sizes $ n_1,n_2, \dots , n_N $, and that the model $ {\cal D}({{ \theta}}) $ is a probability distribution indexed by the unknown $ q $-dimensional parameter $ {{ \theta}}\in{ \varTheta}\subset {\mathbb R}^q $. Eventually, there might be no known model, and it will have to be learned from the data and the underlying hypotheses of the problem.

      In most of this review, with the exception of Section 4.3, we will be handling two samples $ { z}_1,{ z}_2 $ of sizes $ n_1 $ and $ n_2 $ of the form $ { z}_1 = z_{1,1}, z_{2,1}, $$\dots , z_{n_1,1} $ and $ { z}_2 = z_{1,2},z_{2,2}, \dots , z_{n_2,2} $.

      In the following, we will review some of the relevant approaches in the context of SAR image analysis, which have proven to be relevant at handling this problem.

    • The ratio of likelihoods (LR) statistic is of great importance in inference on parametric models. It is based on comparing two (reduced) likelihoods utilizing a ratio of two maxima: the numerator is the likelihood in the alternative hypothesis $ {\cal H}_1: { \theta}\in{{ \varTheta}}_1 $, and the denominator is the likelihood in the null hypothesis $ {\cal H}_0: { \theta}\in{{ \varTheta}}_0 $. These hypotheses must be a partition of the parametric space: $ {{ \varTheta}}_0\cap{{ \varTheta}}_1 = \varnothing $ and $ {{ \varTheta}}_0\cup{{ \varTheta}}_1 = \varTheta $. Under the null hypothesis, the denominator of the test statistic should be large compared to the numerator yielding a small value.

      As discussed in Ref. [22], such statistic has, under $ {\cal H}_0 $, an asymptotic distribution $ \chi^2_q $, where $ q $ is the difference between the dimensions of the parameter spaces under the alternative and the null hypotheses. It is, thus, possible to obtain $ p $-values.

    • This section is based on the work by Nascimento et al.[23]. Contrast analysis requires quantifying how distinguishable two samples are requiring. Within a statistical framework, this amounts to providing a test statistic for the null hypothesis that the same distribution produced the samples $ {\cal D}({ \theta}) $, with ${ \theta}\in{ \varTheta}\subset{\mathbb R}^p $ the parameter space.

      Information theoretical tools collectively known as divergence measures, offer entropy-based methods to discriminate stochastic distributions[24] statistically. Divergence measures were submitted to a systematic and comprehensive treatment in Refs. [2527] and, as a result, the class of $ (h,\phi) $-divergences was proposed[27].

      Let $ X $ and $ Y $ be random variables defined over the same probability space, equipped with densities $ f_X(x;{{{{\theta}}_{\bf{1}}}}) $ and $ f_Y(x;{{{{\theta}}_{\bf{2}}}}) $, respectively, where $ {{{{\theta}}_{\bf{1}}}} $ and $ {{{{\theta}}_{\bf{2}}}} $ are parameter vectors. Assuming that both densities share a common support $ I\subset\mathbb R $, the $ (h,\phi) $-divergence between $ f_X $ and $ f_Y $ is defined by

      $ D_{\phi}^h(X,Y) = h\left(\int_{I} \phi\left( \frac{f_X(x;{{{{\theta}}_{\bf{1}}}})}{f_Y(x;{{{{\theta}}_{\bf{2}}}})}\right) f_Y(x;{{{{\theta}}_{\bf{2}}}})\mathrm{d}x\right) $

      where $ \phi\!:\!(0,\infty)\!\rightarrow\![0,\infty) $ is a convex function, $ h\!\!:\!\!(0,\infty)\!\!\rightarrow\!\![0,\infty) $ is a strictly increasing function with $ h(0) = 0 $, and indeterminate forms are assigned value zero.

      By a judicious choice of functions $ h $ and $ \phi $, some well-known divergence measures arise. Tab. 1 shows a selection of functions $ h $ and $ \phi $ that lead to well-known distance measures. Specifically, the following measures were examined: (i) the Kullback-Leibler divergence[28], (ii) the relative Rényi (also known as Chernoff) divergence of order $ \beta $[29,30], (iii) the Hellinger distance[31], (iv) the Bhattacharyya distance, (v) the relative Jensen-Shannon divergence[32], (vi) the relative arithmetic-geometric divergence, (vii) the triangular distance, and (viii) the harmonic-mean distance.

      $(h,\phi)$-divergence $h(y)$ $\phi(x)$
      Kullback-Leibler $y$ $x\ln(x)$
      Rényi (order $\beta$) $\dfrac{1}{\beta-1}\ln\left((\beta-1)y+1\right),\;0\leq y < \dfrac{1}{1-\beta}$ $\dfrac{x^{\beta}-\beta(x-1)-1}{\beta-1},0 < \beta<1$
      Hellinger ${y}/{2},0\leq y<2$ $(\sqrt{x}-1)^2$
      Bhattacharyya $-\ln(1-y),0\leq y < 1$ $-\sqrt{x}+\dfrac{x+1}{2}$
      Jensen-Shannon $y$ $x\ln\left(\dfrac{2x}{x+1}\right)$
      Arithmetic-geometric $y$ $\left(\dfrac{x+1}{2}\right)\ln \dfrac{x+1}{2x}$
      Triangular $y,\;0\leq y <2$ $\dfrac{(x-1)^2}{x+1}$
      Harmonic-mean $-\ln\left(-\dfrac{y}{2}+1\right),\;0\leq y < 2$ $\dfrac{(x-1)^2}{x+1}$

      Table 1.  ($h,\phi$)-divergences and related functions $\phi$ and $h$

      Often not rigorously a metric[33], since the triangle inequality does not necessarily hold, divergence measures are mathematically suitable tools in the context of comparing distributions. Additionally, some of the divergence measures lack symmetry property. Although there are numerous methods to address the symmetry problem[34], a simple solution is to define a new measure $ d_{\phi}^h $ given by

      $ d_{\phi}^h(X,Y) = \frac{D_{\phi}^h(X,Y)+D_{\phi}^h(Y,X)}{2} $

      regardless whether $ D_{\phi}^h(\cdot,\cdot) $ is symmetric or not. Henceforth, the symmetrized versions of the divergence measures are termed “distances”. By applying the functions of Tab. 1 into Eq. (9), and symmetrizing the resulting divergences, integral formulas for the distance measures are obtained. For simplicity, in the list below, we suppress the explicit dependence on $ x $ and the support $ I $ in the notation.

      (i) The Kullback-Leibler distance:

      $ {d_{{\rm{KL}}}}(X,Y) = \frac{1}{2}\int {({f_X} - {f_Y})} \ln \frac{{{f_X}}}{{{f_Y}}} $

      (ii) The Rényi distance of order $ \beta $:

      $ d_{\rm{R}}^\beta (X,Y) = \frac{1}{{\beta - 1}}\ln \frac{{\displaystyle\int {f_X^\beta } f_Y^{1 - \beta } + \int {f_X^{1 - \beta }} f_Y^\beta }}{2} $

      (iii) The Hellinger distance:

      $ {d_{\rm{H}}}(X,Y) = 1 - \int {\sqrt {{f_X}{f_Y}} } = 1 - \exp \left\{ - \frac{1}{2}d_{\rm{R}}^{1/2}(X,Y)\right\} $

      (iv) The Bhattacharyya distance:

      $ {d_{\rm{B}}}(X,Y) = - \ln \int {\sqrt {{f_X}{f_Y}} } = - \ln \left( {1 - {d_{\rm{H}}}(X,Y)} \right) $

      (v) The Jensen-Shannon distance:

      $ \begin{split} {d_{{\rm{JS}}}}(X,Y) =\,& \frac{1}{2}\left[ \int {{f_X}} \ln \frac{{2{f_X}}}{{{f_Y} + {f_X}}} \right.\\ & \left.+ \int {{f_Y}} \ln \frac{{2{f_Y}}}{{{f_Y} + {f_X}}} \right] \end{split} $

      (vi) The arithmetic-geometric distance:

      $ {d_{{\rm{AG}}}}(X,Y) = \frac{1}{2}\int {({f_X} + {f_Y})} \ln \frac{{{f_Y} + {f_X}}}{{2\sqrt {{f_Y}{f_X}} }} $

      (vii) The triangular distance:

      $ {d_{\rm{T}}}(X,Y) = \int {\frac{{{{({f_X} - {f_Y})}^2}}}{{{f_X} + {f_Y}}}} $

      (viii) The harmonic-mean distance:

      $ {d_{{\rm{HM}}}}(X,Y) = - \ln \int {\frac{{2{f_X}{f_Y}}}{{{f_X} + {f_Y}}}} = - \ln \left(1 - \frac{{{d_{\rm{T}}}(X,Y)}}{2}\right) $

      We conclude this section stating one of the basilar results that support all the forthcoming techniques.

      The distances mentioned above are neither comparable nor semantically rich. Refs. [3537] are pioneering works that make a connection between any $ h $-$ \phi $ distance and a test statistic.

      Consider the model $ {\cal D}({ \theta}) $, with $ { \theta}\in{ \varTheta}\subset{\mathbb R}^M $ an the unknown parameter that indexes the distribution. Assume the availability of two samples of iid observations $ { Z}_1 = Z_1,Z_2,\dots,Z_{n_1} $ from $ {\cal D}({ \theta}_1) $, and $ { Z}_2 = Z_{n_1+1},Z_{n_1+2},\dots,Z_{n_1+n_2} $ from $ {\cal D}({ \theta}_2) $, $ { \theta}_1,{ \theta}_2\in { \varTheta} $. Compute the maximum likelihood estimators of $ { \theta}_1 $ and $ { \theta}_2 $, $ {\widehat{{ \theta}}}_1({ Z}_1) $ and $ {\widehat{{ \theta}}}_2({ Z}_2) $. We are interested in verifying if there is enough evidence in ${\widehat { \theta}}_1,{\widehat{ \theta}}_2 $ to reject the null hypothesis $ H_0:{ \theta}_1 = { \theta}_2 $.

      Under the regularity conditions discussed in Ref. [27,p. 380] the following lemma holds:

      Lemma 1: If $ {n_1}/(n_1+n_2) $ converges to a constant in $ (0,1) $ when $ n_1,n_2\rightarrow\infty $, and $ {{\theta}}_1 = {{\theta}}_2 $, then

      $ S_{\phi}^h({\widehat{{\theta}}}_1,{\widehat{{\theta}}}_2) = \frac{2 n_1 n_2}{n_1+n_2}\frac{d^h_{\phi}({\widehat{{\theta}}}_1,{\widehat{{\theta}}}_2)}{ h{'}(0) \phi{''}(1)}{ \mathop \to \limits^{\cal D}}\chi_{M}^2 $

      where “$ { \mathop \to \limits^{\cal D}} $” denotes convergence in distribution and $ \chi_{M}^2 $ represents the chi-square distribution with $ M $ degrees of freedom.

      Based on Lemma 1, we obtain a tests for the null hypothesis $ {{\theta}}_1 = {{\theta}}_2 $ in the form of following proposition.

      Proposition 1: Let $ n_1 $ and $ n_2 $ be large and $ S_{\phi}^h({\widehat{{\theta}}}_1,{\widehat{{\theta}}}_2) = s $, then the null hypothesis $ {{\theta}}_1 = {{\theta}}_2 $ can be rejected at level $ \alpha $ if $ \Pr( \chi^2_{M}>s)\leq \alpha $.

      This result holds for every $ h $-$ \phi $ divergence, hence its generality. The (asymptotic) $ p $-value of the $ S_{\phi}^h $ statistic can be used as a measure of the evidence in favor of $ H_0 $, so it has rich semantic information. As the convergence is towards the same distribution, these values are comparable.

    • The test statistic presented in Section 4.2 compares a pair of maximum likelihood estimators $ {\widehat{{\theta}}}_1,{\widehat{{\theta}}}_2 $. Many applications, as, for instance, change detection in time series, requires using more evidence. In such cases it is possible to compare entropies.

      For such situations, one may rely on tests based on entropies, as described in the following results, which are based on Ref. [38].

      Let $ f(z;{{\theta}}) $ be a probability density function with parameter vector $ {{\theta}} $. The $ h $-$ \phi $ entropy relative to $ {{Z}} $ is defined by

      $ H_{\phi}^h({{\theta}}) = h\Big(\int\phi(f(z;{{\theta}}))\mathrm{d}z\Big) $

      where either $ \phi:[0,\infty) \rightarrow \mathbb{R} $ is concave and $ h:\mathbb{R} \rightarrow \mathbb{R} $ is increasing, or $ \phi $ is convex and $ h $ is decreasing. Tab. 2 shows the specification of five of these entropies: Shannon, Rényi, Arimoto, Sharma-Mittal, and restricted Tsallis.

      $(h,\phi)$-entropy $h(y)$ $\phi(x)$
      Shannon[35] $y$ $-x\ln x$
      Restricted Tsallis (order $\beta \in \mathbb{R}_{+}\,:\,\beta\neq 1$)[39] $y$ $\dfrac{x^\beta-x}{1-\beta} $
      Rényi (order $\beta \in \mathbb{R}_+\,:\,\beta\neq 1$)[29] $\dfrac{\ln y}{1-\beta}$ $x^\beta$
      Arimoto of order $\beta$ $\dfrac{\beta-1}{y^\beta-1}$ $x^{1/\beta}$
      Sharma-Mittal of order $\beta$ $ \dfrac{\exp\{(\beta-1)y\} }{\beta-1}$ $ x\ln x$

      Table 2.  $h$-$\phi$ entropies and related functions

      The following result, derived by Pardo et al.[37], paves the way for the proposal of asymptotic statistical inference methods based on entropy.

      Lemma 2: Let $ {\widehat{{\theta}}} $ be the ML estimate of the parameter vector $ {{\theta}} $ of dimension $ M $ based on a random sample of size $ n $ under the model $ f(z;{{\theta}}) $. Then

      $ \sqrt{N} \big[H_h^\phi({\widehat{{\theta}}})-H_h^\phi({{\theta}})\big] {\xrightarrow[n\rightarrow \infty]{\cal D}} {\cal N}(0,\sigma_H^2({{\theta}})) $

      where $ {\cal N}(\mu,\sigma^2) $ is the Gaussian distribution with mean $ \mu $ and variance $ \sigma^2 $,

      $ \sigma_H^2({{\theta}}) = {{\delta}}^{\rm T} {\cal K}({{\theta}})^{-1}{{\delta}} $

      $ {\cal K}({{\theta}}) = {\rm E}\{-\partial^2 \ln f(z;{{\theta}})/\partial {{\theta}}^2\} $ is the Fisher information matrix, and $ {{\delta}} = [\delta_1\;\delta_2\;\dots\;\delta_p]^{\rm T} $ such that $ \delta_i = \partial H_h^\phi({{\theta}})/\partial \theta_i $ for $ i = 1,2, \dots,M $.

      We are interested in testing the following hypotheses:

      $ \left. \begin{aligned} &{\cal H}_0 : H_h^\phi({{\theta}}_1) = H_h^\phi({{\theta}}_2) =\dots = H_h^\phi({{\theta}}_N) = v\\ &{\cal H}_1 : H_h^\phi({{\theta}}_i) \neq H_h^\phi({{\theta}}_j)\;{ {\rm{for}}\; {\rm{some}} } \; i \;{\rm{ and }} \;j \end{aligned} \right\} $

      In other words, we seek for statistical evidence for assessing whether at least one of the $ N $ samples has different entropy when compared to the remaining ones.

      Let $ {\widehat{{{\theta}}_i}} $ be the ML estimate for $ {{\theta}}_i $ based on a random sample of size $ n_i $ under $ {{Z}}_i $, for $ i = 1,2, \dots ,N $. From Lemma 2 we have that

      $ \frac{\sqrt{n_i}\big(H_h^\phi({\widehat{{{\theta}}_i}})-v\big)}{\sigma_H({\widehat{{{\theta}}_i}})} {\xrightarrow[n_i\rightarrow \infty]{\cal D}} {\cal N}(0,1) $

      for $ i = 1,2, \dots ,N $. Therefore,

      $ \sum\limits_{i = 1}^N\frac{n_i\big(H_h^\phi({\widehat{{{\theta}}_i}})-v\big)^2}{\sigma_H^2({\widehat{{{\theta}}_i}})} {\xrightarrow[n_i\rightarrow \infty]{\cal D}}\chi^2_r $

      Since $ v $ is, in practice, unknown, in the following we modify this test statistic in order to take this into account. We obtain:

      $ \begin{split} \sum\limits_{i = 1}^N {\frac{{{n_i}{{(H_h^\phi (\widehat {{{{\theta }}_i}}) - v)}^2}}}{{\sigma _H^2(\widehat {{{{\theta }}_i}})}}} = \,& \sum\limits_{i = 1}^N {\frac{{{n_i}{{(H_h^\phi (\widehat {{{{\theta }}_i}}) - \bar v)}^2}}}{{\sigma _H^2(\widehat {{{{\theta }}_i}})}}} \\ &+ \sum\limits_{i = 1}^N {\frac{{{n_i}{{(\bar v - v)}^2}}}{{\sigma _H^2(\widehat {{{{\theta }}_i}})}}} \end{split} $


      $ {\overline{v}} = \bigg[\sum\limits_{i = 1}^N\frac{n_i}{\sigma_H^2({\widehat{{{\theta}}_i}})}\bigg]^{-1}\sum\limits_{i = 1}^N \frac{n_i H_h^\phi({\widehat{{{\theta}}_i}})}{\sigma_H^2({\widehat{{{\theta}}_i}})} $

      Salicrú et al.[35] showed that the second summation in the right-hand side of Eq. (14) is chi-square distributed with one degree of freedom. Since the left-hand side of Eq. (14) is chi-square distributed with $ N $ degrees of freedom (cf. Eq. (13)), we conclude that:

      $ \sum\limits_{i = 1}^N\frac{n_i\big(H_h^\phi({\widehat{{{\theta}}_i}})-{\overline{v}}\big)^2}{\sigma_H^2({\widehat{{{\theta}}_i}})} {\xrightarrow[n_i\rightarrow \infty]{\cal D}} \chi^2_{N-1} $

      In particular, consider the following test statistic:

      $ S_{\phi}^h \left({\widehat{{{\theta}}_1}},{\widehat{{{\theta}}_2}},\ldots,\widehat{{{\theta}}_N}\right) = \sum\limits_{i = 1}^N\frac{n_i\big(H_h^\phi({\widehat{{{\theta}}_i}})-{\overline{v}}\big)^2}{\sigma_H^2({\widehat{{{\theta}}_i}})} $

      We are now in the position to state the following result.

      Proposition 2: Let each sample size $ n_i $, $i = 1, $$ 2,\dots,N $, be sufficiently large. If $ S_{\phi}^h({\widehat{{{\theta}}_1}},{\widehat{{{\theta}}_2}}, \dots ,\widehat{{{\theta}}_N}) =$$ s $, then the null hypothesis $ {\cal H}_0 $ can be rejected at a level $ \alpha $ if $ \Pr\bigl( \chi^2_{N-1}>s\bigr)\leq \alpha $.

      Again, this Proposition transforms a mere distance into a quantity with concrete meaning: a $ p $-value.

    • Assume we are characterizing the data by the distribution $ {\cal D}({{ \theta}}) $, so each population is completely described by a particular point $ {{ \theta}} = (\theta_1,\theta_2,\dots,\theta_q)\in{ \varTheta}\subset {\mathbb R}^q $. The information matrix of the model has entries

      $ [g_{ij}({{ \theta}})]_{1\leq i,j\leq q} = \left[ {\rm E} \left(\frac{\partial^2 \ln f(Z;{{ \theta}})}{\partial {\theta}_i \partial{\theta}_j} \right) \right]_{1\leq i,j\leq q} $

      According to Ref. [40], it can be used to measure the variance of the relative difference between two close models as

      $ {\rm{d}} s^2 = \sum\limits_{i,j = 1}^{q} g_{ij}({{ \theta}}) {\rm{d}}{\theta}_i {\rm{d}}{\theta}_j $

      This quadratic differential form is the basis for computing distances between models with

      $ s({{ \theta}}_1,{{ \theta}}_2) = \left| \int_{t_1}^{t_2} \sqrt{\sum\limits_{i,j = 1}^{q} g_{ij}({{ \theta}}(t)) \frac{{\rm{d}}{\theta}_i(t)}{{\rm{d}} t} \frac{{\rm{d}}{\theta}_j(t)}{{\rm{d}} t} {\rm{d}} t} \right| $

      where $ {{\theta}}(t) $ is a curve joining $ {{ \theta}}_1 $ and $ {{ \theta}}_2 $ such that $ {{\theta}}(t_1) = {{ \theta}}_1 $ and $ {{\theta}}(t_2) = {{ \theta}}_2 $. Among all possible curves, we are interested in the shortest one; this leads to the geodesic distance between the models. Finding such a curve is, in general, very difficult, in particular, for distributions with parameters other than $ q = 1 $.

      An alternative approach consists in considering distribution families with a scalar parameter $ { \theta}\in{ \varTheta}\subset{\mathbb R} $, i.e., with $ q = 1 $. With this restriction in mind, we can define a generalized form of distance: that based on $ h $-$ \phi $ entropies as the solution of

      $ s^h_\phi(\theta_1,\theta_2) = \left| \int_{\theta_1}^{\theta_2} \sqrt{g (\theta)} {\rm{d}}\theta \right| $


      $ \begin{split} g(\theta) =\,& h''\left( \int \phi(f(z;\theta)){\rm{d}} z \right) \\ & \cdot\left[ \int \phi'(f(z;\theta)){\rm{d}} z \frac{\partial f(z;\theta)}{\partial\theta} {\rm{d}} z \right]^2 \\ & +h'\left( \int \phi(f(z;\theta)){\rm{d}} z \right) \\ & \cdot \int \phi''(f(z;\theta)){\rm{d}} z \left[\frac{\partial f(z;\theta)}{\partial\theta} \right]^2 {\rm{d}} z \end{split} $

      In analogy with the results presented for entropies and divergences, Menéndez et al.[41] proved the following asymptotic result: consider, for simplicity, $ h(y) = y $ and $ \phi $ as in previous sections. The test statistic

      $ S^h_\phi({\widehat{\theta}}_1,{\widehat{\theta}}_2) = \frac{n_1 n_2}{n_1+n_2} \frac{s^h_\phi({\widehat{\theta}}_1,{\widehat{\theta}}_2)}{\phi'(0)} $

      has, under the null hypothesis $ \theta_1 = \theta_2 $ and for sufficiently large $ n_1 $ and $ n_2 $, $ \chi^2_1 $ asymptotic distribution.

      The classical geodesic distance is obtained when $ h $ and $ \phi $ are those leading to the Shannon entropy; cf. Tab. 2. In this case, $ g(\theta) $ becomes $ g_{11}(\theta) $, the only element of Eq. (16).

      This alternative approach is more tractable, but requires a further step, namely joining the $ q $ test statistics into a single measure of dissimilarity. After Naranjo-Torres et al.[42] showed the ability of the Shannon geodesic distance at finding edges in intensity SAR data, Frery and Gambini[43] compared three fusion techniques.

    • The most famous models for intensity SAR data are the Gamma, $ {\cal K} $, and $ {\cal{G}}^0 $ distributions. Ref. [44] presents a survey of models for SAR data. The first is adequate for fully developed speckle, while the two others can describe the texture, i.e., variations of the backscatter between pixels with an additional parameter. Their probability density functions are, respectively:

      $ f(z;\mu,L) = \frac{L^L}{\Gamma(L)\mu^L} z^{L-1} \exp\{-L z / \mu\} \hspace{17pt} $

      $ \begin{split} f(z;\alpha,\lambda,L) =\,& \frac{2L\lambda}{\Gamma(\alpha)\Gamma(L)} (L \lambda z)^{\frac{\alpha + L}{2}-1}\hspace{26pt}\\ & \cdot K_{\alpha-L} (2 \sqrt{L\lambda z}) \end{split} $

      $ f(z;\alpha,\gamma,L) = \frac{L^L \Gamma(L-\alpha)}{\gamma^\alpha \Gamma(L)\Gamma(-\alpha)} \frac{z^{L-1}}{(\gamma+L z)^{L-\alpha}} $

      where, in Eq. (19) $ \mu>0 $ is the mean; in Eq. (20) $ \lambda,\alpha>0 $ are the scale and the texture, and $ K_\nu $ is a modified Bessel function of order $ \nu $; and in Eq. (21) $ \gamma>0 $ is the scale and $ \alpha<0 $ is the texture. In all these expressions, $ L\geq1 $ is the number of looks.

      The $ {\cal{G}}^0 $ law is flexible (it can describe observations with a wide range of texture, and it can approximate both the Gamma and $ {\cal K} $ distributions) and tractable (its density does not involve Bessel functions, while the $ {\cal K} $ does). Parameter estimation under the $ {\cal{G}}^0 $ model may be difficult, however.

      Maximum likelihood estimators require iterative methods that may not converge, and estimators by analogy may not have feasible solutions. Moreover, contamination is a significant issue, mostly when dealing with small samples and/or in the presence of strong backscatter as the one produced by a double bounce.

      The models given by Eq. (19) and Eq. (21) are basilar to this work. The next subsections present properties that will be useful in the sequence.

    • Denote by $ Z\sim\Gamma(\mu,L) $ a random variable that follows the Gamma distribution characterized by density Eq. (19). The $ k $-th order moment of $ Z $ is

      $ {\rm E}\left(Z^k\right) = \Big(\frac{\mu}{L}\Big)^k \frac{\Gamma(L+k)}{\Gamma(L)} $

      The Gamma model given by Eq. (19) includes the Exponential distribution when $ L = 1 $, i.e., in the single-look case. Fig. 2 shows the densities that characterize this distribution with varying means. The linear scale (Fig. 2(a)) shows the well-known exponential decay, while in the semilogarithmic scale (Fig. 2(b)) they appear as straight lines with a negative slope.

      Figure 2.  Exponential densities with mean 1/2, 1, and 2 (red, black and blue, resp.) in linear and semilogarithmic scales

      Fig. 3 shows the effect of multilooking the data on their distribution: it shows three densities with the same mean ($ \mu = 1 $) and $ L = 1,3,8 $. The larger $ L $ becomes, the smaller the probability of extreme events, as can be seen in the semilogarithmic scale (Fig. 3(b)). Also, as $ L $ increases, the densities become more symmetric; cf. Fig. 3(a).

      Figure 3.  Unitary mean Gamma densities with 1, 3, and 8 looks (black, red, and blue, resp.) in linear and semilogarithmic scales

      Assume one has a sample of $ n $ iid $ \Gamma(\mu,L) $ deviates $ { Z} = (Z_1,Z_2,\dots,Z_n) $. Inference about $ (\mu,L) $ can be made by the analogy method using the moments given in Eq. (22). For instance, choosing to work with the first moment and the variance, one has:

      $ \hat \mu = \frac{1}{n}\sum\limits_{i = 1}^n {{Z_i}} \hspace{30pt} $

      $ {\widehat L} = \frac{\left(\dfrac{1}{n}\displaystyle\sum_{i = 1}^{n}Z_i\right)^2}{\dfrac{1}{n}\displaystyle\sum_{i = 1}^{n}(Z_i-{\widehat\mu} )^2} $

      i.e., the sample mean and the square of the reciprocal sample coefficient of variation.

      Usually, one selects samples from areas where the Gamma model provides a good fit and estimates the number of looks which, then, is employed for the whole image.

      As an alternative to any analogy method, one may rely on maximum likelihood estimation. From Eq. (19), one has that the reduced loglikelihood of the sample $ { Z} $ is

      $ \begin{split} \ell(\mu,L; { Z}) & = L \ln L - \ln \Gamma(L) - L \ln \mu \\ &\quad+\frac{L}n \sum_{i = 1}^{n} \ln Z_i + \frac{L}{\mu n} \sum_{i = 1}^{n} Z_i \end{split} $

      The maximum likelihood estimator of $ (\mu,L) $ is the maximum of Eq. (25). This maximum may be found by optimization of Eq. (25), or by solving the system of equations given by $ \nabla \ell = { 0} $.

      This second approach to obtaining the maximum likelihood estimator of $ (\mu,L) $ leads to Eq. (23) and to the solution of

      $ \ln {\widehat L} + 1 - \psi({\widehat L}) - \ln {\widehat\mu} + \frac{1}{n} \sum\limits_{i = 1}^{n} \ln Z_i + \frac{1}{{\widehat\mu} n} \sum\limits_{i = 1}^{n} Z_i = 0 $

      where $ \psi $ is the digamma function.

      Either way, optimization or finding the roots of Eq. (26), one relies on numerical procedures that, more often than not, require a starting point or initial solution. Moment estimators are convenient initial solutions for this.

    • Frery et al.[45] noticed that the $ {\cal K} $ distribution failed at describing data from extremely textured areas as, for instance, urban targets. The authors then proposed a different model for the backscatter $ X $: the Reciprocal Gamma distribution.

      We say that $ X\sim{\Gamma^{-1}}(\alpha,\gamma) $, with $ \alpha<0 $ and $ \gamma>0 $ follows a Reciprocal Gamma distribution is characterized by the density

      $ f_X(x;\alpha,\gamma) = \frac{\gamma^{-\alpha}}{\Gamma(-\alpha)} x^{\alpha-1} \exp\{-\gamma/x\} $

      for $ x>0 $ and zero otherwise.

      Now introducing the Reciprocal Gamma model for the backscatter in the multiplicative model, i.e. by multiplying the independent random variables $ X\sim{\Gamma^{-1}}(\alpha,\gamma) $ and $ Y\sim\Gamma(1,L) $, one obtains the $ {\cal G}^0 $ distribution for the return $ Z = XY $, which is characterized by the density Eq. (21). It is noteworthy that, differently from Eq. (20), this density does not involve Bessel functions.

      We denote $ Z\sim {\cal{G}}^0(\alpha,\gamma,L) $ the situation of $ Z $ following the distribution characterized by Eq. (21). The $ k $-order moments of $ Z $ are

      $ {\rm E}(Z^k) = \Big(\frac{\gamma}{L}\Big)^{k} \frac{\Gamma(L+k)\Gamma(-\alpha-k)}{\Gamma(L)\Gamma(-\alpha)} $

      provided $ -\alpha>k $, and infinite otherwise. Eq. (28) is useful, among other applications, for finding $ \gamma^* = -\alpha-1 $, the scale parameter that yields a unitary mean distribution for each $ \alpha $ and any $ L $. It also allows forming systems of equations for estimating the parameters by analogy.

      Fig. 4 shows $ {\cal{G}}^0(\alpha,\gamma^*, 1) $ densities along with an Exponential density also with unitary mean The differences in tail behavior are clearly exhibited in the semilogarithmic scale; cf. Fig. 4(b). Whereas the exponential distribution decreases linearly, the $ {\cal{G}}^0 $ law assigns more probability to larger events increasing, thus, the variability of the return.

      Figure 4.  Densities in linear and semi-logarithmic scale of the ${\rm E}(1) $ (black) and $ {{\cal{G}}^0} $ distributions with unitary mean and $ \alpha\in\{-1.5,-3.0,-8.0\} $ in red, green, and blue, resp

      Fig. 5 shows the effect of varying the number of looks, for the same $ \alpha = 5 $ and $ \gamma = 4 $.

      Figure 5.  Densities in linear and semilogarithmic scale $ {\cal{G}}^0(-5,4,L) $ distributions with unitary mean and $ L\in\{1,3,8\} $ in red, green, and blue, resp

      Notice, again, in Fig. 5(b) the effect multilook processing has mostly on the distribution of minimal values. Such effect, along with the reduced probability of huge values, have with multilook processing, yields less contrasted images.

      The $ {\cal G}^0 $ distribution has the same number of parameters as the $ {\cal K} $ law, but it has been shown to be more apt at modeling return with extreme variability. Moreover, it is also able to describe the same kind of return from textured areas for which the latter was proposed; Mejail et al.[10] showed that, with proper choices of parameters, the $ {\cal{G}}^0 $ law is able to approximate with any error any $ {\cal K} $ distribution. For these reasons, the $ {\cal{G}}^0 $ distribution is called Universal Model for SAR data.

      The $ {\cal{G}}^0 $ distribution relates to the well-known Fisher-Snedekor law in the following manner:

      $ F_{{\cal{G}}^0(\alpha,\gamma,L)}(t) = \Upsilon_{2L,- 2\alpha}(- \alpha t/\gamma) $

      where $ \Upsilon_{u,v} $ is the cumulative distribution function of a Fisher-Snedekor distribution with $ u $ and $ v $ degrees of freedom, and $ F_{{\cal{G}}^0(\alpha,\gamma,L)} $ is the cumulative distribution function of a $ {\cal{G}}^0(\alpha,\gamma,L) $ random variable. Notice that $ \Upsilon $ is readily available in most software platforms for statistical computing. Since such platforms usually also provide implementations of the inverse of cumulative distribution functions, the Inversion Theorem can be used to sample from the $ {\cal{G}}^0 $ law.

      Assume that we have a sample $ { Z} = (Z_1,Z_2,\dots, $$Z_n) $ of iid random variables that follow the $ {\cal{G}}^0(\alpha,\gamma,{\widehat L}) $ distribution, i.e., $ L $ has been already estimated by $ {\widehat L} $. The unknown parameter $ \theta = (\alpha,\gamma) $ lies in $ { \varTheta} = {\mathbb R}_- \times {\mathbb R}_+ $.

      The maximum likelihood estimator of $ (\alpha,\gamma) $, is any point that maximizes the reduced log-likelihood:

      $ \begin{split} \ell(\alpha,\gamma;{\widehat L}, { Z}) =\,& \ln \frac{\Gamma({\widehat L}-\alpha)}{\gamma^\alpha \Gamma(-\alpha)} + {\widehat L} \sum_{i = 1}^n \ln\frac{Z_i}{\gamma+{\widehat L} Z_i} \\ & +\alpha \sum_{i = 1}^n \ln (\gamma + {\widehat L} Z_i) \end{split} $

      provided it lies in $ { \varTheta }$. Maximizing Eq. (30) might be a difficult task, in particular in textureless areas where $ \alpha\to-\infty $ and the $ {\cal{G}}^0 $ distribution becomes very close to a Gamma law.

      The $ {\cal{G}}^0 $ distribution offers several sampling approaches, and Chan et al.[46] compared them for a variety of situations and platforms regarding accuracy and precision.

    • The three models for intensity data presented in Eqs. (19)—(21) have one parameter in common: $ L $, the number of looks. Such feature is absent in empirical models[44], and describes the number of independent samples used to form the observation $ Z $ at each pixel. It is a direct measure of the signal-to-noise ratio and, thus, of the visual quality of the image.

      Although it is usually part of the image information, such nominal value is often larger than the evidence provided by the data. It is, thus, important to estimate this quantity in order to make a fair statistical description of the observations.

      Fig. 6 shows a $ 400\times1000 $ pixels intensity image obtained by the ESAR sensor[47] sensor in L-band and VV polarization. The area consists of agricultural parcels, a river (the dark curvilinear feature to the right), and a bright urban area to the lower right. The data have been equalized for visualization purposes, and we also show a regular grid comprised of squares of $ 25\times25 $ pixels that aids obtaining disjoint samples.

      Figure 6.  Equalized intensity data with grid

      Assume we have collected $ N $ samples $ { z}_1,{ z}_2,\dots, $${ z}_N $ of the form $ { z}_j = z_{1,j},z_{2,j}, \dots, z_{n_j,j} $ for every $ 1\leq j\leq N $ from areas which do not exhibit any evidence of not belonging to the Gamma distribution (at least, they do not show texture). These samples might be of different sizes $ n_1,n_2,\dots,n_N $. Among the many possible approaches for estimating the number of looks, we will consider two of the simplest and most widely used in the literature. In both cases, we will assume that a $ \Gamma(\mu_j,L) $ model holds for all the samples, i.e., that the mean may vary but not the number of looks.

      In general, it is preferred to take several small disjoint samples with some spatial separation. This alleviates the problem of using correlated rather than independent observations.

      The forthcoming results were obtained with $ N = 12 $ samples of equal size $ n = 625 $ each.

      (1) Weighted mean of estimates:Use each sample in Eqs. (23) and (24) to obtain the $ N $ estimates $ {\widehat L}_1,{\widehat L}_2, \dots ,{\widehat L}_N $, then compute the final estimate by the weighted average

      $ {\widehat L}_{\rm{wme}} = \frac{\displaystyle\sum_{j = 1}^{N} n_j {\widehat L}_j }{\displaystyle\sum_{j = 1}^{N} n_j} $

      With the data collected over the image shown in Fig. 6 we obtained $ {\widehat L}_{\rm{wme}} = 0.884 $.

      (2) Regression: From Eq. (22) we have that $ {\rm Var}(Z) = \mu^2/L $, then the coefficient of variation $ {\rm CV} $ of $ Z $ is $ {\rm CV}(Z) = \sqrt{{\rm Var}(Z)}/\mu = L^{-1/2} $. With this, we may form the equation $ \mu = \sqrt{L {\rm Var}(Z)} $.

      We now compute the sample mean $ \bar{\mu}_j $ and variance $ s^2_j $ of each sample $ { z}_j $, and form the following regression model data:

      $ \begin{align} & \bar{\mu}_1 = \sqrt{L} s_1 + \varepsilon_1,\\ & \bar{\mu}_2 = \sqrt{L} s_2 + \varepsilon_2,\\ & \qquad\quad \vdots\\ & \bar{\mu}_N = \sqrt{L} s_N + \varepsilon_N \end{align} $

      Notice that, although equivalent, we preferred to denote the sample mean as $ \bar{\mu}_j $ instead of $ \widehat{\mu}_j $ to elucidate the difference between simple mean and parameter estimation. Assuming further that the errors $ \varepsilon_1, \varepsilon_2, \dots ,\varepsilon_N $ are iid zero mean Gaussian random variables with unknown variance, we are now in position to apply a simple regression model without intercept to estimate $ L $.

      Denote the dependent variable$\overline {{\mu }} = $ $ {\left( {{{\bar \mu }_1}},{{{\bar \mu }_2}}, \dots,{{{\bar \mu }_N}}\right)^{\rm{T}}}$, the independent variable ${{s}} = $$ {\left( {{s_1}},{{s_2}}, \dots, {{s_N}} \right)^{\rm{T}}}$, and the errors ${{\varepsilon }} = {\left( {{\varepsilon _1}},{{\varepsilon _2}}, \dots, {{\varepsilon _N}} \right)^{\rm{T}}}$, where the superscript “$ \rm{T} $” denotes the transpose. The model is $ \bar{ \mu} = L^{1/2} s + \varepsilon $, and the estimate of $ L $ by regression is then given by

      $ {\widehat{L}}_{\rm{r}} = \left(\frac{ ({{s}})^{\rm{T}} \bar{{{\mu}}} }{ ({{s}})^{\rm{T}} {{s}} }\right)^2 $

      Fig. 7 shows the points $ (\bar{\mu}_i, s_i) $, the adjusted regression line, and its confidence band at the 95%. With this approach we obtained $ {\widehat{L}}_{\rm{r}} = 0.756 $.

      Figure 7.  Regression analysis for the estimation of the equivalent number of looks

      Since the equivalent number of looks is assumed, in many cases, constant over the whole image, it is of paramount importance to make a careful estimation.

      In the two following sections, the notion of “contrast” is the difference of statistical properties between two samples.

    • Besides providing estimators with excellent properties, the likelihood function can also be used to find edges. The approach proposed in Refs. [4850] consists in forming a likelihood that models a strip of data, and finding the point where it is maximized. This point will, then, be the estimator of the edge position.

      Fig. 8 illustrates the rationale behind this proposal. It shows strips of $ 10\times500 $ pixels with samples from two $ {\cal{G}}^0 $ distributions: ${\cal{G}}^0(-1.5, $$ 0.5, 3.0) $ in the left half, and $ {\cal{G}}^0(\alpha_2, -\alpha_2-1, 3) $ in the right half, with $ \alpha_2\in\{-20,-10,-5,-3\} $. The two halves seem distinguishable after image equalization in the first three cases, but the reader must notice that they have the same unitary mean. With this, techniques that rely on the classical notion of contrast are doomed to failure.

      Figure 8.  Strips of 10 × 500 pixels with samples from two $ {\cal{G}}^0 $ distributions

      Instead of relying on the mean, the Gambini Algorithm aims at finding the point where the total log-likelihood of the samples is maximum. It is sketched in Algorithm 1.

      Algorithm 1: The Gambini Algorithm for edge detection using maximum likelihood

      Data: A strip of data $ { z} = (z_1,z_2,\dots,z_N) $ with possibly an edge; an estimate of the number of looks $ {\widehat L} $

      Result: An estimate of the edge position

      Obtain a strip of data $ { z} $

      for $ j $ in each position do

        form the left sample $ { z}_{\rm{L}}=(z_1,z_2,\dots,z_j) $;

        form the right sample $ { z}_{\rm{R}}=(z_{j+1},z_{j+2},\dots,z_N) $;

        estimate $ ({\widehat\alpha},{\widehat\gamma})_{\rm{L}}({ z}_{\rm{L}}) $;

        estimate $ ({\widehat\alpha},{\widehat\gamma})_{\rm{L}}({ z}_{\rm{R}}) $;

        Compute the total log-likelihood using Eq. (30):

        ${\rm{tll}}(j) = \ell(({\widehat\alpha},{\widehat\gamma})_{\rm{L}};{\widehat L}, { z}_{\rm{L}}) + \ell(({\widehat\alpha},{\widehat\gamma})_{\rm{R}};{\widehat L}, { z}_{\rm{R}}) $


      Find the point $ j^\star $ which maximizes the total log-likelihood.

      Fig. 9 shows three total log-likelihoods. The bigger the difference between the parameters is, the more pronounced the maximum is. Notice that all the maxima are close to the correct edge position, which is located at $ j = 250 $. The two pronounced local minima are due to the lack of convergence of the optimization algorithm.

      Figure 9.  Illustration of edge detection by maximum likelihood

      This approach and the likelihoods ratio are closely related, with the former being the denominator of a test statistic of the latter.

      It is noteworthy that the Gambini Algorithm is flexible, as it allows using any measure in place of the total log-likelihood function.

    • The previous approach at edge detection maximizes the total likelihood of the two strips. In this sense, the total log-likelihood is the function to be optimized.

      Another approach consists in maximizing a distance between the models to the right and to the left of the moving point. The rationale is the same as before: the models will be maximally different when the samples are not mixed, i.e., at the point which separates them into two different parts.

      Algorithm 2 sketches an implementation of this approach.

      Algorithm 2: The Gambini Algorithm for edge detection using a distance between models

      Data: A strip of data $ { z} = (z_1,z_2,\dots,z_N) $ with possibly an edge; an estimate of the number of looks $ {\widehat L} $

      Result: An estimate of the edge position

      Obtain a strip of data $ { z} $;

      for $ j $ in each position do

        form the left sample $ { z}_{\rm{L}}=(z_1,z_2,\dots,z_j) $;

        form the right sample $ { z}_{\rm{R}}=(z_{j+1},z_{j+2},\dots,z_N) $;

        estimate $ ({\widehat\alpha},{\widehat\gamma})_{\rm{L}}({ z}_{\rm{L}}) $;

        estimate $ ({\widehat\alpha},{\widehat\gamma})_{\rm{L}}({ z}_{\rm{R}}) $;

        Compute a distance between the models:

        ${s(j) = s(f(({\widehat\alpha},{\widehat\gamma})_{\rm{L}};{\widehat L}) , f(({\widehat\alpha},{\widehat\gamma})_{\rm{R}};{\widehat L})) }$


      Find the point $ j^\star $ which maximizes the distance between the models.

      The following points are noteworthy in Algorithm 2:

      • The samples $ { z}_{\rm{L}} $ and $ { z}_{\rm{R}} $ only enter once, when computing the estimates $ ({\widehat\alpha},{\widehat\gamma})_{\rm{L}} $ and $ ({\widehat\alpha},{\widehat\gamma})_{\rm{R}} $, respectively. This leads to faster algorithms than those based on the total log-likelihood, which use the sample twice.

      • If the distance $ s $ belongs to either the class of $ h $-$ \phi $ divergences or geodesic distances can be turned into a test statistic. With this, there is an important additional piece of information: the answer to the question, “Is there enough evidence to consider $ j^\star $ an edge?”

      • Again, any model is suitable for this approach.

      Naranjo-Torres et al.[42] used the Shannon geodesic distance, obtaining remarkable results. They also used this distance to measure the separability of classes.

    • In this section, “contrast” is the lack of adequacy of the model with respect to the observed data.

      Nascimento et al.[23] computed several of the distances and statistical tests presented in Sec. 4.2 Being the null hypothesis that two samples come from the same $ {\cal{G}}^0 $ law, they used these statistics as contrast measures.

      The $ h $-$ \phi $ family of statistical tests is not the only way to measure the difference between samples. Still under the $ {\cal{G}}^0 $ model, Naranjo-Torres et al.[42] computed the Shannon Geodesic distance and, using results from Ref. [41] derived another class of tests useful for edge detection. Frery and Gambini advanced these results[43], proposing tests based on the composition of marginal evidence. These tests proved their usefulness in the detection of edges, even in situations of very low contrast.

      Noting that the histogram is a stable density estimator, Gambini et al.[51] proposed an estimation technique which consists in minimizing a distance between the $ {\cal{G}}^0 $ density and a smoothed version of the histogram. The authors analyzed several smoothing techniques and distances obtained from the symmetrization of $ h $-$ \phi $ divergences. The proposal consists of using kernel smoothing, and the triangular distance.

      Fig. 10 illustrates the procedure. The density that characterizes the model $ Z\sim{\cal{G}}^0(-2,1,3) $ is shown in light blue, along with the histogram of a sample of size $ 500 $ from $ Z $. The two black densities are the starting point (dashed line), and an intermediate step. The final estimate is the density in red.

      Figure 10.  Illustration of parameter estimation by distance minimization

      Notice that this procedure may be used with any model.

    • In this section, the “contrast” is inversely proportional to the similarity between two samples.

      The Nonlocal Means approach is a new way of handling noise reduction[52,53]. The idea is forming a large convolution mask that, differently from the classical location-invariant approach, is computed for every pixel to be filtered. The weight at position $ (k,\ell) $ is proportional to a measure of similarity between the observation $ z_{k,\ell} $ and $ z_{i,j} $, the pixel to be filtered. Such similarity is usually computed using the information in surrounding estimation windows (patches) $ \partial_{k,\ell} $ and $ \partial_{i,j} $ around the pixels. The procedure typically spans a large (search) window, much larger than in typical applications.

      Fig. 11 illustrates this idea with a search window of size $ 9\times9 $ around the central pixel $ z_{i,j} $, and patches of size $ 3\times3 $.

      Figure 11.  Illustration of the Nonlocal Means approach

      The central weight of the convolution mask, $ w_{i,j} $ is set to 1. The weight $ w_{k,\ell} $ will be a measure of the similarity between the values around $ z_{k,\ell} $ and $ z_{i,j} $, highlighted with dark lines. Such similarity may be computed directly between the observations, i.e., $ w_{k,\ell} = f((z_{i-m,j-n},z_{k-m,\ell-n});-1\leq m,n,\leq 1) $, or between descriptors of each patch: $w_{k,\ell} = $$ g(\! f\!(z_{i-m,j-n};\!-1\!\leq\!\! m,n,\!\leq\!\! 1\!),\! f\!(z_{k-m,\ell-n};\!-1\!\!\leq\! m,n,\!\leq \!\!1)\! ) $.

      Teuber and Lang[54] studied the properties of weights designed for the multiplicative model and how they behave in the presence of additive noise. Penna and Mascarenhas[55] included a Haar decomposition of the weights.

      Ferraioli et al.[56] took a different approach. Instead of measuring the distance between the patches samples, they assumed a Square-Root-of-Gamma distribution for the data. If $ Z $ follows a $ \Gamma $ distribution as in Eq. (19), then $ Z^{1/2} $ is Square-Root-of-Gamma distributed. Furthermore, the ratio of two independent of such random variables follows an Amplitude $ {\cal{G}}^0 $ distribution. Using this property, the authors checked how pertinent is the hypothesis that the two samples belong to the same distribution and used this information to compute the weights.

      The Nonlocal Means approach is flexible enough to allow using any conceivable model for the data, and many ways of comparing the samples. In Section 8.1, we comment on applications of this technique for PolSAR data.

    • Polarimetric data are more challenging than their intensity counterparts. A fully polarimetric observation in a single frequency is a $ 3\times3 $ complex Hermitian positive definite matrix. The basic model for such observation is the Wishart distribution, whose probability density function is

      $ f({ z}; \varSigma, L) = \frac{L^{3L} |{ z}|^{L-3}}{|\varSigma|^L \Gamma_3(L)} \exp\{-L {\rm Tr}(\varSigma^{-1} { z})\} $

      where $ L\geq3 $, $ \Gamma_3(L) = \pi^3 \displaystyle\prod_{i = 0}^{2} \Gamma(L-i) $, and $ {\rm Tr} $ is the trace operator. We denote this situation $ { Z}\sim\mathcal W(\Sigma, L) $. Refs. [57,58] provide details and properties of this distribution.

      The Wishart model is the polarimetric version of the Gamma distribution. Similarly to what was presented in Section 5, there are Polarimetric $ {\cal K} $ and $ {\cal{G}}^0 $ distributions. The reader is referred to Ref. [59] for a survey of models for Polarimetric SAR data.

      Consider the sample $ { Z} = { Z}_1,{ Z}_2,\dots, { Z}_n $ from the $ \mathcal W(\varSigma,L) $ distribution. Its reduced log-likelihood is given by

      $ \begin{split} \ell(\varSigma, L; { Z}) =\, & 3 L \ln L + \frac{L}{n} \ln \sum\limits_{i = 1}^n |{ Z}_i | - L \ln |\varSigma| \\ & - \ln \Gamma_3(L) - \frac{L}{n} \sum\limits_{i = 1}^n {\rm Tr}(\varSigma^{-1} Z_i) \end{split} $

      The maximum likelihood estimator of $ (\varSigma,L) $, based on the sample $ { Z}_1,{ Z}_2,\dots, { Z}_n $ under the Wishart can be obtained either by maximizing Eq. (32), or by solving $ \nabla\ell = { 0} $, which amounts to computing:

      $ {\widehat \varSigma} = \frac{1}{n} \sum\limits_{i = 1}^{n} { Z}_i, \; {\rm{and}}\; {\rm{the}} \;{\rm{solution}} \;{\rm{of}} \hspace{38pt} $

      $ 3 \ln {\widehat L} + \frac{1}{n} \sum\limits_{i = 1}^{n} \ln |{ Z}_i| - \ln |{\widehat \varSigma}| - \psi^{(0)}_3 ({\widehat L}) = 0 $

      where $ \psi^{(0)}_3 $ is the zero-order term of the $ \nu $-th order multivariate polygamma function $\psi^{(0)}_3(L) = $$ \displaystyle\sum\nolimits_{i = 0}^2 \psi^{(\nu)} (L-1), $ and $ \psi^{(\nu)} $ is the ordinary polygamma function:

      $ \psi^{(\nu)}(L) = \frac{\partial^{\nu+1} \ln\Gamma(L)}{\partial L^{\nu+1}} $

      Numerical software as, for instance, Ref. [60] provides these special functions.

      Section 6.1 presented two simple alternatives for the estimation of $ L $ using intensity data under the Gamma model. The same estimation is more delicate when dealing with polarimetric data. The reader is referred to the work by Anfinsen et al.[61] for a comprehensive discussion of this topic.

      Ref. [24] is one of the pioneering works dealing with measures of contrast between Wishart matrices. Frery et al.[62] derived explicit expressions of several tests based on $ h $-$ \phi $ divergences between these models, and applied them to image classification.

      Analogously to $ h $-$ \phi $ divergences, it is possible to define $ h $-$ \phi $ entropies. These entropies can also be turned into test statistics with known asymptotic distribution[63]. Frery et al.[38] derived several of these entropies under the Wishart model and studied their finite-size sample behavior. Nascimento et al.[64], using these results, proposed change detection techniques for PolSAR imagery.

      It is noteworthy that both $ h $-$ \phi $ divergences and entropies, under the Wishart model, solely rely on two operations over the covariance matrix $ \varSigma $: Inversion $ \varSigma^{-1} $, and the determinants $ |\varSigma| $ and $ |\varSigma^{-1}| $. Coelho et al.[65] obtained accelerated algorithms for computing them by exploiting their property of being Hermitian positive definite.

      The Wishart distribution characterized by Eq. (31) can be posed in more general way, namely

      $ f({ z}; \varSigma, L) = \frac{L^{pL} |{ z}|^{L-p}}{|\varSigma|^L \Gamma_p(L)} \exp\{-L {\rm Tr}(\varSigma^{-1} { z})\} $

      where $ p $ is the number of polarizations. With this, the dual- and quad-pol cases are contemplated by setting $ p = 2 $ and $ p = 4 $, respectively.

    • We already described the Nonlocal Means approach for speckle reduction in Section 6.5.

      Torres et al.[66] used test statistics for computing the weights for polarimetric data filtering. The proposed approach consists of:

      (1) Assuming the Wishart model for the whole image.

      (2) Computing the maximum likelihood estimator in each patch.

      (3) Computing a stochastic distance between the Wishart models indexed by these estimates; the authors employed the Hellinger distance.

      (4) Turning the test statistic into a $ p $-value and then applying a soft function. This yields the weights that, after scaling to add 1, are applied as a convolution mask.

      This technique produced good results concerning noise reduction and details and radiometric preservation.

      Later, Deledalle et al.[67] extended this idea to other types of SAR data. Chen et al.[68] used a likelihood ratio test, while Zhong et al.[69] employed a Bayesian approach.

      As previously said Nonlocal Means filtering constitutes a very general framework for noise reduction. In particular, patches can be made adaptive to obtain further preservation of details [70].

    • This section describes an application where “contrast” is a measure of how different two samples are.

      Silva et al.[71] proposed the following approach for PolSAR image classification:

      (1) Obtain training samples from the image using all available information. Call these samples prototypes, and with them, estimate the parameters of Wishart distributions.

      (2) Apply transformations to the original image in order to make it a suitable input for a readily available segmentation software.

      (3) Tune the software parameters to obtain a segmentation with an excessive number of segments.

      (4) Estimate the parameters of a Wishart distribution with each of these segments.

      (5) Assign each segment to the prototype with minimum distance.

      The advantage of this approach over other techniques is that steps (2) and (3) above can be applied with any available software, regardless of their appropriateness to PolSAR data.

      Negri et al.[16] used stochastic distances to enhance the performance of region-based classification algorithms based on kernel functions. They also advanced the knowledge about the influence of specifying an erroneous (smaller) number of classes on the overall procedure.

      Gomez et al.[72,73] also used such distances for PolSAR image classification using diffusion-reaction systems and machine learning.

    • Sections 6.2 and 6.3 described two techniques for edge detection based on the maximization of, respectively, the total log-likelihood and a distance between the distributions of the data. This idea was extended to the more difficult case of finding edges between regions of polarimetric data.

      Nascimento et al.[74] proposed edge detectors using test statistics based on stochastic distances and compared the results of several distances with those produced by the total likelihood.

      Edge detectors based on stochastic distances provide the same or better results than the one based on the Wishart likelihood, and at a lower computational cost.

      This work showed that each function to be maximized (negative total log-likelihood or any of the distances) provides results which, if fused, may lead to an improved edge detector. This is the avenue that Ref. [75] starts to explore.

    • The “contrast” here is a measure of how different the original data and the transformed sample are.

      With the aid of stochastic distances between Wishart models, Bhattacharya et al.[76] proposed a modification of the Yamaguchi four-component decompositqion. This technique alleviates problems related to urban areas aligned with the sensor line of sight and was incorporated into the freely available PolSARpro v. 6 educational software distributed by ESA, the European Space Agency.

    • “Contrast” here is proportional to the change experienced between samples along time.

      Consider the situation of having two samples: $ { Z}^{(1)} = { Z}_1,{ Z}_2,\dots, { Z}_m $, and $ { Z}^{(2)} = { Z}_{m+1},{ Z}_{m+2},\dots, $${ Z}_{m+n} $ from the Wishart distribution, and the null hypothesis that they come from the same law versus the alternative that there two different underlying Wishart laws, one for each sample. Elucidating this situation can be posed as a hypothesis test: $ H_0: { Z}_1,{ Z}_2,\dots,{ Z}_{m}, { Z}_{m+1}, { Z}_{m+2},\dots, { Z}_{m+n} $ come from the $ \mathcal W( \varSigma, L) $, and $ H_1:{ Z}_1, { Z}_2, \dots ,{ Z}_{m} $ come from $ \mathcal W(\varSigma_1, L_1) $ while $ { Z}_{m+1},{ Z}_{m+2}, \dots, { Z}_{m+n} $ come from $ \mathcal W(\varSigma_2, L_2) $. The maximum likelihood estimators defined in Eq. (33) and (34) for these two situations are, thus, $ ({\widehat{\varSigma}}^{(0)}, {\widehat{L}}^{(0)})({ Z}_1,{ Z}_2, \dots ,$${ Z}_{m+n}) $ under $ H_0 $, and $({\widehat{\varSigma}}^{(1)},{\widehat{L}}^{(1)}) ({ Z}_1,{ Z}_2,\dots, { Z}_{m}) $ and $ ({\widehat{\varSigma}}^{(2)},{\widehat{L}}^{(2)})({ Z}_{m+1},{ Z}_{m+2}, \dots ,{ Z}_{m+n}) $ under $ H_1 $.

      The log-likelihoods ratio statistic defined in Section 4.1 becomes, under the Wishart model using Eq. (32):

      $ \begin{split} &3\left[ {\widehat L}^{(1)} \ln {\widehat L}^{(1)} + {\widehat L}^{(2)} \ln {\widehat L}^{(2)} - {\widehat L}^{(0)} \ln {\widehat L}^{(0)} \right] \\ &\quad + \frac{{\widehat L}^{(1)}}{m} \ln\sum\limits_{i=1}^{m} | Z_i| + \frac{{\widehat L}^{(2)}}{n} \ln\sum\limits_{i=m+1}^{m+n} | Z_i| \\ &\quad - \frac{{\widehat L}^{(0)}}{m+n}\ln\sum\limits_{i=1}^{m+n} | Z_i| - {\widehat L}^{(1)} \ln \big|{\widehat \varSigma}^{(1)}\big| - {\widehat L}^{(2)} \ln \big|{\widehat \varSigma}^{(2)}\big| \\ &\quad+ {\widehat L}^{(0)} \ln \big|{\widehat \varSigma}^{(0)}\big| - \ln \Gamma_3\left({\widehat L}^{(1)}\right) - \ln \Gamma_3\left({\widehat L}^{(2)}\right) \\ &\quad + \ln \Gamma_3({\widehat L}^{(0)}) - \frac{{\widehat L}^{(1)}}{m} \ln\sum\limits_{i=1}^{m} {\rm Tr}\left([{\widehat \varSigma}^{(1)}]^{-1} Z_i\right) \\ &\quad - \frac{{\widehat L}^{(2)}}{n} \ln\sum\limits_{i=m+1}^{m+n} {\rm Tr}\left([{\widehat \varSigma}^{(2)}]^{-1} Z_i \right) \\ &\quad+ \frac{{\widehat L}^{(0)}}{m+n} \ln\sum\limits_{i=1}^{m+n} {\rm Tr}\left([{\widehat \varSigma}^{(0)}]^{-1} Z_i \right)\\[-20pt] \end{split} $

      Conradsen et al.[77] and Nielsen et al.[78] based their change detection techniques on this test.

      Notice that, differently from all tests based on stochastic distances, Eq. (36) allows its generalization to contrasting $ H_0 $ against an alternative that includes several samples $ { Z}^{(1)},{ Z}^{(2)},{ Z}^{(3)},{ Z}^{(4)},\dots $ in which at least one comes from a different model than the rest. Such extension amounts to adding more terms to Eq. (36) depending on $ ({\widehat{\varSigma}}^{(3)},{\widehat{L}}^{(3)} ), $$ ({\widehat{\varSigma}}^{(4)},{\widehat{L}}^{(4)}),\dots\,$ .

      The recent availability of a plethora of multi-temporal data leads to considering tests that allow checking whether a change has occurred at any instant of the sequence of observations.

      Nascimento et al.[38], using the results presented in Section 4.3, devised a test statistic based on entropies for such situation. The advantage of such a test over the ratio of likelihoods is twofold, namely, (i) it is more general, as it may comprise any $ h $-$ \phi $ entropy, and (ii) it is faster.

    • A polarimetric observation can be projected onto the surface of a $ 16 $ dimensional sphere. Ratha et al.[79] used this mapping, known as Kennaugh representation, to measure geodesic distances between observations and prototypical backscatterers. We call these measures of dissimilarity deterministic because the reference, i.e., the prototypes, is a point rather than a model. The first application targeted unsupervised classification, but they were later used to devise new built-up[80] and vegetation[81] indexes. Such distances are also useful for unsupervised classification in an arbitrary number of classes[79]. These ideas led to the proposal of a new decomposition theorem[82].

      A recent development[83] provides evidence that the distance between carefully selected samples and their closest elementary backscatterer follows a Beta distribution.

    • This review presented, in an unified manner, applications of measures of contrast to problems that arise in SAR and PolSAR image processing and analysis. Those problems are: filter assessment (Section 2), edge detection (Sections 6.2, 6.3 and 8.3), parameter estimation (Section 6.4), noise reduction (Sections 6.5 and 8.1), segments classification (Section 8.2), polarimetric decomposition (Section 8.4), and change detection (Section 8.5).

      Statistical tests derived from divergences, entropies, and geodesic distances are beginning to be exploited by the Remote Sensing community. These techniques have several important properties, among them excellent performance, even with small samples. Such feature might be useful in a problem that we did not tackled yet, namely, target detection.

      Among the questions which, to this date, are open, one may list the following:

      • Is there any gain in using other less known divergences?

      • How is the behavior of change detection techniques based on quadratic differences of entropies other than the Shannon and Rényi, e.g., Arimoto and Sharma-Mittal?

      • The asymptotic properties of these tests require the use of maximum likelihood estimators. How do they behave when other inference techniques, e.g., robust estimators, estimators by analogy, are used?

      • How do these techniques perform when there is no explicit model, and it is inferred from the data with, for instance, kernel estimators?

      • How do these test statistics perform under several types of contamination?

      An important aspect that has not been included in this review is the spatial correlation. Such a feature is responsible for, among other characteristics, a higher-level perception of texture. We refer the reader to the work by Yue et al.[84] for a modern model-based approach to texture description and simulation in SAR imagery.

      The approach based on deterministic distances is fairly new, and it has already shown excellent performance. This kind of data representation in terms of their dissimilarity to prototypical backscatterers is flexible and extensible, as it accepts any number of reference points as input.

      We consider that one of the most significant challenges in this area is integrating these techniques with production software as, for instance, PolSARpro and SNAP.

      As a final thought, one may ask why bothering with such modeling in a scenario where the Deep Learning approach is taking over. The answer is simple. The statistical approach provides more than results: if well employed, it produces answers that can be interpreted. This is a significant advantage, which, in turn, is not orthogonal to the other; statistical evidence may constitute an information layer to be processed by neural networks.

    • We used Ref. [60] v. 3.1 to produce the plots and statistical analyses, on a MacBook Pro with 3.5 GHz Dual-Core Intel Core i7 processor, 16 GB of memory running Mac OS Catalina v. 10.15.2 operating system.

      The code, data, and $\rm{\LaTeX} $ sources are available upon request to the author.

    • This work is only possible thanks to my coauthors’ talent, patience, and kindness. Listing all of them will make me incur unfair omissions, so I prefer to let the reader check on the cited papers the names of those who were instrumental for the development of this body of research.

参考文献 (84)