Robust Model-Based Clustering Using the Symmetric alpha-Stable Distribution for Measurement Error

Moradi, Mozhgan; Zarei, Shaho

doi:10.61186/jss.18.1.11

[Home ] [Archive]

[ فارسی ]

مجله علوم آماری – نشریه علمی پژوهشی انجمن آمار ایران

Main Menu

Home

Journal Information

Articles archive

For Authors

For Reviewers

Registration

Ethics Considerations

Contact us

Site Facilities

Search in website

Receive site information

Indexing and Abstracting

Social Media

Licenses

This Journal is licensed under a Creative Commons Attribution NonCommercial 4.0
International License
(CC BY-NC 4.0).

Similarity Check Systems

Volume 18, Issue 1 (8-2024)

JSS 2024, 18(1): 0-0

Back to browse issues page

Robust Model-Based Clustering Using the Symmetric alpha-Stable Distribution for Measurement Error

Mozhgan Moradi

, Shaho Zarei ^*

Abstract: (2694 Views)

Model-based clustering is the most widely used statistical clustering method, in which heterogeneous data are divided into homogeneous groups using inference based on mixture models. The presence of measurement error in the data can reduce the quality of clustering and, for example, cause overfitting and produce spurious clusters. To solve this problem, model-based clustering assuming a normal distribution for measurement errors has been introduced. However, too large or too small (outlier) values of measurement errors cause poor performance of existing clustering methods. To tackle this problem {and build a stable model against the presence of outlier measurement errors in the data}, in this article, a symmetric $alpha$-stable distribution is proposed as a replacement for the normal distribution for measurement errors, and the model parameters are estimated using the EM algorithm and numerical methods. Through simulation and real data analysis, the new model is compared with the MCLUST-based model, considering cases with and without measurement errors, and the performance of the proposed model for data clustering in the presence of various outlier measurement errors is shown.

Keywords: Model-based clustering‎, ‎$alpha$-stable distribution‎, Measurement error‎, ‎EM‎‎ algorithm

Full-Text [PDF 418 kb] (1807 Downloads)

Type of Study: Applied | Subject: Applied Statistics
Received: 2024/02/25 | Accepted: 2024/08/31 | Published: 2024/06/4

References

1. Bechtel, Y. C., Bonaiti-Pellie, C., Poisson, N., Magnette, J., ‎and‎ Bechtel, P. R. (1993), A Population and Family Study N-Acetyltransferase Using Caffeine Urinary Metabolites. Clinical Pharmacology &‎ Therapeutics‎, 54(2)‎, 134-141‎. [DOI:10.1038/clpt.1993.124] [PMID]

2. ‎Bouveyron‎, ‎C.‎, ‎Celeux‎, ‎G.‎, ‎Murphy‎, ‎T‎. ‎B.‎, ‎and Raftery‎, ‎A‎. ‎E‎. ‎(2019)‎, Model-Based Clustering and Classification for Data Science‎: ‎with Applications in R‎. Cambridge University Press‎.

3. ‎Dempster, A. P., Laird, N. M., and Rubin, D. B. ‎(1977)‎, Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the royal statistical society: series B (methodological)‎, 39(1)‎, 1-22‎. [DOI:10.1111/j.2517-6161.1977.tb01600.x]

4. ‎Dunn‎, ‎J‎. ‎C‎. ‎(1974)‎, ‎Well-Separated Clusters and Optimal Fuzzy Partitions‎, Journal of Cybernetics‎, 4(1)‎, ‎95-104‎. [DOI:10.1080/01969727408546059]

5. Fraley, C., and Raftery, A. E. (2003), Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST. Journal of classification‎, 20(2), 263-286. [DOI:10.1007/s00357-003-0015-3]

6. ‎Fuller‎, ‎W‎. ‎A‎. ‎(2009)‎, Measurement Error Models‎, John Wiley & Sons‎.

7. ‎Hubert‎, ‎L.‎, ‎and Arabie‎, ‎P‎. ‎(1985)‎, Comparing Partitions‎. Journal of classification‎, 2, 193-218. [DOI:10.1007/BF01908075]

8. ‎Komárek‎, ‎A.‎, ‎and Komárková‎, ‎L‎. ‎(2014)‎, ‎Capabilities of R Package mixAK for Clustering Based on Multivariate Continuous and Discrete Longitudinal Data‎. Journal of Statistical Software‎, 59(12)‎, ‎1-38‎. [DOI:10.18637/jss.v059.i12]

9. ‎Kong‎, ‎A.‎, ‎McCullagh‎, ‎P.‎, ‎Meng‎, ‎X‎. ‎L.‎, ‎Nicolae‎, ‎D.‎, ‎and Tan‎, ‎Z‎. ‎(2009)‎, A Theory of Statistical Models for Monte Carlo Integration‎. Journal of the Royal Statistical Society‎: ‎Series B (Statistical Methodology)‎, 65(3)‎, ‎585-604‎. [DOI:10.1111/1467-9868.00404]

10. ‎Nolan‎, ‎J‎. ‎P‎. ‎(2020)‎, ‎ Stable Distributions‎: Models for Heavy-Tailed Data‎. Springer Cham‎.

11. ‎Pankowska‎, ‎P.‎, and ‎Oberski‎, ‎D‎. ‎L‎. ‎(2020)‎, ‎The effect of Measurement Error on Clustering Algorithms‎. arXiv preprint arXiv‎, :2005.11743.

12. ‎Ritter‎, ‎G‎. ‎(2015)‎, ‎Robust Cluster Analysis and Variable Selection‎, Vol‎. ‎137 of Chapman & Hall/CRC Monographs on Statistics & Applied Probability‎, ‎CRC Press‎.

13. ‎Rousseeuw‎, ‎P‎. ‎J‎. ‎(1987)‎, ‎Silhouettes‎: ‎a Graphical Aid to the Interpretation and Validation of Cluster Analysis‎, Journal of Computational and Applied Mathematics‎, 20‎, ‎53-65‎. [DOI:10.1016/0377-0427(87)90125-7]

14. ‎Salas-Gonzalez‎, ‎D.‎, ‎Kuruoglu‎, ‎E‎. ‎E.‎, ‎and Ruiz‎, ‎D‎. ‎P‎. ‎(2009)‎, Finite Mixture of α-Stable Distributions‎. Digital Signal Processing ‎, ‎250-264‎. [DOI:10.1016/j.dsp.2007.11.004]

15. ‎Samorodnitsky‎, ‎G‎. ‎and Taqqu‎, ‎M‎. ‎S‎. ‎(1994)‎, Stable non-Gaussian Random Processes‎, Chapman and Hall‎, ‎New York‎.

16. ‎Schwarz‎, ‎G‎. ‎(1978)‎, ‎Estimating the Dimension of a Model‎. The annals of statistics‎, ‎461-464‎.

17. Teimouri, M. (2020). Maximum Likelihood Estimator of the α-Stable Distribution, Journal of Statistical Sciences, 14, 73-94. [DOI:10.29252/jss.14.1.75]

18. ‎Scrucca‎, ‎L.‎, ‎Fop‎, ‎M.‎, ‎Murphy‎, ‎T‎. ‎B.‎, ‎and Raftery‎, ‎A‎. ‎E‎. ‎(2016), mclust 5‎: Mlustering‎, Classification and Ddensity Estimation using Gaussian Finite Mixture Models‎. Journal of the R‎, 8(1)‎, ‎205-233‎. [DOI:10.32614/RJ-2016-021] [PMID] []

19. Zarei, S. (2021). Robust Empirical Bayes Small Area Estimation with Symmetric α-Stable Distribution for Error Components, Journal of Statistical Sciences, 15(2), ‎463-480. [DOI:10.52547/jss.15.2.463]

20. ‎Zarei‎, ‎S‎.‎,‎ ‎and Mohammdpour‎, ‎A‎. ‎(2020)‎, ‎Pseudo-Stochastic EM for sub-Gaussian α-Stable Mixture Models‎. Digital Signal Processing. doi.org/10.1016/j.dsp.2020.102671‎. 99 102671‎. [DOI:10.1016/j.dsp.2020.102671]

21. ‎Zhang‎, ‎W.‎, and ‎Di‎, ‎Y‎. ‎(2020)‎, ‎Model-Based Clustering with Measurement or Estimation Errors‎, ‎‎ Genes‎, 11(2)‎, ‎185-209‎.‎ [DOI:10.3390/genes11020185] [PMID] []

Send email to the article author

Add your comments about this article

‎ 10.61186/jss.18.1.11

Mendeley

Zotero

RefWorks

Moradi M, Zarei S. Robust Model-Based Clustering Using the Symmetric alpha-Stable Distribution for Measurement Error. JSS 2024; 18 (1)
URL: http://jss.irstat.ir/article-1-888-en.html

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Volume 18, Issue 1 (8-2024)

Back to browse issues page

Persian site map - English site map - Created in 0.2 seconds with 45 queries by YEKTAWEB 4722