[Home ] [Archive]   [ فارسی ]  
:: Main :: About :: Current Issue :: Archive :: Search :: Submit :: Contact ::
Main Menu
Home::
Journal Information::
Articles archive::
For Authors::
For Reviewers::
Registration::
Ethics Considerations::
Contact us::
Site Facilities::
::
Search in website

Advanced Search
..
Receive site information
Enter your Email in the following box to receive the site news and information.
..
Indexing and Abstracting



 
..
Social Media

..
Licenses
Creative Commons License
This Journal is licensed under a Creative Commons Attribution NonCommercial 4.0
International License
(CC BY-NC 4.0).
 
..
Similarity Check Systems


..
:: Volume 16, Issue 1 (9-2022) ::
JSS 2022, 16(1): 1-24 Back to browse issues page
Record Linkage with Machine Learning Methods
Zahra Rezaei Ghahroodi * , Zhina Aghamohamadi
Abstract:   (6160 Views)

With the advent of big data in the last two decades, in order to exploit and use this type of data, the need to integrate databases for building a stronger evidence base for policy and service development is felt more than ever. Therefore, familiarity with the methodology of data linkage as one of the methods of data integration and the use of machine learning methods to facilitate the process of recording records is essential. In this paper, in addition to introducing the record linkage process and some related methods, machine learning algorithms are required to increase the speed of database integration, reduce costs and improve record linkage performance. In this paper, two databases of the Statistical Center of Iran and Social Security Organization are linked.

Keywords: Record Linkage, Machine Learning, Fellegi-Sunter Model, Jaro and Winkler String Comparison, Official Statistics.
Full-Text [PDF 1802 kb]   (4241 Downloads)    
Type of Study: Applied | Subject: Official Statistics
Received: 2021/12/16 | Accepted: 2022/09/1 | Published: 2022/08/2
References
1. Arasu‎, ‎A.‎, ‎Götz‎, ‎M.‎, and Kaushik‎, ‎R‎.‎‎‎‎ ‎(2010)‎, ‎On ‎A‎ctive Learning of Record Matching Packages,‎ ‎In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, ‎783-794. [DOI:10.1145/1807167.1807252]
2. Beltadze‎, ‎D‎.‎‎‎‎ ‎(2020)‎, ‎Developing Methodology for the Register-based Census in Estonia‎, Statistical Journal of The IAOS‎, 36,: ‎159-164‎.‎‎ [DOI:10.3233/SJI-190585]
3. Blackwell‎, ‎L.‎, ‎Charlesworth‎, ‎A‎. ‎and‎ Rogers‎, ‎N‎‎‎‎ ‎(2015)‎, ‎Linkage of Census and Administrative Data to Quality Assure the 2011 Census for England and Wales‎, Journal of Official Statistics‎, ‎31‎, 453-473.‎ [DOI:10.1515/jos-2015-0027]
4. Christen, ‎P‎.‎‎‎‎ ‎(2012)‎, ‎Data Matching‎: ‎Concepts and Techniques for Record Linkage‎, ‎Entity Resolution‎, ‎and Duplicate Detection‎, Springer‎‎‎‎‎‎.‎‎ [DOI:10.1007/978-3-642-31164-2]
5. Churches‎, ‎T‎.‎‎‎‎ ‎(2003)‎, ‎A Proposed Architecture and Method of Operation for Improving the Potection of Privacy and Confidentiality in Disease Registers‎, BMC Medical Research Methodology‎, 3, ‎1-13.‎‎‎‎‎ [DOI:10.1186/1471-2288-3-1] [PMID] []
6. Cochinwala‎, ‎M.‎, ‎Kurien‎, ‎V.‎, ‎Lalk‎, ‎G‎. and Shasha‎, ‎D‎.‎‎‎‎ ‎(2001)‎, ‎Efficient Data Reconciliation‎, Information Sciences‎‎, 137, ‎1-15.‎‎‎‎‎ [DOI:10.1016/S0020-0255(00)00070-0]
7. Dunn‎, ‎H‎.‎‎‎‎ ‎(1946)‎, ‎Record linkage‎, American Journal of Public Health and the Nation's Health‎, 36‎: ‎1412-1416‎.‎ [DOI:10.2105/AJPH.36.12.1412] []
8. Elfeky‎, ‎‎‎M.‎, ‎Verykios‎, ‎V.‎, ‎Elmagarmid‎, ‎A.‎, ‎Ghanem‎, ‎T‎. ‎and Huwait‎, ‎A‎‎‎‎ ‎(2003)‎, ‎Record linkage‎: ‎A Machine Learning Approach‎, ‎a Toolbox‎, ‎and a Digital Government Web‎‌ ‎S‎ervice‎, ‎Purdue University: Department of Computer Science.
9. ‎Fair‎, ‎M‎.‎‎‎‎ ‎(2004)‎, ‎Generalized Record Linkage System-Statistics Canada's Record Linkage Software‎, Austrian Journal of Statistics‎, 33‎, ‎37-53‎.‎‎
10. ‎‎‎‎ Fellegi‎, ‎I‎. ‎and Sunter‎, ‎A‎.‎‎‎‎ ‎(1969)‎, ‎A Theory for Record Linkage‎, Journal of the American Statistical Association‎, 64‎, ‎1183-1210‎.‎ [DOI:10.2307/2286061]
11. Gardner‎, ‎E.‎, ‎Miles‎, ‎H.‎, ‎Bahn‎, ‎A‎. and Romano‎, ‎J‎.‎‎ ‎(1‎963)‎. ‎All Psychiatric Experience in a Community‎: ‎A Cumulative Survey‎: ‎Report of the First Year's Experience‎, Archives of General Psychiatry,‎ ‎9, ‎369-378.‎‎‎ [DOI:10.1001/archpsyc.1963.01720160059007] [PMID]
12. Han‎, ‎J.‎, ‎Pei‎, ‎J‎. ‎and‎ Kamber‎, ‎M‎.‎‎‎ ‎(‎2011)‎. ‎Data Mining‎: ‎Concepts and Techniques, ‎‎Elsevier.‎‎‎
13. Hovy‎, ‎D‎.‎‎‎ ‎(2020)‎. ‎Text Analysis in Python for Social Scientists‎: ‎Discovery and Exploration‎, Cambridge University Press.‎‎‎‎ [DOI:10.1017/9781108873352] [PMID] []
14. Jaro‎, ‎M‎.‎‎ ‎(1‎989)‎. ‎Advances in Record-linkage Methodology as Applied to Matching the 1985 Census of Tampa‎, ‎Florida‎, Journal of the American Statistical Association,‎ 84,‎ ‎414-420.‎‎‎ [DOI:10.1080/01621459.1989.10478785]
15. James‎, ‎G.‎, ‎Witten‎, ‎D.‎, ‎Hastie‎, ‎T‎. and Tibshirani‎, ‎R. ‎(2013)‎‎. An Introduction to Statistical Learning‎, Springer‎‎‎‎.‎
16. Mancini‎, ‎L.‎, ‎Valentino‎, ‎L.‎, ‎Borrelli‎, ‎F‎. ‎and Marcone‎, ‎L. ‎(2012)‎‎. ‎Record Linkage Between Large Dataset‎: ‎Evidence from the 15 th Italian Population Census‎, Quaderni Di Statistica,‎ 14,‎ ‎149-152.‎
17. ‎Michelson‎, ‎M‎. and Knoblock‎, ‎C ‎(2006)‎‎. ‎Learning Blocking Schemes for Record Linkage‎, American Association for Artificial Intelligence,‎ 6,‎ ‎‎440-445.‎
18. ‎Newcombe‎, ‎H.‎, ‎Kennedy‎, ‎J.‎, ‎Axford‎, ‎S‎. ‎and James‎, A ‎(1959)‎‎. ‎Automatic Linkage of Vital Records‎, Science,‎ 130,‎ ‎954-959.‎ [DOI:10.1126/science.130.3381.954] [PMID]
19. Newcombe‎, ‎H. and‎ ‎Kennedy‎, ‎J.‎‎ ‎(1‎962)‎. ‎‎‎Record Linkage‎: ‎Making Maximum Use of the Discriminating Power of Identifying Information‎, Communications of the ACM,‎ ‎5,‎ ‎563-566.‎‎‎ [DOI:10.1145/368996.369026]
20. Rahm‎, ‎E‎. ‎and Do‎, ‎H.‎‎ ‎(2000)‎. ‎Data Cleaning‎: ‎Problems and Current Approaches‎, IEEE Data Eng‎. ‎Bull‎‎,‎ ‎ 23, ‎‎‎3-13.‎‎‎
21. Sadinle‎, ‎M‎.‎‎ ‎(2017)‎. ‎Bayesian Estimation of Bipartite Matchings for Record Linkage‎, Journal of the American Statistical Association‎‎,‎ 112, ‎600-612.‎‎‎ [DOI:10.1080/01621459.2016.1148612]
22. Sarawagi‎, ‎S‎.‎‎ ‎(2008)‎. ‎Information extraction‎, Foundations and Trends in Databases,‎ 1 (3): 261-377, http://dx.doi.org/10.1561/1900000003‎.‎‎‎‎ [DOI:10.1561/1900000003]
23. ‎‎‎‎Winkler, W. E., & Thibaudeau, Y. (1991). An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 US Decennial Census. Washington, DC: US Bureau of the Census.
24. Winkler‎, ‎W.‎‎ ‎(1‎993)‎. Improved Decision Rules in the Fellegi-Sunter Model of Record Linkage‎‎‎‎‎‎. ‎‎ ‎In Proceedings of the Section on Survey Research Methods, American Statistical Association‎‎, ‎‎‎‎829-834‎‎‎.‎
Send email to the article author

Add your comments about this article
Your username or Email:

CAPTCHA



XML   Persian Abstract   Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Rezaei Ghahroodi Z, Aghamohamadi Z. Record Linkage with Machine Learning Methods. JSS 2022; 16 (1) :1-24
URL: http://jss.irstat.ir/article-1-789-en.html


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Volume 16, Issue 1 (9-2022) Back to browse issues page
مجله علوم آماری – نشریه علمی پژوهشی انجمن آمار ایران Journal of Statistical Sciences

Persian site map - English site map - Created in 0.12 seconds with 45 queries by YEKTAWEB 4722