[Home ] [Archive]   [ فارسی ]  
:: Main :: About :: Current Issue :: Archive :: Search :: Submit :: Contact ::
Main Menu
Home::
Journal Information::
Articles archive::
For Authors::
For Reviewers::
Registration::
Ethics Considerations::
Contact us::
Site Facilities::
::
Search in website

Advanced Search
..
Receive site information
Enter your Email in the following box to receive the site news and information.
..
Indexing and Abstracting



 
..
Social Media

..
Licenses
Creative Commons License
This Journal is licensed under a Creative Commons Attribution NonCommercial 4.0
International License
(CC BY-NC 4.0).
 
..
Similarity Check Systems


..
:: Volume 17, Issue 1 (9-2023) ::
JSS 2023, 17(1): 0-0 Back to browse issues page
Improving Fellegi-Sunter ‎model in record linkage using log-linear model and weight ‎adjustment ‎
Alireza Movaffaghi Ardestani , Zahra Rezaei Ghahroodi *
Abstract:   (1942 Views)

‎T‎oday, with the increasing access to administrative databases and the high volume of data registered in organizations, the traditional methods of data collection and analysis are not effective due to the response burden. Accordingly, the transition from traditional ‎survey methods to modern methods of data collection and analysis with the register-based statistics approach has received more and more attention from statistical data analysts. In register-based methods, it is especially important to create an integrated database by linking database records of different organizations. ‎Many record linkage algorithms have been developed using the Fellegi and Sunter ‎‎‎model‎. ‎The Fellegi-Sunter model does not leverage information contained in field values and does not care about specific possible values of a string variable (more common and less common values)‎. ‎In this ‎‏‎article‎, ‎a method that can be able to infuse these differences in specific possible values of a string variable in the Fellegi-Sunter model is presented‎.‎ ‎‎‎On the ‎other, ‎‎the ‎‎model proposed by Fellegi-Sunter‎, ‎as well as the method for adjusting the matching weights in the frequency-based record linkage‎, ‎binding in this paper, ‎are based on the assumption of conditional independence‎. ‎In some applications of record linkage‎, ‎this assumption is not met in agreement or disagreement of common variables which are used for matching‎. ‎One solution used in such a case is to use log-linear model which allows interactions between matching variables in the model‎.‎‎

In this ‎‏‎article‎, ‎we deal with two generalizations of Fellegi-Sunter ‎‎‎‎‎model, ‎one with the correction of the matching weights and the other with using a log-linear model with interactions in absence of conditional independence‎. ‎The proposed methods are implemented on labour force data set of Statistical Centre of Iran using R‎.

Keywords: Fellegi-Sunter model, ‎Frequency-based matching‎, ‎Adjusting weights‎, ‎Conditional independence‎, ‎Log-Linear model
Full-Text [PDF 334 kb]   (1309 Downloads)    
Type of Study: Applied | Subject: Official Statistics
Received: 2022/08/11 | Accepted: 2023/09/1 | Published: 2023/07/11
References
1. ‎‌آقامحمدی، ژ. و رضائی قهرودی، ز. ‎(1401)‎. اتصال رکوردی با روش‌های یادگیری ماشین‎‎ ‌، ‎مجله علوم آماری, جلد ١۶ ، شماره ١‎.
2. موفقی اردستانی ، ع. و رضائی قهرودی ‎ز. ‎(1401)‎. تعمیم چارچوب فلگی-سانتر در اتصال رکوردی فراوانی‌مبنا‎‎، پایان‌نامه، دانشگاه تهران‎.
3. ‎Arasu‎, ‎A.‎, ‎Götz‎, ‎M.‎, ‎and Kaushik‎, ‎R‎. ‎(2010)‎, ‎On Active Learning of Record Matching Packages‎, ‎Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data‎: ‎783-794‎. [DOI:10.1145/1807167.1807252]
4. ‎Beltadze‎, ‎D‎. ‎(2020)‎, ‎Developing Methodology for the Register-based Census in Estonia‎, Statistical Journal of The IAOS‎, 36‎: ‎159-164‎. [DOI:10.3233/SJI-190585]
5. ‎Blackwell‎, ‎L.‎, ‎Charlesworth‎, ‎A‎. ‎and Rogers‎, ‎N (2015)‎, ‎Linkage of Census and Administrative Data to Quality Assure the 2011 Census for England and Wales‎, Journal of Official Statistics‎, 31‎: ‎453-473‎. [DOI:10.1515/jos-2015-0027]
6. ‎Christen‎, ‎P‎. ‎(2012)‎, Data Matching‎: ‎Concepts and Techniques for Record Linkage‎, ‎Entity Resolution‎, ‎and Duplicate Detection}‎, ‎Springer‎. [DOI:10.1007/978-3-642-31164-2]
7. ‎Churches‎, ‎T‎. ‎(2003)‎, ‎A Proposed Architecture and Method of Operation for Improving the Potection of Privacy and Confidentiality in Disease Registers‎, BMC Medical Research Methodology, 3:1-13‎. [DOI:10.1186/1471-2288-3-1] [PMID] []
8. ‎Cochinwala‎, ‎M.‎, ‎Kurien‎, ‎V.‎, ‎Lalk‎, ‎G‎. ‎and Shasha‎, ‎D‎. ‎(2001)‎, ‎Efficient Data Reconciliation‎, Information Sciences‎, 137‎, ‎1-15. ‎‎ [DOI:10.1016/S0020-0255(00)00070-0]
9. Cochran, W. G. (1977). Sampling ‎techniques‎. John Wiley & Sons.
10. ‎Dunn‎, ‎H‎. ‎(1946)‎, ‎Record linkage‎, American Journal of Public Health and the Nations Health‎, 36‎, ‎1412-1416‎. [DOI:10.2105/AJPH.36.12.1412] []
11. ‎Elfeky‎, ‎M.‎, ‎Verykios‎, ‎V.‎, ‎Elmagarmid‎, ‎A.‎, ‎Ghanem‎, ‎T‎. ‎and Huwait‎, ‎A (2003)‎, Record linkage‎: ‎A Machine Learning Approach‎, ‎a Toolbox‎, ‎and a Digital Government Web Service‎, ‎Purdue University‎, ‎Department of Computer Science‎.
12. ‎Fair‎, ‎M‎. ‎(2004)‎, ‎Generalized Record Linkage System-Statistics Canada's Record Linkage Software‎, Austrian Journal of Statistics‎, 33‎, ‎37-53‎.
13. ‎Fellegi‎, ‎I‎. ‎and Sunter‎, ‎A‎. ‎(1969)‎, ‎A Theory for Record Linkage‎, Journal of the American Statistical Association‎, ‎ 64‎, ‎1183-1210‎. [DOI:10.2307/2286061]
14. ‎Gardner‎, ‎E.‎, ‎Miles‎, ‎H.‎, ‎Bahn‎, ‎A‎. ‎and Romano‎, ‎J‎. ‎(1963)‎. ‎All Psychiatric Experience in a Community‎: ‎A Cumulative Survey‎: ‎Report of the First Year's Experience‎, Archives of General Psychiatry‎, 9‎, ‎369-378‎. [DOI:10.1001/archpsyc.1963.01720160059007] [PMID]
15. Guha‎, ‎S.‎, ‎Reiter‎, ‎J‎. ‎and Mercatanti‎, ‎A‎. ‎(2020)‎. ‎Bayesian Causal Inference with Bipartite Record Linkage‎. ArXiv Preprint ArXiv:2002.09119‎.
16. ‎Han‎, ‎J.‎, ‎Pei‎, ‎J‎. ‎and Kamber‎, ‎M‎. ‎(2011)‎. Data Mining‎: ‎Concepts and Techniques‎, ‎Elsevier‎.
17. ‎Hovy‎, ‎D‎. ‎(2020)‎. ‎ Text Analysis in Python for Social Scientists‎: ‎Discovery and Exploration‎, ‎Cambridge University Press‎. [DOI:10.1017/9781108873352] [PMID] []
18. ‎James‎, ‎G.‎, ‎Witten‎, ‎D.‎, ‎Hastie‎, ‎T‎. ‎and Tibshirani‎, ‎R‎. ‎(2013)‎. An Introduction to Statistical Learning‎, ‎Springer‎.
19. ‎Jaro‎, ‎M‎. ‎(1989)‎. ‎Advances in Record-linkage Methodology as Applied to Matching the 1985 Census of Tampa‎, ‎Florida‎, Journal of the American Statistical Association‎, 84‎, ‎414-420‎. [DOI:10.2307/2289924]
20. Li, X., Xu, H., Shen, C., & Grannis, S. (2018). Automated linkage of patient records from disparate sources. Statistical Methods in Medical Research, 27 (1), 172-184. [DOI:10.1177/0962280215626180] [PMID]
21. ‎Mancini‎, ‎L.‎, ‎Valentino‎, ‎L.‎, ‎Borrelli‎, ‎F‎. ‎and Marcone‎, ‎L‎. ‎(2012)‎. ‎Record Linkage Between Large Dataset‎: ‎Evidence from the 15 th Italian Population Census‎, Quaderni Di Statistica‎, 14‎, ‎149-152‎.
22. ‎ McVeigh,‎ ‎B.‎, ‎Spahn‎, ‎B‎. ‎and Murray‎, ‎J‎. ‎(2019)‎. ‎Scaling Bayesian probabilistic record linkage with post-hoc blocking‎: ‎An application to the California Great Registers‎, ArXiv Preprint ArXiv:1905.05337‎.
23. ‎Michelson‎, ‎M‎. ‎and Knoblock‎, ‎C (2006)‎. ‎Learning Blocking Schemes for Record Linkage‎, American Association for Artificial Intelligence‎, 6‎, ‎440-445‎.
24. ‎Newcombe‎, ‎H‎. ‎and Kennedy‎, ‎J‎. ‎(1962)‎. ‎Record Linkage‎: ‎Making Maximum Use of the Discriminating Power of Identifying Information‎, Communications of the ACM‎, 5‎, ‎563-566‎. [DOI:10.1145/368996.369026]
25. ‎Newcombe‎, ‎H.‎, ‎Kennedy‎, ‎J.‎, ‎Axford‎, ‎S‎. ‎and James‎, ‎A (1959)‎. ‎Automatic Linkage of Vital Records‎, Science‎, 130‎, ‎954-959‎. [DOI:10.1126/science.130.3381.954] [PMID]
26. ‎Rahm‎, ‎E‎. ‎and Do‎, ‎H‎. ‎(2000)‎. ‎Data Cleaning‎: ‎Problems and Current Approaches‎, IEEE Data Eng‎. ‎Bull, 23‎, ‎3-13‎.
27. ‎Sadinle‎, ‎M‎. ‎(2017)‎. ‎Bayesian Estimation of Bipartite Matchings for Record Linkage‎, Journal of the American Statistical Association‎, 112‎, ‎600-612‎. [DOI:10.1080/01621459.2016.1148612]
28. ‎Sarawagi‎, ‎S‎. ‎(2008)‎. ‎Information extraction‎, ‎ Foundations and Trends in Databases‎, 1 (3)‎, ‎261-377‎, http://dx.doi.org/10.1561/1900000003‎. [DOI:10.1561/1900000003]
29. Schürle, J. (2005). A method for consideration of conditional dependencies in the Fellegi and Sunter model of record linkage. Statistical Papers, 46(3), 433-449 [DOI:10.1007/BF02762843]
30. ‎Winkler‎, ‎W‎. ‎E.‎, ‎& Thibaudeau‎, ‎Y‎. ‎(1991)‎. ‎An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 US Decennial Census‎. ‎Washington‎, ‎DC‎: ‎US Bureau of the Census‎.
31. ‎Winkler‎, ‎W‎. ‎(1993)‎. ‎Improved Decision Rules in the Fellegi-Sunter Model of Record Linkage‎. ‎ Proceedings of the Section on Survey Research Methods‎, ‎American Statistical Association‎, ‎829-834‎.‎
32. ‎ ‎Xu, H., Li, X., & Grannis, S. (2022). A simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage. Journal of Applied Statistics, 49 (11), 2789-2804. [DOI:10.1080/02664763.2021.1922615] [PMID] []
Send email to the article author

Add your comments about this article
Your username or Email:

CAPTCHA



XML   Persian Abstract   Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Movaffaghi Ardestani A, Rezaei Ghahroodi Z. Improving Fellegi-Sunter ‎model in record linkage using log-linear model and weight ‎adjustment ‎. JSS 2023; 17 (1)
URL: http://jss.irstat.ir/article-1-813-en.html


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Volume 17, Issue 1 (9-2023) Back to browse issues page
مجله علوم آماری – نشریه علمی پژوهشی انجمن آمار ایران Journal of Statistical Sciences

Persian site map - English site map - Created in 0.06 seconds with 45 queries by YEKTAWEB 4700