Improved BIRCH Clustering by Chemical Reaction Optimization Algorithm to Health Fraud Detection
Subject Areas : electrical and computer engineeringM. Abdolrazzagh-Nezhad 1 * , M. Kherad 2
1 - دانشگاه بزرگمهر قائنات
2 -
Keywords: Chemical reaction optimization algorithmhealthcare industryBIRCH clustering algorithmfraud detection,
Abstract :
With regard to the scale of the financial transactions and the extent of the healthcare industry, it is one of the ideal systems for fraud. Therefore, suitable identifying fraud data is still one of the challenges facing the healthcare providers, although there are several fraud detection algorithms. In the paper, the BIRCH clustering algorithm, as one hierarchical clustering algorithm, is hybridized with a chemical reaction optimization algorithm (CRO). The BIRCH with linear time complexity is able for clustering large scale data and identifying their noises and the CRO, as one of new meta-heuristic algorithm inspired by the chemical reactions in the real world, explores the search space with a dynamic population size based on four reactions such as on-wall ineffective collision, decomposition, inter-molecular ineffective collision and synthesis. Due to the improved BIRCH-CRO removes the internal clustering process of the classic BIRCH and determines the optimal values of its main parameters, it causes that the computational time decreases and accuracy and precision of detecting fraud data increase since its experimental results is compared with the exist unsupervised algorithms. Also, the proposed fraud detection algorithm has the ability to perform on online data and large scale data, and given the obtained results, it provides a proper performance.
[1] S. Roglaski, "Business intelligence: 360 insight: the intelligence challenge," DM Review Magazine, vol. 68, pp. 90-113, Jun. 2016.
[2] B. H. Pilon, J. J. Murillo-Fuentes, J. P. C. L. da Costa, R. T. de Sousa Junior, and A. M. R. Serrano, "Gaussian process for regression in business intelligence: a fraud detection application," in Proc. of the 7th Int. Joint Conf. on Knowledge Discovery, Knowledge Engineering, and Knowledge Management, vol.3, pp. 39-49, Nov. 2015.
[3] Q. Liu and M. Vasarhelyi, "Healthcare fraud detection: a survey and a clustering model incorporating geo-location information," in Proc. 29th World Continuous Auditing and Reporting Symp., 10, pp., Brisbane, Australia, 21-22 Nov. 2013.
[4] T. Zhang, R. Ramakrishnan, and M. Livny, "BIRCH: an efficient data clustering method for very large databases," ACM Sigmod Record, vol. 25, no. 2, pp. 103-114, Jun. 1996.
[5] A. Y. Lam and V. O. Li, "Chemical reaction optimization: a tutorial," Memetic Computing, vol. 4, no. 1, pp. 3-17, Mar. 2012.
[6] R. M. Musal, "Two models to investigate medicare fraud within unsupervised databases," Expert Systems with Applications, vol. 37, no. 12, pp. 8628-8633, Dec. 2010.
[7] S. Thiprungsri and M. A. Vasarhelyi, "Cluster analysis for anomaly detection in accounting data: an audit approach," 2011.
[8] M. Tang, B. S. U. Mendis, D. W. Murray, Y. Hu, and A. Sutinen, "Unsupervised fraud detection in Medicare Australia," in Proc. of the 9th Australasian Data Mining Conf., Australian Computer Society, AusDM'11, vol. 121, pp. 103-110, Ballarat, Australia, 2011.
[9] R. Ghani and M. Kumar, "Interactive learning for efficiently detecting errors in insurance claims," in Proc. of the 17th ACM SIGKDD Int Conf. on Knowledge Discovery and Data Mining, ACM, pp. 325-333, San Diego, CA, USA, 21-24 Aug. 2011.
[10] T. Ekina, F. Leva, F. Ruggeri, and R. Soyer, "Application of bayesian methods in detection of healthcare fraud," Chemical Engineering Trans., vol. 33, pp. 151-156, Sept. 2013.
[11] C. Ngufor and J. Wojtusiak, "Unsupervised labeling of data for supervised learning and its application to medical claims prediction," Computer Science, vol. 14, no. 2, p. 191-214, 2013.
[12] V. Rawte and G. Anuradha, "Fraud detection in health insurance using data mining techniques," in Proc. IEEE Int. Conf. on Communication, Information & Computing Technology, ICCICT’15, 5 pp., Mumbai, India, 15-17 Jan. 2015.
[13] M. E. Johnson and N. Nagarur, "Multi-stage methodology to detect health insurance claim fraud," Health Care Management Science, vol. 19, no. 3, pp. 249-260, Sept. 2016.
[14] H. Peng and M. You, "The health care fraud detection using the pharmacopoeia spectrum tree and neural network analytic contribution hierarchy process," in Proc. IEEE Trustcom/BigDataSE/ISPA, , pp. 2006-2011, Tianjin, China, 23-26 Aug. 2016.
[15] A. Gangopadhyay and S. Chen, "Health care fraud detection with community detection algorithms," in Proc. IEEE Int. Conf. on Smart Computing, SMARTCOMP’16, 5 pp., St. Louis, MO, USA, 18-20 May 2016.
[16] S. G. Fashoto, et al., "Development of improved k-means clustering to partition health insurance claims," Annals. Computer Science Series, vol. 14, no. 1, pp. 51-58, 2016.
[17] H. Ahmadinejad, A. Norouzi, A. Ahmadi, and A. Yousefi, "Distance based model to detect healthcare insurance fraud within unsupervised database," Indian J. of Science and Technology, Indian J. of Science and Technology, vol. 9, no. 43, pp. 1-6, Nov. 2016.
[18] J. Wu, R. Zhang, X. Shang, and F. Chu, "Medical insurance fraud recognition based on improved outlier detection algorithm," in Proc. 2nd Int. Conf. on Artificial Intelligence and Engineering Applications, AIEA'17, pp. 765-772, Guilin, China, 23-24 Sept. 2017.
[19] H. Cao and R. Zhang, "Using PCA to improve the detection of medical insurance fraud in SOFM neural networks," in Proc. of the 3rd Int. Conf. on Management Engineering, Software Engineering and Service Sciences, pp. 117-122, Wuhan, China, 12-14 Jan. 2019.
[20] T. Ekin, F. Ieva, F. Ruggeri, and R. Soyer, "Statistical medical fraud assessment: exposition to an emerging field," International Statistical Review, vol. 86, no. 3, pp. 379-402, May 2018.
[21] M. H. Soleymani, M. Yaseri, F. Farzadfar, A. Mohammadpour, F. Sharifi, and M. J. Kabir, "Detecting medical prescriptions suspected of fraud using an unsupervised data mining algorithm," DARU J. of Pharmaceutical Sciences, vol. 26, no. 2, pp. 209-214, Dec. 2018.
[22] D. S. Vijayarani and M. P. Jothi, "Hierarchical and partitioning clustering algorithms for detecting outliers in data streams," International J. of Advanced Research in Computer and Communication Engineering, vol. 3, no. 4, pp. 6205-6207, Apr. 2014.
[23] C. A. Ralanamahatana, J. Lin, D. Gunopulos, E. Keogh, M. Vlachos, and G. Das, "Mining Time Series Data," in Data Mining and Knowledge Discovery Handbook: Springer, pp. 1069-1103, 2005.
[24] م. اسماعیلی، دادهکاوی و مفاهیم آن، ناشر نياز دانش، 1394 1394.
[25] D. O. H. H. Services. Heart Attack Payment - Hospital [Online]. Available: https://catalog.data.gov/dataset/heart-attack-payment-hospital.
[26] S. Firdaus and M. A. Uddin, "A survey on clustering algorithms and complexity analysis," International J. of Computer Science Issues, vol. 12, no. 2, pp. 62-85, Mar. 2015.