یک روش انتخاب ویژگی ترکیبی برای دادههای با بعد بالا مبتنی بر خرد جمعی
محورهای موضوعی : مهندسی برق و کامپیوترامیررضا روحی 1 , حسین نظامآبادیپور 2 *
1 - دانشگاه شهيد باهنر کرمان
2 - دانشگاه شهید باهنر کرمان
کلید واژه: انتخاب ويژگي دادههای با بعد بالا روشهای ترکیبی روشهای فراابتکاری روشهای فیلتری روشهای خرد جمعی,
چکیده مقاله :
امروزه با ظهور و گسترش دادههای بعد بالا، روند انتخاب ویژگی نقش بسیار مهمی را در زمینه یادگیری ماشینی و به خصوص مسایل طبقهبندی داده، بازی ميکند. کار بر روی دادههای با بعد بالا از جمله دادههای میکروآرایهای با مشکلاتی همچون وجود ویژگیهای نامرتبط و افزونه بسیار روبهرو است که باعث کاهش نرخ صحت طبقهبند، افزایش هزینه محاسباتی و معضل "نفرین بعد" میشود. در این مقاله به ارائه یک روش ترکیبی با استفاده از رویکردهای خرد جمعی برای انتخاب ویژگی در دادههای با بعد بالا پرداخته میشود. در روش پیشنهادی، ابتدا در مرحله اول از یک روش فیلتری برای کاهش بعد داده استفاده میشود، سپس در مرحله دوم، دو الگوریتم روزآمد پیچشی با استفاده از رویکرد خرد جمعی بر روی ویژگیهای کاهشیافته اعمال شده و نتیجه تجمیع میگردد. روش پیشنهادی بر روی 8 پایگاه داده میکروآرایهای مورد ارزیابی قرار گرفته و مقایسه نتایج با چندین روش روزآمد و شناختهشده در حوزه انتخاب ویژگی، کارایی روش پیشنهادی را تأیید میکند.
Nowadays, with the advent and proliferation of high-dimensional data, the process of feature selection plays an important role in the domain of machine learning and more specifically in the classification task. Dealing with high-dimensional data, e.g. microarrays, is associated with problems such as increased presence of redundant and irrelevant features, which leads to decreased classification accuracy, increased computational cost, and the curse of dimensionality. In this paper, a hybrid method using ensemble methods for feature selection of high dimensional data, is proposed. In the proposed method, in the first stage, a filter method reduces the dimensionality of features and then, in the second stage, two state-of-the-art wrapper methods run on the subset of reduced features using the ensemble technique. The proposed method is benchmarked using 8 microarray datasets. The comparison results with several state-of-the-art feature selection methods confirm the effectiveness of the proposed approach.
[1] M. M. Kabir, M. Shahjahan, and K. Murase, "A new local search based hybrid genetic algorithm for feature selection," Neurocomputing, vol. 74, no. 17, pp. 2914-2928, Oct. 2011.
[2] L. Lan and S. Vucetic, "Improving accuracy of microarray classification by a simple multi-task feature selection filter," International J. of Data Mining and Bioinformatics, vol. 5, no. 2, pp. 189-208, 2011.
[3] S. Rakkeitwinai, C. Lursinsap, C. Aporntewan, and A. Mutirangura, "New feature selection for gene expression classification based on degree of class overlap in principal dimensions," Computers in Biology and Medicine, vol. 64, pp. 292-298, Sept. 2015.
[4] Z. Zhao and H. Liu, "Searching for interacting features," in Proc. of the 20th Inte. Joint Conf. on Artifical intelligence, IJCAI'07, pp. 1156-1161, Hyderabad, India, 6-12 Jan. 2007.
[5] A. J. Ferreira and M. A. Figueiredo, "An unsupervised approach to feature discretization and selection," Pattern Recognition, vol. 45, no. 9, pp. 3048-3060, Sept. 2012.
[6] I. Kamkar, S. K. Gupta, D. Phung, and S. Venkatesh, "Stable feature selection for clinical prediction: exploiting ICD tree structure using tree-lasso," J. of Biomedical Informatics, vol. 53, pp. 277-290, Feb. 2015.
[7] M. Liu and D. Zhang, "Feature selection with effective distance," Neurocomputing, vol. 215, pp. 100-109, Nov. 2016.
[8] M. A. Hall and L. A. Smith, "Practical feature subset selection for machine learning," in Proc. of the 21st Australasian Computer Science Conference ACSC'98, pp. 181-191, Perth, Australia, 4-6 Feb. 1998.
[9] Q. Gu, Z. Li, and J. Han, "Generalized fisher score for feature selection," arXiv preprint arXiv: 1202.3725, 2012. [10] I. Kononenko, "Estimating attributes: analysis and extensions of RELIEF," in Proc. European Conf. on Machine Learning, pp. 171-182, 1994.
[11] L. Yu and H. Liu, "Feature selection for high-dimensional data: a fast correlation-based filter solution," in Proc. of the 20th Int. Conf. on Machine Learning, ICML'03, pp. 856-863, Washington DC, USA, 2003.
[12] M. A. Hall, Correlation-Based Feature Selection for Machine Learning, The University of Waikato, Ph.D. Thesis, 1999.
[13] A. Colorni, M. Dorigo, and V. Maniezzo, "Distributed optimization by ant colonies," in Proc. of the 1st European Conf. on Artificial Life, pp. 134-142, Paris, France, 1991.
[14] S. Tabakhi, P. Moradi, and F. Akhlaghian, "An unsupervised feature selection algorithm based on ant colony optimization," Engineering Applications of Artificial Intelligence, vol. 32, pp. 112-123, Jun. 2014.
[15] V. Bolon-Canedo, N. Sanchez-Marono, and A. Alonso-Betanzos, "An ensemble of filters and classifiers for microarray data classification," Pattern Recognition, vol. 45, no. 1, pp. 531-539, Jan. 2012.
[16] E. Rashedi, H. Nezamabadi-Pour, and S. Saryazdi, "GSA: a gravitational search algorithm," Information Sciences, vol. 179, no. 13, pp. 2232-2248, 13 Jun. 2009.
[17] K. S. Tang, K. F. Man, S. Kwong, and Q. He, "Genetic algorithms and their applications," IEEE Signal Processing Magazine, vol. 13, no. 6, pp. 22-37, Nov. 1996.
[18] J. Kennedy, "Particle swarm optimization," in Encyclopedia of Machine Learning, Ed: Springer, pp. 760-766, 2011.
[19] P. E. Meyer, C. Schretter, and G. Bontempi, "Information-theoretic feature selection in microarray data using variable complementarity," IEEE J. of Selected Topics in Signal Processing, vol. 2, no. 3, pp. 261-274, Jun. 2008.
[20] J. Wang, L. Wu, J. Kong, Y. Li, and B. Zhang, "Maximum weight and minimum redundancy: a novel framework for feature subset selection," Pattern Recognition, vol. 46, no. 6, pp. 1616-1627, Jun. 2013.
[21] A. Sharma, S. Imoto, and S. Miyano, "A top-r feature selection algorithm for microarray gene expression data," IEEE/ACM Trans. on Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 754-764, May/Jun. 2012.
[22] H. H. Hsu, C. W. Hsieh, and M. D. Lu, "Hybrid feature selection by combining filters and wrappers," Expert Systems with Applications, vol. 38, no. 7, pp. 8144-8150, Jul. 2011.
[23] S. Shreem, S. Abdullah, M. Nazri, and M. Alzaqebah, "Hybridizing ReliefF, MRMR filters and GA wrapper approaches for gene selection," J. of Theoretical and Applied Information Technology, vol. 46, no. 2, pp. 1034-1039, Dec. 2012.
[24] V. Bolon-Canedo, N. Sanchez-Marono, A. Alonso-Betanzos, J. M. Benitez, and F. Herrera, "A review of microarray datasets and applied feature selection methods," Information Sciences, vol. 282, pp. 111-135, 20 Oct. 2014.
[25] J. Meng, H. Hao, and Y. Luan, "Classifier ensemble selection based on affinity propagation clustering," J. of Biomedical Informatics, vol. 60, pp. 234-242, Apr. 2016.
[26] A. Rouhi and H. Nezamabadi-pour, "A hybrid method for dimensionality reduction in microarray data based on advanced binary ant colony algorithm," in Proc. 1st Conf. on Swarm Intelligence and Evolutionary Computation, CSIEC'16, pp. 70-75, Bam, Iran, 9-11 Mar 2016.
[27] S. Tabakhi, A. Najafi, R. Ranjbar, and P. Moradi, "Gene selection for microarray data classification using a novel ant colony optimization," Neurocomputing, vol. 168, pp. 1024-1036, 30 Nov. 2015.
[28] V. Bolon-Canedo, N. Sanchez-Marono, and A. Alonso-Betanzos, "Data classification using an ensemble of filters," Neurocomputing, vol. 135, pp. 13-20, 5 Jul. 2014.
[29] S. M. Vieira, J. M. Sousa, and U. Kaymak, "Fuzzy criteria for feature selection," Fuzzy Sets and Systems, vol. 189, no. 1, pp. 1-18, 16 Feb. 2012.
[30] L. Yin, Y. Ge, K. Xiao, X. Wang, and X. Quan, "Feature selection for high-dimensional imbalanced data," Neurocomputing, vol. 105, no. 7, pp. 3-11, 1 Apr. 2013.
[31] A. B. Brahim and M. Limam, "Robust ensemble feature selection for high dimensional data sets," in Proc. Int. Conf. on High Performance Computing and Simulation, HPCS'13, pp. 151-157, Helsinki, Finland, 1-5 Jul. 2013.
[32] L. Y. Chuang, C. H. Yang, K. C. Wu, and C. H. Yang, "A hybrid feature selection method for DNA microarray data," Computers in Biology and Medicine, vol. 41, no. 4, pp. 228-237, Apr. 2011.ُ [33] J. R. Quinlan, "Induction of decision trees," Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[34] K. Kira and L. A. Rendell, "The feature selection problem: traditional methods and a new algorithm," in Proc, of the 10th National Conf. on Artificial Intelligence, AAAI'92, vol. 2, pp. 129-134, San Jose, CA, USA, 12-16 Jul. 1992.
[35] N. Taheri and H. Nezamabadi-pour, "A hybrid feature selection method for high-dimensional data," in Proc. 4th Int. eConf. on Computer and Knowledge Engineering, ICCKE'14, pp. 141-145, Mashhad, Iran, 29-30 Oct. 2014.
[36] E. Rashedi, H. Nezamabadi-Pour, and S. Saryazdi, "BGSA: binary gravitational search algorithm," Natural Computing, vol. 9, no. 3, pp. 727-745, Sept. 2010.
[37] E. Rashedi and H. Nezamabadi-pour, "Feature subset selection using improved binary gravitational search algorithm," J. of Intelligent and Fuzzy Systems, vol. 26, no. 3, pp. 1211-1221, 2014.
[38] S. Kashef and H. Nezamabadi-pour, "An advanced ACO algorithm for feature subset selection," Neurocomputing, vol. 147, pp. 271-279, 5 Jan. 2015.
[39] A. Statnikov, C. F. Aliferis, and I. Tsamardinos, Gems: Gene Expression Model Selector, Available: http://www.gems-system.org, 2005.
[40] Datasets, Available on http://datam.i2r.a-star.edu.sg/datasets/krbd.