یک روش بدون پارامتر مبتنی بر نزدیکی برای تشخیص داده‌های پرت

محورهای موضوعی : مهندسی برق و کامپیوتر

1 - دانشگاه تربیت دبیر شهید رجایی
2 - دانشگاه شهید رجایی

تاریخ دریافت : 1397/07/08 تاریخ پذیرش : 1397/11/15 تاریخ انتشار : 1398/05/02

کلید واژه: بدون پارامترتشخیص داده‌های پرتمبتنی بر نزدیکی,

چکیده مقاله :

تشخیص داده‌های پرت به عنوان یک حوزه تحقیق در داده‌کاوی و یادگیری ماشین بوده و یک گام مهم در پیش‌پردازش داده‌ها به حساب می‌آید. در این مقاله یک روش بدون پارامتر به منظور تشخیص داده‌های پرت مبتنی بر نزدیکی به نام NPOD ارائه شده است. رهیافت ارائه‌شده، ترکیبی از روش‌های مبتنی بر فاصله و مبتنی بر چگالی بوده و توانایی تشخیص پرت‌ها را به صورت سراسری و محلی دارد. این روش نیاز به تعیین هیچ یک از پارامترهای شعاع همسایگی، حد آستانه نقاط موجود در شعاع همسایگی و پارامتر نزدیک‌ترین همسایگی ندارد. NPOD برای تشخیص داده‌های پرت، یک روش جدید نمره‌دهی ارائه می‌دهد. ارزیابی نتایج بر روی مجموعه داده‌های UCI نشان می‌دهد که این الگوریتم با وجود بدون پارامتر بودنش، عملکردی قابل رقابت با روش‌های پیشین و در بعضی مواقع بهترین عملکرد را دارد.

چکیده انگلیسی:

The detection of outliers is a task in data mining and machine learning and it’s an important step in data preprocessing. In this paper, in order to detect proximity-based outliers, a non-parametric method is proposed called NPOD. The proposed method is a combination of distance-based and density-based methods and has the ability to detect outliers in both local and global scenarios. This method does not require to determine any parameters of neighborhood radius, the threshold of existing points in the neighborhood radius, and the nearest neighbor parameters. In order to detect outliers, a new method of scoring is presented. Experimental results on the UCI datasets show that this algorithm, in spite of being non-parametric, has comparable results with previous methods. Also in some cases, it has the best performance.

منابع و مأخذ:

[1] D. Hawkins, Identification of Outliers, Springer Sci. Bus. Media, 1980.
[2] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd Ed. Elsevier Inc, Morgan Kaufmann, 2012.
[3] M. Gupta, J. Gao, and C. C. Aggarwal, "Outlier detection for temporal data: a survey," IEEE Trans. on Knowledge and Data Engineering, vol. 25, no. 9, pp. 2250-2267, Dec.. 2013.
[4] M. Salehi, C. Leckie, J. C. Bezdek, and L. Fellow, "Fast memory efficient local outlier detection in data streams," IEEE Trans. on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3246-3260, Aug. 2016.
[5] K. Yamanishi, J. Takeuchi, and G. Williams, "On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms," Data Mining and Knowledge Discovery, vol. 8, no. 3, pp. 275-300, May 2004.
[6] J. Huang, Q. Zhu, L. Yang, D. Cheng, and Q. Wu, "A novel outlier cluster detection algorithm without top-n parameter," Knowledge-Based Systems, vol. 121, pp. 32-40, 1 Apr. 2017.
[7] G. Gan and M. K. Ng, "K-means clustering with outlier removal," Pattern Recognition Letter, vol. 90, pp. 8-14, Apr. 2017.
[8] M. M. Breunig, H. Kriegel, R. T. Ng, and J. Sander, "LOF: identifying density-based local outliers," in Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 93-104, Dalles, TX,USA, 15-18 May 2000.
[9] E. Schubert, A. Zimek, and H. Kriegel, "Local outlier detection reconsidered : a generalized view on locality with applications to spatial, video, and network outlier detection," Data Mining and Knowledge Discovery, vol. 28, no. 1, pp. 190-237, Jan. 2014.
[10] W. Jin, A. K. H. Tung, J. Han, and W. Wang, "Ranking outliers using symmetric neighborhood relationship," in Proc. of the 10th Pacific-Asia Conf.on Advances in Knowledge Discovery and Data Mining, Singapore, Singapore, 9-12 Apr. 2006.
[11] H. Kriegel, P. Kroger, E. Schubert, and A. Zimek, "LoOP: local outlier probabilities," in Proc. of the 18th ACM Conf. on Information and Knowledge Management, pp. 1649-1652, Hong Kong, China, 2-8 Nov. 2009.
[12] K. Zhang, M. Hutter, and H. Jin, "A new local distance-based outlier detection approach for scattered real-world data," in Advances in Knowledge Discovery and Data Mining, pp. 1-15, 2009.
[13] E. Schubert and H. Kriegel, "Generalized outlier detection with flexible kernel density estimates," in Proc. of the SIAM Int. Conf. on Data Mining, 9 pp., Philadelphia, PN, USA, 24-26 Apr. 2014.
[14] V. Hautam and K. Ismo, "Outlier detection using k-nearest neighbour graph," in Proc. of the 17th Int. Conf. on Pattern Recognition, ICPR'04, vol. 3, pp. 430-433, Cambridge, UK, 26-26 Aug. 2004.
[15] F. Angiulli and C. Pizzuti, "Fast Outlier Detection in High Dimensional Spaces," in Principles of Data Mining and Knowledge Discovery, pp. 15-27, 2002.
[16] S. Papadimitriou, P. B. Gibbons, C. Faloutsos, and C. Faloutsos, "LOCI: fast outlier detection using the local correlation integral," in Proc. 19th Int. Conf. on Data Engineering, pp. 315-326, Bangalore, India, 5-8 Mar. 2003.
[17] Q. Zhu, J. Feng, and J. Huang, "Natural neighbor: a self-adaptive neighborhood method without parameter K," Pattern Recognition Letter, vol. 80, no. 1, pp. 30-36, Sept. 2016.
[18] R. L. Burden, J. D. Faires, and A. M. Burden, Numerical Analysis, 10th Ed. Cengage Learning US, 2015.
[19] C. F. Gerald and P. O. Wheatley, Applied Numerical Analysis, 7th Ed. Pearson Addison-Wesley, 2006.
[20] A. Zimek, R. J. G. B. Campello, and J. Sander, "Ensembles for unsupervised outlier detection: challenges and research questions," ACM SIGKDD Explorations Newsletter, vol. 15, no. 1, pp. 11-22, Mar. 2014.
[21] G. O. Campos, et al., "On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study," Data Mining and Knowledge Discovery, vol. 30, no. 4, pp. 891-927, Jul. 2016.

مقالات مرتبط

تشخيص تغييرات صحنه به روش زمينه‏ گيری هوشمند
تاریخ چاپ : 1382/01/01
تخمين سرعت موتور القايي تكفاز و بهينه‎سازي گشتاور آن بدون استفاده از حسگر مكانيكي
تاریخ چاپ : 1382/03/31
طراحی بهينة موتور القائی سه فاز قفس سنجابی برای خودروی برقی
تاریخ چاپ : 1382/03/31
روشي نو در طراحي و ساخت سنكرونايزر الكترونيكي براساس قفل كردن فاز (PLL) جهت موازي كردن سريع ديزل‏ژنراتور‏ها
تاریخ چاپ : 1382/03/31
يك شيوه مداري جديد جهت حفاظت تريستورهاي قدرت سري
تاریخ چاپ : 1382/03/31
همكاري در سيستمهاي چند عامله با استفاده از اتوماتاهاي يادگير
تاریخ چاپ : 1382/03/31

اشتراک گذاری

آدرس مقاله

یک روش بدون پارامتر مبتنی بر نزدیکی برای تشخیص داده‌های پرت