نظرکاوی افزايشی با استفاده از یادگيری فعال بر روی جريان متون

محورهای موضوعی : مهندسی برق و کامپیوتر

1 - دانشگاه خوانسار

تاریخ دریافت : 1397/03/13 تاریخ پذیرش : 1397/07/29 تاریخ انتشار : 1398/01/31

کلید واژه: جريان داده‌هارانش مفهومنظرکاوییادگيری افزايشییادگيری فعال,

چکیده مقاله :

نظرکاوی امروزه به عنوان یکی از کاربردهای پراهمیت پردازش زبان طبیعی مطرح است که به دلیل بالابودن حجم و نرخ نظرات تولیدشده نیاز به روش‌های ويژه‌ای برای پردازش دارد. امروزه با توجه به ماهيت جريان داده‌ای نظرات کاربران در شبکه‌‌های اجتماعی و سایت‌های تجارت الکترونيکی، استفاده از الگوريتم‌های دسته‌بندی غير افزايشی باعث می‌گردد به مرور زمان کارايي مدل يادگرفته‌شده برای کاوش نظرات کاهش یافته و عملاً غير قابل استفاده شود. علاوه بر این به دليل نامحدودبودن تعداد نظرات، امکان برچسب‌گذاری تمام نظرات برای ایجاد نمونه‌های آموزشی جديد و به روزرسانی مدل یادگرفته‌شده وجود ندارد. از آنجا که ممکن است نظرات جديد دارای واژگان جديد بوده و یا توزيع دسته‌های قطبيت تغيير کند، رانش مفهوم نيز می‌بايست در نظرکاوی افزايشی پشتيبانی گردد. در اين مقاله یک روش جدید برای یادگيری قطبيت متون به صورت افزايشی ارائه می‌گردد که با استفاده از یادگيری فعال جریان‌ داده‌ای، متون ارزشمند برای به‌روز رسانی مدل دسته‌بندی را انتخاب می‌کند و پس از تعيين برچسب آنها توسط متخصص انسانی، از آنها برای بهبود مدل دسته‌بندی بهره می‌گيرد. روش پيشنهادی به صورت برخط و بدون نياز به ذخيره متون، با استفاده از تعداد محدودی متون برچسب‌خورده آموزش می‌بیند و قادر به تشخيص و پشتيبانی از رانش مفهوم می‌باشد. روش پيشنهادی با روش‌های شاخص افزايشی و غير افزايشی، با استفاده از مجموعه داده‌های معتبر و معيارهای ارزيابی استاندارد مقايسه و ارزيابی می‌شود.

چکیده انگلیسی:

Today, opinion mining is one the most important applications of natural language processing which requires special methods to process documents due to the high volume of comments produced. Since the users’ opinions on social networks and e-commerce websites constitute an evolving stream, the application of traditional non-incremental classification algorithm for opinion mining leads to the degradation of the classification model as time passes. Moreover, because the users’ comments are massive, it is not possible to label enough comments to build training data for updating the learned model. Another issue in incremental opinion mining is the concept drift that should be supported to handle changing class distributions and evolving vocabulary. In this paper, a new incremental method for polarity detection is proposed which with the application of stream-based active learning selects the best documents to be labeled by experts and updates the classifier. The proposed method is capable of detecting and handling concept drift using a limited labeled data without storing the documents. We compare our method with the state of the art incremental and non-incremental classification methods using credible datasets and standard evaluation measures. The evaluation results show the effectiveness of the proposed method for polarity detection of opinions.

منابع و مأخذ:

[1] K. Ravi and V. Ravi, "A survey on opinion mining and sentiment analysis: tasks, approaches and applications," Knowledge-Based Syst., vol. 89, pp. 14-46, Nov. 2015.
[2] J. A. Balazs and J. D. Velasquez, "Opinion mining and information fusion: a survey," Inf. Fusion, vol. 27, pp. 95-110, Jan. 2016.
[3] B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up?: sentiment classification using machine learning techniques," in Proc. of the ACL-02 Conf. on Empirical Methods in Natural Language Processing, vol. 10, pp. 79-86, 2002.
[4] R. McDonald, K. Hannan, T. Neylon, M. Wells, and J. Reynar, "Structured models for fine-to-coarse sentiment analysis," in Proc. of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 432-439, Prague, Czech Republic, 2007.
[5] Y. Dang, Y. Zhang, and H. Chen, "A lexicon-enhanced method for sentiment classification: an experiment on online product reviews," IEEE Intell. Syst., vol. 25, no. 4, pp. 46-53, Jul./Aug. 2010.
[6] J. Blitzer, M. Dredze, and F. Pereira, "Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification," in Proc. of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 440-447, 2007.
[7] M. Rushdi-Saleh, M. T. Maritn-Valdivia, A. M. Raez, and L. A. U. Lpez, "Experiments with SVM to classify opinions in different domains," Expert Syst. Appl., vol. 38, no. 12, pp. 14799-14804, Nov. /Dec. 2011.
[8] B. Pang and L. Lee, "A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts," in Proc. of the 42nd Annual Meeting on Association for Computational Linguistics, Article No. 271, 8 pp., Barcelona, Spain, 21-26 Jul. 2004.
[9] M. Taboada and J. Grieve, "Analyzing appraisal automatically," in Proc. of the AAAI Spring Symp. on Exploring Attitude and Affect in Text: Theories and Applications, pp. 158-161, Mar. 2004.
[10] X. Bai, "Predicting consumer sentiments from online text," Decis. Support Syst., vol. 50, no. 4, pp. 732-742, Mar. 2011.
[11] Z. Zhang, Q. Ye, Z. Zhang, and Y. Li, "Sentiment classification of internet restaurant reviews written in Cantonese," Expert Syst. Appl., vol. 38, no. 6, pp. 7674-7682, Jun. 2011.
[12] L. K. W. Tan, J. C. Na, Y. L. Theng, and K. Chang, "Phrase-level sentiment polarity classification using rule-based typed dependencies and additional complex phrases consideration," J. Comput. Sci. Technol., vol. 27, no. 3, pp. 650-666, Jan. 2012.
[13] G. Wang, J. Sun, J. Ma, K. Xu, and J. Gu, "Sentiment classification: the contribution of ensemble learning," Decis. Support Syst., vol. 57, pp. 77-93, Jan. 2014.
[14] R. Moraes, J. F. Valiati, and W. P. G. Neto, "Document-level sentiment classification: an empirical comparison between SVM and ANN," Expert Syst. Appl., vol. 40, no. 2, pp. 621-633, 1 Feb. 2013.
[15] A. S. H. Basari, B. Hussin, I. G. P. Ananta, and J. Zeniarja, "Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization," Procedia Eng., vol. 53, pp. 453-462, 2013.
[16] M. Ghiassi, J. Skinner, and D. Zimbra, "Twitter brand sentiment analysis: a hybrid system using N-gram analysis and dynamic artificial neural network," Expert Syst. Appl., vol. 40, no. 16, pp. 6266-6282, Nov. 2013.
[17] J. Smailovic, M. Grcar, N. Lavrac, and M. Znidarsic, "Stream-based active learning for sentiment analysis in the financial domain," Inf. Sci., vol. 285, pp. 181-203, 20 Nov. 2014.
[18] M. Zimmermann, E. Ntoutsi, and M. Spiliopoulou, "Incremental active opinion learning over a stream of opinionated documents," Proc. Workshop on Issues of Sentiment Discovery and Opinion Mining, WISDOM'15, 10 pp., Sydney, Australia, 10 Aug. 2015.
[19] E. Serrao and M. Spiliopoulou, "Active stream learning with an oracle of unknown availability for sentiment prediction," in Proc. Workshop on Interactive Adaptive Learning, IAL'18, pp. 36-47, Dublin, Ireland, 2018.
[20] I. Zliobaite, A. Bifet, B. Pfahringer, and G. Holmes, "Active learning with drifting streaming data," IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 1, pp. 27-39, Jan. 2014.
[21] A. McCallum and K. Nigam, "A comparison of event models for Naive Bayes text classification," in Proc. AAAI-98 Workshop on Learning for Text Categorization, pp. 41-48, 1998.
[22] J. D. Rennie, L. Shih, J. Teevan, and D. R. Karger, "Tackling the poor assumptions of Naive Bayes text classifiers," in Proc. of the 20th Int. Conf. on Machine Learning ICML'03, pp. 616-623, Washington, DC, USA, 21-24 Nov. 2003.
[23] T. R. Hoens, R. Polikar, and N. V. Chawla, "Learning from streaming data with concept drift and imbalance: an overview," Prog. AI, vol. 1, no. 1, pp. 89-101, 2012.
[24] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with drift detection," in Proc. of Advances in Artificial Intelligence, SBIA, vol. 3171, pp. 286-295, 2004.
[25] I. H. Witten, E. Frank, L. Trigg, M. Hall, G. Holmes, and S. J. Cunningham, "Weka: practical machine learning tools and techniques with java implementations," 1999.
[26] G. H. John and P. Langley, "Estimating continuous distributions in bayesian classifiers," in Proc. 11th Conf. on Uncertainty in Artificial Intelligence, pp. 338-345, Montreal, QC, Canada, 18-20 Aug. 1995.
[27] R. Quinlan, C4.5: Programs for Machine Learning, San Mateo, CA: Morgan Kaufmann Publishers, 1993.
[28] W. W. Cohen, "Fast effective rule induction," in Proc. 12th Int. Conf. on Machine Learning, pp. 115-123, Tahoe City, CA, USA, 9-12 Jul. 1995.
[29] C. C. Chang and C. J. Lin, LIBSVM-A Library for Support Vector Machines, 2001.

اشتراک گذاری

آدرس مقاله

نظرکاوی افزايشی با استفاده از یادگيری فعال بر روی جريان متون