نظرکاوی افزايشی با استفاده از یادگيری فعال بر روی جريان متون
محورهای موضوعی : مهندسی برق و کامپیوتر
1 - دانشگاه خوانسار
کلید واژه: جريان دادههارانش مفهومنظرکاوییادگيری افزايشییادگيری فعال,
چکیده مقاله :
نظرکاوی امروزه به عنوان یکی از کاربردهای پراهمیت پردازش زبان طبیعی مطرح است که به دلیل بالابودن حجم و نرخ نظرات تولیدشده نیاز به روشهای ويژهای برای پردازش دارد. امروزه با توجه به ماهيت جريان دادهای نظرات کاربران در شبکههای اجتماعی و سایتهای تجارت الکترونيکی، استفاده از الگوريتمهای دستهبندی غير افزايشی باعث میگردد به مرور زمان کارايي مدل يادگرفتهشده برای کاوش نظرات کاهش یافته و عملاً غير قابل استفاده شود. علاوه بر این به دليل نامحدودبودن تعداد نظرات، امکان برچسبگذاری تمام نظرات برای ایجاد نمونههای آموزشی جديد و به روزرسانی مدل یادگرفتهشده وجود ندارد. از آنجا که ممکن است نظرات جديد دارای واژگان جديد بوده و یا توزيع دستههای قطبيت تغيير کند، رانش مفهوم نيز میبايست در نظرکاوی افزايشی پشتيبانی گردد. در اين مقاله یک روش جدید برای یادگيری قطبيت متون به صورت افزايشی ارائه میگردد که با استفاده از یادگيری فعال جریان دادهای، متون ارزشمند برای بهروز رسانی مدل دستهبندی را انتخاب میکند و پس از تعيين برچسب آنها توسط متخصص انسانی، از آنها برای بهبود مدل دستهبندی بهره میگيرد. روش پيشنهادی به صورت برخط و بدون نياز به ذخيره متون، با استفاده از تعداد محدودی متون برچسبخورده آموزش میبیند و قادر به تشخيص و پشتيبانی از رانش مفهوم میباشد. روش پيشنهادی با روشهای شاخص افزايشی و غير افزايشی، با استفاده از مجموعه دادههای معتبر و معيارهای ارزيابی استاندارد مقايسه و ارزيابی میشود.
Today, opinion mining is one the most important applications of natural language processing which requires special methods to process documents due to the high volume of comments produced. Since the users’ opinions on social networks and e-commerce websites constitute an evolving stream, the application of traditional non-incremental classification algorithm for opinion mining leads to the degradation of the classification model as time passes. Moreover, because the users’ comments are massive, it is not possible to label enough comments to build training data for updating the learned model. Another issue in incremental opinion mining is the concept drift that should be supported to handle changing class distributions and evolving vocabulary. In this paper, a new incremental method for polarity detection is proposed which with the application of stream-based active learning selects the best documents to be labeled by experts and updates the classifier. The proposed method is capable of detecting and handling concept drift using a limited labeled data without storing the documents. We compare our method with the state of the art incremental and non-incremental classification methods using credible datasets and standard evaluation measures. The evaluation results show the effectiveness of the proposed method for polarity detection of opinions.
[1] K. Ravi and V. Ravi, "A survey on opinion mining and sentiment analysis: tasks, approaches and applications," Knowledge-Based Syst., vol. 89, pp. 14-46, Nov. 2015.
[2] J. A. Balazs and J. D. Velasquez, "Opinion mining and information fusion: a survey," Inf. Fusion, vol. 27, pp. 95-110, Jan. 2016.
[3] B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up?: sentiment classification using machine learning techniques," in Proc. of the ACL-02 Conf. on Empirical Methods in Natural Language Processing, vol. 10, pp. 79-86, 2002.
[4] R. McDonald, K. Hannan, T. Neylon, M. Wells, and J. Reynar, "Structured models for fine-to-coarse sentiment analysis," in Proc. of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 432-439, Prague, Czech Republic, 2007.
[5] Y. Dang, Y. Zhang, and H. Chen, "A lexicon-enhanced method for sentiment classification: an experiment on online product reviews," IEEE Intell. Syst., vol. 25, no. 4, pp. 46-53, Jul./Aug. 2010.
[6] J. Blitzer, M. Dredze, and F. Pereira, "Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification," in Proc. of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 440-447, 2007.
[7] M. Rushdi-Saleh, M. T. Maritn-Valdivia, A. M. Raez, and L. A. U. Lpez, "Experiments with SVM to classify opinions in different domains," Expert Syst. Appl., vol. 38, no. 12, pp. 14799-14804, Nov. /Dec. 2011.
[8] B. Pang and L. Lee, "A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts," in Proc. of the 42nd Annual Meeting on Association for Computational Linguistics, Article No. 271, 8 pp., Barcelona, Spain, 21-26 Jul. 2004.
[9] M. Taboada and J. Grieve, "Analyzing appraisal automatically," in Proc. of the AAAI Spring Symp. on Exploring Attitude and Affect in Text: Theories and Applications, pp. 158-161, Mar. 2004.
[10] X. Bai, "Predicting consumer sentiments from online text," Decis. Support Syst., vol. 50, no. 4, pp. 732-742, Mar. 2011.
[11] Z. Zhang, Q. Ye, Z. Zhang, and Y. Li, "Sentiment classification of internet restaurant reviews written in Cantonese," Expert Syst. Appl., vol. 38, no. 6, pp. 7674-7682, Jun. 2011.
[12] L. K. W. Tan, J. C. Na, Y. L. Theng, and K. Chang, "Phrase-level sentiment polarity classification using rule-based typed dependencies and additional complex phrases consideration," J. Comput. Sci. Technol., vol. 27, no. 3, pp. 650-666, Jan. 2012.
[13] G. Wang, J. Sun, J. Ma, K. Xu, and J. Gu, "Sentiment classification: the contribution of ensemble learning," Decis. Support Syst., vol. 57, pp. 77-93, Jan. 2014.
[14] R. Moraes, J. F. Valiati, and W. P. G. Neto, "Document-level sentiment classification: an empirical comparison between SVM and ANN," Expert Syst. Appl., vol. 40, no. 2, pp. 621-633, 1 Feb. 2013.
[15] A. S. H. Basari, B. Hussin, I. G. P. Ananta, and J. Zeniarja, "Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization," Procedia Eng., vol. 53, pp. 453-462, 2013.
[16] M. Ghiassi, J. Skinner, and D. Zimbra, "Twitter brand sentiment analysis: a hybrid system using N-gram analysis and dynamic artificial neural network," Expert Syst. Appl., vol. 40, no. 16, pp. 6266-6282, Nov. 2013.
[17] J. Smailovic, M. Grcar, N. Lavrac, and M. Znidarsic, "Stream-based active learning for sentiment analysis in the financial domain," Inf. Sci., vol. 285, pp. 181-203, 20 Nov. 2014.
[18] M. Zimmermann, E. Ntoutsi, and M. Spiliopoulou, "Incremental active opinion learning over a stream of opinionated documents," Proc. Workshop on Issues of Sentiment Discovery and Opinion Mining, WISDOM'15, 10 pp., Sydney, Australia, 10 Aug. 2015.
[19] E. Serrao and M. Spiliopoulou, "Active stream learning with an oracle of unknown availability for sentiment prediction," in Proc. Workshop on Interactive Adaptive Learning, IAL'18, pp. 36-47, Dublin, Ireland, 2018.
[20] I. Zliobaite, A. Bifet, B. Pfahringer, and G. Holmes, "Active learning with drifting streaming data," IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 1, pp. 27-39, Jan. 2014.
[21] A. McCallum and K. Nigam, "A comparison of event models for Naive Bayes text classification," in Proc. AAAI-98 Workshop on Learning for Text Categorization, pp. 41-48, 1998.
[22] J. D. Rennie, L. Shih, J. Teevan, and D. R. Karger, "Tackling the poor assumptions of Naive Bayes text classifiers," in Proc. of the 20th Int. Conf. on Machine Learning ICML'03, pp. 616-623, Washington, DC, USA, 21-24 Nov. 2003.
[23] T. R. Hoens, R. Polikar, and N. V. Chawla, "Learning from streaming data with concept drift and imbalance: an overview," Prog. AI, vol. 1, no. 1, pp. 89-101, 2012.
[24] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with drift detection," in Proc. of Advances in Artificial Intelligence, SBIA, vol. 3171, pp. 286-295, 2004.
[25] I. H. Witten, E. Frank, L. Trigg, M. Hall, G. Holmes, and S. J. Cunningham, "Weka: practical machine learning tools and techniques with java implementations," 1999.
[26] G. H. John and P. Langley, "Estimating continuous distributions in bayesian classifiers," in Proc. 11th Conf. on Uncertainty in Artificial Intelligence, pp. 338-345, Montreal, QC, Canada, 18-20 Aug. 1995.
[27] R. Quinlan, C4.5: Programs for Machine Learning, San Mateo, CA: Morgan Kaufmann Publishers, 1993.
[28] W. W. Cohen, "Fast effective rule induction," in Proc. 12th Int. Conf. on Machine Learning, pp. 115-123, Tahoe City, CA, USA, 9-12 Jul. 1995.
[29] C. C. Chang and C. J. Lin, LIBSVM-A Library for Support Vector Machines, 2001.