‌آنالیز حس اسناد فارسی با طراحی حوزه تبدیل بهینه

محورهای موضوعی : مهندسی برق و کامپیوتر

آصف پورمعصومی ^{1
*} , هادی صدوقی یزدی ² , هادی قائمی ³ , زهرا دلخسته ⁴

1 - دانشگاه فردوسی مشهد
2 - دانشگاه فردوسی مشهد
3 - دانشگاه فردوسی مشهد
4 - دانشگاه فردوسی مشهد

تاریخ دریافت : 1396/04/22 تاریخ پذیرش : 1396/04/22 تاریخ انتشار : 1395/06/31

کلید واژه: آنالیز حس حوزه تبدیل حداکثرکردن انرژی طیفی,

چکیده مقاله :

با توسعه تعاملات مبتنی بر وب نظیر نظرسنجی‌ها، وبلاگ‌های شخصی و شبکه‌های اجتماعی، آنالیز حس و یا کاوش عقیده به یکی از حوزه‌های تحقیقاتی مهم در علوم کامپیوتر تبدیل شده است. تا کنون روش‌های زیادی مبتنی بر یادگیری ماشین و همچنین پردازش زبان طبیعی در ارتباط با آنالیز حس ارائه شده است. در این مقاله از توزیع کلمات در مجموعه اسناد جمع‌آوری شده به عنوان معیاری جدید برای تشخیص حس جمله استفاده شده است. در روش پیشنهادی با طراحی حوزه تبدیل بهینه مناسب روی توزیع کلمات، دو هدف حداکثرکردن انرژی طیفی کلاس 1 در فرکانس‌های پایین و حداکثرکردن انرژی طیفی کلاس 2 در فرکانس‌های بالا دنبال می‌شود. با طراحی حوزه تبدیل بهینه، داده‌ها از حوزه فراوانی به حوزه فوریه نگاشت می‌شوند. با این تبدیل بهینه، جداسازی الگوهای دوکلاسی از مفاهیم خوش‌بینی و بدبینی در حوزه تبدیل به راحتی امکان‌پذیر خواهد بود. برای محقق‌شدن مدل ریاضی، استراتژی استفاده از پروفایل نمونه‌ها روی همه نمونه‌های سیگنال نماینده کلاس 1 ارائه شده و مسأله حل می‌شود. طیف این پروفایل دارای مؤلفه‌های فرکانس پایین می‌باشد که با فرض تضاد طیفی دوکلاسی 1 و 2، حداکثرکردن انرژی طیفی کلاس 2 نیز ارضا می‌گردد. این روش به روی متون با زبان فارسی و انگلیسی اجرا شده است.

چکیده انگلیسی:

With development of web-based interactions such as social networks, personal blogs, surveys and user comments, sentiment analysis and opinion mining has become an important research domain in computer science. Up to now, many approaches have been proposed for analysis of sense using machine learning and natural language processing techniques. In this paper, we used the distribution of words in the collection of documents as new criteria for analyzing sentiment. In proposed approach, we model an optimal transform domain over words distribution with two goals: maximizing spectral energy of class at low frequencies and maximizing spectral energy of at high frequencies. Using optimal transform domain, we can map data from frequency domain into Fourier domain and easily distinguish optimism and pessimism patterns. For this purpose, we use samples’ profiles of class which have low-frequency components. Assuming the contrast of the spectrum of two classes and, maximizing the spectral energy of class will be satisfied. We have performed this approach for English and Persian documents.

منابع و مأخذ:

[1] B. Pang and L. Lee, "Opinion mining and sentiment analysis," Foundations and Trends in Information Retrieval 2(1-2), vol. 2, no. 1-2, pp. 1-135, 07 Jul 2008..
[2] W. Medhat, A. Hassan, and H. Korashy, "Sentiment analysis algorithms and applications: a survey," Ain Shams Engineering J., vol. 5, no. 4, pp. 1093-1113, Dec. 2014.
[3] R. F. Bruce and J. M. Wiebe, "Recognizing subjectivity: a case study in manual tagging," Natural Language Engineering, vol. 5, no. 2, pp. 187-205, Jun. 1999.
[4] K. Dave, S. Lawrence, and D. M. Pennock, "Mining the peanut gallery: opinion extraction and semantic classification of product reviews," in Proc. of 12th Int. Conf. on World Wide Web, WWW'03, pp. 519-528, 2003.
[5] O. Nasraoui, "Book review: web data mining-exploring hyperlinks, contents, and usage data," ACM SIGKDD Explorations Newsletter, vol. 10, no. 2, pp. 23-25, Dec. 2008.
[6] B. Liu, Sentiment Analysis and Subjectivity, Handbook of Natural Language Processing, 2010.
[7] T. Wilson, J. Wiebe, and P. Hoffmann, "Recognizing contextual polarity in phrase-level sentiment analysis," in Proc. of the Conf. on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347-354, 2006.
[8] S. M. Liu and J. H. Chen, "A multi-label classification based approach for sentiment classification," Expert Systems with Applications, vol. 42, no. 3, pp. 1083-1093, 15 Feb 2015.
[9] L. Yung-Ming and L. Tsung-Ying, "Deriving market intelligence from microblogs," Decision Support Systems, vol. 55, no. 1, pp. 206-217, Apr. 2013.
[10] R. Moraes, J. F. Valiati, and W. P. GaviãO Neto, "Document-level sentiment classification: an empirical comparison," Between SVM and ANN," Expert Systems with Applications, vol. 40, no 2, pp.621-633, Feb. 2013.
[11] F. L. Cruz, J. A. Troyano, F. EnríQuez, F. J. Ortega, and C. G. Vallejo, "Long autonomy or long delay? the importance of domain in opinion mining," Expert Systems with Applications, vol. 40, no. 8, pp. 3174-3184, Jun. 2013.
[12] M. Taboada, Lexicon-Based Methods for Sentiment Analysis, Association for Computational Linguistics, 2011.
[13] R. M. Tong, "An operational system for detecting and tracking opinions in on-line discussions," in Working Notes of the ACM SIGIR Workshop on Operational Text Classification, 6 pp., Mar. 2001.
[14] P. Turney and M. Littman, "Measuring praise and criticism: inference of semantic orientation from association," ACM Trans. on Information Systems, vol. 21, no. 4, pp. 315-346, Sep. 2003.
[15] Y. Dang, Y. Zhang, and H. Chen, "A lexicon enhanced method for sentiment classification: an experiment on online product reviews," IEEE Intelligent Systems, vol. 25, no. 4, pp. 46-53, Aug. 2010.
[16] P. Rudy and M. Thelwall, "Sentiment analysis: a combined approach," J. of Informetrics, vol. 3, no. 2, pp. 143-157, Apr. 2009.
[17] S. Dasgupta and V. Ng, "Mine the easy, classify the hard: a semi-supervised approach to automatic sentiment classification," in Proc. of ACL-IJCNLP, vol. 2, pp. 701-709, 2009.
[18] E. Kouloumpis, "Twitter sentiment analysis: the good the bad and the OMG!," in Proc. 5th Int. AAAI Conf. on Weblogs and Social Media, pp. 538-541, Barcelona, Catalonia, Spain, 17-21 Jul. 2011.
[19] C. Tan, et al., "User-level sentiment analysis incorporating social networks," in Proc, of the 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, PP. 1397-1405, 2011.
[20] D. Rao and D. Ravichandran, "Semi supervised polarity lexicon induction," in Proc. of the European Chapter of the Association for Computational Linguistics, EACL'09, pp. 675-682, Apr 2009.
[21] O. Tackstrom and R. McDonald, "Semi-supervised latent variable models for sentence-level sentiment analysis," in Proc. of the 49th Annual Meeting of the ACL: Human Language Technologies, HLT'11, vol. 2, pp. 569-57, 2011.
[22] P. Galeas, R. Kretschmer, and B. Freisleben, "Document relevance evaluation via term distribution analysis using Fourier series expansion," in Proc. of the 9th ACM/IEEE-CS Joint Conf. on Digital libraries, pp. 277-284, Mar. 2009.
[23] A. F. Laurence, K. Ramamohanarao, and M. Palaniswami, "Fourier domain scoring: a novel document ranking method," in IEEE Trans. Knowledge and Data Engineering, vol. 16, no. 5, pp. 529-539, May 2004.
[24] S. Steven, "Chapter 8: the discrete Fourier transform," The Scientist and Engineer's Guide to Digital Signal Processing, 2nd Ed., San Diego, CA, USA: California Technical Publishing, 1999.
[25] M. R. Spiegel, Schaum's Outline of Theory and Problems of Fourier Analysis, New York, NY, USA: McGraw Hill, 1974.
[26] S. Mallat, A Wavelet Tour of Signal Processing, New York, NY, USA: Academic Press, 1999.
[27] E. C. Mundim, H. A. Schots, and J. M. Araujo, "WTdecon, a colored deconvolution implemented by wavelet transform," The Leading Edge, vol. 25, no. 4, pp. 398-401, Apr. 2006.
[28] B. Pang and L. Lillian, "Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales," in Proc. of the 43rd Annual Meeting on Association for Computational Linguistics, ACL'05, pp. 115-124, 2005.
[29] M. Hu and B. Liu, "Mining and summarizing customer reviews," in Proc. of the 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD'04, pp. 168-177, 2004.
[30] H. Cunningham, K. Humphreys, R. Gaizauskas, and Y. Wilks, Developing Language Processing Components with GATE Version 8, University of Sheffield Department of Computer Science, Nov. 2014.
[31] T. Nakagawa, K. Inui, and S. Kurohashi, "Dependency tree-based sentiment classification using CRFs with hidden variables," in Proc. The 2010 Annual Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologie, pp. 786-794, 2010.
[32] R. Socher, B. Huval, C. Manning, and A. Ng, "Semantic compositionality through recursive matrix-vector spaces," in Proc. of the 2012 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP'12, pp. 786-794, 2010.
[33] R. Socher, A. Perelygin, J. Wu, J. Chuang, C. Manning, A. Ng, and C. Potts, "Recursive deep models for semantic compositionality over a sentiment treebank," in Proc. of the 2013 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP'13, pp. 121-135, Aug. 2012.
[34] L. Dong, F. Wei, S. Liu, M. Zhou, and K. Xu, "A Statistical Parsing Framework for Sentiment Classification," Computational Linguistics, vol. 14, no. 2, pp. 293-336, Jun 2014.
[35] Y. Kim, "Convolutional neural networks for sentence classification," in Proc. of the 2014 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP'14, pp. 135-151, Sep. 2014.

مقالات مرتبط

یک رهیافت فرااکتشافی چندهدفه برای بهبود پوشش و اتصال در شبکه‌های حسگر بی‌سیم
تاریخ چاپ : 1405/02/22
رویکرد ارزیابی هیجان نوین جهت مراقبت از سرطان مبتنی بر مدل‌های زبانی بزرگ
تاریخ چاپ : 1405/02/22
ارائه روشی برای مدیریت منابع در شبکه‌های Fog-DSDN با بهره‌گیری از معماری میکروسرویس و شبکه‌های ESN
تاریخ چاپ : 1405/02/22
چارچوب ترکیبی سبک‌وزن برای امنیت اینترنت اشیا با استفاده از جنگل تصادفی بهینه و انتخاب ویژگی تطبیقی در معماری لبه-ابری
تاریخ چاپ : 1405/02/22
یک چارچوب یادگیری نیمه‌نظارتی جهت دسته‌بندی دقیق موارد آزمون با بهره‌گیری از تعبیه‌های زبانی و ویژگی‌های معنایی متن
تاریخ چاپ : 1405/02/22
تکنیک هوشمند مبتنی بر الگوریتم چتر دریایی برای زمان‌بندی وظایف بر اساس اولویت در شبکه‌های IoT/Fog
تاریخ چاپ : 1405/02/22

اشتراک گذاری

آدرس مقاله

‌آنالیز حس اسناد فارسی با طراحی حوزه تبدیل بهینه