استخراج مفاهیم کلیدی با استفاده از شبکه قاب و زنجیره مفاهیم
محورهای موضوعی : مهندسی برق و کامپیوترسودابه محمدی 1 * , کامبیز بدیع 2
1 - دانشگاه صنعتی کرمانشاه
2 - مرکز تحقیقات مخابرات ایران
کلید واژه: استخراج مفاهیم کلیدی تجزیهگر معنایی زنجیره مفاهیم شبکه قاب,
چکیده مقاله :
طی سالهای اخیر، رویکردهای متنوعی جهت استخراج خودکار کلمات و یا عبارات کلیدی ارائه شده است اما رویکردهای اندکی برای استخراج مفاهیم/ نکات کلیدی به طور خودکار وجود دارد که اغلب آنها نیز مبتنی بر متدهای آماری هستند. استخراج مفاهیم کلیدی فرایند شناسایی عباراتی است که بیانگر مفهوم اصلی متن هستند. در این مقاله رویکرد جدیدی جهت استخراج مفاهیم کلیدی با استفاده از شبکه قاب پیشنهاد شده که مبتنی بر پردازش زبان طبیعی است. در این رویکرد، تجزیه معنایی متن اصلی با استفاده از شبکه قاب صورت میگیرد و زنجیرههای مفاهیم ساخته میشوند. به هر مفهوم بردار امتیازی متشکل از چهار امتیاز که سه تای آنها مبتنی بر زنجیرههای مفاهیم هستند، نسبت داده میشود. در نهایت مفاهیمی که امتیاز آنها بیش از حد آستانه است، به عنوان مفاهیم کلیدی استخراج میشوند. سه حد آستانه متفاوت در این پژوهش مورد استفاده قرار گرفته و در نهایت با یکدیگر مقایسه میشوند. برای ارزیابی سیستم پیشنهادی از خبره استفاده میشود و معیارهای دقت و یادآوری بررسی میشوند. کاربرد مفاهیم کلیدی در مسایلی نظیر شاخصگذاری متون الکترونیکی، ساخت کتابخانههای دیجیتال، خلاصهسازی متون، موتورهای جستجو، خوشهبندی، دستهبندی و ... است.
During last years, many approaches have been presented for the automatic keyword or key phrase extraction. But there are a few approaches for the key concept or key point extraction and they are often based on the statistical methods. The key Concept extraction is a process to identify phrases referring to the concepts of the interests in an unstructured text. In this paper, a new approach has been proposed to the Key Concept Extraction (KCE) by using of FrameNet. This approach is based on the natural language processing methods. The FrameNet is used for shallow semantic parsing of the original texts. Then the concept chains are constructed. For each concept, a score vector with four elements is assigned. Three of them are based on the chains. As the final attempt, a set of concepts is extracted its score are greater than threshold. They contain the most important concept of the main text. The objective and the human-based subjective evaluation have been performed. Precision and recall criteria are investigated. The process of the automatic key concept extraction can be useful in the electronic document indexing, the digital libraries’ building, the categorizing, the text clustering and classifying, the summarizing and the searching.
[1] , General Release Note 1.5. https://framenet.icsi.berkeley.edu/fndrupal/ Accessed: on 12 Feb. 2016.
[2] T. N. Erekhinskaya and D. I. Moldovan, "Lexical chains on wordnet and extensions," in Proc. Twenty-sixth Int. Florida Artificial Intelligence Research Society Conf., pp. 52-57, May 2013.ُ
[3] F. Boudin and E. Morin, "Keyphrase extraction for N-best reranking in multi-sentence compression," in Proc. of the NAACL HLT Conf., pp. 298-305, Jun. 2013.
[4] J. R. Thomas, S. K. Bharti, and K. S. Babu, "Automatic keyword extraction for text summarization in e-newspapers," in Proc. ACM Int. Conf. on Informatics and Analytics., p. 86, Aug. 2016.
[5] E. D'Avanzo, B. Magnini, and A. Vallin, "Keyphrase extraction for summarization purposes: the LAKE system at DOC-2004," in Proc. Document Understanding Conf., 4 pp., May 2004.
[6] A. Onan, S. Korukoglu, and H. Bulut, "Ensemble of keyword extraction methods and classifiers in text classification," Expert Syst. Appl., vol. 57, pp. 232-247, Sept. 2016.
[7] P. Turney, "Learning algorithms for keyphrase extraction," Information Retrieval, vol. 2, no. 4, pp. 303-336, Boston, Sept. 2000.
[8] I. Witten, G. Paynter, and E. Frank, "KEA: practical automatic keyphrase extraction," in Proc. of the 4th ACM Conf. on Digital Libraries, pp. 254-255, Aug. 1999.
[9] F. Liu, D. Pennell, and Y. Liu, "Unsupervised approaches for automatic keyword extraction using meeting transcripts," in Proc. of Human Language Technologies: the 2009 Annual Conf. of the North American Chapter of the Association for Computational Linguistics, pp. 620-628, May 2009.
[10] R. Wang, W. Liu, and C. McDonald, "How preprocessing affects unsupervised keyphrase extraction," Proc. 15th Int. Conference on Computational Linguistics and Intelligent Text Processing , CICLing'014, pp. 163-176, Kathmandu, Nepal, 6-12 Apr. 2014.
[11] T. Tomokiyo and M. Hurst, "A language model approach to keyphrase extraction," in Proc. of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, vol. 18, pp. 33-40, Nov. 2003.
[12] G. Ercan and I. Cicekli, "Using lexical chains for keyword extraction," Information Processing & Management, vol. 43, no. 6, pp. 1705-1714, Sept. 2007.
[13] K. Sarkar, A Hybrid Approach to Extract Keyphrases from Medical Documents, arXiv Prepr. arXiv1303.1441, 2013.
[14] K. Sarkar, M. Nasipuri, and S. Ghose, A New Approach to Keyphrase Extraction Using Neural Networks, arXiv Prepr. arXiv1004.3274, 2010.
[15] K. Hasan and V. Ng, "Automatic keyphrase extraction: a survey of the state of the art," in Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1262-1273, 2014.
[16] S. Kim, O. Medelyan, M. Kan, and T. Baldwin, "Automatic keyphrase extraction from scientific articles," Lang. Resour. Eval., vol. 47, no. 3, pp. 723-742, Sept. 2013.
[17] S. Kim, O. Medelyan, M. Kan, and T. Baldwin, "Semeval-2010 task 5: automatic keyphrase extraction from scientific articles," in Proc. of the 5th Int. Workshop on Semantic Evaluation, Association for Computational Linguistics, pp. 21-26, Jul. 2010.
[18] F. Xie, X. Wu, and X. Zhu, "Document-specific keyphrase extraction using sequential patterns with wildcards," in Proc. IEEE Int. Conf. on Data Mining, ICDM'14, pp. 1055-1060, Dec. 2014.
[19] G. Salton, C. S. Yang, and C. T. Yu, "A theory of term importance in automatic text analysis," J. Am. Soc. Inf. Sci., vol. 26, no. 1, pp. 33-44, Jan. 1975.
[20] J. D. Cohen, "Highlights: language-and domain-independent automatic indexing terms for abstracting," J. Am. Soc. Inf. Sci., vol. 46, no. 3, p. 162, Apr. 1995.
[21] M. Ortuno, P. Carpena, P. Bernala-Galvan, E. Munoz, and A. M. Somoza, "Keyword detection in natural languages and DNA," Europhysics Lett., vol. 57, no. 5, pp. 759-764, Mar. 2002.
[22] S. Beliga, A. Mestrovic, and S. Martincic-Ipsic, "An overview of graph-based keyword extraction methods and approaches," Inf. Organ. Sci., vol. 39, no. 1, pp. 1-20, Jun. 2015.
[23] R. Hussey, S. Williams, and R. Mitchell, "Automatic keyphrase extraction: a comparison of methods," in Proc. Int. Conf. on Information Processing and Knowledge Management, pp. 18-23, Jan. 2012.
[24] Y. HaCohen-Kerner, S. Vrochidis, D. Liparas, A. Moumtzidou, and I. Kompatsiaris, "Keyphrase extraction using textual and visual features," in Proc. 25th Int. Conf. Comput. Linguist., pp. 121-123, Aug. 2014.
[25] Y. Zhang, R. Mukherjee, and B. Soetarman, "Concept extraction and e-commerce applications," Electronic Commerce Research and Applications, vol. 12, no. 4, pp. 289-296, Aug. 2013.
[26] D. Glinos, Syntax-Based Concept Extraction for Question Answering, Doctoral Dissertation, University of Central Florida Orlando, Florida, 2006.
[27] N. A. Bennett, Q. He, C. T. K. Chang, and B. R. Schatz, Concept Extraction in the Interspace Prototype, Urbana Champaign, 1999.
[28] B. Gelfand, M. Wulfekuler, and W. Punch, "Automated concept extraction from plain text," in Proc. AAAI Workshop on Text Categorization, pp. 13-17, Jul. 1998.
[29] P. M. Ramirez and C. A. Mattmann, "ACE: improving search engines via automatic concept extraction," in Proc. Int. Conf. on Information Reuse and Integration, IRI, pp. 229-234, Nov. 2004.
[30] S. Mohamadi, K. Badie, and A. Moeini, "Using frame-based lexical chains for extracting key points from texts," in Proc. the 3rd Int. Conf. on Creteave Content Technologies, CONTENT'11, pp. 68-73, Sept. 2011.
[31] R. Barzilay and M. Elhadad, "Using lexical chains for text summarization," in I. Mani and M. T. Maybury, Eds., Advances in Automatic Text Summarization, pp. 111-121, The MIT Press, Cambridge, 1999.
[32] S. Mohamadi and K. Badie, "Extracting key concept from English texts by the use of FrameNet," in Proc. 17th National CSI Computer Conf., pp. 384-389, , Mar. 2012. [in Persian]
[33] M. Ajgalik, M. Barla, and M. Bielikova, "From ambiguous words to key-concept extraction," in Proc.-Int. Workshop on Database and Expert Systems Applications, DEXA'13, pp. 63-67, Aug. 2013.
[34] Y. Liu, M. Shi, and C. Li, "Domain ontology concept extraction method based on text," in Proc. 15th In. Conf. on Computer and Information Science, ICIS'16, 5 pp., Jun. 2016.
[35] M. Hearst, "TextTiling: segmenting text into multi-paragraph subtopic passages," Computational Linguistics, vol. 23, no. 1, pp. 33-64, Mar. 1997.
[36] F. Choi, "Advances in domain independent linear text segmentation," in Proc. of the 1st North American Chapter of the Association for Computational Linguistics Conf., pp. 26-33, Apr. 2000.
[37] D. Das, D. Chen, A. Martins, N. Schneider, and N. A. Smith, "Frame-semantic parsing," MIT Press J., Comput. Linguist., vol. 40, no. 1, pp. 9-56, Mar. 2014.
[38] K. Erk and S. Pado, "Shalmaneser-a toolchain for shallow semantic parsing," in Proc. of LREC, vol. 6, pp. 527-532, May 2006.
[39] R. Johannson, Language Technology at LTH, Lund University. http://nlp.cs.lth.se/software. Accessed on 12 Dec. 2016.
[40] R. Johansson and P. Nugues, "LTH: semantic structure extraction using nonprojective dependency trees," in Proc. of the 4th Int. Workshop on Semantic Evaluations, Association for Computational Linguistics, pp. 227-230, Jun. 2007.
[41] D. Das, No Title, Noah Smith’s NLP Group at Carnegie Mellon University, http://www.ark.cs.cmu.edu/SEMAFOR. Accessed on 2 Dec. 2016.
[42] D. Das, N. Schneider, D. Chen, and N. A. Smith, SEMAFOR 1.0: A Probabilistic Frame-Semantic Parser, Lang. Technol. Institute, Sch. Comput. Sci. Carnegie Mellon Univ., 2010.
[43] W. Gale, K. W. Church, and D. Yarowsky, "Estimating upper and lower bounds on the performance of word-sense disambiguation programs," in Proc. of 30th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp. 249-256, Jun. 1992.
[44] H. G. Silber and K. F. McCoy, "Efficient text summarization using lexical chains," in Proc. of 5th Int. Conf. on Intelligent User Interfaces, pp. 252-255, Jan. 2000.