افزایش سرعت جستجو در مدل‌‌های مبتنی بر مجاورت

محورهای موضوعی : مهندسی برق و کامپیوتر

جواد پاک‌سيما ^{1
*} , علي‌محمد زارع بيدكي ² , ولي درهمي ³

1 - دانشگاه يزد
2 - دانشگاه يزد
3 - دانشگاه يزد

تاریخ دریافت : 1396/06/17 تاریخ پذیرش : 1396/06/17 تاریخ انتشار : 1396/06/30

کلید واژه: موتور جستجو رتبه‌بندی فاصله مدل مجاورت سرعت بازیابی,

چکیده مقاله :

یکی از اصلی‌‌ترین چالش‌‌های مدل‌‌های مبتنی بر مجاورت مسأله سرعت بازیابی اطلاعات می‌‌باشد. در مدل‌‌های مبتنی بر مجاورت مفهومی به نام فاصله تعریف می‌‌شود که برای محاسبه آن باید موقعیت کلمات پرس و جو در سند استخراج شود. این موضوع یعنی استخراج موقعیت‌‌ها و محاسبه فاصله‌‌ها فرایندی زمان‌‌بر است و چون غالباً در زمان جستجو اجرا می‌‌شود از دید کاربر اهمیت بیشتری دارد. در صورتی که بتوان تعداد اسناد مورد بررسی را کاهش داد بازیابی سریع‌‌تر می‌‌شود. در این مقاله الگوریتمی به نام 3SNTK برای هرس‌کردن پویای اسناد در موقع جستجوی عبارت ارائه گردیده است. برای اجتناب از تخصیص بیش از حد حافظه و کاهش ریسک بروز خطا در موقع بازیابی، امتیاز تعدادی از اسناد بدون هیچ گونه هرسی محاسبه می‌‌شود (Skip-N). در این الگوریتم از سه هرم حداقل برای استخراج اسناد دارای بالاترین امتیازها استفاده شده و آزمایش‌‌ها نشان می‌‌دهد که استفاده از الگوریتم‌‌ ‌‌پیشنهادی باعث بهبود سرعت بازیابی می‌‌گردد.

چکیده انگلیسی:

One of the main challenges in the proximity models is the speed of data retrieval. These models define a distance concept which is calculated based on the positions of query terms in the documents. This means that finding the positions and calculating the distance is a time consuming process and because it usually executed during the search time it has a special importance to users. If we can reduce the number of documents, retrieval process becomes faster. In this paper, the SNTK3 algorithm is proposed to prune documents dynamically. To avoid allocating too much memory and reducing the risk of errors during the retrieval, some documents' scores are calculated without any pruning (Skip-N). The SNTK3 algorithm uses three pyramids to extract documents with the highest scores. Experiments show that the proposed algorithm can improve the speed of retrieval.

منابع و مأخذ:

[1] S. Tatikonda, B. B. Cambazoglu, and F. P. Junqueira, "Posting list intersection on multicore architectures," in Proc. of the 34th International ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 963-972, Jul. 2011.
[2] J. Teevan, K. Collins-Thompson, R. W. White, S. T. Dumais, and Y. Kim, "Slow search: information retrieval without time constraints," in Proc. of the Symp. on Human-Computer Interaction and Information Retrieval, Article 1, Oct. 2013.
[3] B. B. Cambazoglu, F. P. Junqueira, V. Plachouras, S. Banachowski, B. Cui, S. Lim, and B. Bridge, "A refreshing perspective of search engine caching," in Proc. of the 19th Int.Conf. on World Wide Web, pp. 181-190, Apr. 2010.
[4] Q. Gan and T. Suel, "Improved techniques for result caching in web search engines," in Proc. of the 18th Int. Conf. on World Wide Web, pp. 431-440, Apr. 2009.
[5] R. Blanco Gonzalez, Index Compression for Information Retrieval Systems, Ph.D. Thesis, University of A Coruna, Spain, 2008.
[6] D. Bahle, H. Williams, and J. Zobel, "Compaction techniques for nextword indexes," in Proc. Int. Symp. on String Processing and Information Retrieval, p. 33, Nov. 2001.
[7] I. H. Witten, A. Moffat, and T. C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images, Morgan Kaufmann, 1999.
[8] E. M. Keen, "The use of term position devices in ranked output experiments," Journal of Documentation, vol. 47, no. 1, pp. 1-22, Mar. 1991.
[9] K. Jiang and Y. Yang, "Efficient dynamic pruning on largest scores first (LSF) retrieval," Front. Inf. Technol. Electron. Eng., vol. 17, pp. 1-14, Jan. 2016.
[10] T. Strohman, H. Turtle, and W. B. Croft, "Optimization strategies for complex queries," in Proc. of the 28th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 219-225, Aug. 2005.
[11] D. Carmel and E. Amitay, Juru at TREC 2006: TAAT versus DAAT in the Terabyte Track, in TREC, 2006.
[12] A. Chuklin, P. Serdyukov, and M. De Rijke, "Modeling clicks beyond the first result page," in Proc. of the 22nd ACM Int. Conf. on Information & Knowledge Management, pp. 1217-1220, Oct. 2013.
[13] H. Turtle and J. Flood, "Query evaluation: strategies and optimizations," Inf. Process. Manag., vol. 31, no. 6, pp. 831-850, Nov. 1995.
[14] A. Broschart and R. Schenkel, "High-performance processing of text queries with tunable pruned term and term pair indexes," ACM Trans. Inf. Syst., vol. 30, no. 1, p. 5, Feb. 2012.
[15] R. Schenkel, A. Broschart, S. Hwang, M. Theobald, and G. Weikum, "Efficient text proximity search," in Proc. 14th Int. Conf. on String Processing and Information Retrieval, pp. 287-299, Oct. 2007.
[16] H. Yan, S. Shi, F. Zhang, T. Suel, and J. R. Wen, "Efficient term proximity search with term-pair indexes," in Proc. of the 19th ACM Int. Conf. on Information and Knowledge Management, pp. 1229-1238, ???. 2010.
[17] L. Wang, J. Lin, and D. Metzler, "Learning to efficiently rank," in Proc. of the 33rd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 138-145, Oct. 2010.
[18] R. Baeza-Yate, F. Junqueira, V. Plachouras, and H. F. Witschel, "Admission policies for caches of search engine results," in Proc. 14th Int. Conf. on String Processing and Information Retrieval, vol. pp. 74-85, Jul. 2007.
[19] R. Ozcan, I. S. Altingovde, B. B. Cambazoglu, F. P. Junqueira, and O. Ulusoy, "A five-level static cache architecture for web search engines," Inf. Process. Manag., vol. 48, no. 5, pp. 828-840, Sept. 2012.
[20] S. Jonassen, "Improving dynamic index pruning via linear programming," in Proc. of the 2015 Workshop on Large-Scale and Distributed System for Information Retrieval, pp. 21-25, Oct. 2015.
[21] A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien, "Efficient query evaluation using a two-level retrieval process," in Proc. of the Twelfth Int. Conf. on Information and Knowledge Managemen,t CIKM'03, p. 426, Nov. 2003.
[22] M. Zhu, S. Shi, N. Yu, and J. R. Wen, "Can phrase indexing help to process non-phrase queries?," in Proc. of the 17th ACM Conf. on Information and Knowledge Management, vol. ???, pp. 679-688, ???. 2008.
[23] S. Huston, Indexing Proximity-Based Dependencies for Information Retrieval, Ph.D. Dissertation, University of Massachusetts, 2014.
[24] C. Macdonald, I. Ounis, and N. Tonellotto, "Upper-bound approximations for dynamic pruning," ACM Trans. Inf. Syst., vol. 29, no. 4, p. 17, Oct. 2011.
[25] C. Macdonald, N. Tonellotto, and I. Ounis, "On upper bounds for dynamic pruning," in Proc. Conf. on the Theory of Information Retrieval, vol. 6931, pp. 313-317, 2011.
[26] N. Tonellotto, C. Macdonald, and I. Ounis, "Efficient dynamic pruning with proximity support," in Proc. of the 8th Workshop on Large-Scale Distributed Systems for Information Retrieval, pp. 31-35, Geneva, Switzerland, Jul. 2010.
[27] S. R. Kim and J. Hong, "An efficient and practical algorithm for the many-keyword proximity problem by offsets," in Proc. Int. Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing, vol. 3642, pp. 477-483, 2005.
[28] M. Petri, J. S. Culpepper, and A. Moffat, "Exploring the magic of WAND," in Proc. of the 18th Australasian Document Computing Symp., pp. 58-65, Oct. 2013.
[29] O. Rojas, V. Gil-Costa, and M. Marin, "Running time prediction for web search queries," in Proc. Parallel Processing and Applied Mathematics, pp. 210-220, 2016.

مقالات مرتبط

یک رهیافت فرااکتشافی چندهدفه برای بهبود پوشش و اتصال در شبکه‌های حسگر بی‌سیم
تاریخ چاپ : 1405/02/22
رویکرد ارزیابی هیجان نوین جهت مراقبت از سرطان مبتنی بر مدل‌های زبانی بزرگ
تاریخ چاپ : 1405/02/22
ارائه روشی برای مدیریت منابع در شبکه‌های Fog-DSDN با بهره‌گیری از معماری میکروسرویس و شبکه‌های ESN
تاریخ چاپ : 1405/02/22
چارچوب ترکیبی سبک‌وزن برای امنیت اینترنت اشیا با استفاده از جنگل تصادفی بهینه و انتخاب ویژگی تطبیقی در معماری لبه-ابری
تاریخ چاپ : 1405/02/22
یک چارچوب یادگیری نیمه‌نظارتی جهت دسته‌بندی دقیق موارد آزمون با بهره‌گیری از تعبیه‌های زبانی و ویژگی‌های معنایی متن
تاریخ چاپ : 1405/02/22
تکنیک هوشمند مبتنی بر الگوریتم چتر دریایی برای زمان‌بندی وظایف بر اساس اولویت در شبکه‌های IoT/Fog
تاریخ چاپ : 1405/02/22

اشتراک گذاری

آدرس مقاله

افزایش سرعت جستجو در مدل‌‌های مبتنی بر مجاورت