خلاصه‌سازی ویدئویی با روش ترکیبی گراف شبکه‌ای و خوشه‌بندی

محورهای موضوعی : مهندسی برق و کامپیوتر

مهسا رحیمی رسکتی ¹ , همایون موتمنی ^{2
*} , ابراهیم اکبری ³ , حسین نعمت زاده ⁴

1 - دانشگاه آزاد اسلامی واحد ساری
2 - دانشگاه آزاد اسلامی واحد ساری
3 - دانشگاه آزاد اسلامی واحد ساری
4 - دانشگاه آزاد اسلامی واحد ساری

تاریخ دریافت : 1401/04/01 تاریخ پذیرش : 1402/03/16 تاریخ انتشار : 1402/08/02

کلید واژه: کاوش ویدئویی, خلاصه‌سازی ویدئویی, خوشه‌بندی, K-Medoids, شبکه توجه گرافی کانولوشنالی,

چکیده مقاله :

ما در دنیایی زندگی می‌کنیم که وجود دوربین‌های خانگی و قدرت رسانه باعث شده تا با حجم خیره‌کننده‌ای از داده‌های ویدئویی سر و کار داشته باشیم. مسلم است روشی که بتوان با کمک آن، این حجم بالای فیلم را با سرعت و بهینه مورد دسترسی و پردازش قرار داد، اهمیت ویژه‌ای پیدا می‌کند. با کمک خلاصه‌سازی ویدئویی این مهم حاصل شده و فیلم به یک سری فریم یا کلیپ کوتاه ولی بامعنی خلاصه می‌گردد. در این پژوهش سعی گردیده در ابتدا داده با کمک الگوریتم K-Medoids خوشه‌بندی شود؛ سپس در ادامه با کمک شبکه توجه گرافی کانولوشنالی، جداسازی زمانی و گرافی انجام گیرد و در گام بعدی با کمک روش ردکردن اتصال، نویزها و موارد تکراری حذف گردد. سرانجام با ادغام نتایج به‌دست‌آمده از دو گام متفاوت گرافی و زمانی، خلاصه‌سازی انجام گیرد. نتایج به دو صورت کیفی و کمی و بر روی سه دیتاست SumMe، TVSum و OpenCv مورد بررسی قرار گرفت. در روش کیفی به‌طور میانگین 88% نرخ صحت در خلاصه‌سازی و 31% میزان خطا دست یافته که به نسبت سایر روش‌ها جزء بالاترین نرخ صحت است. در ارزیابی کمی نیز روش پیشنهادی، کارایی بالاتری نسبت به روش‌های موجود دارد.

چکیده انگلیسی:

The increase of cameras nowadays, and the power of the media in people's lives lead to a staggering amount of video data. It is certain that a method to process this large volume of videos quickly and optimally becomes especially important. With the help of video summarization, this task is achieved and the film is summarized into a series of short but meaningful frames or clips. This study tried to cluster the data by an algorithm (K-Medoids) and then with the help of a convolutional graph attention network, temporal and graph separation is done, then in the next step with the connection rejection method, noises and duplicates are removed, and finally summarization is done by merging the results obtained from two different graphical and temporal steps. The results were analyzed qualitatively and quantitatively on three datasets SumMe, TVSum, and OpenCv. In the qualitative method, an average of 88% accuracy rate in summarization and 31% error rate was achieved, which is one of the highest accuracy rates compared to other methods. In quantitative evaluation, the proposed method has a higher efficiency than the existing methods.

منابع و مأخذ:

[1] A. Messina and M. Montagnuolo, "Fuzzy mining of multimedia genre applied to television archives," in Proc. IEEE Int.Conf. on Multimedia and Expo, pp. 117-120, Hannover, Germany, 23 Jun.-26 Apr. 2008.
[2] A. Bora and S. Sharma, "A review on video summarization approcahes: recent advances and directions," in Proc. Int. Conf. on Advances in Computing, Communication Control and Networking, ICACCCN'18, pp. 601-606, Greater Noida, India, 12-13 Oct. 2018.
[3] M. K. Mahesh and K. Pai, "A survey on video summarization techniques," in Proc. Innovations in Power and Advanced Computing Technologies, i-PACT'19, 5 pp., Vellore, India, 22-23 Mar. 2019.
[4] V. K. Vivekraj, D. Sen, and B. Raman, "Video skimming: taxonomy and comprehensive survey," ACM Computing Surveys, vol. 52, no. 5, Article ID: 106, 38 pp., Sept. 2019.
[5] P. Li, Q. Ye, L. Zhang, L. Yuan, X. Xu, and L. Shao, "Exploring global diverse attention via pairwise temporal relation for video summarization," Computer Vision and Pattern Recognition, vol. 111, Article ID: 107677, Mar. 2020.
[6] M. Gygli, H. Grabner, H. Riemenschneider, and L. V. Gool, "Creating summaries from user videos," In: D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, (eds) Computer Vision-ECCV'14, Lecture Notes in Computer Science, vol 8695. Springer, pp. 505-520, 2014.
[7] Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes, "TVSum: summarizing web videos using titles," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR'15, pp. 5179-5187, Boston, MA, USA, 7-12 Jun. 2015.
[8] G. Bradski, A. Keahler, and V. Pisarevsky, "Learning-based computer vision with Intel's open source computer vision library," Intel. Technology J., vol. 9, no. 2, pp. 119-130, May 2005.
[9] D. Zhao, J. Xiu, Y. Bai, and Z. Yang, "An improved item-based movie recommendation algorithm," in Proc. 4th Int. Conf. on Cloud Computing and Intelligence Systems, CCI'16, pp. 278-281, Beijing, China, 17-19 Aug. 2016.
[10] A. Dimou, D. Matsiki, A. Axenopoulos, and P. Daras, "A user-centric approach for event-driven summarization of surveillance videos," in Proc. 6th Int. Conf. on Imaging for Crime Prevention and Detection, ICDP'15, 6 pp., London, UK, 15-17 Jul. 2015.
[11] H. Zeng, et al., "EmotionCues: emotion-oriented visual summarization of classroom videos," IEEE Trans. on Visualization and Computer Graphics, vol. 27, no. 7, pp. 3168-3181, Jul. 2021.
[12] P. Li, C. Tang, and X. Xu, "Video summarization with a graph convolutional attention network," Frontiers of Information Technology & Electronic Engineering, vol. 22, no. 6, pp. 902-913, 2021.
[13] S. S. de Almeida, et al., "Speeding up a video summarization approach using GPUs and multicore CPUs," Procedia Computer Science, vol. 29, pp. 159-171, 2014.
[14] K. Zhang, W. L. Chao, F. Sha, and K. Grauman, "Video summarization with long short-term memory," In: B. Leibe, J. Matas, N. Sebe, and M. Welling, (eds) Computer Vision-ECCV'16, Lecture Notes in Computer Scienc, vol 9911. Springer, pp. 766-782, 2016.
[15] M. Rochan, L. Ye, and Y. Wang, "Video summarization using fully convolutional sequence networks," In: V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, (eds) Computer Vision-ECCV'18, Lecture Notes in Computer Science, vol 11216. Springer, pp. 358-374, 2018.
[16] Y. Li, L. Wang, T. Yang, and B. Gong, "How local is the local diversity? reinforcing sequential determinantal point processes with dynamic ground sets for supervised video summarization," In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision-ECCV'18, Lecture Notes in Computer Science, vol 11216. Springer, pp. 156-174, 2018.
[17] B. Zhao, X. Li, and X. Lu, "Property-constrained dual learning for video summarization," IEEE Trans. on Neural Networks and Learning Systems, vol. 31, no. 10, pp. 3989-4000, Oct. 2020.
[18] B. U. Kota, A. Stone, K. Davila, S. Setlur, and V. Govindaraju, "Automated whiteboard lecture video summarization by content region detection and representation," in Proc. 25th Int. Conf. on Pattern Recognition, ICPR'21, pp. 10704-10711, Milan, Italy, 10-15 Jan. 2021.
[19] G. Liang, Y. Lv, S. Li, S. Zhang, and Y. Zhang, "Video summarization with a convolutional attentive adversarial network," Pattern Recognition, vol. 131, Article ID: 108840, Nov. 2022.
[20] R. Yang, S. Wang, X. Wu, T. Liu, and X. Liu, "Using lightweight convolutional neural network to track vibration displacement in rotating body video," Mechanical Systems and Signal Processing, vol. 177, Article ID: 109137, Sept. 2022.
[21] S. Sikandar, R. Mahmum, and N. Akbar, "Cricket videos summary generation using a novel convolutional neural network," in Mohammad Ali Jinnah University Int. Conf. on Computing, MAJICC'22, 7 pp., Karachi, Pakistan, 27-28 Oct. 2022.
[22] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, et al., "Going deeper with convolutions," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR'15, 9 pp., Boston, MA, USA, 7-12 Jun. 2015.
[23] A. Rahimi, T. Cohn, and T. Baldwin, "Semi-supervised user geolocation via graph convolutional networks," in Proc of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 2009-2019, Melbourne, Australia, Jul. 2018.
[24] A. P. Ta, M. Ben, and G. Gravier, "Improving cluster selection and event modeling in unsupervised mining for automatic audiovisual video structuring," In: K. Schoeffmann, B, Merialdo, A. G, Hauptmann, and C. W. Ngo, Andreopoulos, Y., Breiteneder, C. (eds) Advances in Multimedia Modeling. MMM 2012. Lecture Notes in Computer Science, vol 7131. Springer, Berlin, pp. 529-540, 2012.
[25] Z. Ji, K. Xiong, Y. Pang, and X. Li, "Video summarization with attention-based encoder-decoder networks," IEEE Trans. on Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1709-1717, Jun. 2019.
[26] X. Li, Q. Li, D. Yin, L. Zhang, and D. Peng, "Unsupervised video summarization based on an encoder-decoder architecture," J. of Physics: 5th Int. Conf. on Advanced Algorithms and Control Engineering, ICAACE'22, vol. 2258, Article ID: 012067, Sanya, China, 20-22 Jan, 2022.
[27] S. E. F. de Avila, et al., "VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method," Pattern Recognition Letters, vol. 32, no. 1, pp. 56-68, Jan. 2011.
[28] M. Furini, F. Geraci, M. Montangero, and M. Pellegrini, "STIMO: STIll and MOving video storyboard for the web scenario," Multimedia Tools and Applications, vol. 46, no. 1, pp. 529-540, Jan. 2009.
[29] P. Mundur, Y. Rao, and Y. Yesha, "Keyframe-based video summarization using delaunay clustering," International J. on Digital Libraries, vol. 6, no. 2, pp. 219-232, 2006.
[30] D. DeMenthon, V. Kobla, and D. Doermann, "Video summarization by curve simplification," in Proc. of the 6th ACM Int. Conf. on Multimedia, pp. 211-218, Bristol, UK, 13-16 Sept. 1998.
[31] B. Mahasseni, M. Lam, and S. Todorovic, "Unsupervised video summarization with adversarial LSTM networks," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2982-2991, Honolulu, HI, USA, 21-26 Jul. 2017.
[32] K. Y. Zhou, Y. Qiao, and T. Xiang, "Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward," in Proc. AAAI Conf. on Artificial Intelligence, pp. 7582-7589, New Orleans, LA, USA, 2-7 Feb. 2018.
[33] H. W. Wei, et al., "Video summarization via semantic attended networks," in Proc. AAAI Conf. on Artificial Intelligence, pp. 216-223, New Orleans, LA, USA, 2-7 Feb. 2018.
[34] M. Rochan and Y. Wang, "Video summarization by learning from unpaired data," in Proc IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 7894-7903, Long Beach, CA, USA, 15-20 Jun. 2019.
[35] Y. Jung, D. Cho, D. Kim, and I. S. Kweon, "Discriminative feature learning for unsupervised video summarization," in Proc AAAI Conf. on Artificial Intelligence, pp. 8537-8544, Honolulu, HI, USA, 27 Jun.-1 Feb. 2019.

مقالات مرتبط

یک رهیافت فرااکتشافی چندهدفه برای بهبود پوشش و اتصال در شبکه‌های حسگر بی‌سیم
تاریخ چاپ : 1405/02/22
رویکرد ارزیابی هیجان نوین جهت مراقبت از سرطان مبتنی بر مدل‌های زبانی بزرگ
تاریخ چاپ : 1405/02/22
ارائه روشی برای مدیریت منابع در شبکه‌های Fog-DSDN با بهره‌گیری از معماری میکروسرویس و شبکه‌های ESN
تاریخ چاپ : 1405/02/22
چارچوب ترکیبی سبک‌وزن برای امنیت اینترنت اشیا با استفاده از جنگل تصادفی بهینه و انتخاب ویژگی تطبیقی در معماری لبه-ابری
تاریخ چاپ : 1405/02/22
یک چارچوب یادگیری نیمه‌نظارتی جهت دسته‌بندی دقیق موارد آزمون با بهره‌گیری از تعبیه‌های زبانی و ویژگی‌های معنایی متن
تاریخ چاپ : 1405/02/22
تکنیک هوشمند مبتنی بر الگوریتم چتر دریایی برای زمان‌بندی وظایف بر اساس اولویت در شبکه‌های IoT/Fog
تاریخ چاپ : 1405/02/22

اشتراک گذاری

آدرس مقاله

خلاصه‌سازی ویدئویی با روش ترکیبی گراف شبکه‌ای و خوشه‌بندی