استخراج گذرگاهها با استفاده از تشخیص اشیا در یادگیری تقویتی
محورهای موضوعی : مهندسی برق و کامپیوتربهزاد غضنفری 1 * , ناصر مزینی 2 , محمدرضا جاهد مطلق 3
1 - دانشگاه علم و صنعت ایران
2 - دانشگاه علم و صنعت ایران
3 - دانشگاه علم و صنعت ایران
کلید واژه: يادگيري تقويتي خوشهبندي اشيا يادگيري تقويتي سلسله مراتبي اقدامات گسترشيافته زماني,
چکیده مقاله :
اين مقاله روش جديدي را مطرح ميکند که قادر به استخراج گذرگاهها بهصورت اتوماتيک براي عامل يادگيري تقويتي است. روش پيشنهادي از سيستمهاي بيولوژيکي، رفتار و مسيريابي حيوانات الهام گرفته شده است و بهواسطه تعاملات عامل با محيط پيرامونياش عمل ميکند. عامل با استفاده از خوشهبندي و تشخيص اشيا بهصورت سلسله مراتبي، نشانههايي را پيدا ميکند. اگر اين نشانهها در فضاي اقدام به هم نزديک باشند، گذرگاهها با استفاده از حالتهاي بين آنها استخراج ميشوند. نتايج آزمايشها بهبود قابل ملاحظهاي را در فرايند يادگيري تقويتي در مقايسه با ساير روشهاي مشابه نشان ميدهد.
Extracting bottlenecks improves considerably the speed of learning and the ability knowledge transferring in reinforcement learning. But, extracting bottlenecks is a challenge in reinforcement learning and it typically requires prior knowledge and designer’s help. This paper will propose a new method that extracts bottlenecks for reinforcement learning agent automatically. We have inspired of biological systems, behavioral analysts and routing animals and the agent works on the basis of its interacting to environment. The agent finds landmarks based in clustering and hierarchical object recognition. If these landmarks in actions space are close to each other, bottlenecks are extracted using the states between them. The Experimental results show a considerable improvement in the process of learning in comparison to some key methods in the literature.
[1] L. Kaelbling, M. Littman, and A. Moore, "Reinforcement learning: a survey," J. of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.
[2] M. Ghavamzadeh, S. Mahadevan, and R. Makar, "Hierarchical multi-agent reinforcement learning," Autonomous Agents and Multi-Agent Systems, vol. 13, no. 2, pp. 197-229, Sep. 2006.
[3] A. Barto and S. Mahadevan, "Recent advances in hierarchical reinforcement learning markov and semi-markov decision processes," Discrete Event Dynamic Systems, vol. 13, pp. 41-77, 2003.
[4] R. S. Sutton, D. Precup, and S. Singh, "Between MDPs and semi - MDPs: a framework for temporal abstraction in reinforcement learning," Artificial Intelligence, vol. 112, no. 1-2, pp. 181-211, Aug. 1999.
[5] R. Parr and S. Russell, "Reinforcement learning with hierarchies of machines," in Proc. Conf. on Advances in Neural Information Processing Systems, pp. 1043-1049, 1997.
[6] T. G. Dietterich, "Hierarchical reinforcement learning with the MAXQ value function decomposition," J. of Artificial Intelligence Research, vol. 13, pp. 227-303, 2000.
[7] G. Kheradmandian and M. Rahmati, "Automatic abstraction in reinforcement learning using data mining techniques," Robotics and Autonomous Systems, vol. 57, no. 11, pp. 1119-1128, Nov. 2009.
[8] S. Mannor, I. Menache, A. Hoze, and U. Klein, "Dynamic abstraction in reinforcement learning via clustering," in Proc. 21st Int. Conf. on Machine learning, ICML'04, p. 560-567, 2004.
[9] E. A. Mcgovern, Autonomous Discovery of Temporal Abstractions from Interaction with an Environment, Citeseer, 2002.
[10] C. Chiu and V. W. Soo, "Automatic complexity reduction in reinforcement learning," Computational Intelligence, vol. 26, no. 1, pp. 1-25, Feb. 2010.
[11] I. Menache, S. Mannor, and N. Shimkin, "Q - cut - dynamic discovery of sub-goals in reinforcement learning," in Proc. of the 13th European Conf. on Machine Learning, pp. 295-3062002.
[12] O. Simsek, A. P. Wolfe, and A. G. Barto, "Identifying useful subgoals in reinforcement learning by local graph partitioning," in Proc. of the 22nd Int. Conf. on Machine Learning , ICML'05, pp. 816-823, 2005.
[13] B. Digney, "Learning hierarchical control structures for multiple tasks and changing environments," in: Proc. of 5th Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats 5, pp. 321-330, 1998.
[14] O. Simsek and A. Barto, "Skill characterization based on betweenness," in Proc. 22nd Annual Conf. on Advances in Neural Information Processing Systems, NIPS'08, pp. 1497-1504, 2008.
[15] M. Riesenhuber and T. Poggio, "Hierarchical models of object recognition in cortex," Nature Neuroscience, vol. 2, no. 11, pp. 1019-25, Nov. 1999.
[16] T. S. Collett and P. Graham, "Animal navigation: path integration, visual landmarks, and cognitive maps," Current Biology, vol. 14, no. 12, pp. 475-457, Jun. 2004.
[17] S. Thrun, "Learning metric - topological maps for indoor mobile robot navigation," Artificial Intelligence, vol. 99, no. 1, pp. 21-71, 1998.
[18] N. Mehta, S. Ray, P. Tadepalli, and T. Dietterich, "Automatic discovery and transfer of task hierarchies in reinforcement learning," AI Magazine, vol. 32, no. 1, p. 35, 2011.
[19] A. Jonsson, A Causal Approach to Hierarchical Decomposition in Reinforcement Learning, Ph. D. Thesis, University of Massachusetts Amherst, Feb. 2006.
[20] B. Hengst, Discovering Hierarchy in Reinforcement Learning, Ph. D. Thesis, University of New South Wales, Australia, Dec. 2003.
[21] S. Thrun and A. Schwartz, "Finding structure in reinforcement learning," Proc. 5th Annual Conf. on Advances in Neural Information Processing Systems, NIPS'95, pp. 385-392, 1995.
[22] C. C. Chiu, "Subgoal identification for reinforcement learning and planning in multiagent problem solving," in Proc. of 5th German Conf. on Multiagent System Technologies, pp. 37-48, 2007.