中国空间科学技术 ›› 2024, Vol. 44 ›› Issue (5): 75-82.doi: 10.16708/j.cnki.1000-758X.2024.0075

• 论文 • 上一篇    下一篇

回合制轨道博弈中MCTS算法的改进与应用

郑鑫宇,张轶,周杰,唐佩佳,彭升人,党朝辉   

  1. 1 中国空间技术研究院 钱学森空间技术实验室,北京100094
    2 西北工业大学 航天学院,西安710072
  • 出版日期:2024-10-25 发布日期:2024-10-21

Improvement and application of MCTS in turn-based orbital games

ZHENG Xinyu,ZHANG Yi,ZHOU Jie,TANG Peijia,PENG Shengren,DANG Zhaohui   

  1. 1 Qian Xuesen Laboratory of Space Technology,China Academy of Space Technology,Being 100094,Chian
    2 School of Astronautics,Northwestern Polytechnical University,Xi′an 710072,China
  • Published:2024-10-25 Online:2024-10-21

摘要: 航天器回合制追逃博弈中的变轨感知延迟使得微分对策法求解困难,基于深度强化学习的博弈算法可解释性弱,在工程上的运用仍存在风险。针对航天器回合制追逃博弈问题,提出了一种预测价值积累的蒙特卡洛树搜索(PVA-MCTS)算法。该算法基于航天器轨道运动的可预知性,对博弈过程中的决策价值进行预测并积累,解决了航天器回合制追逃博弈奖励稀疏、时间跨度大的问题,采用的自适应扩展方法提升了学习效率。将其用于求解航天器回合制追逃博弈问题,并与蒙特卡洛树搜索(MCTS)算法求解得到的结果对比,结果表明PVA-MCTS算法对追踪航天器和逃逸航天器分别有约27.6%的追捕用时缩短和约6.8%的逃逸时间延长。该算法的提出可加快推进后续轨道博弈技术在非合作目标接近、碰撞规避等领域应用的落实落地。

关键词: 航天器追逃, 回合制追逃博弈, 蒙特卡洛树搜索, 变轨感知延迟, 预测价值积累

Abstract:  The sensing delay of orbit change in turn-based orbit pursuitevasion game brings difficulties to differential game approaches,and deep reinforcement learning-based algorithms are still risky for engineering applications due to the inexplicability.The predictive-value-accumulate Monte Carlo tree search(PVA-MCTS) algorithm is proposed for the turn-based orbit pursuit-evasion game.Based on the predictability of spacecraft orbital motion,the algorithm predicts and accumulates the decision value in the game.This solves the problem of sparse reward and large time span in the turn-based orbit pursuit-evasion game,and improves the learning efficiency.It is used to solve the turn-based orbit pursuit-evasion game,and compared with the results obtained by Monte Carlo tree search(MCTS) algorithm.The results show that the PVA-MCTS algorithm reduces the pursuit time by about 27.6% and increases the escape time by about 6.8% for pursuer and evader respectively.The PVA-MCTS algorithm is realistic for the application of orbital game in the fields of non-cooperative target approaching and collision avoidance.

Key words: pursuit-evasion of spacecraft, turn-based pursuit-evasion game, Monte Carlo tree search, sensing delay of orbit change, predictive value accumulate