中国空间科学技术 ›› 2023, Vol. 43 ›› Issue (4): 24-34.doi: 10.16708/j.cnki.1000-758X.2023.0050

• 论文 • 上一篇    下一篇

基于ADDPG策略的超立方体卫星编队控制

苗峻1,涂歆滢1,殷建丰1,彭靖1,李海津2,陈子匀3   

  1. 1 中国空间技术研究院 钱学森空间技术实验室,北京100094
    2 中国空间技术研究院 北京空间飞行器总体设计部,北京100094
    3 中国人民解放军66136部队,北京100042
  • 出版日期:2023-08-25 发布日期:2023-07-18

Hypercube satellite formation control based on ADDPG strategy

MIAO Jun1,TU Xinying1,YIN Jianfeng1,PENG Jing1,LI Haijin2,CHEN Ziyun3   

  1. 1 Qian Xuesen Laboratory of Space Technology,Beijing 100094,China
    2 Beijing Institute of Spacecraft Engineering,Beijing 100094,China
    3 The No66136 Troop of PLA,Beijing 100042,China
  • Published:2023-08-25 Online:2023-07-18

摘要: 针对大规模卫星高精度编队控制问题,提出了一种基于吸引法则的深度确定性策略梯度控制方法(attraction-based deep deterministic policy gradient,ADDPG)。首先阐述了超立方体拓扑编队拓扑构型特性,建立了卫星编队动力学模型,设计了超立方体卫星编队虚拟中心用于衡量编队整体飞行状态。为解决无模型深度强化学习的探索和扩展平衡问题,设计了ε-imitation动作选择策略方法,最终提出了基于ADDPG的卫星编队控制策略。算法不依赖于环境模型,通过充分利用已有信息,可以降低学习模型初期探索过程中的盲目试错。仿真结果表明ADDPG策略以较少的能量消耗达到更高的精度,相比知名算法在加快编队收敛速度的同时,误差减少5%以上,能量消耗减少7%以上,验证了算法的有效性。

关键词: ADDPG策略, 虚拟中心, 超立方体拓扑结构, 卫星编队, 深度强化学习

Abstract: For the high precision control problem of large-scale satellite formation,an attraction-based deep deterministic policy gradient(ADDPG)was proposed.Firstly,topological configuration characteristics of a hypercube topological formation were formulated,and a satellite formation dynamic model was established.Then,the virtual center of hypercube satellite formation was designed to measure the overall flight state of the formation.In order to solve the problems of exploration and expansion balance of model-free deep reinforcement learning,the ε-imitation action selection strategy method was introduced.Lastly,the satellite formation control strategy based on ADDPG was proposed.The algorithm does not depend on the environmental model.With the existing information being optimized,the probability of blind trial and error in the initial exploration of the learning model would be decreased.The simulation results show that the ADDPG strategy enables higher precision as well as lower energy consumption.Compared with the well-known algorithm,the algorithm introduced in this paper not only accelerates the formation convergence rate,but also improves the control precision by 5% and reduces the energy consumption by 7%.Thus,the effectiveness of the algorithm is verified.

Key words: ADDPG strategy, virtual center, hypercube topology, satellite formation, deep reinforcement learning