›› 2019, Vol. 39 ›› Issue (4): 36-.doi: 10.16708/j.cnki.1000-758X.2019.0027

• 研究探讨 • 上一篇    下一篇


基于深度增强学习的卫星姿态控制方法

 王月娇, 马钟, 杨一岱, 王竹平, 唐磊   

  1. 西安微电子技术研究所,西安710065
  • 收稿日期:2018-11-01 修回日期:2019-01-08 出版日期:2019-08-25 发布日期:2019-04-22
  • 作者简介:王月娇(1991-),女,助理工程师,研究方向为深度增强学习,计算机视觉,人工智能
  • 基金资助:

    国家自然科学基金(61702413);航天九院技术创新基金(2016JY06)

Satellite attitude control method based on deep reinforcement learning

 WANG  Yue-Jiao, MA  Zhong, YANG  Yi-Dai, WANG  Zhu-Ping, TANG  Lei   

  1. Xi′an Microelectronics Technology Institute,Xi′an 710065,China
  • Received:2018-11-01 Revised:2019-01-08 Published:2019-08-25 Online:2019-04-22

摘要: 针对卫星在执行丢弃载荷或捕获目标等复杂任务时遭遇的姿态突然发生变化的问题,采用深度增强学习方法对卫星姿态进行控制,使卫星恢复稳定状态。具体来说,首先搭建飞行器的姿态动力学环境,并将连续的控制力矩输出离散化,然后采用Deep Q Network算法进行卫星自主姿态控制训练,以姿态角速度趋于稳定作为奖励获得离散行为的最优智能输出。仿真试验表明,面向空间卫星姿态控制的深度增强学习算法能够在卫星受到突发随机扰动后稳定卫星姿态,并能有效解决传统PD控制器依赖被控对象质量参数的难题。所提出的方法采用自主学习的方式对卫星姿态进行控制,具有很强的智能性和一定的普适性,在未来卫星执行复杂空间任务中的智能控制方面有着很好的应用潜力。

关键词: 深度增强学习, 卫星姿态控制, 动力学环境, 自主姿态控制, 质量参数

Abstract: Aiming at the problem of sudden changes in the attitudes encountered by satellites while performing complex tasks such as discarding a payload or capturing a target, a satellite attitude control method based on the deep reinforcement learning is proposed to restore the satellite to a stable state. Concretely, the attitude dynamics environment of the vehicle is firstly established, and the output of continuous control torque is discretized. Deep Q Network algorithm is then performed to train the autonomous attitude control of the satellite for further processing, and the optimal intelligent output of discrete behavior is rewarded with the stabilization of attitude angular velocity. Finally, the validity of the mechanism is verified by the simulation test. Results analysis illustrates that the deep reinforcement learning algorithm for satellite attitude control can stabilize satellite attitude after the satellite is disturbed by sudden random disturbance, and it can effectively solve the problem of traditional PD controller depending on the mass parameters of the controlled object. The proposed method adopts selflearning to control the satellite attitude, which has strong intelligence and universal applicability, and has a strong application potential for future intelligent control of satellites performing complex space tasks.

Key words: deep reinforcement learning, satellite attitude control, dynamic environment, autonomous attitude control, mass parameters