中国空间科学技术 ›› 2023, Vol. 43 ›› Issue (3): 123-133.doi: 10.16708/j.cnki.1000-758X.2023.0045

• 巨型星座/低轨大规模星座专栏 • 上一篇    下一篇

基于A2C算法的低轨星座动态波束资源调度研究

刘伟,郑润泽,张磊,高梓贺,陶滢,崔楷欣   

  1. 1 国家航天局卫星通信系统创新中心,北京100094
    2 中国空间技术研究院 通信与导航卫星总体部,北京100094
    3 西北工业大学,西安710072
    4 北京理工大学,北京100081
  • 出版日期:2023-06-25 发布日期:2023-05-23

Research of dynamic beam resource scheduling of LEO constellation based on A2C algorithm

LIU Wei,ZHENG Runze,ZHANG Lei,GAO Zihe,TAO Ying,CUI Kaixin   

  1. 1 Innovation Center of Satellite Communication System,CNSA,Beijing 100094,China
    2 Institute of Telecommunication and Navigation Satellites,China Academy of Space Technology,Beijing 100094,China
    3 Northwestern Polytechnical University,Xi′an 710072,China
    4 Beijing Institute of Technology,Beijing 100081,China
  • Published:2023-06-25 Online:2023-05-23

摘要: 巨型低轨星座为载人飞船、空间站、遥感卫星等用户航天器提供低时延、大容量的通信通道存在波束资源分配优化的难题。针对采用离散时间的深度强化学习A2C(advanced actor critic)的智能优化框架进行了研究,结合遗传算法中个体和基因概念、形成了可有效满足多用户、动态、并发接入需求的波束资源调度算法。基于仿真分析,提出的算法可在多种典型场景下具有适用性,支持在20s内完成超过3000个任务的有效规划,任务成功率不低于91%。通过算法优化实现复杂度的降低,相对传统遗传算法可节约时间45%以上。同时对传统A2C算法框架中的收敛问题进行了优化,解决了传统全连接A2C算法无法收敛的难题,同时相比DQN(deep q-network)算法框架收敛速度提升38%以上。

关键词: 低轨星座, 波束调度, 任务规划, 深度强化学习, A2C算法

Abstract: The giant low-orbit constellation provided low-latency,largecapacity communication channels for user spacecraft such as manned spacecraft,space stations and remote sensing satellites,and there was a resource allocation optimizing problem of satellite beams.The intelligent optimization framework of A2C(advanced actor-critic)using discrete-time deep reinforcement learning was studied,and the beam resource scheduling algorithm that could effectively meet the needs of multi-users,dynamic and concurrent access was formed by combining the concepts of individuals and genes in genetic algorithms.Based on simulation and analysis,the proposed algorithm could be applicable in a variety of typical scenarios.The method could provide effective scheduling results for more than 3000 tasks in 20s,and the task success rate was not less than 91%.The complexity was reduced by algorithm optimization,which could save more than 45% of the time compared with traditional genetic algorithms.At the same time,the convergence problem in the traditional A2C algorithm framework was optimized,which solved the non-convergence problem in the traditional fully connected A2C algorithm.Meanwhile,the convergence speed was increased by more than 38% compared with the DQN(deep q-network)algorithm.

Key words: LEO constellation, beam scheduling, task planning, DRL, A2C algorithm