The giant low-orbit constellation provides low-latency,large-capacity communication channels for user spacecraft such as manned spacecraft,space stations and remote sensing satellites,and there is a resource allocation optimizing problem of satellite beams.The intelligent optimization framework of A2C(advanced actor-critic)using discrete-time deep reinforcement learning was studied,and the beam resource scheduling algorithm that could effectively meet the needs of multi-users,dynamic and concurrent access was formed by combining the concepts of individuals and genes in genetic algorithms.Based on simulation and analysis,the proposed algorithm could be applicable in a variety of typical scenarios.The method could provide effective scheduling results for more than 3000 tasks in 20s,and the task success rate was not less than 91%.The complexity was reduced by algorithm optimization,which could save more than 45% of the time compared with traditional genetic algorithms.At the same time,the convergence problem in the traditional A2C algorithm framework was optimized,which solved the non-convergence problem in the traditional fully connected A2C algorithm.Meanwhile,the convergence speed was increased by more than 38% compared with the DQN(deep q-network)algorithm.