中国空间科学技术 ›› 2025, Vol. 45 ›› Issue (5): 22-32.doi: 10.16708/j.cnki.1000-758X.2025.0073

• 宜居行星探测专题 • 上一篇    下一篇

知识约束的天体化学反应预测深度模型

张亚楠,杨培伦,王佳玮*,卜海丽,段曼妮,全冬晖   

  1. 之江实验室 天文计算研究中心,杭州311100
  • 收稿日期:2025-03-21 修回日期:2025-05-22 录用日期:2025-05-30 发布日期:2025-09-17 出版日期:2025-10-01

Knowledge-constrained deep learning model for predicting astrochemical reactions

ZHANG Yanan,YANG Peilun,WANG Jiawei*,BU Haili,DUAN Manni,QUAN Donghui   

  1. Research Center for Astronomical Computing, Zhejiang Lab,Hangzhou 311100, China
  • Received:2025-03-21 Revision received:2025-05-22 Accepted:2025-05-30 Online:2025-09-17 Published:2025-10-01

摘要: 在天体化学研究中,解析天体区域物种的演化过程需要在动态物理环境中重建其演化路径,而这一过程高度依赖于精确且完整的天体化学物种反应网络。传统天体化学反应网络构建方法主要依赖专家知识和实验验证来获取物种间的化学反应,需要较高的时间和计算成本。在此背景下,提出了一种名为GraSSCoL-2的深度学习预测方法,以实现天体化学反应的高效预测,从而加速物种演化分析。该方法集成了图编码器、序列解码器和对比学习深度学习技术,并基于已有的反应数据进行训练,能够有效预测天体物种间的正向与逆向化学反应路径。基于最新天体化学领域反应数据集Chemiverse,在正向预测任务中,GraSSCoL-2的Top-1、Top-3、Top-5、Top-10预测准确率分别达到81.8%、91.3%、92.9%和93.4%,相对提升3.5%、3.6%、2.9%和2.5%;在逆向预测任务中,Top-1、Top-3、Top-5、Top-10准确率为76.2%、87.6%、89.9%和90.5%,相对提升1.9%、1.8%、1.8%和1.2%。通过系统评估不同领域数据增强策略对模型预测性能的影响,实验结果表明SMILES变种与加氢策略的协同应用能够显著提升预测准确率。此外,在正向和逆向预测任务中,GraSSCoL-2生成的无效SMILES比例分别为3.0%和3.9%,相较于GraSSCoL方法的14.2%和14.6%显著降低。这一结果表明,GraSSCoL-2在保证预测准确性的同时,能够有效提高生成结果的有效性,进一步验证了其在天体化学反应预测任务中的可靠性与适用性。

关键词: 生命分子, 物种演化, 反应预测, 数据增强, 守恒定律, 图编码器, 对比学习

Abstract: In astrochemical research, analyzing the evolutionary processes of species within astrophysical regions requires reconstructing their reaction pathways under dynamic physical conditions. This process heavily relies on an accurate and comprehensive astrochemical reaction network. Traditional methods for constructing such networks primarily depend on expert knowledge and experimental validation to identify chemical reactions between species, which entails high time and computational costs. In this context, a deep learningbased predictive method named GraSSCoL-2 is proposed to enable efficient prediction of astrochemical reactions, thereby accelerating the analysis of species evolution.GraSSCoL-2 incorporates a graph encoder, a sequence decoder, and contrastive learning techniques. Trained on existing reaction data, it can effectively predict both forward and reverse reaction pathways among astrochemical species. Evaluated on Chemiverse, a state-of-the-art astrochemical reaction dataset, GraSSCoL-2 achieves Top-1, Top-3, Top-5 and Top-10 accuracies of 81.8%, 91.3%, 92.9% and 93.4%, respectively, for forward reaction prediction, representing relative improvements of 3.5%, 3.6%, 2.9% and 2.5%. For reverse reaction prediction, the corresponding accuracies are 76.2%, 87.6%, 89.9% and 90.5%, with relative gains of 1.9%, 1.8%, 1.8% and 1.2%. Furthermore, experimental results indicate that the combined application of SMILES augmentation and hydrogenation strategies significantly enhances prediction accuracy. Additionally, the proportion of invalid SMILES generated in forward and reverse reaction prediction tasks is 3.0% and 3.9%, respectively, a substantial reduction from 14.2% and 14.6% observed with GraSSCoL. These findings demonstrate that GraSSCoL-2 not only ensures high prediction accuracy but also significantly improves the validity of generated results, further validating its reliability and applicability in astrochemical reaction prediction tasks.

Key words: biomolecules, species evolution, reaction prediction, data augmentation, conservation laws, graph encoder, contrastive learning