Chinese Space Science and Technology ›› 2025, Vol. 45 ›› Issue (5): 22-32.doi: 10.16708/j.cnki.1000-758X.2025.0073

Previous Articles     Next Articles

Knowledge-constrained deep learning model for predicting astrochemical reactions

ZHANG Yanan,YANG Peilun,WANG Jiawei*,BU Haili,DUAN Manni,QUAN Donghui   

  1. Research Center for Astronomical Computing, Zhejiang Lab,Hangzhou 311100, China
  • Received:2025-03-21 Revision received:2025-05-22 Accepted:2025-05-30 Online:2025-09-17 Published:2025-10-01

Abstract: In astrochemical research, analyzing the evolutionary processes of species within astrophysical regions requires reconstructing their reaction pathways under dynamic physical conditions. This process heavily relies on an accurate and comprehensive astrochemical reaction network. Traditional methods for constructing such networks primarily depend on expert knowledge and experimental validation to identify chemical reactions between species, which entails high time and computational costs. In this context, a deep learningbased predictive method named GraSSCoL-2 is proposed to enable efficient prediction of astrochemical reactions, thereby accelerating the analysis of species evolution.GraSSCoL-2 incorporates a graph encoder, a sequence decoder, and contrastive learning techniques. Trained on existing reaction data, it can effectively predict both forward and reverse reaction pathways among astrochemical species. Evaluated on Chemiverse, a state-of-the-art astrochemical reaction dataset, GraSSCoL-2 achieves Top-1, Top-3, Top-5 and Top-10 accuracies of 81.8%, 91.3%, 92.9% and 93.4%, respectively, for forward reaction prediction, representing relative improvements of 3.5%, 3.6%, 2.9% and 2.5%. For reverse reaction prediction, the corresponding accuracies are 76.2%, 87.6%, 89.9% and 90.5%, with relative gains of 1.9%, 1.8%, 1.8% and 1.2%. Furthermore, experimental results indicate that the combined application of SMILES augmentation and hydrogenation strategies significantly enhances prediction accuracy. Additionally, the proportion of invalid SMILES generated in forward and reverse reaction prediction tasks is 3.0% and 3.9%, respectively, a substantial reduction from 14.2% and 14.6% observed with GraSSCoL. These findings demonstrate that GraSSCoL-2 not only ensures high prediction accuracy but also significantly improves the validity of generated results, further validating its reliability and applicability in astrochemical reaction prediction tasks.

Key words: biomolecules, species evolution, reaction prediction, data augmentation, conservation laws, graph encoder, contrastive learning