CaMol, A Novel Causality Inference Framework for Few-shot Molecular Property Prediction

We present CaMol, a novel architecture for predicting molecular property in few-shot scenarios and developed by NS Lab, CUK based on pure PyTorch backend.

Graph Transformer Architecture
The overall architecture of CaMol.

We aim to build a context-aware graph causality inference framework to address the few-shot molecular property prediction tasks. Molecular property prediction is becoming one of the major applications of graph learning in Web-based services, e.g., online protein structure prediction and drug discovery. A key challenge arises in few-shot scenarios, where only a few labeled molecules are available for predicting unseen properties. Recently, several studies have used in-context learning to capture relationships among molecules and properties, but they face two limitations in: (1) exploiting prior knowledge of functional groups that are causally linked to properties and (2) identifying key substructures directly correlated with properties. We propose CaMol, a context-aware graph causality inference framework, to address these challenges by using a causal inference perspective, assuming that each molecule consists of a latent causal structure that determines a specific property. First, we introduce a context graph that encodes chemical knowledge by linking functional groups, molecules, and properties to guide the discovery of causal substructures. Second, we propose a learnable atom soft-masking strategy to disentangle causal substructures from confounding ones. Third, we introduce a distribution intervener that applies backdoor adjustment by combining causal substructures with chemically grounded confounders, disentangling causal effects from real-world chemical variations. Experiments on diverse molecular datasets showed that CaMol achieved superior accuracy and sample efficiency in few-shot tasks, showing its generalizability to unseen properties. Also, the discovered causal substructures were strongly aligned with chemical knowledge about functional groups, supporting the model interpretability.

A short description of CaMol:

The idea is based on a causal inference perspective, assuming that each molecule consists of a latent causal structure that determines a specific property.
We introduce a context graph that encodes chemical knowledge by linking functional groups, molecules, and properties to guide the discovery of causal substructures. Second, we propose a learnable atom soft-masking strategy to disentangle causal substructures from confounding ones.
We introduce a distribution intervener that applies backdoor adjustment by combining causal substructures with chemically grounded confounders, disentangling causal effects from real-world chemical variations.

The CaMol is available at:

Cite “CaMol” as:

Please cite our paper if you find CaMol useful in your work:

@misc{hoang2026contextawaregraphcausalityinference,
      title={Context-aware Graph Causality Inference for Few-Shot Molecular Property Prediction}, 
      author={Van Thuy Hoang and O-Joun Lee},
      year={2026},
      eprint={2601.11135},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2601.11135}, 
}

Please take a look at our unified graph transformer model, UGT, which can preserve local and globl graph structure, and community-aware graph transformer model, CGT, which can mitigate degree bias problem of message passing mechanism, and S-CGIB, which builds a pre-trained Graph Neural Network (GNN) model on molecules without human annotations or prior knowledge, together.

Contributors:

Share this article:

CaMol, A Novel Causality Inference Framework for Few-shot Molecular Property Prediction

A short description of CaMol:

The CaMol is available at:

Cite “CaMol” as:

Contributors:

MVCIB, A Novel Pre-trained Graph Neural Network Model on 2D and 3D Molecular Structures