This is the official code repository for the paper READRetro: Natural Product Biosynthesis Planning with Retrieval-Augmented Dual-View Retrosynthesis (bioRxiv, 2023).
We also provide a web version for ease of use.
Download the necessary data folder READRetro_data
from Zenodo to ensure proper execution of the code and demonstrations in this repository.
The directory structure of READRetro_data
is as follows:
├── data
│ ├── model_train_data
│ └── multistep_data
├── model
│ ├── bionavi
│ ├── g2s
│ │ └── saved_models
│ ├── megan
│ └── retroformer
│ └── saved_models
├── result
└── scripts
Place READRetro_data
into the READRetro directory (i.e., READRetro/READRetro_data
) and run sh
in READRetro_data
to set up the data.
Ensure the data is correctly located in READRetro
. Verify the following:
should matchREADRetro_data/model/retroformer/saved_models
should matchREADRetro_data/model/g2s/saved_models
should matchREADRetro_data/data/multistep_data
should matchREADRetro_data/result
should matchREADRetro_data/scripts
The directories READRetro_data/model/bionavi
, READRetro_data/model/megan
, and READRetro_data/data/model_train_data
are required for reproducing the values in the manuscript.
Run the following commands to install the dependencies:
conda create -n readretro python=3.8
conda activate readretro
conda install pytorch==1.12.0 cudatoolkit=11.3 -c pytorch
pip install easydict pandas tqdm numpy==1.22 OpenNMT-py==2.3.0 networkx==2.5
conda install -c conda-forge rdkit=2019.09
Alternatively, you can install the readretro
package through pip:
conda create -n readretro python=3.8 -y
conda activate readretro
pip install readretro==1.2.0
We provide the trained models through Zenodo.
You can use your own models trained using the official codes ( and
More detailed instructions can be found in demo.ipynb
Run the following commands to evaluate the single-step performance of the models:
CUDA_VISIBLE_DEVICES=${gpu_id} python # ensemble
CUDA_VISIBLE_DEVICES=${gpu_id} python -m retroformer # Retroformer
CUDA_VISIBLE_DEVICES=${gpu_id} python -m g2s -s 200 # Graph2SMILES
Run the following command to plan paths of multiple products using multiprocessing:
CUDA_VISIBLE_DEVICES=${gpu_id} python
# e.g., CUDA_VISIBLE_DEVICES=0 python
You can modify other hyperparameters described in
Lower num_threads
if you run out of GPU capacity.
Run the following command to plan the retrosynthesis path of your own molecule:
CUDA_VISIBLE_DEVICES=${gpu_id} python ${product}
# e.g., CUDA_VISIBLE_DEVICES=0 python 'O=C1C=C2C=CC(O)CC2O1'
run_readretro -rc ${retroformer_ckpt} -gc ${g2s_ckpt} ${product}
# e.g., run_readretro -rc retroformer/saved_models/ -gc g2s/saved_models/ 'O=C1C=C2C=CC(O)CC2O1'
# you can replace the checkpoints with your own trained checkpoints of retroformer and g2s
# you should set the corresponding vocab file as an option if you replace the checkpoints
You can modify other hyperparameters described in
Run the following command to evaluate the planned paths of the test molecules:
python ${save_file}
# e.g., python result/debug.txt
You can reproduce the figures and tables presented in the paper or train your own models by utilizing the provided demo.ipynb