Modeling an FGFR Kinase Inhibitor with Boltz-1 on DiPhyx: A Sequence-to-Structure-to-Function Workflow

Modeling an FGFR Kinase Inhibitor with Boltz-1 on DiPhyx: A Sequence-to-Structure-to-Function Workflow#

Why This Case Study Matters#

FGFR kinases (FGFR1-3) are implicated in diverse cancers. Infigratinib (BGJ-398), an ATP-competitive inhibitor, is approved for FGFR2 fusion-driven cholangiocarcinoma and in trials for broader FGFR-addicted cancers. Understanding the structural basis of its activity supports:

Precision medicine — anticipating resistance mutations and optimizing analogs.
Biomarker discovery — linking structure to downstream gene expression.
In-silico screening — evaluating analogs computationally before synthesis.

Boltz-1, a generative diffusion model, predicts protein-ligand complex structures. This notebook demonstrates how to run Boltz-1 within the DiPhyx platform, integrate molecular modeling and transcriptomics, and interpret biological outcomes.

Pipeline Overview#

Stage	Key Tool	Output
A. Target & ligand prep	UniProt, RDKit	FGFR1-3 kinase sequences; 3D mol of Infigratinib
B. Structure prediction	Boltz-1	PDBs of FGFR–drug complexes + per-model confidence
C. Expression signature	Scanpy + GSEApy (bulk/SC datasets)	Differential gene lists & pathway NES
D. Interpretation	PyMOL, volcano/heat-maps	Structure-function narrative & design hypotheses

Compute Recommendations: Run this notebook on GPU-enabled units on DiPhyx. Recommended instances include:

g4dn.4xlarge (16 cores, 64 GB RAM, Tesla T4 GPU)

g4dn.2xlarge (8 cores, 16 GB RAM, Tesla T4 GPU)

g6.2xlarge (8 cores, 32 GB RAM, NVIDIA L4 GPU)

Practical Walk-through#

Prepare Inputs#

Fetch FGFR1 kinase domain and generate Infigratinib conformer:

"""Fetch FGFR1 kinase domain (residues 564‑822) and build a 3‑D conformer of
Infigratinib – all in pure Python so you can run inside a notebook."""

import os, requests, textwrap
from pathlib import Path
from rdkit import Chem
from rdkit.Chem import AllChem

boltz_input_path = Path("boltz_inputs");
boltz_input_path.mkdir(exist_ok=True)


# ▸  Fetch canonical SMILES for Infigratinib (PubChem CID 50909836) -----
url = "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/50909836/property/IsomericSMILES/JSON"
smiles = requests.get(url, timeout=30).json()['PropertyTable']['Properties'][0]['IsomericSMILES']
print("SMILES:", smiles[:60], "…")

# Build 3‑D ligand
mol = Chem.AddHs(Chem.MolFromSmiles(smiles))
AllChem.EmbedMolecule(mol, randomSeed=42)
AllChem.UFFOptimizeMolecule(mol)
Chem.MolToMolFile(mol, boltz_input_path / "Infigratinib.mol")
print("Wrote 3D MOL →", boltz_input_path / "Infigratinib.mol")

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 6
      4 import os, requests, textwrap
      5 from pathlib import Path
----> 6 from rdkit import Chem
      7 from rdkit.Chem import AllChem
      9 boltz_input_path = Path("boltz_inputs");

ModuleNotFoundError: No module named 'rdkit'

Generate YAML programmatically#

The snippet below downloads the full FGFR1 sequence from UniProt, slices the kinase domain (564–822), and writes a Boltz‑1 YAML in boltz_inputs/. It re‑uses the smiles variable created in the previous cell:

import os, requests, yaml, textwrap
from pathlib import Path

boltz_input_path = Path("boltz_inputs")
boltz_input_path.mkdir(exist_ok=True)

# ▸  Download full FGFR1 sequence (UniProt P11362) ----------------------


uniprots = {
    "FGFR1": "P11362",
    "FGFR2": "P21802",
    "FGFR3": "P22607",
}

seq_dict = {}
for name, uid in uniprots.items():
    url = f"https://www.uniprot.org/uniprot/{uid}.fasta"
    fasta = requests.get(url, timeout=30).text
    full_seq = "".join(l.strip() for l in fasta.splitlines() if not l.startswith(">"))
    kd_seq = full_seq[563:822]  # slice residues 564‑822 (python 0‑based)
    seq_dict[name] = kd_seq
    print(f"{name}: kinase domain length = {len(kd_seq)} aa")

inputs_dict ={}

for name, kd_seq in seq_dict.items():
    yaml_dict = {
        "version": 1,
        "sequences": [
            {"protein": {"id": "A", "sequence": textwrap.fill(kd_seq, 60)}},
            {"ligand":  {"id": "B", "smiles": smiles}},
        ]
    }
    outfile = os.path.join("boltz_inputs", f"{name.lower()}_infig.yaml")
    with open(outfile, "w") as fh:
        yaml.safe_dump(yaml_dict, fh, sort_keys=False)
    inputs_dict[name] = outfile
    print("Wrote", outfile)

3.2  Run Boltz‑1#

The fist step is install Boltz-1. The following command will install the latest version of Boltz-1 from the GitHub repository. There are some dependencies that need to be installed first, including Cmake, compilers (C++, gfortran, etc). You can find the installation instructions in the Boltz-1 GitHub repository

!conda install -y -c conda-forge gfortran_linux-64 compilers git cmake  openblas openblas-devel > /dev/null 2>&1
!pip install rdkit
!pip install pyyaml
# !pip install boltz
!git clone https://github.com/jwohlwend/boltz.git
!cd boltz; pip install -e .

import subprocess
import os 

boltz_output_path = "boltz_output"
# Ensure boltz_output_path exists
os.makedirs(boltz_output_path, exist_ok=True)
# Run the command and stream output in real time
def run_and_stream(cmd):
    process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
    for line in process.stdout:
        print(line, end='')
    process.wait()
    if process.returncode != 0:
        print(f"Process exited with code {process.returncode}")

# Repeat for FGFR2 and FGFR3.Outp
#boltz_input_path = os.path.join("volume", "boltz_input")
for name, yaml_input_file in inputs_dict.items():
    cmd = [
        "boltz", "predict",
        yaml_input_file,
        "--out_dir", boltz_output_path,
        "--recycling_steps", "10",
        "--diffusion_samples", "8",
        "--cache", "/volume/boltz_cache",
        "--use_msa_server"
    ]

    print(f"Running command: {' '.join(cmd)}")
    run_and_stream(cmd)

Binding Pose Visualization#

Load fgfr1_infig_model_0.pdb in PyMOL:In order to visualize the binding pose of the ligand in the protein structure, we can use PyMOL. PyMOL is a powerful molecular visualization tool that allows us to view and manipulate 3D structures of proteins and ligands. We can load the PDB file generated by Boltz-1 and visualize the binding pose of Infigratinib in the FGFR1 kinase domain. You can launch PyMol on your desired compute-unit. First go to the flow then find PyMol and click on the “Tryout” button. Then select the desired compute-unit to launch the PyMol. This will launch a new instance of PyMol in your browser.

When you open the PyMol check the following:

Acrylamide warhead aligned toward Cys488 (covalent site).
Hinge hydrogen bonds to Ala564 backbone.
Confidence JSON → ligand_iptm > 0.6 ⇒ stable pose.

Link Structure to Transcriptional Response#

Obtain any public RNA-seq dataset where FGFR-addicted cells are treated with BGJ-398 (e.g. GEO GSE65324). Analyse with Scanpy:

import scanpy as sc
import gseapy as gp
adata = sc.read_h5ad("BGJ398_treated_vs_control.h5ad")
sc.tl.rank_genes_groups(adata, 'condition', groups=['treated'], reference='control')
deg = sc.get.rank_genes_groups_df(adata, 'treated')
rank = deg[['names','logfoldchanges']].sort_values('logfoldchanges', ascending=False)
enrich = gp.prerank(rnk=rank, gene_sets='MSigDB_Hallmark_2020')
enrich.res2d.head(10)

Expected results

Observation	Structural rationale
Down-reg of E2F targets, MYC targets	Loss of FGFR/ERK proliferative signalling
Up-reg of p53 pathway, apoptosis	FGFR blockade induces cell-cycle arrest
Feedback ↓ in FGFR1/2 mRNA	Kinase pocket occupancy disrupts receptor recycling

Combine volcano plot of DEGs with PyMOL snapshot → a coherent narrative from pocket blockade to pathway shutdown.