DAlphaBall.gcc Note: Logo is a ChatGPT-modified version of the Netflix animation and is for illustration only.
π Paper: Protein Hunter: exploiting structure hallucination within diffusion for protein design
-
Clone the repository
git clone https://github.com/yehlincho/Protein-Hunter.git cd Protein-Hunter -
Run the automated setup
chmod +x setup.sh ./setup.sh
β οΈ Note: AlphaFold3 setup is not included. Please install it separately by following the official instructions.
The setup script will automatically:
- β Create a conda environment with Python 3.10
- β Install all required dependencies
- β Set up a Jupyter kernel for notebooks
- β Download Boltz and Chai weights
- β Configure LigandMPNN and ProteinMPNN
- β Optionally install PyRosetta
- β AF3 must be installed separately
We have implemented two different AF3-style models in our Protein Hunter pipeline (more models will be added in the future):
- Boltz1/2
- Chai1
π See example usage in run_protein_hunter.py for reference. πβ¨
This will take you from your initial input all the way to final designed protein structures and sequences, all in one pipeline!
The design chain is "A" and other target chains are "B", "C", etc.
π‘ Tips: The original evaluation in the paper used an all-X sequence for initial design. However, to increase the diversity of generated folds, you can mix random amino acids with X residues by setting the
percent_Xparameter (e.g.,--percent_X 50for 50% X and 50% random AAs). Adjusting this ratio helps explore a broader design spaceβbut if you see too many floating or disconnected structures, try decreasingpercent_Xto encourage more structured designs.
--alphafold_dir: Path to your AlphaFold3 installation (default: ~/alphafold3)--af3_docker_name: Name of your AlphaFold3 Docker container--af3_database_settings: Path to AlphaFold3 database--af3_hmmer_path: Path to HMMER--use_alphafold3_validation: Add this flag to enable AlphaFold3-based validation.
To use AlphaFold3 validation, make sure your AlphaFold3 Docker is installed, specify the correct AlphaFold3 directory, and turn on --use_alphafold3_validation.
-
Protein-protein design with all X sequence:
To design a protein-protein complex using an all-X sequence (i.e., X for every residue, encouraging de novo exploration), run:python boltz_ph/design.py --num_designs 3 --num_cycles 7 --protein_seqs AFTVTVPKDLYVVEYGSNMTIECKFPVEKQLDLAALIVYWEMEDKNIIQFVHGEEDLKVQHSSYRQRARLLKDQLSLGNAALQITDVKLQDAGVYRCMISYGGADYKRITVKVNAPYAAALE --msa_mode "mmseqs" --gpu_id 0 --name PDL1_mix_aa_all_X --percent_X 100 --min_protein_length 90 --max_protein_length 150 --high_iptm_threshold 0.7 --use_msa_for_af3 --plotπ‘ Tip: The
--percent_X 100flag ensures all positions use the X (unknown) amino acid code. -
Protein-protein design (mixed X and random amino acids):
To design a protein-protein complex using a mix of X and random amino acids for the designable chain, run:python boltz_ph/design.py --num_designs 3 --num_cycles 7 --protein_seqs AFTVTVPKDLYVVEYGSNMTIECKFPVEKQLDLAALIVYWEMEDKNIIQFVHGEEDLKVQHSSYRQRARLLKDQLSLGNAALQITDVKLQDAGVYRCMISYGGADYKRITVKVNAPYAAALE --msa_mode "mmseqs" --gpu_id 0 --name PDL1_mix_aa --percent_X 50 --min_protein_length 90 --max_protein_length 150 --high_iptm_threshold 0.7 --use_msa_for_af3 --plot- Protein-protein design (sample fewer alanine with alanine bias):
To design a protein-protein complex using a mix of X and random amino acids while discouraging alanine sampling during design, run:
python boltz_ph/design.py --num_designs 3 --num_cycles 7 --protein_seqs AFTVTVPKDLYVVEYGSNMTIECKFPVEKQLDLAALIVYWEMEDKNIIQFVHGEEDLKVQHSSYRQRARLLKDQLSLGNAALQITDVKLQDAGVYRCMISYGGADYKRITVKVNAPYAAALE --msa_mode "mmseqs" --gpu_id 0 --name PDL1_mix_aa_alanine_bias --percent_X 50 --min_protein_length 90 --max_protein_length 150 --high_iptm_threshold 0.7 --alanine_bias --use_msa_for_af3 --plot - Protein-protein design (sample fewer alanine with alanine bias):
-
Protein-protein contact specification design:
By default, contact potentials are disabled. To enable contact-based potentials and specify interface residue positions (e.g., residue positions "2,3,10" in the target chain), add--no_potentials Falseand--contact_residues 2,3,10to your command. For example:python boltz_ph/design.py --num_designs 3 --num_cycles 7 --protein_seqs AFTVTVPKDLYVVEYGSNMTIECKFPVEKQLDLAALIVYWEMEDKNIIQFVHGEEDLKVQHSSYRQRARLLKDQLSLGNAALQITDVKLQDAGVYRCMISYGGADYKRITVKVNAPYAAALE --msa_mode "mmseqs" --gpu_id 0 --name PDL1_mix_aa --min_protein_length 90 --max_protein_length 150 --high_iptm_threshold 0.7 --use_msa_for_af3 --plot --no_potentials False --contact_residues 2,3,10 -
Multimer binder design:
To design a binder for a multimeric protein (e.g., a dimer), separate chain sequences using:python boltz_ph/design.py --num_designs 3 --num_cycles 7 --protein_seqs AGIKVFGHPASIATRRVLIALHEKNLDFELVHVELKDGEHKKEPFLSRNPFGQVPAFEDGDLKLFESRAITQYIAHRYENQGTNLLQTDSKNISQYAIMAIGMQVEDHQFDPVASKLAFEQIFKSIYGLTTDEAVVAEEEAKLAKVLDVYEARLKEFKYLAGETFTLTDLHHIPAIQYLLGTPTKKLFTERPRVNEWVAEITKRPASEKVQ:AGIKVFGHPASIATRRVLIALHEKNLDFELVHVELKDGEHKKEPFLSRNPFGQVPAFEDGDLKLFESRAITQYIAHRYENQGTNLLQTDSKNISQYAIMAIGMQVEDHQFDPVASKLAFEQIFKSIYGLTTDEAVVAEEEAKLAKVLDVYEARLKEFKYLAGETFTLTDLHHIPAIQYLLGTPTKKLFTERPRVNEWVAEITKRPASEKVQ --msa_mode "mmseqs" --gpu_id 0 --name 1GNW_mix_aa --min_protein_length 90 --max_protein_length 150 --high_iptm_threshold 0.7 --use_msa_for_af3 --plot -
Cyclic peptide binder design
python boltz_ph/design.py --num_designs 3 --num_cycles 7 --protein_seqs AFTVTVPKDLYVVEYGSNMTIECKFPVEKQLDLAALIVYWEMEDKNIIQFVHGEEDLKVQHSSYRQRARLLKDQLSLGNAALQITDVKLQDAGVYRCMISYGGADYKRITVKVNAPYAAALE --msa_mode "mmseqs" --gpu_id 0 --name PDL1_cyclic_peptide_binder --min_protein_length 10 --max_protein_length 20 --high_iptm_threshold 0.8 --percent_X 100 --use_msa_for_af3 --plot --cyclic -
Small molecule binder design:
For designing a protein binder for a small molecule (e.g., SAM), use:python boltz_ph/design.py --num_designs 5 --num_cycles 7 --ligand_ccd SAM --gpu_id 2 --name SAM_binder --min_protein_length 130 --max_protein_length 150 --high_iptm_threshold 0.7 --use_msa_for_af3 --plot -
DNA/RNA PDB design:
To design a protein binder for a nucleic acid (e.g., an RNA sequence), run:python boltz_ph/design.py --num_designs 5 --num_cycles 7 --nucleic_seq AGAGAGAGA --nucleic_type rna --gpu_id 0 --name RNA_bind --min_protein_length 130 --max_protein_length 150 --high_iptm_threshold 0.7 --use_msa_for_af3 --plot -
Designs with multiple/heterogeneous target types:
Want to target multiple types of molecules (e.g., a protein with a ligand and a template)? Run:python boltz_ph/design.py --num_designs 5 --num_cycles 7 --protein_seqs AFTVTVPKDLYVVEYGSNMTIECKFPVEKQLDLAALIVYWEMEDKNIIQFVHGEEDLKVQHSSYRQRARLLKDQLSLGNAALQITDVKLQDAGVYRCMISYGGADYKRITVKVNAPYAAALE --msa_mode "mmseqs" --ligand_ccd SAM --gpu_id 0 --name PDL1_SAM --min_protein_length 90 --max_protein_length 150 --high_iptm_threshold 0.8 --use_msa_for_af3 --plot
β οΈ Caution: The Chai version is under active development
-
Support for multiple targets
-
Full AlphaFold (AF3) validation
-
Unconditional protein design:
Generate de novo proteins of a desired length:python chai_ph/design.py --jobname unconditional_design --min_protein_length 150 --max_protein_length 150 --percent_X 0 --seq "" --target_seq ACDEFGHIKLMNPQRSTVWY --n_trials 1 --n_cycles 5 --n_recycles 3 --n_diff_steps 200 --hysteresis_mode templates --repredict --omit_aa "" --temperature 0.1 --scale_temp_by_plddt --render_freq 100 --gpu_id 0 --plot -
Protein binder design:
To design a binder for a specific protein target (e.g., PDL1):python chai_ph/design.py --jobname PDL1_binder --min_protein_length 90 --max_protein_length 150 --percent_X 80 --seq "" --target_seq AFTVTVPKDLYVVEYGSNMTIECKFPVEKQLDLAALIVYWEMEDKNIIQFVHGEEDLKVQHSSYRQRARLLKDQLSLGNAALQITDVKLQDAGVYRCMISYGGADYKRITVKVNAPYAAALE --n_trials 1 --n_cycles 5 --n_recycles 3 --n_diff_steps 200 --hysteresis_mode templates --repredict --omit_aa "" --temperature 0.1 --scale_temp_by_plddt --render_freq 100 --gpu_id 0 --use_msa_for_af3 --plot -
Cyclic peptide binder design:
Design a cyclic peptide binder for a specific protein target:python chai_ph/design.py --jobname PDL1_cyclic_binder --percent_X 80 --seq "" --min_protein_length 10 --max_protein_length 20 --cyclic --target_seq AFTVTVPKDLYVVEYGSNMTIECKFPVEKQLDLAALIVYWEMEDKNIIQFVHGEEDLKVQHSSYRQRARLLKDQLSLGNAALQITDVKLQDAGVYRCMISYGGADYKRITVKVNAPYAAALE --n_trials 1 --n_cycles 5 --n_recycles 3 --n_diff_steps 200 --hysteresis_mode templates --repredict --omit_aa "" --temperature 0.1 --scale_temp_by_plddt --render_freq 100 --gpu_id 0 --use_msa_for_af3 --plot -
Small molecule (ligand) binder design:
To design a protein binder for a small molecule or ligand (SMILES string as target):python chai_ph/design.py --jobname ligand_binder --percent_X 0 --seq "" --min_protein_length 130 --max_protein_length 150 --target_seq O=C(NCc1cocn1)c1cnn(C)c1C(=O)Nc1ccn2cc(nc2n1)c1ccccc1 --n_trials 1 --n_cycles 5 --n_recycles 3 --n_diff_steps 200 --hysteresis_mode esm --repredict --omit_aa "" --temperature 0.01 --scale_temp_by_plddt --render_freq 100 --gpu_id 2 --plot
π οΈ You can provide your initial designs as input and further improve their structures by iteratively redesigning and predicting them. Repeat as needed for optimal results!
See the code in refiner.ipynb for example usage.
For example, you can generate a design using Boltzgen, take the final output, and refine it further using the iterative pipeline.
We have implemented trajectory visualization using LogMD and py2Dmol (developed by Sergey Ovchinnikov).
Final structures are validated using AlphaFold3 for:
- Structure quality assessment
- Confidence scoring
- Cross-validation against design targets
By default, the high_iptm_yaml folder contains only the designs with ipTM above your chosen threshold and alanine percentage below 20%. If you want to browse all generated designs, please check the metrics in summary_all_runs.csv.
After running the pipeline with run_protein_hunter.py, high-confidence designs can be found in:
your_output_folder/high_iptm_yamlyour_output_folder/high_iptm_cif- Metrics summary:
your_output_folder/summary_high_iptm.csv
After running AlphaFold3, the validated (successfully predicted) structures are saved in:
your_output_folder/03_af_pdb_success
- Add cross-validation between Boltz and Chai (both directions) without using AlphaFold3 as an option
- Implement multi-timer support for Protein Hunter Chai edition
- Specify multiple contacts on multiple targets
- Explore other cool applications
Collaboration is always welcome! Email me. Let's chat.
License: MIT License - See LICENSE file for details
Citation: If you use Protein Hunter in your research, please cite:
@article{cho2025protein,
title={Protein Hunter: exploiting structure hallucination within diffusion for protein design},
author={Cho, Yehlin and Rangel, Griffin and Bhardwaj, Gaurav and Ovchinnikov, Sergey},
journal={bioRxiv},
pages={2025--10},
year={2025},
publisher={Cold Spring Harbor Laboratory}
}
Questions or Collaboration: [email protected]
Issues: Please report bugs and feature requests via GitHub Issues
EXPERIMENTAL SOFTWARE: This pipeline is under active development and has NOT been experimentally validated in laboratory settings. We release this code to enable community contributions and collaborative development. Use at your own discretion and validate results independently.
