Host–Guest Docking Tutorial

This tutorial demonstrates how to use ChemRefine for a host–guest docking workflow, followed by machine-learning refinement, DFT validation, and explicit solvation.

We will start with an initial structure (step1.xyz) and progressively refine docking poses through MLFF and DFT optimization.

Overview

Step 1 – Docking (DFT)
Generate 5 initial docking poses of the guest molecule into the host cavity using XTB-level scoring.
Step 2 – MLFF Optimization
Refine docked structures using the UMA-S-1 MLFF model (omol task).
GPU acceleration is enabled (device: cuda).
Retains structures within 10 kcal/mol of the lowest energy.
Step 3 – DFT Re-optimization
The lowest-energy MLFF structure is re-optimized at the DFT level for accuracy.
Step 4 – Solvation
Add explicit solvent molecules around the final optimized host–guest complex for solvation analysis.
Step 5 - DFT calculations

DFT calculations for each solvent molecule to get solvation free energies.

Input Files

We start with an initial structure located in the templates folder:

📄 View input.yaml
📄 View step1.xyz

Orca Input Files

You can find the ORCA input files here

Interactive 3D Viewer

1. Input File

Below is a complete example of an input file (input.yaml) for a docking study:

template_dir: ./templates
scratch_dir: /scratch/ganymede2/dal950773/orca_files/
output_dir: ./fixed_charge
orca_executable: /mfs/io/groups/sterling/software-tools/orca/orca_6_1_0_avx2/orca

# Global system settings
charge: 0
multiplicity: 1

# Optional: Override default initial structure
initial_xyz: ./templates/step1.xyz

# === Step-by-step workflow ===
steps:
  # Step 1: Perform docking with DFT
  - step: 1
    operation: "DOCKER"
    engine: "DFT"
    sample_type:
      method: "integer"
      parameters:
        num_structures: 5   # Generate 5 docked structures

  # Step 2: Refine docking poses with MLFF
  - step: 2
    operation: "OPT+SP"
    engine: "MLFF"
    charge: -1
    multiplicity: 1
    mlff:
      model_name: "uma-s-1"
      task_name: "omol"
      device: "cuda"
    sample_type:
      method: "energy_window"
      parameters:
        energy: 10
        unit: kcal/mol

  # Step 3: Validate best candidates with DFT
  - step: 3
    operation: "OPT+SP"
    engine: "DFT"
    charge: -1
    multiplicity: 1
    sample_type:
      method: "integer"
      parameters:
        num_structures: 1

  # Step 4: Solvation refinement
  - step: 4
    operation: "SOLVATOR"
    engine: "DFT"
    sample_type:
      method: "integer"
      parameters:
        num_structures: 0

  - step: 5
        engine: "DFT"
        operation: "OPT+SP"
        charge: -1
        multiplicity: 1
        sample_type:
        method: "energy_window"
        parameters:
            energy: 10
            unit: kcal/mol

2. Running the Workflow

From the command line:

chemrefine input.yaml --maxcores 16

This runs the workflow locally with up to 16 parallel jobs.

On an HPC cluster with SLURM:

sbatch ./Examples/templates/chemrefine.slurm

4. Expected Outputs

Docked poses from Step 1 in outputs/step1/
Refined MLFF structures with energies in outputs/step2/
Validated DFT structures in outputs/step3/
Final solvated complex in outputs/step4/
Free Energy Solvation Energies in outputs/step5/

Each step directory contains .out logs, .xyz geometries, and summary files.

5. Notes & Tips

Adjust num_structures in Step 1 to explore more docking poses.
Use MLFF refinement for speed, then confirm results with DFT.
Solvation step can be skipped by removing Step 4.
Large jobs should always be submitted via SLURM.