Conformer Sampling Tutorial
This tutorial demonstrates how to use ChemRefine for conformational sampling with an initial global geometry optimization (GOAT) and ensemble generation.
Overview
Conformer sampling is the first step in exploring molecular flexibility and generating diverse geometries.
ChemRefine automates this process by running a global optimization followed by ensemble generation, producing a set of candidate structures for further refinement.
The workflow:
- Global Optimization (GOAT):
Performs a stochastic search of the potential energy surface (PES) to identify low-energy conformers. - Ensemble Generation:
Collects the lowest-energy structures into an ensemble for downstream calculations (e.g., DFT, MLFF). - Level of theory benchmarking:
We're going to refine the level of theory starting from simple GFN2-xTB, UMA-S-1, PBE-D4, ωB97X-D4, B2PLYP
Prerequisites
- Installed ChemRefine (see Installation Guide)
- Access to an ORCA executable
- Example molecule and YAML input from the repository
Input Files
For this tutorial, we will use Pd(PPh₃)₄.
Orca Input Files
You can find the ORCA input files here
Interactive 3D Viewer
YAML Configuration
The YAML input for conformer sampling is also included in the tutorial folder:
➡️ Examples/Tutorials/Conformational Sampling/input.yaml
Example content:
charge: 0
multiplicity: 1
initial_xyz: ./Examples/Tutorials/Conformational Sampling/PdPPh3_4.xyz
template_dir: ./templates
scratch_dir: /scratch/
output_dir: ./outputs
orca_executable: /orca
# Sequential ORCA Input Configuration File
# Define each step with its specific parameters.
#This workflow reflects using GOAT and refining methods to improve the accuracy
charge: 0
multiplicity: 1
# Optional: Override default initial structure (default is /template_dir/step1.xyz)
initial_xyz: ./templates/step1.xyz
steps:
- step: 1
calculation_type: "GOAT"
sample_type:
method: "integer"
parameters:
num_structures: 15 #This energy is in Hartrees.
# Step 1: Using MLFF to refine the calculation
- step: 2
calculation_type: "MLFF"
mlff:
model_name: "uma-s-1"
task_name: "omol"
device: "cuda"
sample_type:
method: "integer"
parameters:
num_structures: 15
- step: 3
calculation_type: "DFT"
sample_type:
method: "integer"
parameters:
num_strucures: 15
- step: 4
calculation_type: "DFT"
sample_type:
method: "integer"
parameters:
num_strucures: 15
- step: 5
calculation_type: "DFT"
sample_type:
method: "integer"
parameters:
num_structures: 15
How to Run
Before running ChemRefine, ensure that:
- The ChemRefine Enviroment is activated
- The ORCA executable is installed and available in your
PATH
- The template directory (
./templates/
) is correctly set up - The input structure file (e.g.,
input.xyz
) is prepared
Option 1: Run from the Command Line
You can launch ChemRefine directly from the command line:
chemrefine input.yaml --maxcores <N>
Here N is the max number of simultaneous cores you want to use.
Option 2: Run ChemRefine with SLURM script
On HPC systems with SLURM, you can submit ChemRefine as a batch job. A ready-to-use SLURM script template is available at:
➡️Example ChemRefine SLURM script
#!/bin/bash
#SBATCH --partition=<your_partition>
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=2G # Limit memory to allow multiple jobs on the same node
#SBATCH --time=72:00:00
#SBATCH --exclude=g-07-02
#SBATCH --job-name=conformer_search
#SBATCH --output=%x.out
#SBATCH --error=%x.err # Saves error log
# Ensure the script allows for shared node usage
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# Run the calculation
chemrefine input.yaml --maxcores 480