Conformer Sampling Tutorial

This tutorial demonstrates how to use ChemRefine for conformational sampling with an initial global geometry optimization (GOAT) and ensemble generation.

Overview

Conformer sampling is the first step in exploring molecular flexibility and generating diverse geometries.
ChemRefine automates this process by running a global optimization followed by ensemble generation, producing a set of candidate structures for further refinement.

The workflow:

Global Optimization (GOAT):
Performs a stochastic search of the potential energy surface (PES) to identify low-energy conformers.
Ensemble Generation:
Collects the lowest-energy structures into an ensemble for downstream calculations (e.g., DFT, MLFF).
Level of theory benchmarking:
We're going to refine the level of theory starting from simple GFN2-xTB, UMA-S-1, PBE-D4, ωB97X-D4, B2PLYP

Prerequisites

Installed ChemRefine (see Installation Guide)
Access to an ORCA executable
Example molecule and YAML input from the repository

Input Files

For this tutorial, we will use Pd(PPh₃)₄.

📄 View Input YAML
📄 View step1.xyz

Orca Input Files

You can find the ORCA input files here

Interactive 3D Viewer

YAML Configuration

The YAML input for conformer sampling is also included in the tutorial folder:

➡️ Examples/Tutorials/Conformational Sampling/input.yaml

Example content:

charge: 0
multiplicity: 1

initial_xyz: ./Examples/Tutorials/Conformational Sampling/PdPPh3_4.xyz

template_dir: ./templates
scratch_dir: /scratch/
output_dir: ./outputs
orca_executable: /orca
# Sequential ORCA Input Configuration File
# Define each step with its specific parameters.
#This workflow reflects using GOAT and refining methods to improve the accuracy
charge: 0
multiplicity: 1 

# Optional: Override default initial structure (default is /template_dir/step1.xyz)
initial_xyz: ./templates/step1.xyz

steps:
  - step: 1
    calculation_type: "GOAT"
    sample_type:
      method: "integer"  
      parameters:
       num_structures: 15  #This energy is in Hartrees.

  # Step 1: Using MLFF to refine the calculation
  - step: 2
    calculation_type: "MLFF"
    mlff:
      model_name: "uma-s-1"
      task_name: "omol"
      device: "cuda"
    sample_type:
      method: "integer"
      parameters:
       num_structures: 15 

  - step: 3
    calculation_type: "DFT"
    sample_type:
      method: "integer"
      parameters:
        num_strucures: 15      

  - step: 4
    calculation_type: "DFT"
    sample_type:
      method: "integer"
      parameters:
        num_strucures: 15

  - step: 5
    calculation_type: "DFT"
    sample_type:
      method: "integer"
      parameters:
        num_structures: 15

How to Run

Before running ChemRefine, ensure that:

The ChemRefine Enviroment is activated
The ORCA executable is installed and available in your PATH
The template directory (./templates/) is correctly set up
The input structure file (e.g., input.xyz) is prepared

Option 1: Run from the Command Line

You can launch ChemRefine directly from the command line:

chemrefine input.yaml --maxcores <N>

Here N is the max number of simultaneous cores you want to use.

Option 2: Run ChemRefine with SLURM script

On HPC systems with SLURM, you can submit ChemRefine as a batch job. A ready-to-use SLURM script template is available at:

➡️Example ChemRefine SLURM script

#!/bin/bash
#SBATCH --partition=<your_partition>
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=2G              # Limit memory to allow multiple jobs on the same node
#SBATCH --time=72:00:00
#SBATCH --exclude=g-07-02
#SBATCH --job-name=conformer_search
#SBATCH --output=%x.out   
#SBATCH --error=%x.err    # Saves error log

# Ensure the script allows for shared node usage
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Run the calculation
chemrefine input.yaml --maxcores 480