GALLOP

Update Jun 2025

*I have been working for a company that did not permit me to make contributions to open-source projects. I have since left that role, and confirmed with my new employer that I am able to contribute to open source projects. As such, I hope to be able to make further updates to GALLOP when time allows, and continue improving this project. Collaboration and other contributions would be very welcome!

Expect some changes to the installation instructions over the next few days.*

GPU Accelerated Local Optimisation and Particle Swarm: fast crystal structure determination from powder diffraction data.

This code is very much a work in progress and is likely to have many bugs. If you have comments, suggestions, bugs to report or ideas for improvement or new features, please feel free to submit an issue, contact me directly (markspillman at gmail dot com) or submit a pull request.

Updates to the code will be alerted on Twitter. Blog posts and tutorials can be found here.

Contents:

Introduction
Try GALLOP on a free cloud GPU
Local Installation
References and Resources

Introduction

An article describing the approach taken by this code is available from CrystEngComm. If you do not have access to RSC journals, you can also download a copy of the accepted manuscript here. If you find GALLOP useful in your work, please consider citing this article:

CrystEngComm, 2021, 23, 6481 - 6485

GALLOP is a recently developed hybrid algorithm for crystal structure determination from powder diffraction data (SDPD), which combines fast local optimisation with a particle swarm optimiser. An introduction to the algorithm can be found in this blog post. We have also published an article showing how GALLOP can be used to efficiently solve the conformations of rings. This is also covered in a blog post.

This repository provides an implementation of GALLOP that is capable of running on graphics processing units (GPUs). The use of hardware accelerators provides significant performance advantages over CPU-based methods, allowing complex crystal structures to be solved rapidly. GALLOP has also been tested on TPUs, though at this point there is no significant advantage, and it entails some additional complexity. If you require TPU acceleration, please contact me for more information.

Other software is required for PXRD data indexing and intensity extraction via Pawley refinement. GALLOP currently accepts as input files produced by DASH, GSAS-II and TOPAS (experimental). Z-matrices of the molecular fragments of interest are also required, which can be automatically generated in the correct format using DASH.

Once all data is prepared, GALLOP can be used via a convenient browser-based GUI or as part of a Python program.

Video showing browser interface in use

A video showing GALLOP solving CT-DMF2 (42 degrees of freedom) in real time via the GALLOP broswer-based interface is available in this short YouTube video.

**Try GALLOP on a free cloud GPU**

You will need a Google/Kaggle account to run the notebooks linked below. These notebooks will allow you to try GALLOP for free, via your browser. You will not need to download or install anything to your computer, and you do not require local GPU hardware in order to run these notebooks.

Notebook	Description
Colab - Browser interface *	Try the GALLOP browser interface
Kaggle - Browser interface *	Try the GALLOP browser interface (in general, kaggle offers faster GPUs than colab for free use)
Colab - Python mode	Try GALLOP as part of a Python program
Kaggle - Python mode	Try GALLOP as part of a Python program (in general, kaggle offers faster GPUs than colab for free use)

* To use the browser interface notebook, you will also need an ngrok authentication key, which can be obtained for free here.

An overview of using the browser interface can be found in this blog post.

An overview of how to use the Python interface can be found in this blog post.

Data preparation

GALLOP is able to read Pawley fitting output files produced by DASH, GSAS-II and TOPAS. It can also read SHELX HKLF4 format single crystal data.

Space groups must be in their standard setting.

DASH: follow the Pawley fitting procedure as normal, and ensure that the resultant .sdi, .dsl, .hcv, .pik and .tic files are available.
GSAS-II: Pawley fit the data as normal. Once satisfied with the fit, unflag all parameters apart from the intensities (i.e. peak shape, unit cell, background etc should all be fixed). Reset the intensity values (via Pawley create), then ensure that for this final refinement, the optimisation algorithm is set to analytic Jacobian. This is essential, as the default analytic Hessian optimiser modifies the covariance matrix in ways that produce errors in GALLOP. After saving, GALLOP will read in the .gpx file.
TOPAS (experimental): Pawley fit the data as normal. Once satisfied with the fit, unflag all refined parameters in the .inp (i.e. peak shape, unit cell, background etc should all be fixed), and delete the intensities (if present). Add the key word do_errors before the hkl_Is term, and add the key word C_matrix to the end of the .inp file. GALLOP will read in the resultant .out file.
SHELX (experimental): as well as a .hkl file conforming to the HKLF4 format, GALLOP also expects a .ins or .cif file to supply the unit cell and space group information. If using a .ins file, then the space group should be included as the last item that follows the TITL keyword. For example: TITL Famotidine in P121/c1

Z-matrices

GALLOP is able to read Z-matrices that have been produced by the MakeZmatrix.exe program that is bundled with DASH.

An example Z-matrix is below, showing the expected format. Lines 1-3 are ignored by GALLOP (but should still be present in the file), line 4 onwards contains information about the atoms. The columns have the following format:

Element symbol
Bond length
Bond length refinement flag (1 = refine*)
Angle
Angle refinement flag (1 = refine*)
Torsion angle
Torsion angle refinement flag (1 = refine)
Atom index for bond
Atom index for angle
Atom index for torsion
Debye-Waller factor for atom
Occupancy of atom
Atom index in file used to generate Z-matrix
to 17. Atom labels in file used to generate Z-matrix. 14 = current atom, 15-17 linked to indices in columns 8-10. The index for an atom is equivalent to the zero-indexed order in which it appears in the Z-matrix, i.e. for a Z-matrix with N atoms, the first atom in the ZM = index 0, the last atom in the ZM = index N-1.

* refinement of bond lengths and angles in GALLOP is not yet supported

Z-matrix generated by ToolKit
1.0 1.0 1.0 90.0 90.0 90.0
  35   0
  S      0.0000000  0     0.000000  0     0.000000  0    0    0    0  3.0  1.0    2 S2
  C      1.8177016  0     0.000000  0     0.000000  0    1    0    0  3.0  1.0   15 C3 S2
  C      1.8283979  0    99.579820  0     0.000000  0    1    2    0  3.0  1.0   16 C4 S2 C3
  C      1.5232339  0   109.911257  0  -170.684290  1    2    1    3  3.0  1.0   14 C2 C3 S2 C4
  H      0.9813685  0   109.693939  0   121.095906  0    2    1    4  6.0  1.0   27 H7 C3 S2 C2
  H      0.9572320  0   110.549365  0  -121.387389  0    2    1    4  6.0  1.0   28 H8 C3 S2 C2
  C      1.4927147  0   114.065887  0    62.653100  1    3    1    2  3.0  1.0   17 C5 C4 S2 C3
  H      0.9759911  0   100.970584  0   122.204015  0    3    1    7  6.0  1.0   29 H9 C4 S2 C5
  H      0.9855786  0   108.004016  0  -124.221958  0    3    1    7  6.0  1.0   30 H10 C4 S2 C5
  C      1.5152129  0   112.786930  0    67.443801  1    4    2    1  3.0  1.0   13 C1 C2 C3 S2
  H      0.9472479  0   110.310184  0  -122.665750  0    4    2   10  6.0  1.0   25 H5 C2 C3 C1
  H      0.9803862  0   108.741344  0   117.500021  0    4    2   10  6.0  1.0   26 H6 C2 C3 C1
  N      1.3817147  0   118.733461  0    56.911575  1    7    3    1  3.0  1.0    9 N4 C5 C4 S2
  C      1.3502925  0   125.467914  0   178.519478  0    7    3   13  3.0  1.0   18 C6 C5 C4 N4
  N      1.3219948  0   114.720410  0    51.385586  1   10    4    2  3.0  1.0    7 N2 C1 C2 C3
  N      1.3239387  0   116.701385  0   179.823818  0   10    4   15  3.0  1.0    8 N3 C1 C2 N2
  C      1.3159161  0   111.226477  0   178.885305  0   13    7    3  3.0  1.0   19 C7 N4 C5 C4
  S      1.7276900  0   110.235809  0  -177.494170  0   14    7    3  3.0  1.0    3 S3 C6 C5 C4
  H      0.8888039  0   127.959634  0  -179.218947  0   14    7   18  6.0  1.0   31 H11 C6 C5 S3
  S      1.6099792  0   124.015478  0   177.152771  0   15   10    4  3.0  1.0    1 S1 N2 C1 C2
  H      0.8407330  0   121.656021  0  -179.639174  0   16   10    4  6.0  1.0   23 H3 N3 C1 C2
  H      0.8916929  0   118.440482  0   177.399416  0   16   10   21  6.0  1.0   24 H4 N3 C1 H3
  N      1.3675353  0   128.791480  0   177.813567  0   17   13    7  3.0  1.0   10 N5 C7 N4 C5
  O      1.4401296  0   104.153339  0   176.990530  1   20   15   10  3.0  1.0    4 O1 S1 N2 C1
  O      1.4495729  0   111.428796  0   126.708421  0   20   15   24  3.0  1.0    5 O2 S1 N2 O1
  N      1.6102226  0   112.985425  0  -115.367010  0   20   15   24  3.0  1.0    6 N1 S1 N2 O1
  C      1.3330201  0   119.009846  0    -2.944580  1   23   17   13  3.0  1.0   20 C8 N5 C7 N4
  H      0.7875800  0   111.997720  0    59.603967  0   26   20   15  6.0  1.0   21 H1 N1 S1 N2
  H      0.8628448  0   115.250209  0  -141.557264  0   26   20   28  6.0  1.0   22 H2 N1 S1 H1
  N      1.3393007  0   125.228760  0    -1.576341  0   27   23   17  3.0  1.0   11 N6 C8 N5 C7
  N      1.3393308  0   118.202264  0  -179.624495  0   27   23   30  3.0  1.0   12 N7 C8 N5 N6
  H      0.8683326  0   116.238032  0     7.990260  0   30   27   23  6.0  1.0   32 H12 N6 C8 N5
  H      0.8584227  0   119.641148  0   160.980386  0   30   27   32  6.0  1.0   33 H13 N6 C8 H12
  H      0.8440167  0   123.549148  0   177.005037  0   31   27   23  6.0  1.0   34 H14 N7 C8 N5
  H      0.8335209  0   114.791543  0  -157.863738  0   31   27   34  6.0  1.0   35 H15 N7 C8 H14

Troubleshooting Z-matrices

A commonly encountered error in GALLOP is when a Z-matrix has torsion angles to be refined that are defined in terms of one or more hydrogen atoms. This error is also encountered when using Mogul or MDB for torsion angle restraints in DASH. To fix this issue, use the following steps:

Produce a CIF of the structure from which the Z-matrix is being generated
Reorder the atoms in the CIF such that all hydrogen atoms are listed after all non-hydrogen atoms.
Regenerate the Z-matrices with DASH / MakeZmatrix.exe

Other programs can in principle be used to produce Z-matrices suitable for GALLOP. For more information, see the gallop.z_matrix module documentation.

**Run GALLOP via the browser interface**

Cloud operation:

Use this Colab Notebook to try the GALLOP browser interface for free. You will need a Google account to run the notebook, and an ngrok authentication key, which can be obtained for free at https://ngrok.com/ Save a copy of the notebook to your own Google drive for easy access in the future.

Local operation:

Once GALLOP has been installed successfully, Windows users with Python added to their PATH environment variable can simply double click on the supplied gallop.bat file to automatically run the below command:

streamlit run .\gallop_streamlit.py

Alternatively, those using Anaconda should open the Anaconda prompt, then run the batch file by typing:

gallop

A browser window will be opened displaying the GALLOP interface:

**Run GALLOP via Python scripts / Jupyter notebooks**

Cloud operation:

Use this Colab Notebook to try GALLOP in Python mode for free. You will need a Google account to run the notebook. You can save a copy of the notebook to your own Google drive for easy access in the future.

Local operation:

In general, the script or notebook should:

Import the libraries needed
Create a GALLOP Structure object
Add data and Z-matrices to the Structure
Create a GALLOP Swarm object, and generate the initial positions of the particles
Define the settings needed for the local optimisation
Optionally find an appropriate learning_rate for the local optimiser
Have a loop that performs some number of local optimisation steps, followed by a swarm update step.

A simple example is given below, which will output a CIF of the best structure found after each iteration:

import time
from gallop.structure import Structure
from gallop.optim import local
from gallop.optim import Swarm

# Create a Structure object, then add data and Z-matrices
mystructure = Structure(name="Famotidine", ignore_H_atoms=True)
mystructure.add_data("./gallop/example_data/Famotidine.sdi", source="DASH")
mystructure.add_zmatrix("./gallop/example_data/FOGVIG03_1.zmatrix")

# Create swarm object and get the initial particle positions.
# This will initialise 10k particles, split into
# 10 independent swarms.
swarm = Swarm(mystructure, n_particles=10000, n_swarms=10)
external, internal = swarm.get_initial_positions()

# Get the minimiser settings and optionally modify them
minimiser_settings = local.get_minimiser_settings(mystructure)
minimiser_settings["n_iterations"] = 500
minimiser_settings["save_CIF"] = True

# Automatically set the learning rate (aka step size) for the local optimiser
lr = local.find_learning_rate(mystructure, external=external,
        internal=internal, minimiser_settings=minimiser_settings)
minimiser_settings["learning_rate"] = lr[-1]

# Set the total number of iterations for the GALLOP run
gallop_iters = 10

# The main GALLOP loop
start_time = time.time()
for i in range(gallop_iters):
    # Local optimisation of particle positions
    result = local.minimise(mystructure, external=external, internal=internal,
                run=i, start_time=start_time, **minimiser_settings)
    # Particle swarm update generates new positions to be optimised
    external, internal = swarm.update_position(result=result)
    # Print out the best chi2 value found by each subswarm
    print(swarm.best_subswarm_chi2)

For more sophisticated control of the optimisation, add the following imports:

import torch
from gallop import tensor_prep
from gallop import zm_to_cart
from gallop import intensities
from gallop import chi2
from gallop import files

Then replace the main GALLOP loop with something like:

# The main GALLOP loop
start_time = time.time()
for i in range(gallop_iters):
    tensors = tensor_prep.get_all_required_tensors(mystructure,
                            external=external, internal=internal)

    optimizer = torch.optim.Adam([tensors["zm"]["external"],
                                  tensors["zm"]["internal"]],
                                  lr=minimiser_settings["learning_rate"],
                                  betas=[0.9,0.9])
    local_iters = range(minimiser_settings["n_iterations"])

    for j in local_iters:
        # Zero the gradients before each iteration otherwise they accumulate
        optimizer.zero_grad()

        asymmetric_frac_coords = zm_to_cart.get_asymmetric_coords(**tensors["zm"])

        calculated_intensities = intensities.calculate_intensities(
                                asymmetric_frac_coords, **tensors["int_tensors"])

        chisquared = chi2.calc_chisqd(calculated_intensities, **tensors["chisqd_tensors"])

        # pytorch needs a scalar from which to calculate the gradients, here use
        # the sum of all values - as all of the chi-squared values are independent,
        # the gradients will be correctly propagated to the relevant DoFs.
        L = torch.sum(chisquared)

        # For the last iteration, don't step the optimiser, otherwise the chi2
        # value won't correspond to the DoFs
        if j != minimiser_settings["n_iterations"] - 1:
            # Backwards pass through computational graph gives the gradients
            L.backward()
            optimizer.step()
        # Print out some info during the runs
        if j == 0 or (j+1) % 10 == 0:
            print(i, j, chisquared.min().item())
    # Save the results in a dictionary which is expected by the files and swarm
    # functions. The tensors should be converted to CPU-based numpy arrays.
    result = {
        "external"     : tensors["zm"]["external"].detach().cpu().numpy(),
        "internal"     : tensors["zm"]["internal"].detach().cpu().numpy(),
        "chi_2"        : chisquared.detach().cpu().numpy(),
        "GALLOP Iter"  : i
        }
    # Output a CIF of the best result
    files.save_CIF_of_best_result(mystructure, result, start_time)
    # Swarm update step
    external, internal = swarm.update_position(result=result, verbose=False)

The local optimiser can be replaced with any pytorch-compatible optimiser. GALLOP includes as a dependency torch-optimizer, which provides a wide variety of recently developed optimisers for use in machine learning applications. To try a different algorithm, simply replace the definition of optimizer. For example, to use the DiffGrad optimiser, add an import for torch-optimizer:

import torch_optimizer

And then define the optimiser:

optimizer = torch_optimizer.DiffGrad([tensors["zm"]["external"],
                                  tensors["zm"]["internal"]],
                                  lr=minimiser_settings["learning_rate"],
                                  betas=[0.9,0.9])

Swarm behaviour can also be modified by creating a new swarm class which inherits from the GALLOP Swarm. Then replace the relevant methods to obtain the behaviour desired. Most likely to be of interest is PSO_velocity_update(...). See Swarm code and comments for more information.

It is also trivial to replace the swarm element of GALLOP with another optimiser which can return numpy arrays for the external and internal degrees of freedom. For example, the ask-and-tell interface of the pycma implementation of CMA-ES could be used as an alternative to the particle swarm:

while not es.stop():
    solutions = es.ask()
    external = solutions[:,:mystructure.total_external_degrees_of_freedom]
    internal = solutions[:,mystructure.total_external_degrees_of_freedom:]
    result = local.minimise(mystructure, external=external, internal=internal,
                run=i, start_time=start_time, **minimiser_settings)
    # Optionally overwrite the starting positions
    # solutions = np.hstack([result["external"], result["internal"]])
    es.tell(solutions, result["chi_2"])
    es.disp()
es.result_pretty()

TPU use

Instructions coming soon

Local Installation

Some users may wish to make use of GALLOP locally. Whilst these instructions are specific to Windows, the libraries used are cross-platform and therefore it should be possible to run GALLOP on Linux or Mac OS environments - the colab notebooks linked above run on a Linux virtual machine for example. The below instructions assume a Windows-based system - the only expected major difference with other platforms will be the C++ build tools. Administrator privileges may be required.

Hardware requirements: For optimal performance, an Nvidia GPU is recommended. However, it may be possible to use some AMD GPUs, provided that ROCm is compatible with the GPU. This has not been tested - if using ROCm, please get in touch!

Older Nvidia GPUs may not work with PyTorch. Your GPU should have a compute capability of 3.5 or higher. See here to check your GPU. It is recommended that your GPU have a large amount of on-board memory (> 6 GB) in order to be able to tackle high-complexity structures in a reasonable time.

Windows (64-bit) + CUDA 12+ compatible GPU:

The simplest way to get started with GALLOP is to use the uv package management system. Please follow the instructions on the UV website to install it to your machine. You do not need to download and install CUDA separately. You will need approximately 10 GB of disk space available to proceed, and ideally local administrator privileges. Also make sure that you have C++ build tools installed - on Windows, this will be the visual C++ build tools which you can obtain from here.

Install uv according to the powershell instructions here.
Install git from here (do not use the 'thumbdrive edition').
Once uv and git are installed, download the GALLOP installer.bat and run-gallop.bat files to your machine.
Create a folder called "gallop" in a sensible location, e.g. C:\Users\username\gallop and copy the installer.bat and run-gallop.bat files to the new folder.
The first time you want to use gallop, you need to run the installer file. Simply double click the installer.bat file. If this is the first time you have used it on your machine, a Windows defender dialog may appear. To allow the script to run, click "more info" then "Run anyway".
Once the installer has finished running, you can then run gallop by double clicking the run-gallop.py file.
Confirm it works by running one of the example datasets. If GALLOP runs, and it makes use of your GPU (check your system monitor to confirm) then everything should be running smoothly.

Other environments:

On linux or mac (not tested) environments, it should be possible to install gallop into an environment of your choice by running:

git clone https://github.com/mspillman/gallop.git && cd gallop && pip install .

Regardless of platform, if C++ build tools are not available, attempting installation is likely to result in an error.

References and resources

If you make use of GALLOP in your work, please consider citing the following article:

CrystEngComm, 2021, 23, 6481 - 6485

Python Libraries

GALLOP makes use of a number of libraries, without which its development would have been significantly more challenging. In particular:

PyMatGen - the PyMatGen PXRD pattern calculator was used as the inspiration for what eventually became GALLOP. GALLOP still makes use of a number of PyMatGen features to handle things like space group symmetries.
PyTorch - the original code that eventually became GALLOP was originally written using numpy. PyTorch served as a near drop-in replacement that allowed automatic differentiation and running on GPUs/TPUs.
Streamlit - this allowed the browser-based GUI to be written entirely in python, which made it easier to integrate with the existing code.

Free and cheap GPU/TPU Resources

Kaggle - free access to GPUs and TPUs, run time capped at 30 hours / week. GPUs = Nvidia Tesla P100 (better than most of the free tier on colab)
Google Colaboratory offers free/cheap access to GPUs and TPUs. Colab Pro is ~$10 per month which gives priority access to more powerful GPU resources amongst other benefits.
Amazon SageMaker - free access to GPUs. Session time limited to 4 hours, but as far as I can see at the moment, there isn't a cap on the number of sessions you can run in a week. GPUs = Nvidia Tesla T4 (slightly faster than P100 for some workloads, but with a slightly lower memory of ~15 GB)
Paperspace - free access to some GPUs, plus various paid options for more powerful GPUs
Runpod.io - cheap access to high-end GPUs, with both community and data-centre cloud options available.
Vast.ai - cheap access to a wide variety of GPUs not offered by other services. Free trial credit is available to new users.

A wide variety of bigger commercial providers (e.g. AWS, GCP, Azure, Lambda Labs) are also available.

Name		Name	Last commit message	Last commit date
Latest commit History 483 Commits
gallop		gallop
.gitattributes		.gitattributes
.gitignore		.gitignore
CEC_Communication_GALLOP.pdf		CEC_Communication_GALLOP.pdf
GALLOP_banner.png		GALLOP_banner.png
LICENSE		LICENSE
README.md		README.md
help.md		help.md
installer.bat		installer.bat
pyproject.toml		pyproject.toml
run-gallop.bat		run-gallop.bat
screenshot.png		screenshot.png
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GALLOP

Introduction

Video showing browser interface in use

**Try GALLOP on a free cloud GPU**

Data preparation

Z-matrices

Troubleshooting Z-matrices

**Run GALLOP via the browser interface**

Cloud operation:

Local operation:

**Run GALLOP via Python scripts / Jupyter notebooks**

Cloud operation:

Local operation:

TPU use

Local Installation

References and resources

Other Relevant articles

Python Libraries

Free and cheap GPU/TPU Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

mspillman/gallop

Folders and files

Latest commit

History

Repository files navigation

GALLOP

Introduction

Video showing browser interface in use

Try GALLOP on a free cloud GPU

Data preparation

Z-matrices

Troubleshooting Z-matrices

Run GALLOP via the browser interface

Cloud operation:

Local operation:

Run GALLOP via Python scripts / Jupyter notebooks

Cloud operation:

Local operation:

TPU use

Local Installation

References and resources

Other Relevant articles

Python Libraries

Free and cheap GPU/TPU Resources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

**Try GALLOP on a free cloud GPU**

**Run GALLOP via the browser interface**

**Run GALLOP via Python scripts / Jupyter notebooks**

Packages