designing automacs

Automacs is tool for running GROMACS simulations in a systematic way. I sometimes call it a “reverse” API (application programming interface) because it manipulates GROMACS from the outside in a predicatable way and substitutes for a traditional API which might expose GROMACS commands to a language like Python. In fairness, the well-organized GROMACS command line tools already function as a decent API, and automacs contributes a set of extra functionality that helps users run many simulations in a predicatable way. The project evolved from some simple BASH, Perl, and Python scripts which myself, and other members of the Radhakrishnan lab have used in our research.

In this article I would like to summarize the key design features we have used in automacs in order to give a high-level view of which features we have found useful.

Automacs runs from the terminal

While automacs can deployed in a Django-based interface provided by the factory, it is typically run from the terminal. It includes a makefile which routes commands to python functions. There are many different modules for exposing python functions to the command line, however our somewhat unorthodox solution has two major advantages over these. First, it uses the omnipresent make utility and hence requires no centralized installation, no manipulation of your PATH, and no use of ./bin/program to run the program. Second, this makefile trick automatically exposes python functions to arguments and keyword arguments sent through the makefile, which means that the interface is inferred, and not tediously written. This makes it easy to add new functions to the command line.

Automacs is “assembled” and not installed

Automacs is distributed as a git repository which is also designed to clone other repositories for running specific simulations. Most automacs experiments can be run with two commands. The first command e.g. make setup proteins gets a set of extra codes and the second command runs the simulation via e.g. make go protein clean. Without a central installation, updates to the code are available for any new simulations without running a costly or clusmy update procedure. Since each new simulation gets fresh code from github, we have been able to rapidly extend and develop the code with separate repositories.

An automacs experiment is written to a simple data structure which points to a script that contains functions like solvate or equilibrate. These functions are “magically” imported and distributed to automacs code modules. Each experiment can ask that extra modules are imported and distributed. Since automacs pulls down extra repositories for specific projects, and these codes can be maintained by other users, the magic import scheme saves the effort required to write many tedious import statements from different paths. Users are free to override the core functions if they prefer a different naming scheme. We have fragmented the code into separate repositories so that development is unsupervised. You can add functions to automacs without a pull request, and share codes with new users by posting it on github.

As with the makefile interface described above, the use of multiple repositories is a quirky choice. We could have easily used development branches, a central repository, and pull requests. Instead, we chose a decentralized approach to develop the code rapidly and lower the barriers to alternate, more creative use cases.

Simulations can manipulate a shared state variable

The magic import scheme also piggybacks two special variables, called settings and state to any imported function. These variables are special dictionaries (you can access keys with dots to avoid excessive variable['key'] syntax) that supply the original simulation settings as well as the “state” of the simulation so that different construction steps can keep track of key variables, i.e. composition, size, and associated files.

Each simulation is divided into steps, some of which are modular. For example, any lipid bilayer can be made larger with a generic multiply step that works on both atomistic and coarse-grained simulations. Each step has data written to a subfolder i.e. s01-protein and s02-adhere and the state is preserved in a JSON file. Subsequent steps can access the state of the previous steps by automatically reading the “history” of the simulation states. Common features of the simulation, namely the composition, are read and modified with a single function (component).

Calling GROMACS

Standard functions, like solvate, typically call many GROMACS binaries using a very simple interface based on Python’s subprocess module. Gromacs commands are called from a wrapper function called gmx which converts keyword arguments into the right BASH call. The system calls are automatically logged, and in fact every function run from automacs is written to a log so that users have a written record of the simulation procedure. The format for GROMACS calls is set by a template called gmx_call_templates which can be modified or overridden in case users want to deviate from the usual format of the calls.

We tend to use very explicit names for stages of the simulation i.e. em-solvate-steep.mdp for the steepest descent minimization of the solvated structure. We always log standard output and error to a text file, particularly since the simulations can be run in the background by adding the back flag to the terminal call. This scheme also allows users to set global flags for processors and GPU usage. Gromacs binaries can easily be mapped to different names for use on clusters by using a centralized configuration.

Organized integrator parameters

Integrator parameters are stored in a large nested dictionary in a single python file (parameters.py) which allows users to more elegantly request simulation settings. For example, users running a coarse-grained simulation with weak pressure coupling only need to use a few keywords to get the standard parameters, instead of keeping track of a lengthy set of integrator parameters.

Automacs resembles a framework

This brief sketch outlines the key features of automacs, however there are many other bells and whistles which we have included to make the package more comprehensive, from automatic upload/download functions for sending data to clusters, the “experiment” files which allow you to concisely override existing, validated experiments, and a set of geometry functions which are useful for manipulating three dimensional structures.

The most important feature of the package is the combination of design elements that make it possible to rapidly prototype and deploy codes. At present, these elements are somewhat unintuitive and require some amount of training or guidance from us. We expect that even if GROMACS receives a proper API, the flexible ecosystem we have designed in automacs will be useful to other users who wish to write more durable codes.

Automacs is also tightly integrated with the factory and is currently tested inside Docker containers. The unit tests are summarized on the BioPhysCode portal.