Designing automacs
Automacs is tool for running GROMACS simulations in a systematic way. I sometimes call it a “reverse” API (application programming interface) because it manipulates GROMACS from the outside in a predicatable way and substitutes for a traditional API which might expose GROMACS commands to a language like Python. In fairness, the well-organized GROMACS command line tools already function as a decent API, and automacs contributes a set of extra functionality that helps users run many simulations in a predicatable way. The project evolved from some simple BASH, Perl, and Python scripts which myself, and other members of the Radhakrishnan lab have used in our research.
In this article I would like to summarize the key design features we have used in automacs in order to give a high-level view of which features we have found useful.
Automacs runs from the terminal
While automacs can deployed in a Django-based interface provided by the factory, it is typically run from the terminal. It includes a makefile
which routes commands to python functions. There are many different modules for exposing python functions to the command line, however our somewhat unorthodox solution has two major advantages over these. First, it uses the omnipresent make
utility and hence requires no centralized installation, no manipulation of your PATH
, and no use of ./bin/program
to run the program. Second, this makefile trick automatically exposes python functions to arguments and keyword arguments sent through the makefile, which means that the interface is inferred, and not tediously written. This makes it easy to add new functions to the command line.
Automacs is “assembled” and not installed
Automacs is distributed as a git repository which is also designed to clone other repositories for running specific simulations. Most automacs experiments can be run with two commands. The first command e.g. make setup proteins
gets a set of extra codes and the second command runs the simulation via e.g. make go protein clean
. Without a central installation, updates to the code are available for any new simulations without running a costly or clusmy update procedure. Since each new simulation gets fresh code from github, we have been able to rapidly extend and develop the code with separate repositories.
An automacs experiment is written to a simple data structure which points to a script that contains functions like solvate
or equilibrate
. These functions are “magically” imported and distributed to automacs code modules. Each experiment can ask that extra modules are imported and distributed. Since automacs pulls down extra repositories for specific projects, and these codes can be maintained by other users, the magic import scheme saves the effort required to write many tedious import statements from different paths. Users are free to override the core functions if they prefer a different naming scheme. We have fragmented the code into separate repositories so that development is unsupervised. You can add functions to automacs without a pull request, and share codes with new users by posting it on github.
As with the makefile
interface described above, the use of multiple repositories is a quirky choice. We could have easily used development branches, a central repository, and pull requests. Instead, we chose a decentralized approach to develop the code rapidly and lower the barriers to alternate, more creative use cases.
Simulations can manipulate a shared state variable
The magic import scheme also piggybacks two special variables, called settings
and state
to any imported function. These variables are special dictionaries (you can access keys with dots to avoid excessive variable['key']
syntax) that supply the original simulation settings as well as the “state” of the simulation so that different construction steps can keep track of key variables, i.e. composition, size, and associated files.
Each simulation is divided into steps, some of which are modular. For example, any lipid bilayer can be made larger with a generic multiply
step that works on both atomistic and coarse-grained simulations. Each step has data written to a subfolder i.e. s01-protein
and s02-adhere
and the state is preserved in a JSON file. Subsequent steps can access the state of the previous steps by automatically reading the “history” of the simulation states. Common features of the simulation, namely the composition, are read and modified with a single function (component
).
Calling GROMACS
Standard functions, like solvate
, typically call many GROMACS binaries using a very simple interface based on Python’s subprocess module. Gromacs commands are called from a wrapper function called gmx
which converts keyword arguments into the right BASH call. The system calls are automatically logged, and in fact every function run from automacs is written to a log so that users have a written record of the simulation procedure. The format for GROMACS calls is set by a template called gmx_call_templates
which can be modified or overridden in case users want to deviate from the usual format of the calls.
We tend to use very explicit names for stages of the simulation i.e. em-solvate-steep.mdp
for the steepest descent minimization of the solvated structure. We always log standard output and error to a text file, particularly since the simulations can be run in the background by adding the back
flag to the terminal call. This scheme also allows users to set global flags for processors and GPU usage. Gromacs binaries can easily be mapped to different names for use on clusters by using a centralized configuration.
Organized integrator parameters
Integrator parameters are stored in a large nested dictionary in a single python file (parameters.py
) which allows users to more elegantly request simulation settings. For example, users running a coarse-grained simulation with weak pressure coupling only need to use a few keywords to get the standard parameters, instead of keeping track of a lengthy set of integrator parameters.
Automacs resembles a framework
This brief sketch outlines the key features of automacs, however there are many other bells and whistles which we have included to make the package more comprehensive, from automatic upload/download functions for sending data to clusters, the “experiment” files which allow you to concisely override existing, validated experiments, and a set of geometry functions which are useful for manipulating three dimensional structures.
The most important feature of the package is the combination of design elements that make it possible to rapidly prototype and deploy codes. At present, these elements are somewhat unintuitive and require some amount of training or guidance from us. We expect that even if GROMACS receives a proper API, the flexible ecosystem we have designed in automacs will be useful to other users who wish to write more durable codes.
Automacs is also tightly integrated with the factory and is currently tested inside Docker containers. The unit tests are summarized on the BioPhysCode portal.