DENSS: DENsity from Solution Scattering
Author: Thomas Grant | email: email@example.com
If you use DENSS in your work, please cite:
Grant, Thomas D. (2018). Ab initio electron density determination directly from solution scattering data. Nature Methods. http://dx.doi.org/10.1038/nmeth.4581.
DENSS.org is the official home of DENSS. Packed with detailed instructions for installing and running DENSS on your own computer. It also contains useful tips and best practices.
Try out DENSS without installing the code using the DENSSWeb server to perform simple online calculations suitable for most cases. (N is limited to 32 samples for efficiency to allow many users to try it out). For more complex particle shapes, install DENSS and EMAN2 (see below). Thanks to Andrew Bruno and the CCR for DENSSWeb!
New MEMBRANE mode
DENSS now has a new mode for membrane proteins. Membrane proteins are often solubilized in detergents or lipid nanodiscs. The hydrophobic regions of these molecules often have lesser scattering density than the bulk solvent, resulting in a negative contrast relative to the solvent. The default setting for DENSS enforces a positivity restraint that will not allow any density to be negative. While this is appropriate for most standard biomolecules such as proteins and nucleic acids, is it not appropriate for molecules containing regions of negative contrast. To accommodate this scenario, there is a new MEMBRANE mode in addition to the previously available FAST and SLOW modes. This mode disables the positivity restraint and starts shrink-wrap immediately.
New symmetry averaging feature
A new feature in denss v1.4.6 allows for the use of symmetry if known. The options for imposing symmetry are --ncs, --ncs_axis, and --ncs_steps. Currently only symmetry along a single axis is supported, though multiple axes will be supported in the future. Symmetry is imposed by first aligning the principal axes of inertia with the XYZ axes (largest to smallest). Then symmetry averaging is performed along the selected axis at the given step(s). More frequent steps makes for stronger restraint, but consequently more bias. Can select a different axis (in case the largest principal axis is not the symmetry axis). Note that the averaging procedure is still symmetry agnostic. Also, you may need to manually filter sets of maps in case the wrong symmetry axis was chosen in some cases, then perform averaging separately with the denss.align_and_average.py script.
New script for performing simple operations on MRC files
A new script,
denss.mrcops.py, is included in v1.4.5 that includes handy
tools for resampling or reshaping an MRC formatted electron density map.
New refinement script
A new script called
denss.refine.py is available for refining an averaged
electron density map. Final averaged maps from denss.all.py or superdenss are
unlikely to have scattering profiles matching the data (since they are an average
from many different maps). To potentially improve the results, one can input
the averaged map to denss.refine.py using the --rho_start option.
works similarly to denss.py, and will take the same .out or .dat file used
in the original runs of denss.py. This script is still in testing, so use at your
own risk and let me know if you run into bugs or issues.
New Averaging Procedure (beta)
A new procedure for aligning and averaging electron density maps with DENSS is now available with v1.4.3. The new procedure is written entirely in Python from the ground up and no longer requires EMAN2 to be installed. The new procedure can be accessed with a series of new scripts, named as denss.xxx.py where xxx is the name of the script. denss.all.py acts as the old superdenss, running twenty reconstructions, aligning and averaging all maps, including enantiomer generation and selection. The new scripts are still in testing, so use at your own risk. Old scripts using the EMAN2 averaging procedure are still available. Manuals for the new scripts coming soon. Thanks to intern Nhan Nguyen for helping to write the new code.
New interactive GUI for fitting data
A new script (denss.fit_data.py) is provided with DENSS v1.3.0 for calculating smooth fits to experimental data using a convenient interactive GUI.
New script for calculating profiles from MRC files
A new script (denss.rho2dat.py) is provided with DENSS v1.3.0 for calculating scattering profiles from MRC files.
DENSS is an algorithm used for calculating ab initio electron density maps directly from solution scattering data. DENSS implements a novel iterative structure factor retrieval algorithm to cycle between real space density and reciprocal space structure factors, applying appropriate restraints in each domain to obtain a set of structure factors whose intensities are consistent with experimental data and whose electron density is consistent with expected real space properties of particles.
DENSS utilizes the NumPy Fast Fourier Transform for moving between real and reciprocal space domains. Each domain is represented by a grid of points (Cartesian), N x N x N. N is determined by the size of the system and the desired resolution. The real space size of the box is determined by the maximum dimension of the particle, D, and the desired sampling ratio. Larger sampling ratio results in a larger real space box and therefore a higher sampling in reciprocal space (i.e. distance between data points in q). Smaller voxel size in real space corresponds to higher spatial resolution and therefore to larger q values in reciprocal space.
The core functions are stored in the
saxstats.py module. The actual script
to run DENSS is
DENSS can be installed by typing at the command prompt in the directory where you downloaded DENSS:
python setup.py install
DENSS requires that Python 2.7, NumPy (minimum v1.10.0) and SciPy are installed. These packages are are often installed by default, or are available for your operating system using package managers such as PIP or Anaconda. The current code was built using the Anaconda package management system on Mac OS X 10.11. If using Anaconda, install the Python 2.7 version. Alternatively you can have two separate python environments under Anaconda by installing the latest Python 3 version of Anaconda, and create a separate Python 2.7 environment by typing at the command prompt:
conda create -n python2 python=2.7 anaconda
Afterwards you can enable the Python 2.7 environment by typing the following during any terminal session:
source activate python2
Once the Python 2.7 environment is enabled, you can run denss.py in that terminal shell.
DENSS uses smooth fits to experimental scattering profiles (rather than the noisy experimental data). Two file formats are currently acceptable: .dat files or .out files. Files with .dat extensions are expected to be the smoothed (i.e. fitted) curve.
A script called
denss.fit_data.py is provided which can be used to fit experimental data
with a smooth curve based on an extended version of Peter Moore's approach (Moore 1979)
using a trigonometric series. The denss.fit_data.py script includes a simple interactive
GUI for selecting Dmax and the smoothing factor alpha and displays the experimental
data, the smooth fit to the data, and the real space pair distribution function.
denss.fit_data.py will save a .dat file containing the smooth fit to the data which can
then be used as input to denss.py (see below). Additionally, useful parameters
calculated from the fit, such as the radius of gyration and Porod volume, are displayed.
The manuscript describing the mathematical derivation and the algorithm of this
new approach is currently in preparation.
denss.fit_data.py can be run simply from the command line as:
denss.fit_data.py -f <experimental_data.dat>
where <experimental_data.dat> is the noisy scattering profile, given as a three-column
ASCII text file with columns q, I, error. An interactive GUI will appear showing
the experimental scattering profile on the left along with the fit to the data,
and the associated pair distribution function (P(r)) on the right. Two interactive
sliders on the bottom left can be adjusted for Dmax (the maximum particle dimension)
and the alpha smoothing factor. See
denss.fit_data.py -h for more options.
DENSS uses the smoothed curve fit to the experimental data and extrapolated to q = 0, i.e. I(0). You can also use any other smooth and extrapolated curve, such as the output from FoXS or CRYSOL, as long as it is supplied in a three column ASCII text file with columns q, I, error where q is given as 4 pi sin(theta)/lambda in angstroms, I is scattering intensity, and error is the error on the intensity.
lysozyme.out is a GNOM .out file from real lysozyme data.
6lyz.dat is a
simulated scattering profile from lysozyme PDB 6LYZ using FoXS.
a GNOM .out file created from the
6lyz.dat data file. Any of these files can
be used as input to DENSS for testing.
DENSS can be run with basic defaults for an outfile:
denss.py -f <saxs.out>
In this case, DENSS uses the maximum dimension from the .out file. You can override this maximum dimension by specifying the -d parameter. If you are using a .dat file you must specify the -d parameter, as it is not contained in the file:
denss.py -f <saxs.dat> -d <estimated maximum dimension>
For example, using the supplied lysozyme.out data, DENSS can be run with:
denss.py -f lysozyme.out
While using the 6lyz.dat data it can be run as:
denss.py -f 6lyz.dat -d 50.0
On Windows, depending on your setup you may need to type:
python C:\path\to\denss.py -f 6lyz.out
Options you may want to set are:
-h, --help show this help message and exit -f FILE, --file FILE SAXS data file for input (either .dat or .out) -d DMAX, --dmax DMAX Estimated maximum dimension (Default=100) -v VOXEL, --voxel VOXEL Set desired voxel size, setting resolution of map -os OVERSAMPLING, --oversampling OVERSAMPLING Sampling ratio (Default=3) -n NSAMPLES, --nsamples NSAMPLES Number of samples, i.e. grid points, along a single dimension. (Sets voxel size, overridden by --voxel. Best optimization with n=power of 2. Default=64) --ne NE Number of electrons in object (Default=10,000) -ncs NCS, --ncs NCS Rotational symmetry -ncs_steps NCS_STEPS [NCS_STEPS ...], --ncs_steps NCS_STEPS [NCS_STEPS ...] List of steps for applying NCS averaging (default=3000) -ncs_axis NCS_AXIS, --ncs_axis NCS_AXIS Rotational symmetry axis (options: 1, 2, or 3 corresponding to xyz principal axes) -s STEPS, --steps STEPS Maximum number of steps (iterations) -o OUTPUT, --output OUTPUT Output map filename -m MODE, --mode MODE Mode. F(AST) sets default options to run quickly for simple particle shapes. S(LOW) useful for more complex molecules. (default SLOW)
By default DENSS runs in SLOW mode, which is generally suitable for the vast majority of particles, including those with complex shapes. You can override all the default parameters set by the SLOW mode by explicitly setting any of the options.
Additional advanced options are can be seen by typing
As the program runs, the current status will be printed to the screen like so:
Step Chi2 Rg Support Volume ----- --------- ------- -------------- 2259 1.31e+00 14.34 42135
Step represents the number of iterations so far,
Chi2 is the fit of
the calculated scattering of the map to the experimental data,
Rg is the
radius of gyration calculated directly from the electron density map, and
Support Volume is the volume of the support region.
Electron density maps are written in CCP4/MRC format (credit Andrew Bruno) and optionally as Xplor ASCII text format (with the --write_xplor option enabled). These files can be opened directly in some visualization programs such as Chimera and PyMOL. In particular, the PyMOL "volume" function is well suited for displaying these maps with density information displayed as varying color and opacity. Maps can be converted to other formats using tools such as the Situs map2map tool.
Output files include:
output.mrc electron density map (MRC format) output_support.mrc final support volume formatted as unitary electron density map output_stats_by_step.dat statistics as a function of step number. three columns: chi^2, Rg, support volume output_map.fit The fit of the calculated scattering profile to the experimental data. Experimental data has been interpolated to the q values used for scaling intensities and I(0) has been scaled to the square of the number of electrons in the particle. Columns are: q(data), I(data), error(data), q(calc), I(calc) output_*.png If plotting is enabled, these are plots of the results. output.log A log file containing parameters for the calculation and summary of the results.
Alignment, Averaging, and Resolution Estimation
The solutions are non-unique, meaning many different electron density maps will yield the same scattering profile. Different random starting points will return different results. Therefore, running the algorithm many times (>20) is strongly advised.
There now exists two options for performing alignment and averaging. The older option uses EMAN2, the newer option is now built into the latest versions of DENSS (v1.4.1).
New Built-in Method
The new built-in option should run on Mac, Linux and Windows systems (please email me with bugs), and requires no additional programs or modules to be installed (just the already required NumPy and SciPy modules). The built-in method is fully parallelized for taking advantage of multicore machines.
denss.all.py is the primary script for running the full pipeline of DENSS,
including running multiple runs of DENSS (default = 20), aligning, selecting
enantiomers, averaging, and estimating resolution. To run the defaults, which
should be suitable for most applications, simply type:
$ denss.all.py -f 6lyz.out
If you would like to use multiple cores for parallel processing, simply add the -j option:
$ denss.all.py -f 6lyz.out -j 4
for example to run on 4 cores. All options available to denss.py can also be passed
denss.all.py. Some additional options exist as well. Type
denss.all.py -h to
view all of the options available.
Several helper scripts are also supplied for performing various tasks:
denss.align.py - aligns electron density maps to a reference (MRC or PDB file)
denss.align2xyz.py - aligns electron density maps to the XYZ axes
denss.align_by_principal_axes.py - aligns electron density maps to a reference
(MRC or PDB), but performs no minimization.
denss.align_and_average.py - aligns and averages a set of electron density maps
denss.average.py - averages a set of pre-aligned electron density maps
denss.calcfsc.py - calculates the Fourier Shell Correlation curve between two
pre-aligned electron density maps, and estimates resolution.
denss.pdb2mrc.py - calculates an electron density map from a PDB file.
denss.get_info.py - prints basic information about an MRC file, to be used
with denss.pdb2mrc.py, for example, to set box sizes, voxels, etc.
denss.rho2dat.py - calculates a solution scattering profile from an electron density map.
denss.mrcops.py - performs basic operations on MRC file, such as resampling
an electron density map to have a new size or shape.
The older option for alignment and averaging requires installation of EMAN2. Installing is pretty straightforward on Unix systems. Some users have noted difficulty with installation of EMAN2 on Windows systems.
To install EMAN2, download the appropriate EMAN2 binary from this page Then, change to the folder where you would like to install EMAN2 (such as your home folder) and move the eman2 script you downloaded to that directory. Then execute the script. For example:
$ cd $HOME $ mv $HOME/Downloads/eman2.21.MacOS.sh . $ bash eman2.21.MacOS.sh
But, obviously, use the correct filenames and paths for your platform. If you have issues getting EMAN2 installed, see their installation page.
A bash script is provided called
superdenss that runs the EMAN2 pipeline automatically
in parallel assuming EMAN2 and gnu parallel are all installed. To run superdenss
with the default parameters for denss.py, type:
superdenss -f 6lyz.out
superdenss also takes its own options, as well as all of the options accepted by denss.py. The following options are available for superdenss (accessible with the -h option):
------------------------------------------------------------------------------ superdenss is a simple wrapper for denss that automates the process of generating multiple density reconstructions and averaging them with EMAN2. -f: filename of .out GNOM file or .dat solution scattering data -o: the output prefix to name the output directory and all the files. -i: input options for denss exactly as they would be given to denss, including dashed options. Enclose everything in quotes. Dont include --file or --output. -n: the number of reconstructions to run (default 20) -j: the number of cores to use for parallel processing (defaults to ncores - 1) -e: generate and select enantiomers (significantly increases runtime, default=no) -----------------------------------------------------------------------------
For example, to run superdenss while checking for the best enantiomers, type:
superdenss -f 6lyz.dat -e
If you would like to edit the parameters of denss.py, pass them into the -i option of superdenss. This is admittedly a little awkward to use to pass the arguments correctly as you must pass the denss.py options enclosed in double quotes after the -i option: To run the above example using a .dat file (note the quotes in the command line):
superdenss -f 6lyz.dat -e -i " -d 50.0 " -o lysozyme
This will create a folder named "lysozyme" with all the output files for each of the 20 individual runs of DENSS and a folder named spt_avg_01 which will contain the final averaged density. You can add other denss.py options into the -i option in in between the quotes.
The combination of total real space box size (D x oversampling) divided by voxel size determines N. The number of grid points scales as N^3, so large N typically requires long compute times and lots of memory (tests show N>50 may start to slow things down noticeably). Preliminary tests have shown oversampling as low as 2 is often sufficient for accurate reconstruction. However, lesser oversampling also results in low sampling of scattering profile, so direct comparison with experimental data becomes more difficult. Note that D given by the user is only used to determine the size of the box and does not limit the the electron density of the object by default. If one wishes to impose D as a limit, enable --limit_dmax_on (off by default).
While the NumPy implementation of FFT is most efficient when N is a power of two, considerable performance gains can still be gained when N is not a power of two, and there is no requirement in DENSS for N to equal a power of two.
The electron density map is initially set to be random based on the random seed
selected by the program. One can therefore exactly reproduce the results of a previous
calculation by giving the random seed to the program with the
--seed option and
the same input parameters. The parameters of previous runs can all be found in the log file.
denss.rho2dat.py file can be used for calculating scattering profiles from MRC formatted
electron density maps. Currently the input maps must be cubic (i.e. same length
and shape on all sides). Type
denss.rho2dat.py -h for more options.