DENSS: DENsity from Solution Scattering
Author: Thomas Grant | email: firstname.lastname@example.org
If you use DENSS in your work, please cite:
Grant, Thomas D. (2018). Ab initio electron density determination directly from solution scattering data. Nature Methods. http://dx.doi.org/10.1038/nmeth.4581.
tdgrant.com/denss is the official home of DENSS. Packed with detailed instructions for installing and running DENSS on your own computer. It also contains useful tips and best practices.
Try out DENSS without installing the code using the DENSSWeb server. DENSSWeb recently underwent a major update (March 2021). Now run DENSSWeb in standard Slow and Membrane modes suitable for publication (in addition to the previously available Fast mode). Many file types are now accepted, including raw data. You can also enforce symmetry. Thanks to Andrew Bruno and the CCR for DENSSWeb!
New automatic Dmax estimation and fitting of raw data
When provided raw experimental data, DENSS will now automatically estimate the maximum dimension and fit the profile with a smooth curve. This uses the same algorithm used in denss.fit_data.py for fitting. Additionally, denss.fit_data.py now outputs a *.fit file with header information including the Dmax. denss.py can now take this *.fit as input and read the Dmax value directly from the header, thus removing the need to expressly set the -d option in denss.py. Users are still advised to manually run denss.fit_data.py to ensure accurate Dmax and fits are obtained. These features are available in version 1.6.1+.
New DENSS version 1.6.0 released
Shrinkwrap in DENSS has been updated to be more robust, particularly for particles with negative contrast used in MEMBRANE mode. Several bug fixes. Thanks to intern Esther Gallmeier!
New algorithm for denss.pdb2mrc.py
A new, faster and more accurate method for calculating density from PDB files is included. This method calculates density from atomic coordinates in real space as a 5-term Gaussian summation using Cromer-Mann coefficients.
DENSS now updated for GPU acceleration with CUDA
A new --gpu option when running denss.py v1.5.0+ with NVIDIA CUDA capable graphics cards is available. Tests show this option speeds DENSS up by more than 20x compared to CPU-only. Requires CuPy to be installed. Will attempt to update this in the future for more GPUs. Thanks to intern Esther Gallmeier!
DENSS is now python3 compatible!
DENSS has been updated to support python 3 in addition to python 2. I will attempt to maintain python 2 support for as long as possible.
New MEMBRANE mode
DENSS now has a new mode for membrane proteins. Membrane proteins are often solubilized in detergents or lipid nanodiscs. The hydrophobic regions of these molecules often have lesser scattering density than the bulk solvent, resulting in a negative contrast relative to the solvent. The default setting for DENSS enforces a positivity restraint that will not allow any density to be negative. While this is appropriate for most standard biomolecules such as proteins and nucleic acids, is it not appropriate for molecules containing regions of negative contrast. To accommodate this scenario, there is a new MEMBRANE mode in addition to the previously available FAST and SLOW modes. This mode disables the positivity restraint and starts shrink-wrap immediately.
New symmetry averaging feature
A new feature in denss v1.4.6 allows for the use of symmetry if known. The options for imposing symmetry are --ncs, --ncs_axis, and --ncs_steps. Currently only symmetry along a single axis is supported, though multiple axes will be supported in the future. Symmetry is imposed by first aligning the principal axes of inertia with the XYZ axes (largest to smallest). Then symmetry averaging is performed along the selected axis at the given step(s). More frequent steps makes for stronger restraint, but consequently more bias. Can select a different axis (in case the largest principal axis is not the symmetry axis). Note that the averaging procedure is still symmetry agnostic. Also, you may need to manually filter sets of maps in case the wrong symmetry axis was chosen in some cases, then perform averaging separately with the denss.align_and_average.py script.
New script for performing simple operations on MRC files
A new script,
denss.mrcops.py, is included in v1.4.5 that includes handy
tools for resampling or reshaping an MRC formatted electron density map.
New refinement script
A new script called
denss.refine.py is available for refining an averaged
electron density map. Final averaged maps from denss.all.py or superdenss are
unlikely to have scattering profiles matching the data (since they are an average from many different maps). To refine the averaged map so that its scattering profile matches the data, one can input the averaged map to denss.refine.py using the --rho_start option.
denss.refine.py works similarly to denss.py, and will take the same input file used in the original run of denss.all.py.
New Averaging Procedure
A new procedure for aligning and averaging electron density maps with DENSS is now available with v1.4.3. The new procedure is written entirely in Python from the ground up and no longer requires EMAN2 to be installed. The new procedure can be accessed with a series of new scripts, named as denss.xxx.py where xxx is the name of the script. denss.all.py acts as the old superdenss, running twenty reconstructions, aligning and averaging all maps, including enantiomer generation and selection. Thanks to intern Nhan Nguyen for helping to write the new code.
New interactive GUI for fitting data
A new script (denss.fit_data.py) is provided with DENSS v1.3.0 for calculating smooth fits to experimental data using a convenient interactive GUI.
New script for calculating profiles from MRC files
A new script (denss.rho2dat.py) is provided with DENSS v1.3.0 for calculating scattering profiles from MRC files.
DENSS is an algorithm used for calculating ab initio electron density maps directly from solution scattering data. DENSS implements a novel iterative structure factor retrieval algorithm to cycle between real space density and reciprocal space structure factors, applying appropriate restraints in each domain to obtain a set of structure factors whose intensities are consistent with experimental data and whose electron density is consistent with expected real space properties of particles.
DENSS utilizes the NumPy Fast Fourier Transform for moving between real and reciprocal space domains. Each domain is represented by a grid of points (Cartesian), N x N x N. N is determined by the size of the system and the desired resolution. The real space size of the box is determined by the maximum dimension of the particle, D, and the desired sampling ratio. Larger sampling ratio results in a larger real space box and therefore a higher sampling in reciprocal space (i.e. distance between data points in q). Smaller voxel size in real space corresponds to higher spatial resolution and therefore to larger q values in reciprocal space.
The core functions are stored in the
saxstats.py module. The actual script
to run DENSS is
DENSS can be installed by typing at the command prompt in the directory where you downloaded DENSS:
python setup.py install
DENSS requires that Python (2.7/3.6/3.7/3.8), NumPy (v1.10+) and SciPy are installed. These packages are are often installed by default, or are available for your operating system using package managers such as PIP or Anaconda. The current code was built using the Anaconda package management system on Mac OS X.
DENSS uses smooth fits to experimental scattering profiles (rather than the noisy experimental data). Multiple file formats are currently acceptable: .dat files (3-column ASCII text files: q, I, error), .fit files (4-columns: q, I, error, fit), or .out files (GNOM). DENSS will check if the data in .dat format are raw or smoothed data. If raw, DENSS will roughly estimate Dmax and fit the data with a smooth curve. However, it is best to use the denss.fit_data.py script described below. If a *.fit is given (the output of denss.fit_data.py), then Dmax will be read directly from the header of that file. DENSS will extract Dmax from files if possible, otherwise will estimate it automatically.
A script called
denss.fit_data.py is provided which can be used to fit experimental data with a smooth curve based on an extended version of Peter Moore's approach (Moore 1979) using a trigonometric series. The denss.fit_data.py script includes a simple interactive GUI for selecting Dmax and the smoothing factor alpha and displays the experimental data, the smooth fit to the data, and the real space pair distribution function. denss.fit_data.py will save a .fit file containing the smooth fit to the data which can then be used as input to denss.py (see below). Additionally, useful parameters calculated from the fit, such as the radius of gyration and Porod volume, are displayed. The manuscript describing the mathematical derivation and the algorithm of this new approach is currently in preparation.
denss.fit_data.py can be run simply from the command line as:
denss.fit_data.py -f <experimental_data.dat>
where <experimental_data.dat> is the noisy scattering profile, given as a three-column ASCII text file with columns q, I, error. An interactive GUI will appear showing the experimental scattering profile on the left along with the fit to the data, and the associated pair distribution function (P(r)) on the right. Two interactive sliders on the bottom left can be adjusted for Dmax (the maximum particle dimension) and the alpha smoothing factor. See
denss.fit_data.py -h for more options. denss.fit_data.py will output two files, one containing the fitted, smooth scattering profile (ends with .fit) and one containing the corresponding P(r) curve (ends with pr.dat). Parameters such as Dmax, alpha, Rg, volume, etc. are printed to the terminal screen and also are written in the header of the .fit file. The .fit file can be used directly as input to denss.py, which will then read the Dmax value from the header.
lysozyme.out is a GNOM .out file from real lysozyme data.
6lyz.dat is a
simulated scattering profile from lysozyme PDB 6LYZ using FoXS.
a GNOM .out file created from the
6lyz.dat data file. Any of these files can
be used as input to DENSS for testing.
It is best to provide denss.py with a data file where q is defined as 4 pi sin(theta)/lambda in angstroms. However, since some beamlines provide data in 1/nm, the -u option of denss.py can be set to "nm" in that case.
DENSS can be run with basic defaults as follows:
denss.py -f <saxs.out>
In this case, DENSS uses the maximum dimension from the .out file. You can override this maximum dimension by specifying the -d parameter. Similarly you can provide a .fit file from denss.fit_data.py:
denss.py -f <saxs.fit>
If you provide a raw experimental profile that has not been fitted, denss.py will attempt to estimate Dmax for you from the data and fit the data with the same algorithm denss.fit_data.py uses. However, it is advised that you manually run denss.fit_data.py or GNOM to ensure accurate fit and estimation of Dmax.
Examples: using the supplied 6lyz.out data, DENSS can be run with:
denss.py -f 6lyz.out
On Windows, depending on your setup you may need to type:
python C:\path\to\denss.py -f 6lyz.out
By default DENSS assumes data are given in inverse angstroms (q = 4pisin(theta)/lambda, where 2*theta is the scattering angle and lambda is the wavelength of the incident beam). The -u option can be used set the scale to inverse nanometers (-u nm).
Options you may want to set are:
-h, --help show this help message and exit -f FILE, --file FILE SAXS data file for input (either .dat, .fit, or .out) -u UNITS, --units UNITS Angular units ("a" [1/angstrom] or "nm" [1/nanometer]; default="a") -d DMAX, --dmax DMAX Estimated maximum dimension (Default=100) -n NSAMPLES, --nsamples NSAMPLES Number of samples, i.e. grid points, along a single dimension. (Sets voxel size, overridden by --voxel. Best optimization with n=power of 2. Default=64) -ncs NCS, --ncs NCS Rotational symmetry -ncs_steps NCS_STEPS [NCS_STEPS ...], --ncs_steps NCS_STEPS [NCS_STEPS ...] List of steps for applying NCS averaging (default=3000) -ncs_axis NCS_AXIS, --ncs_axis NCS_AXIS Rotational symmetry axis (options: 1, 2, or 3 corresponding to xyz principal axes) -s STEPS, --steps STEPS Maximum number of steps (iterations) -o OUTPUT, --output OUTPUT Output map filename -m MODE, --mode MODE Mode. F(AST) sets default options to run quickly for simple particle shapes. S(LOW) useful for more complex molecules. M(EMBRANE) mode allows for negative contrast. (default SLOW)
By default DENSS runs in SLOW mode, which is generally suitable for the vast majority of particles, including those with complex shapes. You can override all the default parameters set by the SLOW mode by explicitly setting any of the options.
Additional advanced options can be seen by typing
As the program runs, the current status will be printed to the screen like so:
Step Chi2 Rg Support Volume ----- --------- ------- -------------- 2259 1.31e+00 14.34 42135
Step represents the number of iterations so far,
Chi2 is the fit of
the calculated scattering of the map to the experimental data,
Rg is the
radius of gyration calculated directly from the electron density map, and
Support Volume is the volume of the support region.
Electron density maps are written in CCP4/MRC format (credit Andrew Bruno). These files can be opened directly in some visualization programs such as Chimera and PyMOL. In particular, the PyMOL "volume" function is well suited for displaying these maps with density information displayed as varying color and opacity.
Output files include:
output.mrc electron density map (MRC format) output_support.mrc final support volume formatted as unitary electron density map output_stats_by_step.dat statistics as a function of step number. three columns: chi^2, Rg, support volume output_map.fit The fit of the calculated scattering profile to the experimental data. Experimental data has been interpolated to the q values used for scaling intensities and I(0) has been scaled to the square of the number of electrons in the particle. Columns are: q(data), I(data), error(data), I(calc) output_*.png If plotting is enabled, these are plots of the results. output.log A log file containing parameters for the calculation and summary of the results.
Alignment, Averaging, and Resolution Estimation
The solutions are non-unique, meaning many different electron density maps will yield the same scattering profile. Different random starting points will return different results. Therefore, running the algorithm many times (>20) is strongly advised.
The built-in option should run on Mac, Linux and Windows systems (please email me with bugs), and requires no additional programs or modules to be installed (just the already required NumPy and SciPy modules). The built-in method is fully parallelized for taking advantage of multicore machines.
denss.all.py is the primary script for running the full pipeline of DENSS,
including running multiple runs of DENSS (default = 20), aligning, selecting
enantiomers, averaging, and estimating resolution. To run the defaults, which
should be suitable for most applications, simply type:
$ denss.all.py -f 6lyz.out
If you would like to use multiple cores for parallel processing, simply add the -j option:
$ denss.all.py -f 6lyz.out -j 4
for example to run on 4 cores. All options available to denss.py can also be passed to
denss.all.py. Some additional options exist as well. Type
denss.all.py -h to view all of the options available.
Several helper scripts are also supplied for performing various tasks:
denss.align.py - aligns electron density maps to a reference (MRC or PDB file)
denss.align2xyz.py - aligns electron density maps to the XYZ axes
denss.align_by_principal_axes.py - aligns electron density maps to a reference
(MRC or PDB), but performs no minimization.
denss.align_and_average.py - aligns and averages a set of electron density maps
denss.average.py - averages a set of pre-aligned electron density maps
denss.calcfsc.py - calculates the Fourier Shell Correlation curve between two
pre-aligned electron density maps, and estimates resolution.
denss.pdb2mrc.py - calculates an electron density map from a PDB file.
denss.get_info.py - prints basic information about an MRC file, to be used
with denss.pdb2mrc.py, for example, to set box sizes, voxels, etc.
denss.rho2dat.py - calculates a solution scattering profile from an electron density map.
denss.mrcops.py - performs basic operations on MRC file, such as resampling
an electron density map to have a new size or shape.
denss.all.py will create a folder with the name given to the --output option (which is input file basename by default). If a folder of that name already exists, DENSS will create a new folder adding a number at the end and incrementing by one. The folder will contain all the output maps and files of each reconstruction, the aligned maps, the average map, and the FSC curve used to estimate resolution. The final resulting averaged map will be called *_avg.mrc.
The alternative option for alignment and averaging requires installation of EMAN2. Installing is pretty straightforward on Unix systems. Some users have noted difficulty with installation of EMAN2 on Windows systems.
To install EMAN2, download the appropriate EMAN2 binary from this page Then, change to the folder where you would like to install EMAN2 (such as your home folder) and move the eman2 script you downloaded to that directory. Then execute the script. For example:
$ cd $HOME $ mv $HOME/Downloads/eman2.21.MacOS.sh . $ bash eman2.21.MacOS.sh
But, obviously, use the correct filenames and paths for your platform. If you have issues getting EMAN2 installed, see their installation page.
A bash script is provided called
superdenss that runs the EMAN2 pipeline automatically in parallel assuming EMAN2 and gnu parallel are all installed. To run superdenss with the default parameters for denss.py, type:
superdenss -f 6lyz.out
superdenss also takes its own options, as well as all of the options accepted by denss.py. The following options are available for superdenss (accessible with the -h option):
------------------------------------------------------------------------------ superdenss is a simple wrapper for denss that automates the process of generating multiple density reconstructions and averaging them with EMAN2. -f: filename of .out GNOM file or .dat solution scattering data -o: the output prefix to name the output directory and all the files. -i: input options for denss exactly as they would be given to denss, including dashed options. Enclose everything in quotes. Dont include --file or --output. -n: the number of reconstructions to run (default 20) -j: the number of cores to use for parallel processing (defaults to ncores - 1) -e: generate and select enantiomers (significantly increases runtime, default=no) -----------------------------------------------------------------------------
For example, to run superdenss while checking for the best enantiomers, type:
superdenss -f 6lyz.dat -e
If you would like to edit the parameters of denss.py, pass them into the -i option of superdenss. This is admittedly a little awkward to use to pass the arguments correctly as you must pass the denss.py options enclosed in double quotes after the -i option. To run the above example using a .dat file (note the quotes in the command line):
superdenss -f 6lyz.dat -e -i " -d 50.0 " -o lysozyme
This will create a folder named "lysozyme" with all the output files for each of the 20 individual runs of DENSS and a folder named spt_avg_01 which will contain the final averaged density. You can add other denss.py options into the -i option in between the quotes.
The combination of total real space box size (D x oversampling) divided by voxel size determines N. The number of grid points scales as N^3, so large N typically requires long compute times and lots of memory (tests show N>50 may start to slow things down noticeably). Preliminary tests have shown oversampling as low as 2 is often sufficient for accurate reconstruction. However, lesser oversampling also results in low sampling of scattering profile, so direct comparison with experimental data becomes more difficult. Note that D given by the user is only used to determine the size of the box and does not limit the the electron density of the object by default. If one wishes to impose D as a limit, enable --limit_dmax_on (off by default).
While the NumPy implementation of FFT is most efficient when N is a power of two, considerable performance gains can still be gained when N is not a power of two, and there is no requirement in DENSS for N to equal a power of two.
The electron density map is initially set to be random based on the random seed
selected by the program. One can therefore exactly reproduce the results of a previous calculation by giving the random seed to the program with the
--seed option and the same input parameters. The parameters of previous runs can all be found in the log file.
denss.rho2dat.py file can be used for calculating scattering profiles from MRC formatted electron density maps. Currently the input maps must be cubic (i.e. same length and shape on all sides). Type
denss.rho2dat.py -h for more options.