man:w_assign
By Nick Rego
w_assign uses simulation output to assign walkers to user-specified bins and macrostates. These assignments are required for some other simulation tools, namely w_kinetics and w_kinavg.
w_assign supports parallelization (see general work manager options for more on command line options to specify a work manager).
Usage:
w_assign [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version] [-W WEST_H5FILE] [-o OUTPUT] [--bins-from-system | --bins-from-expr BINS_FROM_EXPR | --bins-from-function BINS_FROM_FUNCTION] [-p MODULE.FUNCTION] [--states STATEDEF [STATEDEF ...] | --states-from-file STATEFILE | --states-from-function STATEFUNC] [--wm-work-manager WORK_MANAGER] [--wm-n-workers N_WORKERS] [--wm-zmq-mode MODE] [--wm-zmq-info INFO_FILE] [--wm-zmq-task-endpoint TASK_ENDPOINT] [--wm-zmq-result-endpoint RESULT_ENDPOINT] [--wm-zmq-announce-endpoint ANNOUNCE_ENDPOINT] [--wm-zmq-listen-endpoint ANNOUNCE_ENDPOINT] [--wm-zmq-heartbeat-interval INTERVAL] [--wm-zmq-task-timeout TIMEOUT] [--wm-zmq-client-comm-mode MODE]
For WESTPA 1.0, run with $WEST_ROOT/bin/w_assign
instead of w_assign
.
See the general command-line tool reference for more information on the general options.
-W, --west-data file : Read simulation result data from file.
If no argument is provided, the default is the hdf5 file specified in the configuration file, which is in turn by default west.h5 (assumed to be in the working directory.)
'''-o, --output ''outfile'''''
- Write assignment results to file outfile. (Default: hdf5 file assign.h5)
Specify how binning is to be assigned to the dataset.
'''--bins-from-system'''
- Use binning scheme specified by the system driver; system driver can be found in the west configuration file, by default named west.cfg (Default binning)
-
'''--bins-from-expr ''bin_expr'''''
- Use binning scheme specified in bin_expr, which takes the form a Python list of lists, where each inner list corresponds to the binning a given dimension.
- (for example, "0,1,2,4,inf],[-inf,0,inf" specifies bin boundaries for two dimensional progress coordinate. Note that this option accepts the special symbol 'inf' for floating point infinity
'''--bins-from-function ''bin_func'''''
- Bins specified by calling an external function bin_func. bin_func should be formatted as '[PATH:]module.function', where the function 'function' in module 'module' will be used
You can optionally specify how to assign user-defined macrostates. Note that macrostates must be assigned for subsequent analysis tools, namely w_kinetics and w_kinavg.
'''--states ''statedef [statedef ...]'''''
- Specify a macrostate for a single bin as statedef, formatted as a coordinate tuple where each coordinate specifies the bin to which it belongs, for instance:
- '[1.0,]' assigns a macrostate corresponding to the bin that contains the (two-dimensional) progress coordinates 1.0 and 2.0. Note that a macrostate label can optionally by specified, for instance: 'bound:[1.0,]' assigns the corresponding bin containing the given coordinates the macrostate named 'bound'. Note that multiple assignments can be specified with this command, but only one macrostate per bin is possible - if you wish to specify multiple bins in a single macrostate, use the --states-from-file option.
'''--states-from-file ''statefile'''''
- Read macrostate assignments from yaml file statefile. This option allows you to assign multiple bins to a single macrostate. The following example shows the contents of statefile that specify two macrostates, bound and unbound, over multiple bins with a two-dimensional progress coordinate:
--- states: - label: unbound coords: - [9.0, 1.0] - [9.0, 2.0] - label: bound coords: - [0.1, 0.0]
By default, progress coordinate information for each iteration is taken from pcoord dataset in the specified input file (which, by default is west.h5). Optionally, you can specify a function to construct the progress coordinate for each iteration - this may be useful to consolidate data from several sources or otherwise preprocess the progress coordinate data.
'''--construct-pcoord ''module.function'', -p ''module.function'''''
- Use the function module.function to construct the progress coordinate for each iteration. This will be called once per iteration as function(n_iter, iter_group) and should return an array indexable as [seg_id][timepoint][dimension]. The default function returns the 'pcoord' dataset for that iteration (i.e. the function executes
return iter_group['pcoord'][...]
)
The output file (-o/--output, by default "assign.h5") contains the following attributes datasets:
-
nbins attribute (Integer):
Number of valid bins. Bin assignments range from 0 to nbins-1, inclusive. -
nstates attribute (Integer):
Number of valid macrostates (may be zero if no such states are specified). Trajectory ensemble assignments range from 0 to nstates-1, inclusive, when states are defined. -
/assignments [iteration][segment][timepoint] (Integer):
Per-segment and -timepoint assignments (bin indices). -
/npts [iteration] (Integer):
Number of timepoints in each iteration. -
/nsegs [iteration] (Integer):
Number of segments in each iteration. -
/labeled_populations [iterations][state][bin] (Floating-point):
Per-iteration and -timepoint bin populations, labeled by most recently visited macrostate. The last state entry (nstates-1) corresponds to trajectories initiated outside of a defined macrostate. -
/bin_labels [bin] (String):
Text labels of bins.
-
/trajlabels [iteration][segment][timepoint] (Integer):
Per-segment and -timepoint trajectory labels, indicating the macrostate which each trajectory last visited. -
/state_labels [state] (String):
Labels of states. -
/state_map [bin] (Integer):
Mapping of bin index to the macrostate containing that bin. An entry will contain nbins+1 if that bin does not fall into a macrostate.
- Datasets indexed by state and bin contain one more entry than the number of valid states or bins. For N bins, axes indexed by bin are of size N+1, and entry N (0-based indexing) corresponds to a walker outside of the defined bin space (which will cause most mappers to raise an error). More importantly, for M states (including the case M=0 where no states are specified), axes indexed by state are of size M+1 and entry M refers to trajectories initiated in a region not corresponding to a defined macrostate.
- Thus,
labeled_populations[:,:,:].sum(axis=1)[:,:-1]
gives overall per-bin populations, for all defined bins andlabeled_populations[:,:,:].sum(axis=2)[:,:-1]
gives overall per-trajectory-ensemble populations for all defined states.
In this example, a 2D binning scheme is made by creating auxiliary yaml files to pass into w_assign.
#!/bin/bash
WEST=west.h5
AUX_A="1_75_39_c2"
AUX_B="fit_m1_rms_heavy_m2"
SCHEME="C2_M2"
mkdir $SCHEME
cd $SCHEME
# define bins and states with yaml files
cat << EOF > BINS
---
bins:
type: RectilinearBinMapper
boundaries: [[0.0, 45.0, 'inf'], [0.0, 6.25, 'inf']]
EOF
cat << EOF > STATES
---
states:
- label: a
coords:
- [46.0, 6.5]
- label: b
coords:
- [44.0, 6.0]
EOF
# create module.py file to process 1D or 2D scheme
cat << EOF > module.py
#!/usr/bin/env python
import numpy
def pull_data_1d(n_iter, iter_group):
'''
This function reshapes auxiliary data for each iteration and returns it.
'''
auxdata = iter_group['auxdata']['${AUX_A}'][...]
data = auxdata[:,:,numpy.newaxis]
return data
def pull_data_2d(n_iter, iter_group):
'''
This function reshapes 2 auxiliary datasets for each iteration and returns it.
'''
auxdata1 = iter_group['auxdata']['${AUX_A}'][...]
auxdata2 = iter_group['auxdata']['${AUX_B}'][...]
data = numpy.dstack((auxdata1, auxdata2))
return data
EOF
# run w_assign to assign macrostates based off of defined BINS and STATES
w_assign -W ../${WEST} --bins-from-file BINS --states-from-file STATES -o assign.h5 --construct-dataset module.pull_data_2d --serial
Alternatively, the scheme/bins/states can be defined in the west.cfg file, which can also be passed into w_assign. In your west.cfg file, fill out or add a section for analysis:
# Settings for w_ipa, an interactive analysis program that can also automate analysis.
analysis:
directory: ANALYSIS # specify the directory all analysis files should exist in.
kinetics: # general options for both kinetics routines.
step_iter: 1
evolution: cumulative
extra: [ 'disable-correl' ]
analysis_schemes: # Analysis schemes. Required: name, states, and bins
AB:
enabled: True
bins:
- type: RectilinearBinMapper
boundaries:
- [0.0, 4, 17, 'inf']
states:
- label: unbound
coords:
- [18]
- label: encounter
coords:
- [3.9]
Create a custom function to construct the desired dataset.
#!/usr/bin/env python
import numpy
def load_distances(n_iter, iter_group):
auxgroup1 = iter_group['auxdata/distance3']
auxgroup2 = iter_group['auxdata/distance4']
dataset = numpy.dstack((auxgroup1, auxgroup2))
return dataset
Then run w_assign.
w_assign -W west.h5 --config-from-file --scheme AB --construct-dataset module.load_distances --serial
In the case where you only want to consider a single dataset, the `numpy.expand_dims()` function can be used to ensure the dataset is with the correct dimensions.
import numpy
def load_distances(n_iter, iter_group):
auxgroup = iter_group['auxdata/distance3']
dataset = numpy.atleast_2d(auxgroup)
return dataset