Skip to content

man:w_assign

Jeremy M. G. Leung edited this page Mar 18, 2024 · 28 revisions

By Nick Rego

w_assign uses simulation output to assign walkers to user-specified bins and macrostates. These assignments are required for some other simulation tools, namely w_kinetics and w_kinavg.

w_assign supports parallelization (see general work manager options for more on command line options to specify a work manager).

Table of Contents

Overview

Usage:

 w_assign [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
               [-W WEST_H5FILE] [-o OUTPUT]
               [--bins-from-system | --bins-from-expr BINS_FROM_EXPR | --bins-from-function BINS_FROM_FUNCTION]
               [-p MODULE.FUNCTION]
               [--states STATEDEF [STATEDEF ...] | --states-from-file STATEFILE | --states-from-function STATEFUNC]
               [--wm-work-manager WORK_MANAGER] [--wm-n-workers N_WORKERS]
               [--wm-zmq-mode MODE] [--wm-zmq-info INFO_FILE]
               [--wm-zmq-task-endpoint TASK_ENDPOINT]
               [--wm-zmq-result-endpoint RESULT_ENDPOINT]
               [--wm-zmq-announce-endpoint ANNOUNCE_ENDPOINT]
               [--wm-zmq-listen-endpoint ANNOUNCE_ENDPOINT]
               [--wm-zmq-heartbeat-interval INTERVAL]
               [--wm-zmq-task-timeout TIMEOUT]
               [--wm-zmq-client-comm-mode MODE]

For WESTPA 1.0, run with $WEST_ROOT/bin/w_assign instead of w_assign.

Command-Line Options

See the general command-line tool reference for more information on the general options.

Input/output Options

  -W, --west-data file
  : Read simulation result data from file. 

If no argument is provided, the default is the hdf5 file specified in the configuration file, which is in turn by default west.h5 (assumed to be in the working directory.)

'''-o, --output ''outfile'''''
Write assignment results to file outfile. (Default: hdf5 file assign.h5)

Binning Options

Specify how binning is to be assigned to the dataset.

'''--bins-from-system'''
Use binning scheme specified by the system driver; system driver can be found in the west configuration file, by default named west.cfg (Default binning)
'''--bins-from-expr ''bin_expr'''''
Use binning scheme specified in bin_expr, which takes the form a Python list of lists, where each inner list corresponds to the binning a given dimension.
(for example, "0,1,2,4,inf],[-inf,0,inf" specifies bin boundaries for two dimensional progress coordinate. Note that this option accepts the special symbol 'inf' for floating point infinity
'''--bins-from-function ''bin_func'''''
Bins specified by calling an external function bin_func. bin_func should be formatted as '[PATH:]module.function', where the function 'function' in module 'module' will be used

Macrostate Options

You can optionally specify how to assign user-defined macrostates. Note that macrostates must be assigned for subsequent analysis tools, namely w_kinetics and w_kinavg.

'''--states ''statedef [statedef ...]'''''
Specify a macrostate for a single bin as statedef, formatted as a coordinate tuple where each coordinate specifies the bin to which it belongs, for instance:
'[1.0,]' assigns a macrostate corresponding to the bin that contains the (two-dimensional) progress coordinates 1.0 and 2.0. Note that a macrostate label can optionally by specified, for instance: 'bound:[1.0,]' assigns the corresponding bin containing the given coordinates the macrostate named 'bound'. Note that multiple assignments can be specified with this command, but only one macrostate per bin is possible - if you wish to specify multiple bins in a single macrostate, use the --states-from-file option.
'''--states-from-file ''statefile'''''
Read macrostate assignments from yaml file statefile. This option allows you to assign multiple bins to a single macrostate. The following example shows the contents of statefile that specify two macrostates, bound and unbound, over multiple bins with a two-dimensional progress coordinate:
 ---
 states:
   - label: unbound
     coords:
       - [9.0, 1.0]
       - [9.0, 2.0]
   - label: bound
     coords:
       - [0.1, 0.0]

Specifying Progress Coordinate

By default, progress coordinate information for each iteration is taken from pcoord dataset in the specified input file (which, by default is west.h5). Optionally, you can specify a function to construct the progress coordinate for each iteration - this may be useful to consolidate data from several sources or otherwise preprocess the progress coordinate data.

'''--construct-pcoord ''module.function'', -p ''module.function'''''
Use the function module.function to construct the progress coordinate for each iteration. This will be called once per iteration as function(n_iter, iter_group) and should return an array indexable as [seg_id][timepoint][dimension]. The default function returns the 'pcoord' dataset for that iteration (i.e. the function executes return iter_group['pcoord'][...])

Output format

The output file (-o/--output, by default "assign.h5") contains the following attributes datasets:

  • nbins attribute (Integer):
    Number of valid bins. Bin assignments range from 0 to nbins-1, inclusive.
  • nstates attribute (Integer):
    Number of valid macrostates (may be zero if no such states are specified). Trajectory ensemble assignments range from 0 to nstates-1, inclusive, when states are defined.
  • /assignments [iteration][segment][timepoint] (Integer):
    Per-segment and -timepoint assignments (bin indices).
  • /npts [iteration] (Integer):
    Number of timepoints in each iteration.
  • /nsegs [iteration] (Integer):
    Number of segments in each iteration.
  • /labeled_populations [iterations][state][bin] (Floating-point):
    Per-iteration and -timepoint bin populations, labeled by most recently visited macrostate. The last state entry (nstates-1) corresponds to trajectories initiated outside of a defined macrostate.
  • /bin_labels [bin] (String):
    Text labels of bins.
When macrostate assignments are given, the following additional datasets are present:
  • /trajlabels [iteration][segment][timepoint] (Integer):
    Per-segment and -timepoint trajectory labels, indicating the macrostate which each trajectory last visited.
  • /state_labels [state] (String):
    Labels of states.
  • /state_map [bin] (Integer):
    Mapping of bin index to the macrostate containing that bin. An entry will contain nbins+1 if that bin does not fall into a macrostate.
Datasets indexed by state and bin contain one more entry than the number of valid states or bins. For N bins, axes indexed by bin are of size N+1, and entry N (0-based indexing) corresponds to a walker outside of the defined bin space (which will cause most mappers to raise an error). More importantly, for M states (including the case M=0 where no states are specified), axes indexed by state are of size M+1 and entry M refers to trajectories initiated in a region not corresponding to a defined macrostate.
Thus, labeled_populations[:,:,:].sum(axis=1)[:,:-1] gives overall per-bin populations, for all defined bins and labeled_populations[:,:,:].sum(axis=2)[:,:-1] gives overall per-trajectory-ensemble populations for all defined states.
For more information on how to work with h5 data, see this page: Accessing data from custom analysis scripts.

Examples

In this example, a 2D binning scheme is made by creating auxiliary yaml files to pass into w_assign.

#!/bin/bash

WEST=west.h5
AUX_A="1_75_39_c2"
AUX_B="fit_m1_rms_heavy_m2"
SCHEME="C2_M2"

mkdir $SCHEME
cd $SCHEME

# define bins and states with yaml files
cat << EOF > BINS
---
bins:
    type: RectilinearBinMapper
    boundaries: [[0.0, 45.0, 'inf'], [0.0, 6.25, 'inf']]
EOF

cat << EOF > STATES
---
states:
  - label: a
    coords:
      - [46.0, 6.5]

  - label: b
    coords:
      - [44.0, 6.0]
EOF

# create module.py file to process 1D or 2D scheme
cat << EOF > module.py
#!/usr/bin/env python

import numpy

def pull_data_1d(n_iter, iter_group):
    '''
    This function reshapes auxiliary data for each iteration and returns it.
    '''
    auxdata = iter_group['auxdata']['${AUX_A}'][...]
    data = auxdata[:,:,numpy.newaxis]
    return data

def pull_data_2d(n_iter, iter_group):
    '''
    This function reshapes 2 auxiliary datasets for each iteration and returns it.
    '''
    auxdata1 = iter_group['auxdata']['${AUX_A}'][...]
    auxdata2 = iter_group['auxdata']['${AUX_B}'][...]
    data = numpy.dstack((auxdata1, auxdata2))
    return data
EOF

# run w_assign to assign macrostates based off of defined BINS and STATES
w_assign -W ../${WEST} --bins-from-file BINS --states-from-file STATES -o assign.h5 --construct-dataset module.pull_data_2d --serial

Alternatively, the scheme/bins/states can be defined in the west.cfg file, which can also be passed into w_assign. In your west.cfg file, fill out or add a section for analysis:

# Settings for w_ipa, an interactive analysis program that can also automate analysis.
analysis:
   directory: ANALYSIS                # specify the directory all analysis files should exist in.
   kinetics:                          # general options for both kinetics routines.
     step_iter: 1 
     evolution: cumulative
     extra: [ 'disable-correl' ]
   analysis_schemes:                  # Analysis schemes.  Required: name, states, and bins
     AB:
       enabled: True
       bins:
         - type: RectilinearBinMapper
           boundaries: 
             - [0.0,  4, 17, 'inf']
       states:
         - label: unbound
           coords: 
             - [18]
         - label: encounter
           coords:
             - [3.9]

Create a custom function to construct the desired dataset.

#!/usr/bin/env python

import numpy

def load_distances(n_iter, iter_group):
    auxgroup1 = iter_group['auxdata/distance3']
    auxgroup2 = iter_group['auxdata/distance4']
    dataset = numpy.dstack((auxgroup1, auxgroup2))
    return dataset

Then run w_assign.

w_assign -W west.h5 --config-from-file --scheme AB --construct-dataset module.load_distances --serial

In the case where you only want to consider a single dataset, the `numpy.expand_dims()` function can be used to ensure the dataset is with the correct dimensions.

import numpy
  
def load_distances(n_iter, iter_group):
    auxgroup = iter_group['auxdata/distance3']
    dataset = numpy.atleast_2d(auxgroup)
    return dataset
Clone this wiki locally