Calculate SMD and Hamming and Jaccard distances between each pair of samples in a set of variant files.
- libbio (provided as a Git submodule)
On Linux also the following libraries are required:
- libdispatch (provided as a Git submodule)
- libpthread_workqueue (provided as a Git submodule)
- libkqueue
- Reasonably new compilers for C and C++, e.g. GCC 7. C++17 support is required.
- GNU gengetopt (tested with version 2.22.6)
- Ragel State Machine Compiler (tested with version 6.7)
- CMake
- Boost
git clone https://github.com/tsnorri/vcfdistances.git
cd vcfdistances
git submodule update --init --recursive
- Edit local.mk
make -j4
- Clone the repository with
git clone https://github.com/tsnorri/vcfdistances.git
. - Change the working directory with
cd vcfdistances
. - Run
git submodule update --init --recursive
. This clones the missing submodules and updates their working tree. - Edit
local.mk
in the repository root to override build variables. Useful variables includeCC
,CXX
,RAGEL
andGENGETOPT
for C and C++ compilers, gengetopt and Ragel respectively.BOOST_INCLUDE
is used as preprocessor flags when Boost is required.BOOST_LIBS
andLIBDISPATCH_LIBS
are passed to the linker. Seecommon.mk
for additional variables. - Run make with a suitable numer of parallel jobs, e.g.
make -j4
Useful make targets include:
- all
- Build everything
- clean
- Remove build products except for dependencies (in the
lib
folder). - clean-all
- Remove all build products.
The tool takes one or more Variant Call Format files as its input. It reads the variants and makes pairwise comparisons between each sample. Remaining files are processed similarly. The requested distances are output as triangular matrices.
Please see src/vcfdistances --help
for command line options.