An ADAM extension library for loading .vcf files annotated with SnpEff and SnpSift.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
img
project
src
.gitignore
LICENSE
README.md
activator
activator-launch-1.3.5.jar
activator.bat
build.sbt
notes.md
version.sbt

README.md

adam-fx

A Scala library extending ADAM and BDG-formats to load .vcf files annotated with SnpEff.

[WARNING: this library is still under heavy development. Expect versions to break compatibility.]

Get the Maven artifact

Artifacts are published to Bintray.

SBT
resolvers += "bintray-tmoerman" at "http://dl.bintray.com/tmoerman/maven"`

libraryDependencies += "org.tmoerman" %% "adam-fx" % "0.5.5"
Spark Notebook
:remote-repo bintray-tmoerman % default % http://dl.bintray.com/tmoerman/maven % maven

:dp org.tmoerman % adam-fx_2.10 % 0.5.5
Zeppelin
%dep

z.addRepo("bintray-tmoerman").url("http://dl.bintray.com/tmoerman/maven")

z.load("org.tmoerman:adam-fx_2.10:0.5.5")

Data model

The AnnotatedVariant and AnnotatedGenotype classes are the "connector" types between the Adam types and the SnpEffAnnotations.

Class diagrams distilled from the Java classes generated from the Avro schema definition.

Overview:

Class diagram

With properties:

Class diagram

Usage

Kryo

Adam-fx has its own KryoRegistrator that extends the ADAMKryoRegistrator with additional Avro data types. Use it when initializing a SparkConf.

val conf = new SparkConf()
    .setAppName("Test")
    .setMaster("local[*]")
    .set("spark.kryo.registrator", "org.tmoerman.adam.fx.serialization.AdamFxKryoRegistrator")
    .set("spark.kryo.referenceTracking", "true")
    .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    
val sc = new SparkContext(conf)
SnpEffContext

Instantiate a SnpEffContext, passing it a SparkContext.

In a notebook setting you may want to use the @transient annotation in order to prevent serialization issues.

import org.tmoerman.adam.fx.snpeff.SnpEffContext

@transient val ec = new SnpEffContext(sc)

Or you could simply import the implicit conversions and use an (already instantiated) SparkContext reference.

import org.tmoerman.adam.fx.snpeff.SnpEffContext._
Loading data

Loading Variants with SnpEffAnnotations:

val annotatedVariants: RDD[AnnotatedVariant] = sc.loadAnnotatedVariants(annotatedVcf)

Or Genotypes with SnpEffAnnotations:

val annotatedGenotypes: RDD[AnnotatedGenotype] = sc.loadAnnotatedGenotypes(annotatedVcf)