spark-root

No longer actively developed

Current Release is on Maven Central: 0.1.15 (awaiting 0.1.16)

ROOT 6.12 (and beyond) updates are not yet integrated

An Apache Spark's Data Source for ROOT File Format. Employ Spark's Datasets for processing of ROOT TTrees.

Current Limitations

Pointers are currently not well supported

Requirements

Apache Spark 2.0.
Scala 2.11
root4j - available on Maven Central

Get Started with 2 commands locally on your laptop!

Download https://spark.apache.org/downloads.html, unzip and cd into spark's dir
Start a scala shell: ./bin/spark-shell --packages org.diana-hep:spark-root_2.11:0.1.15
Or Start a python shell: ./bin/pyspark --packages org.diana-hep:spark-root_2.11:0.1.15

You are ready to start analyzing

ATLAS Open Data

Download a ROOT file from http://opendata.cern.ch/record/391
Start a spark shell: ./bin/spark-shell --packages org.diana-hep:spark-root_2.11:0.1.15

import org.dianahep.sparkroot.experimental._

// read in the file
val df = spark.read.root("file:/Users/vk*.root")
df: org.apache.spark.sql.DataFrame = [runNumber: int, eventNumber: int ... 44 more fields]

scala> df.printSchema
root
 |-- runNumber: integer (nullable = true)
 |-- eventNumber: integer (nullable = true)
 |-- channelNumber: integer (nullable = true)
 |-- mcWeight: float (nullable = true)
 |-- pvxp_n: integer (nullable = true)
 |-- vxp_z: float (nullable = true)
 |-- scaleFactor_PILEUP: float (nullable = true)
 |-- scaleFactor_ELE: float (nullable = true)
 |-- scaleFactor_MUON: float (nullable = true)
 |-- scaleFactor_BTAG: float (nullable = true)
 |-- scaleFactor_TRIGGER: float (nullable = true)
 |-- scaleFactor_JVFSF: float (nullable = true)
 |-- scaleFactor_ZVERTEX: float (nullable = true)
 |-- trigE: boolean (nullable = true)
 |-- trigM: boolean (nullable = true)
 |-- passGRL: boolean (nullable = true)
 |-- hasGoodVertex: boolean (nullable = true)
 |-- lep_n: integer (nullable = true)
 |-- lep_truthMatched: array (nullable = true)
 |    |-- element: boolean (containsNull = true)
 |-- lep_trigMatched: array (nullable = true)
 |    |-- element: short (containsNull = true)
 |-- lep_pt: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_eta: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_phi: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_E: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_z0: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_charge: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_type: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- lep_flag: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- lep_ptcone30: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_etcone20: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_trackd0pvunbiased: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- lep_tracksigd0pvunbiased: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- met_et: float (nullable = true)
 |-- met_phi: float (nullable = true)
 |-- jet_n: integer (nullable = true)
 |-- alljet_n: integer (nullable = true)
 |-- jet_pt: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- jet_eta: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- jet_phi: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- jet_E: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- jet_m: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- jet_jvf: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- jet_trueflav: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- jet_truthMatched: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- jet_SV0: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- jet_MV1: array (nullable = true)
 |    |-- element: float (containsNull = true)

 // show the first 10 entries for the column vxp_z
 scala> df.select("vxp_z").show
 17/12/14 12:09:31 WARN util.Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
 +----------+
 |     vxp_z|
 +----------+
 |-12.316585|
 | 22.651913|
 |  67.00033|
 | 25.114586|
 |  5.419942|
 |-29.495798|
 | -41.16512|
 |-34.865147|
 |  -34.4249|
 | -43.56932|
 |-10.688576|
 |-75.542625|
 | 24.876757|
 |-21.913155|
 | -66.78894|
 | -48.26993|
 | 56.751675|
 |-41.679707|
 | 4.6949086|
 | 23.715324|
 +----------+

 // count the number of rows (events) when there are 2 leptons
 scala> df.where("lep_n == 2").count
 res2: Long = 647126

Further examples:

Image Conversion Pipeline https://github.com/vkhristenko/MPJRPipeline/blob/master/ipynb/convert2images_python.ipynb
Feature Engineering https://github.com/vkhristenko/MPJRPipeline/blob/master/ipynb/preprocessing_python_noudfs.ipynb
Public CMS Open Data Example https://github.com/diana-hep/spark-root/blob/master/ipynb/publicCMSMuonia_exampleAnalysis_wROOT.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
ipynb		ipynb
jars		jars
md		md
project		project
src		src
talks		talks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-root

Current Limitations

Requirements

Get Started with 2 commands locally on your laptop!

ATLAS Open Data

Further examples:

About

Releases

Packages

Languages

License

vkhristenko/spark-root

Folders and files

Latest commit

History

Repository files navigation

spark-root

Current Limitations

Requirements

Get Started with 2 commands locally on your laptop!

ATLAS Open Data

Further examples:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages