Skip to content
Sanchi Mittal edited this page Nov 21, 2022 · 2 revisions

Batch Generator For Training Machine Learning Models (GSoC 2022)

image

Introduction

This Wiki constitutes the Final Project Report for the project under Google Summer of Code 2022.

About the Project Goals

Toolkit for Multivariate Analysis (TMVA) is a multi-purpose machine learning toolkit integrated into the ROOT scientific software framework, used in many particle physics data analysis and applications. Since it is part of the ROOT data analysis framework, it comes with an automatically generated Python interface, which closely follows the C++ interface. The goal of this project is to develop a generator in C++ and Python to read data from the ROOT I/O and input them to the Python machine learning tools such as Tensorflow/Keras and PyTorch. The main aim of the generator is to efficiently input data from the ROOT I/O system to train machine learning models, and keep in memory only the data required to train a batch of events and not all the data set.

Goals Development of a Python generator for getting the data directly from a ROOT TTree using PyROOT interfaces with Documentation Integration with ROOT RDataFrame with Documentation Development of tests and tutorial example

Clone this wiki locally