ts-flint - Time Series Library for PySpark

ts-flint is a collection of modules related to time series analysis for PySpark.

You can build flint by running this in the top level of this repo:

python install
# -or-
pip install .

You can also install directly from gitlab clone:

make dist

This will create a jar under target/scala-2.11/flint-assembly-{VERSION}-SNAPSHOT.jar

Running with PySpark

You can use ts-flint with PySpark by:

pyspark --jars /path/to/flint-assembly-{VERSION}-SNAPSHOT.jar --py-files /path/to/flint-assembly-{VERSION}-SNAPSHOT.jar


>>> import os
>>> import ts.flint
>>> ts.flint.__file__[len(os.getcwd()):]

Running in a notebook

You can also run ts-flint from within a jupyter notebook. First, create a virtualenv or conda environment containing pandas and jupyter.

conda create -n flint  python=3.5 pandas notebook
source activate flint


Then visit http://localhost:8080.

Make sure pyspark is in your PATH. Then, from the flint project dir, start pyspark with the following options:

export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='$(hostname)' --NotebookApp.port=8888"
pyspark --master=local --jars /path/to/flint-assembly-{VERSION}-SNAPSHOT.jar --py-files /path/to/flint-assembly-{VERSION}-SNAPSHOT.jar


The Flint python bindings are documented at