Jupyter Notebooks are wonderful because they provide a way to share code, explanations, and visualizations in the same place. Notebooks add narrative to computation. The cells compartmentalize steps and fascilitate data analysis. In this way, notebooks act as an invitation for experimentation. If you are looking to enhance your time series data by performing time series data analytics or data science tasks, a Notebook is a great place to start this work.
Analyze your time series data and experiment with forecasting and anomaly detection algorithms using Jupyter Notebook tutorials (.ipynb files) and corresponding sample data (.csv files). We include tutorials and sample data for the following topics:
-
How to get started with InfluxDB Cloud powered by IOx and Pandas with Flight SQL
-
- For multiple time series, including BIRCH, KMEANS, and Median Absolute Deviation(MAD).
- For single time series, including Autoregression, LevelShiftAD, and SeasonalAD
-
Forecasting, including FBProphet, LSTM with Keras, and statsmodels' Holt's Method.
These instructions are written for InfluxDB OSS 2.0 (starting with release candidate 0 and later) or InfluxDB Cloud. If you're using InfluxDB Cloud make sure to change your URL appropriately.
Installations of Python can get a bit tricky; different versions of the language, as well as projects which require different versions of installed libraries, can quickly lead to conflicts. Using a virtual environment is reccommended. Please consider looking into additional tooling like virtualenv or pyenv might be useful.
Run pip install requirements.txt
inside your virtual environment to download all of the necessary dependencies.
After cloning this repo, run Jupyter Notebook locally with jupyter notebook
. This should direct you to the web application which by default runs on http://localhost:8888.
After you analyze your time series data with notebooks and select the forecasting or anomaly detection approach that works for you, it's time to implement your solution in production. The following resources could be useful in that next step:
- Using the http.post() function in a task in combination with a serverless compute solution (like aws lambda for example) to run your code.
- Using the Telegraf Execd processor plugin to run an external program. Please see this example of Machine Learning with the Telegraf Execd processor plugin for more details.
This repo is just a sample of some of the many algorithms, approaches, and tools to time series forecasting and anomaly detection. Here are additional ML solutions that might interest the reader:
- STUMPY: A powerful library that efficiently computes the matrix profile of a time series, which can be used for a variety of time series data mining tasks.
- scikit-multiflow: A machine learning package for streaming data in Python, especially apt for clustering.
- InfluxDB Interpreter for Apache Zeppelin: Apache Zeppelin is another web-based notebook similar to Jupyter Notebooks. Zeppelin has built in Spark integration. Zeppelin and the InfluxDB interpreter enables easy access to and parallelization of big time series data for quick analysis on large volumes of data.