# Overview

In this notebook we will explore the functionality of an MLFlow project. Before continuing, be sure to get familiar with the matrial in the [README](README.md) and the prerequisite notebooks listed there.

# 1. Creating A Project

## 1.1. Create A Directory
As mentioned in the [README](README.md) an MLFlow Project is a packaging format for storing arbitrary code files. As such a project is basically a directory of files. MLFlow also supports using an Git repository (also a directory) as an MLFlow Project.

We will use a simple directory named TestProject

In [2]:
import os
os.listdir()

['.gitignore',
 '.ipynb_checkpoints',
 'artifacts',
 'images',
 'MLFlow Model Registry API.ipynb',
 'MLFlow Projects API.ipynb',
 'MLFLow Tracking API.ipynb',
 'mlflow.db',
 'mlruns',
 'README.md',
 'TestProject',
 'test_results.png',
 'tmp']

## 1.2. Create A Script To Do ML
We will create a script to do machine learning

In [12]:
with open("TestProject/train.py") as f:
    print(f.read())

import sys
import argparse

print("Parsing Args")
parser = argparse.ArgumentParser(description='A Python Web Server')
parser.add_argument('-d', '--data-file', help='A data file to load.', required=True)
parser.add_argument('-t', '--ticker', help='The ticker to model.', required=True)
args = parser.parse_args()

print("Loading Data")
import pandas
pandas_dataframe = pandas.read_csv(args.data_file)
aaba_dataframe = pandas_dataframe[pandas_dataframe["ticker"] == args.ticker]
aaba_dataframe = aaba_dataframe.sort_values(by="date", ascending=True)

print("Showing Data")
print(aaba_dataframe.head())
                    



We can test the script as follows:

In [22]:
! python.exe TestProject/train.py -d "..\..\..\Example Data Sets\nasdaq_2019.csv" -t AABA



Parsing Args
Loading Data
Showing Data
       ticker interval       date   open   high    low  close    volume
93620    AABA        D 2019-01-01  57.94  57.94  57.94  57.94         0
96799    AABA        D 2019-01-02  56.78  58.01  56.47  57.49  10532400
99984    AABA        D 2019-01-03  56.48  56.85  55.09  55.53   8506900
103170   AABA        D 2019-01-04  56.50  59.38  56.50  58.72   9438700
106357   AABA        D 2019-01-07  58.90  60.20  58.45  59.64   9004700
Train The Machine Learning Model
The best parameter set (0, 1, 0) yielded an AIC of 409.4944259259052




## 1.2. Create An MLProject File
MLFlow will try to understand how to handle the package in two ways. First, it will look for an [MLProject file](https://mlflow.org/docs/latest/projects.html#mlproject-file) in the root of the directory. This is a yaml file which explicitly defines the name, entrypoints, and environments for the project. If this file does not exist, [inferences will be made](https://mlflow.org/docs/latest/projects.html#project-directories) based on the directory and its structure. It is ideal to use the MLFlow project file as it is the only real way to get treuly reproducable results.

In [6]:
with open("TestProject/MLProject") as f:
    print(f.read())

name: My First MLFlow Project

docker_env:
  image:  tschneider/ml-python

entry_points:
  train_my_model:
    parameters:
      data_file: ..\..\..\..\Example Data Sets\nasdaq_2019.csv
      ticker: AABA
    command: "python3 train.py --data-file {data_file} --ticker {ticker}"

