The motivation of this project is to build a simple, modular, extensible, machine learning pipeline in Python.
I recommend you to first view my notebook to have a sense of the pipeline as a whole. (You can directly view it in github - if it cannot open (as it contains a lot of graph), just reload another time! Alternatively, you can also download it and use your own local Jupyter Notebook to view.) After that, you can jump to my code to see how I designed the functions used in the notebook.
The project build a complete machine learning pipeline. Specifically, it deploys the decision tree classifier, and uses the financial distress prediction as an example. The goal of the example is to predict if an individual will experience financial distress in the next two years.
The pipeline project is organized as follow:
- code: The functions designed to build up the pipeline. It is composed by four python files, which would be illustrated
- notebook: An implementation of the pipeline.
- data: The data from the financial distress example.
- Please refer to explore.py and the corresponding part in the notebook.
- Please refer to feature.py and the corresponding part in the notebook.