High Potential Prediction Using Tabular Neural Networks

This project aims to predict high potentials using a Tabular Neural Network model and Anomaly Detection techniques. The model is trained using historical user purchase and engagement data and leverages AutoGluon for efficient hyperparameter tuning. The project includes scripts for training the model (train.py) and performing inference (infer.py) to identify users likely to make high potentials.

Project Structure

train.py: Script to preprocess data, train an LSTM model & Anomaly Detection, tune hyperparameters, and save the best-performing model along with the scalers and label encoders.
infer.py: Script to load the saved model, preprocess new data, and predict churn.
amc_sql.sql: SQL query to pull historical data for training & inference from AMC.

Prerequisites

Ensure you have the following installed:

Python 3.7+
Docker Desktop

Setup Instructions

Install Docker Desktop: Download and install Docker Desktop from here. This application will allow you to build, share, and run containerized python code on you local environment. It provides a streamlined way to manage Docker containers and images, making it easier to develop, test, and deploy applications in a consistent environment.
Directory Structure:

User Churn Prediction/
│
├── ml/
│   ├── code/
│   │   ├── infer/
│   │   │   ├── Dockerfile
│   │   │   ├── infer.py
│   │   │   ├── push_docker.cmd
│   │   │   ├── requirements.txt
│   │   │   ├── test_docker.sh
│   │   │   ├── test_docker.cmd
│   │   │   ├── push_docker.sh
│   │   │   ├── run_docker.sh
│   │   │   └── run_docker.cmd
│   │   │
│   │   └── train/
│   │       ├── Dockerfile
│   │       ├── push_docker.cmd
│   │       ├── push_docker.sh
│   │       ├── requirements.txt
│   │       ├── run_docker.cmd
│   │       ├── run_docker.sh
│   │       └── train.py
│   │
│   ├── input/
│   │   └── data/
│   │       ├── infer/
│   │       │   └── {Training Data.csv}
│   │       └── train/
│   │           └── {Inference Data.csv}
│   │
│   ├── model/
│   └── output/
│       └── data/
│           ├── audiences/
│           └── log/
│
└── SQL/
    └── amc_sql.sql

How to Run

Generating Synthetic Data

Use the training.sql and inference.sql queries in the AMC Sandbox environment. Ensure that the _for_audiences suffix is removed during synthetic data generation. Apply different time windows for training and inference data.
Once the synthetic data is available please have them saved in the below paths in csv format
```
Training = /ml/input/train/
Inference = /ml/input/infer/
```

Initiating Docker Desktop

Make sure the Docker Desktop is Started and Running properly
If there is any issues please restart the Docker Desktop before proceeding

Configuring the model code and paths

{Optional} If needed please update the train.py and infer.py to fit your Use Case
Update the run_docker.cmd in both train and infer section of the code with appropriate path from your local directory.
Update the push_docker.cmd in both train and infer section of the code with appropriate AWS Account IDs

Training the Model

Open CLI or Terminal and navigate to the Train section of the code
```
cd /ml/code/train/
```
Run run_docker.cmd (or run_docker.sh if your are using Mac) to train the model and save it locally:
```
run_docker.cmd
```
```
./run_docker.sh
```
Once satisfied with the execution of Training Container Run push_docker.cmd (or push_docker.sh if your are using Mac) to push the container to an ECR repository.
```
push_docker.cmd
```
```
./push_docker.sh
```
NOTE: If you are getting errors while running the .sh file use chmod to change the file to an executable

Performing Inference

Open CLI or Terminal and navigate to the Inference section of the code
```
cd /ml/code/infer/
```
Run run_docker.cmd (or run_docker.sh if your are using Mac) to start the Webserver:
```
run_docker.cmd
```
```
./run_docker.sh
```
Once the Webserver is live Run test_docker.cmd (or test_docker.sh if your are using Mac) to generate inference from the trained model and save it locally:
```
test_docker.cmd
```
```
./test_docker.sh
```
Once satisfied with the execution of Inference Container Run push_docker.cmd (or push_docker.sh if your are using Mac) to push the container to an ECR repository.
```
push_docker.cmd
```
```
./push_docker.sh
```

Key Components

train.py

Preprocesses Data: Scales numerical features and handles missing data.
Trains a Tabular Neural Network: Uses AutoGluon to optimize hyperparameters and improve performance.
Saves the Model: Stores the best-performing model, scalers, and other preprocessing artifacts for use during inference.

infer.py

Loads the Model: Retrieves the trained model and preprocessing artifacts.
Preprocesses New Data: Ensures the new data is consistent with the training data.
Generates Predictions: Outputs probabilities indicating the likelihood of high potentials.
Sorts Results: Ranks users based on their high potential probability.

Notes

SQL Query Consistency: Use the same SQL query for Training and Inference but apply different time windows.
Prediction Threshold: Adjust the threshold for classifying high-probability high potential customers based on your business needs.

Troubleshooting

Data Validation: Ensure input data does not contain NaN or infinite values before running the scripts.
Performance Optimization: Use a GPU for faster processing or adjust the batch size and number of epochs for large datasets.

Future Improvements

Experiment with additional models like Random Forests or Gradient Boosting for comparison.
Introduce advanced feature engineering techniques to capture deeper insights.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
API		API
SQL		SQL
ml		ml
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

High Potential Prediction Using Tabular Neural Networks

Project Structure

Prerequisites

Setup Instructions

How to Run

Generating Synthetic Data

Initiating Docker Desktop

Configuring the model code and paths

Training the Model

Performing Inference

Key Components

train.py

infer.py

Notes

Troubleshooting

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Pacvue/AMC_customized

Folders and files

Latest commit

History

Repository files navigation

High Potential Prediction Using Tabular Neural Networks

Project Structure

Prerequisites

Setup Instructions

How to Run

Generating Synthetic Data

Initiating Docker Desktop

Configuring the model code and paths

Training the Model

Performing Inference

Key Components

train.py

infer.py

Notes

Troubleshooting

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages