# AutoGluon Assistant - Quick Start

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/autogluon/autogluon-assistant)
[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://github.com/autogluon/autogluon-assistant)

(Links above are still WIP)

In this tutorial, we will see how to use AutoGluon Assistant (AG-A) to solve machine learning problems **with zero line of code**. AG-A combines the power of AutoGluon's state-of-the-art AutoML capabilities with Large Language Models (LLMs) to automate the entire data science pipeline.

We will cover:
- Setting up AutoGluon Assistant
- Preparing your data
- Running your first ML project using a toy version of Titanic dataset
- Understanding the output and predictions
- Customizing the configuration for better results

By the end of this tutorial, you'll be able to transform your data and problem description into highly accurate ML solutions using just natural language instructions. Let's get started with the installation!

## Setting up AutoGluon Assistant
Getting started with AutoGluon Assistant is straightforward. Let's install it directly using pip:

In [None]:
!pip install git+https://github.com/autogluon/autogluon-assistant.git#egg=autogluon-assistant[dev]

AutoGluon Assistant supports two LLM providers: AWS Bedrock (default) and OpenAI. Choose one of the following setups:

In [4]:
import os

# Option A: AWS Bedrock (Recommended)
os.environ['BEDROCK_API_KEY'] = '4509...'
os.environ['AWS_DEFAULT_REGION'] = '<your-region>'
os.environ['AWS_ACCESS_KEY_ID'] = '<your-access-key>'
os.environ['AWS_SECRET_ACCESS_KEY'] = '<your-secret-key>'

### OR ###

# Option B: OpenAI
os.environ['OPENAI_API_KEY'] = 'sk-...'

*Note: If using OpenAI, we recommend a paid API key rather than a free-tier account to avoid rate limiting issues.*

Let's verify the installation by importing the package:

In [None]:
import autogluon_assistant
print(autogluon_assistant.__version__)


Now that you have AutoGluon Assistant installed and configured, let's move on to preparing your data directory structure for your first ML project!

## Example Data

For this tutorial, we'll use the classic Titanic dataset which is perfect for getting started with machine learning. The goal is to predict whether a passenger survived based on their characteristics such as age, gender, ticket class, and other features. We sampled 1000 training and test examples from the original data. The sampled dataset make this tutorial run quickly, but AutoGluon Assistant can handle the full dataset if desired.

In [None]:
import requests, os

# Create directory and download example files
os.makedirs("./toy_data", exist_ok=True)
for f in ["train.csv", "test.csv", "descriptions.txt"]:
    open(f"toy_data/{f}", "wb").write(
        requests.get(f"https://raw.githubusercontent.com/autogluon/autogluon-assistant/main/toy_data/{f}").content
    )

That's it! We now have:

- `train.csv`: Training data with labeled examples
- `test.csv`: Test data for making predictions
- `descriptions.txt`: A description of the dataset and task

Let's take a quick look at our training data:

In [None]:
import pandas as pd
train_data = pd.read_csv("toy_data/train.csv")
train_data.head()

## Using AutoGluon Assistant

Now that we have our data ready, let's use AutoGluon Assistant to build our ML model. The simplest way to use AutoGluon Assistant is through the command line - no coding required! After installing the package, you can run it directly from your terminal:

In [None]:
#TODO: remove the requirement of config files
!autogluon-assistant ./toy_data

Let's also look at how to use AutoGluon Assistant programmatically in Python:

In [None]:
from autogluon_assistant import AutogluonAssistant

# Initialize the assistant
assistant = AutogluonAssistant()

# Run the assistant
output_file = assistant.predict(data_dir="./toy_data")

Model fitting should take a few minutes or less depending on your CPU. You can make training faster by specifying the `time_limit` argument. For example, `fit(..., time_limit=60)` will stop training after 60 seconds. Higher time limits will generally result in better prediction performance, and excessively low time limits will prevent AutoGluon from training and ensembling a reasonable set of models.



## Prediction

Once we have a predictor that is fit on the training dataset, we can load a separate set of data to use for prediction and evaulation.

In [None]:
test_data = TabularDataset(f'{data_url}test.csv')

y_pred = predictor.predict(test_data.drop(columns=[label]))
y_pred.head()

## Evaluation

We can evaluate the predictor on the test dataset using the `evaluate()` function, which measures how well our predictor performs on data that was not used for fitting the models.

In [None]:
predictor.evaluate(test_data, silent=True)

AutoGluon's `TabularPredictor` also provides the `leaderboard()` function, which allows us to evaluate the performance of each individual trained model on the test data.

In [None]:
predictor.leaderboard(test_data)

## Conclusion

In this quickstart tutorial we saw AutoGluon's basic fit and predict functionality using `TabularDataset` and `TabularPredictor`. AutoGluon simplifies the model training process by not requiring feature engineering or model hyperparameter tuning. Check out the in-depth tutorials to learn more about AutoGluon's other features like customizing the training and prediction steps or extending AutoGluon with custom feature generators, models, or metrics.