## FastAI tutorial

### 1. Introduction to FastAI

The `fastai` library is a high-level deep learning framework built on top of **PyTorch**, designed to make training neural networks **fast**, **easy**, and **accessible**, without sacrificing flexibility and performance. It is widely used for both education and production purposes due to its thoughtful design and strong abstractions.

**Key Features**
- High-level API: Simplifies model training for vision, text, tabular, and collaborative filtering.
- Transfer Learning: Makes it easy for fine-tune state-of-the-are pre-trained models.
- Less Code, More Power: Most tasks can be performed with just a few lines of code.

**Installation**
```bash
pip install fastai
```

---

### 2. Core Concepts
To use `fastai` effectively, it is important to understand its **core abstractions**. These include:

#### 2.1) The `Learner`
The `Learner` is the heart of `fastai`. It wraps a model, the data, and a loss function into a single object that can be trained, evaluated, and exported easily.

```python
learn = vision_learner(dls, resnet34, metrics=accuracy)
learn.fine_tune(3)
```

The `Learner` handles:
- Model creation
- Training loops
- Logging and metrics
- Callbacks
- Model saving/loading

#### 2.2) `DataLoaders` and `DataBlock`
`fastai` simplifies data preparation through:
- `DataBlock`: a flexible pipeline builder to transform raw data into a `DataLoaders` object.
- `DataLoaders`: wraps PyTorch dataloaders for training and evaluation sets.

**Example using `DataBlock`**:
```python
dblock = DataBlock(
    blocks = (ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(),
    get_y=parent_label,
    item_tfms=Resize(224)
)
dls = dblock.dataloaders(path)
```

Or use a shortcut:
```python
dls = ImageDataLoaders.from_folder(path, valid_pct=0.2)
```

#### 2.3) Callbacks and Metrics
`fastai`'s `Learner` supports callbacks that can modify training behaviour - e.g., saving the best model, early stopping, or scheduling learning rates.

```python
from fastai.callback.tracker import SaveModelCallback
learn.fine_tune(5, cbs=SaveModelCallback(monitor='accuracy'))
```

Metrics like `accuracy`, 'error_rate', and `rmse` can be added during `Learner` creation.

#### 2.4) Transfer Learning
Most vision and NLP tasks start with a pre-trained model. `fastai` makes transfer learning simple:
```python
learn = vision_learner(dls, resnet50, metrics=accuracy)
learn.fine_tune(4)
```

`fastai` handles:
- Unfreezing layers gradualy
- Discriminative learning rates
- Automatic normalization

#### 2.5) The `fastai` library structure
`fastai` is modular. The most used modules are:
- `fastai.vision`: image classification, segmentation, etc.
- `fastai.text`: NLP models and datasets
- `fastai.tabular`: structured/tabular data
- `fastai.callback`: training behaviors and utilities
- `fastai.data`: data handling tools
- `fastcore` and `fastdownload`: underlying utility libraries

---

### 3. Image Classification

Image classification is one of the most common and well-supported tasks in `fastai`. It allows you to train a model to recognize images from different categories - for example, distinguishing between cats and dogs or identifying types of diseases from X-rays.

#### 3.1) Prepare Data
Your image dataset should be organized like this:

<pre>
dataset/
├── train/
│   ├── class1/
│   ├── class2/
│   └── ...
</pre>

Each class has its own folder containing images. Alternatively, you can use a CSV file with image paths and labels.

#### 3.2) Load data with `ImageDataLoaders`
You can quickly load your data with:

```python
from fastai.vision.all import *

path = Path('/path/yo/your/data/')

dls = ImageDataLoaders.from_folder(
    path,
    valid_pct=0.2,
    item_tfms=Resize(224),
    bs=64
)
```

Or build a more customized pipeline using `DataBlock`:

```python
dblock = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(seed=42),
    get_y=parent_label,
    item_tfms=Resize(224),
    batch_tfms=aug_transform()
)
dls = dblock.dataloaders(path)
```

#### 3.3) Train a Model with `vision_learner`
Now let's create a model using transfer learning:

```python
learn = vision_learner(dls, resnet34, metrics=accuracy)
learn.fine_tune(3)
```

- `resnet34` is the backbone (you can also use `resnet18`, `resnet50`, etc.)
- `fine_tune(3)` first freezes the pre-trained layers, then unfreezes and trains the whole network.

#### 3.4) Evaluate the Model
to see how your model is doing:

```python
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()
interp.plot_top_losses(5)
```

These tools help identify which classes your model struggles with.

#### 3.5) Make Predictions

To use your model on a new image:

```python
img = PILImage.create('/path/to/image.jpg')
pred_class, pred_idx, probs = learn.predict(img)
print(f"Prediction: {pred_class}, Probability: {prob[pred_idx]:.4f}")
```

#### 3.6) Export and Reload the Model
To save your model for later inference:

```python
learn.export('model.pkl')
```

To load it in a new session or environment:

```python
learn_inf = load_learner('model.pkl')
learn_inf.predict(PILImage.create('path/to/image.jpg'))
```

---

### 4. Text Classification / NLP

Text classification is a foundational task in natural language processing (NLP), such as categorizing tweets by sentiment or news articles by topic. fastai offers an intuitive pipeline built on top of powerful language models like AWD-LSTM.


#### 4.1) Prepare Your Text Dataset

For text classification, your dataset should typically be in CSV format, like so:

| text                          | label    |
| ----------------------------- | -------- |
| "I love this movie!"          | positive |
| "This is a terrible product." | negative |

Let’s assume the CSV is named `texts.csv`.


#### 4.2) Load the Data

fastai provides `TextDataLoaders.from_df` to load the data directly from a dataframe:

```python
from fastai.text.all import *

df = pd.read_csv('texts.csv')

dls = TextDataLoaders.from_df(df, text_col='text', label_col='label', valid_pct=0.2)
```

You can also use `from_csv` if you're loading directly from a file:

```python
dls = TextDataLoaders.from_csv(Path('.'), 'texts.csv', text_col='text', label_col='label')
```


#### 4.3) Train a Text Classifier

Training a language model from scratch is resource-intensive, so fastai uses **transfer learning** with pre-trained language models like AWD-LSTM.

```python
learn = text_classifier_learner(dls, AWD_LSTM, metrics=accuracy)
learn.fine_tune(4)
```

This process involves:

1. Fine-tuning a pre-trained language model on your dataset (optional but recommended).
2. Training a classification head on top of it.


#### 4.4) Interpret the Results

After training, you can view model predictions and confusion matrix:

```python
learn.show_results()
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()
interp.print_classification_report()
```

#### 4.5) Make Predictions

To predict the class of a new sentence:

```python
learn.predict("This phone is amazing!")
```

Returns a tuple:

* Predicted class
* Class index
* Probabilities for each class

#### 4.6) Save and Load the Model

Export and reuse your trained model like this:

```python
learn.export('text_classifier.pkl')
```

Then later:

```python
learn_inf = load_learner('text_classifier.pkl')
learn_inf.predict("This app is not working well.")
```

---

### 5. Tabular Data

Tabular data (i.e. data in spreadsheet form) is extremely common in real-world business applications, such as predicting customer churn, loan defaults, or house prices. fastai provides specialized support for working with this kind of structured data.


#### 5.1) Prepare the Dataset

Tabular datasets are usually CSV files with a mix of categorical and continuous variables. For example:

| age | gender | income | purchased |
| --- | ------ | ------ | --------- |
| 25  | male   | 40000  | no        |
| 30  | female | 60000  | yes       |

Here:

* `gender` is **categorical**.
* `age`, `income` are **continuous**.
* `purchased` is the **target** (the dependent variable).


#### 5.2) Load the Data

You define which columns are categorical, which are continuous, and what column to predict:

```python
from fastai.tabular.all import *

path = Path('path/to/data')
df = pd.read_csv(path/'data.csv')

# Define columns
dep_var = 'purchased'
cat_names = ['gender']
cont_names = ['age', 'income']

# Split the data
splits = RandomSplitter(valid_pct=0.2)(range_of(df))

# Create DataLoaders
to = TabularPandas(df, 
                   procs=[Categorify, FillMissing, Normalize],
                   cat_names=cat_names,
                   cont_names=cont_names,
                   y_names=dep_var,
                   splits=splits)

dls = to.dataloaders()
```


#### 5.3) Train a Tabular Model

Now you can create and train the model:

```python
learn = tabular_learner(dls, metrics=accuracy)
learn.fit_one_cycle(5)
```

You can also use `roc_auc_score`, `rmse`, or other metrics depending on your task (classification or regression).


#### 5.4) Interpret the Model

fastai provides tools to interpret which features are most important:

```python
learn.show_results()
learn.model
learn.predict(df.iloc[0])

# Feature importance
learn.interp = learn.interpret()
learn.interp.plot_feature_importance()
```

#### 5.5) Save and Use the Model

As with other fastai models:

```python
learn.export('tabular_model.pkl')

learn_inf = load_learner('tabular_model.pkl')
learn_inf.predict(df.iloc[0])
```

---