### Installation
The first step is to install the `lema` module and its dependencies.


Once we are out of stealth, and the package is published on PyPi, we can simply do: `pip install lema[all]`


However for now, since the repo is sill private we need to use a workaround:
- **Manual upload**: The simplest option is to manually upload the zipped repo, either to Google Drive or the colab filesystem directly.
    - If you choose this option you can skip to step 3.
- **Git pull with read token**: A more convenient alternative is to generate a read-only github token for the repo.
    - The setup only needs to be done once, and after that you can quickly pull new code changes.

#### 1. Setting up read-only github token
Since the Github repository is private, we need to generate a `read-only` user token scoped for the `lema` repo.
1. In Github.com, go to `Settings -> Developer settings -> Personal access tokens -> Generate new token`
2. See example [here](https://drive.google.com/file/d/1zxd8r7qkPfl34mfGK83m_13oLGFGghW1/view?usp=share_link) on how to fill the form. The only permission that should be granted is `Contents`, in `read-only` mode
3. Add the github token to your colab environment secrets (Key icon in the left menu)

This only needs to be done once!

#### 2. Cloning LeMa repository

In [None]:
from google.colab import userdata

github_repo_token = userdata.get("repo-token")  # Setup token in your notebook secrets
github_username = "<GITHUB_USERNAME>"  # Change your github username

!git clone https://$github_username:$github_repo_token@github.com/openlema/lema.git

#### 3. Installing LeMa module & dependencies

In [None]:
!pip install -e lema[all]

## Training
Make sure to enable GPU runtime for faster training

#### Using `lema` module

In [None]:
import lema

In [None]:
lema.train(
    model_name="microsoft/Phi-3-mini-4k-instruct",
    dataset_name="yahma/alpaca-cleaned",
    preprocessing_function_name="alpaca",
    output_dir="train/",
    trust_remote_code=True,
)

#### Using `lema` CLI

In [1]:
!lema-train \
    "data.dataset_name=yahma/alpaca-cleaned" \
    "data.preprocessing_function_name=alpaca" \
    "data.trainer_kwargs.dataset_text_field=prompt" \
    "model.model_name=microsoft/Phi-3-mini-4k-instruct" \
    "model.trust_remote_code=true" \
    "training.output_dir=train/"

## Evaluation

#### Using `lema` module

In [None]:
lema.evaluate(
    model_name="train/best.pt",  # model output
    dataset_name="yahma/alpaca-cleaned",
    preprocessing_function_name="alpaca",
    output_dir="eval/",
    trust_remote_code=True,
)

#### Using `lema` CLI

In [None]:
!lema-evaluate \
    "data.dataset_name=yahma/alpaca-cleaned" \
    "data.preprocessing_function_name=alpaca" \
    "data.trainer_kwargs.dataset_text_field=prompt" \
    "model.model_name=microsoft/Phi-3-mini-4k-instruct" \
    "model.trust_remote_code=true" \
    "training.output_dir=eval/"