## Project Overview

InstructLab uses a novel synthetic data-based alignment tuning method for Large Language Models (LLMs.) The "**lab**" in Instruct**Lab** stands for **L**arge-Scale **A**lignment for Chat**B**ots.

It is an outgrowth of the paper [*LAB: Large-Scale Alignment for ChatBots*](https://arxiv.org/abs/2403.01081).

### Getting Started

This notebook represents one step in the InstructLab pipeline – to see what else is involved, please check out https://github.com/instructlab/instructlab

## Overview of this Notebook

This notebook is a starting point for someone to use `instructlab` through a Red Hat Openshift AI workbench terminal, taking advantage of GPU instances.

The notebook focuses on the following steps:
* [Initializing instructlab](https://github.com/instructlab/instructlab?tab=readme-ov-file#%EF%B8%8F-initialize-ilab).
* [Downloading the model](https://github.com/instructlab/instructlab?tab=readme-ov-file#-download-the-model)
* [Serving the model](https://github.com/instructlab/instructlab?tab=readme-ov-file#-serving-the-model)
* [Chatting with the model](https://github.com/instructlab/instructlab?tab=readme-ov-file#-serving-the-model)
* [Contributing a knowledge](https://github.com/instructlab/instructlab?tab=readme-ov-file#-contribute-knowledge-or-compositional-skills)
* [Generating a synthetic dataset](https://github.com/instructlab/instructlab?tab=readme-ov-file#-generate-a-synthetic-dataset)
* [Training the model](https://github.com/instructlab/instructlab?tab=readme-ov-file#-training-the-model)
* [Serve the new model](https://github.com/instructlab/instructlab?tab=readme-ov-file#-serve-the-newly-trained-model)

***IMPORTANT***: make sure your notebook uses GPUs.

## Open a Terminal
* TODO: Instructions on how to open a terminal

## Create instructlab dir
* Create a new directory called `instructlab` to store the files the ilab CLI needs when running.

## Initialize InstructLab

* The notebook supposes that `InstructLab` is already installed in your system.
* If it's not, please follow these [installation notes](https://github.com/instructlab/instructlab?tab=readme-ov-file#-installing-ilab)
* Run the following command in the terminal you've opened:

In [None]:
# go to the instructlab/ dir
cd instructlab/

# initialize the configuration of ilab
ilab config init

## Download the model
* While on the terminal window you have already opened, run the following command to download the default model.

In [None]:
ilab model download

* In case you'd like to use a specific model, then type:

In [None]:
HF_TOKEN=<YOUR HUGGINGFACE TOKEN GOES HERE> ilab model download --repository=<hg-repo> --filename=<hg-model>

## Serve the model
* In order to serve the model run:

In [None]:
ilab model serve

* In case you want to use an alternative/specific model then run:

In [None]:
ilab model serve --model-path models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf

## Chat with the model
* You can chat with the model by running:

In [None]:
ilab model chat

## Contribute a knowledge
* Clone the taxonomy repo:


In [None]:
git clone https://github.com/instructlab/taxonomy.git


* Follow the instructions from `instructlab` [taxonomy docs](https://github.com/instructlab/taxonomy/blob/main/README.md) and add the knowledge you'd like to this repo.

## Generate a sythetic dataset
* To generate a synthetic dataset based on your newly added knowledge or skill set in taxonomy repository, run the following command:

In [None]:
ilab data generate

* To use a non-default model run:

In [None]:
ilab data generate --model <your-model>


## Train the model
* Stop any `chat` or `serve` terminals.
* Now that you have generated a synthetic dataset you're ready to train your model by running:

In [None]:
ilab model train --device=cuda

*Note that this might take a while*

## Test & Validate the updates
* You can test the model by running:

In [None]:
ilab model test

* Serve the new model again by running:

In [None]:
# convert the trained model
ilab model convert

# serve again the model
ilab model serve --model-path <new-model>

# on a separate terminal
ilab model chat -m <new-model-name>