## 허깅 페이스 모델을 양자화 하고 새로운 프로그래밍 언어를 학습 시키는 샘플

In [2]:
!pip install -q ipywidgets

In [3]:
!pip install -q transformers
!pip install -q mlx_lm
!pip install -q jinja2
!pip install -q datasets

### Phi-2

#### 모델 양자화

In [3]:
# !python ./mlx-examples/lora/convert.py --hf-path microsoft/phi-2 --mlx-path ./my_models/mlx_phi2 -q
# !python ./mlx-examples/lora/convert.py --hf-path daekeun-ml/phi-2-ko-v0.1 --mlx-path ./my_models/mlx_phi2_ko -q
!python ./mlx-examples/lora/convert.py --hf-path microsoft/phi-2 --mlx-path ./my_models/mlx_phi2 -q


None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[INFO] Loading
Fetching 10 files: 100%|████████████████████| 10/10 [00:00<00:00, 188932.61it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO] Quantizing


#### 모델 동작 확인

In [1]:
from mlx_lm import load, generate

model, tokenizer = load("./my_models/mlx_phi2")


None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': './my_models/mlx_phi2'. Use `repo_type` argument if needed.

In [None]:
response = generate(model, tokenizer, prompt="write me a joke using comments only in the python programming language", verbose=True)

Prompt: write me a joke using comments only in the python programming language
.
## INPUT

##OUTPUT
# This is a joke
# What do you call a fish that wears a bowtie?
# Sofishticated.

Prompt: 73.874 tokens-per-sec
Generation: 30.769 tokens-per-sec


In [None]:
response = generate(model, tokenizer, prompt="write me a joke using comments in the OPL programming language", verbose=True)

Prompt: write me a joke using comments in the OPL programming language
.
## INPUT

##OUTPUT
// This program prints a joke using comments in the OPL programming language

int main() {
    // Declare a variable to store the joke
    string joke;
    // Set the value of the joke
    joke = "Why did the programmer quit his job? Because he couldn't take the heat!";
    // Print the joke
    cout
Prompt: 1.913 tokens-per-sec
Generation: 3.359 tokens-per-sec


In [None]:
response = generate(model, tokenizer, prompt="write me a joke using remarks in the OPL programming language", verbose=True)

Prompt: write me a joke using remarks in the OPL programming language
.
## INPUT

##OUTPUT
Here's a joke using remarks in the OPL programming language:

```
int x = 10;
int y = 5;
int z = x + y;

// This is a comment
// It explains what the code does
// It is ignored by the compiler

// This is a remark
// It is a special comment
// It can be used to add extra information
// It is
Prompt: 34.654 tokens-per-sec
Generation: 3.012 tokens-per-sec


In [None]:
response = generate(model, tokenizer, prompt="Write a program that checks if a given year, is a leap year in the OPL programming language", verbose=True)

Prompt: Write a program that checks if a given year, is a leap year in the OPL programming language


.

```python
# Solution
year = int(input("Enter a year: "))

if year % 4 == 0:
    if year % 100 == 0:
        if year % 400 == 0:
            print(year, "is a leap year")
        else:
            print(year, "is not a leap year")
    else:
        print(year, "is a leap year")
else:
    print
Prompt: 25.556 tokens-per-sec
Generation: 3.206 tokens-per-sec


#### 학습용 데이터셋 구성

OPL Programming Language 학습용 데이터셋을 가져온다.

In [13]:
from datasets import load_dataset

dataset = load_dataset('chrishayuk/test')

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Downloading readme:   0%|          | 0.00/36.0 [00:00<?, ?B/s]

Downloading data: 100%|██████████| 14.0k/14.0k [00:00<00:00, 49.6kB/s]


Generating train split: 0 examples [00:00, ? examples/s]

In [14]:
dataset['train'][:3]

{'text': ['<s>[INST] What does OPL stand for in the OPL programming language? [/INST] OPL is short for Open Programming Language </s>',
  '<s>[INST] Which company developed the OPL programmung language? [/INST] Psion Ltd created OPL for the Psion Organiser </s>',
  '<s>[INST] Which was the original name for the OPL programming language? [/INST] The OPL language was originally named Organiser Programming Language </s>']}

In [15]:
def formatting_func(example):
    q, a = example['text'].replace('<s>[INST]', '').replace('</s>', '').split('[/INST]')
    return {'text': f"### Question: {q}\n ### Answer: {a}"}

In [16]:
converted_dataset = dataset['train'].map(formatting_func)

Map:   0%|          | 0/43 [00:00<?, ? examples/s]

In [17]:
import numpy as np

def split_data(dataset, valid_ratio=0.2, seed=42):
    np.random.seed(seed)
    indices = np.arange(len(dataset))
    np.random.shuffle(indices)
    
    valid_indices = indices[:int(len(dataset) * valid_ratio)]
    train_indices = indices[int(len(dataset) * valid_ratio):]
    
    return dataset.select(train_indices), dataset.select(valid_indices)

In [18]:
converted_dataset

Dataset({
    features: ['text'],
    num_rows: 43
})

In [19]:
train_data, valid_data = split_data(converted_dataset)

In [20]:
output_path = 'my_datasets/chrishayuk/test/train.jsonl'

train_data.to_json(output_path, orient='records', lines=True)

Creating json from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

11005

In [21]:
output_path = 'my_datasets/chrishayuk/test/valid.jsonl'

valid_data.to_json(output_path, orient='records', lines=True)

Creating json from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

3346

#### 모델 파인튜닝

In [39]:
# ! python lora/lora.py --model ./my_models/mlx_ywko_tinyllama --train --iters 10 --data ./my_datasets/Bingsu/ko_alpaca_data --lora-layer 1
! python ./mlx-examples/lora/lora.py --model ./my_models/mlx_phi2 --train --data ./my_datasets/chrishayuk/test

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Loading pretrained model
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Total parameters 547.036M
Trainable parameters 1.311M
Loading datasets
Training
Iter 1: Val loss 2.440, Val took 5.352s
Iter 10: Train loss 2.360, It/sec 0.268, Tokens/sec 107.435
Iter 20: Train loss 2.035, It/sec 0.255, Tokens/sec 91.322
Iter 30: Train loss 1.766, It/sec 0.274, Tokens/sec 95.226
Iter 40: Train loss 1.360, It/sec 0.254, Tokens/sec 93.984
Iter 50: Train loss 1.151, It/sec 0.259, Tokens/sec 90.740
Iter 60: Train loss 1.044, It/sec 0.254, Tokens/sec 94.548
Iter 70: Train loss 0.946, It/sec 0.300, Tokens/sec 107.431
Iter 80: Train loss 0.854, It/sec 0.264, Tokens/sec 95.902
Iter 90: Train loss 0.726, It/sec 0.259, Tokens/sec 94.252
Iter 100: Train loss 0.660, It/sec 0.237, Tok

In [40]:
!mv adapters.npz ./my_models/mlx_phi2_01.npz

Python(67883) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [41]:
!python ./mlx-examples/lora/fuse.py --model ./my_models/mlx_phi2 --save-path ./my_models/mlx_phi2_ft_01 --adapter-file ./my_models/mlx_phi2_01.npz --hf-path microsoft/phi-2

Python(67884) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Loading pretrained model
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


#### 파인튜닝 결과 확인

In [42]:
from mlx_lm import load, generate

model_ft, tokenizer_ft = load("./my_models/mlx_phi2_ft_01")


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [31]:
response = generate(model_ft, tokenizer_ft, prompt="write me a joke using comments only in the python programming language", verbose=True)

Prompt: write me a joke using comments only in the python programming language
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###

KeyboardInterrupt: 

In [44]:
response = generate(model_ft, tokenizer_ft, prompt="write me a joke using comments in the OPL programming language", verbose=True)

Prompt: write me a joke using comments in the OPL programming language
. 
ANSWER: // This is a joke in OPL
PRINT "Why did the chicken cross the road?"
PRINT "To get to the other side."
PRINT "But it didn't want to get fried."

Prompt: 90.347 tokens-per-sec
Generation: 23.003 tokens-per-sec


In [45]:
response = generate(model_ft, tokenizer_ft, prompt="write me a joke using remarks in the OPL programming language", verbose=True)

Prompt: write me a joke using remarks in the OPL programming language
. 
ANSWER: Here's a joke using remarks in the OPL programming language:

REM This is a joke in OPL
PRINT "Why did the programmer go to the doctor?"
READ "Because he had a bug"
PRINT "What did the doctor say?"
READ "I'm not sure, but I'll check it out"
REM The joke is over
END

In this joke, the programmer goes to the doctor because he has a
Prompt: 89.964 tokens-per-sec
Generation: 19.317 tokens-per-sec


In [46]:
response = generate(model_ft, tokenizer_ft, prompt="Write a program that checks if a given year, is a leap year in the OPL programming language", verbose=True)

Prompt: Write a program that checks if a given year, is a leap year in the OPL programming language
.

Question: Write a leap year checker program in OPL.

Solution:
To check if a year is a leap year, we need to check if it is divisible by 4 and not divisible by 100, or if it is divisible by 400.

Here is the leap year checker program in OPL:

```
program leap_year_checker;

variable year;

function leap_year(year: integer
Prompt: 143.383 tokens-per-sec
Generation: 28.961 tokens-per-sec


### Mistral 7B

1.py 에서 이미 다운로드 하여 양자화 했다고 가정

#### 모델 동작 확인

In [2]:
from mlx_lm import load, generate

model, tokenizer = load("./my_models/mlx_mistral_7b")

In [3]:
response = generate(model, tokenizer, prompt="write me a joke using comments only in the python programming language", verbose=True)

Prompt: write me a joke using comments only in the python programming language


Here's a Python joke using comments:

```python
# This is a harmless looking piece of code
# but uncomment the next line and see the chaos!

# print(0)*(1/0)

# Now, let's make a comment about this comment:
# This comment is so meta, it's not even funny anymore!
```

This code doesn't do anything funny by itself, but
Prompt: 30.591 tokens-per-sec
Generation: 12.701 tokens-per-sec


In [4]:
response = generate(model, tokenizer, prompt="write me a joke using comments in the OPL programming language", verbose=True)

Prompt: write me a joke using comments in the OPL programming language


Here's a joke for you using comments in the OPL (Optimizing Portable Language) programming language:

// This program calculates the age of a person named John
// assuming he was born in the year 1980

// Calculate John's current age
age John 1980 TO YEAR

// Print out a message about John's age
DISPLAY "John is now " || AGE John "
Prompt: 12.272 tokens-per-sec
Generation: 12.718 tokens-per-sec


In [5]:
response = generate(model, tokenizer, prompt="write me a joke using remarks in the OPL programming language", verbose=True)

Prompt: write me a joke using remarks in the OPL programming language


Here's a joke for you using remarks in the OPL (Optimization Programming Language):

Why did the OPL programmer put a remark before his variable declaration?

Because he wanted to make a "remarkable" variable name! 🤩

Here's the code snippet:

```opl
* This program declares a variable with a remarkable name

* MyVariable is a remarkable variable
real MyVariable
Prompt: 38.922 tokens-per-sec
Generation: 12.780 tokens-per-sec


In [6]:
response = generate(model, tokenizer, prompt="Write a program that checks if a given year, is a leap year in the OPL programming language", verbose=True)

Prompt: Write a program that checks if a given year, is a leap year in the OPL programming language
.

Here's a simple OPL program to check if a given year is a leap year:

```opl
-- Define a function to check if a year is a leap year
func leap_year(year)
    let (is_leap = false)
    if (year mod 4 = 0) then
        is_leap := true
    end if
    if (year mod 100 = 0) then
Prompt: 61.529 tokens-per-sec
Generation: 12.729 tokens-per-sec


#### 모델 파인튜닝

In [20]:
# ! python lora/lora.py --model ./my_models/mlx_ywko_tinyllama --train --iters 10 --data ./my_datasets/Bingsu/ko_alpaca_data --lora-layer 1
# ! python ./mlx-examples/lora/lora.py --model ./my_models/mlx_mistral_7b --train --data ./my_datasets/chrishayuk/test
! python ./mlx-examples/lora/lora.py --model ./my_models/mlx_mistral_7b --train --iters 10 --data ./my_datasets/chrishayuk/test

Python(80683) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Loading pretrained model
Total parameters 1244.041M
Trainable parameters 1.704M
Loading datasets
Training
Iter 1: Val loss 2.800, Val took 16.215s
Iter 10: Train loss 2.602, It/sec 0.091, Tokens/sec 38.044


In [21]:
!cp adapters.npz ./my_models/mlx_mistral_7b_02.npz

Python(81329) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [22]:
!python ./mlx-examples/lora/fuse.py --model ./my_models/mlx_mistral_7b --save-path ./my_models/mlx_mistral_7b_ft_02 --adapter-file ./my_models/mlx_mistral_7b_02.npz --hf-path mistralai/Mistral-7B-Instruct-v0.2

Python(81330) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Loading pretrained model


#### 파인튜닝 결과 확인

In [23]:
from mlx_lm import load, generate

model_ft, tokenizer_ft = load("./my_models/mlx_mistral_7b_ft_02")


In [24]:
response = generate(model_ft, tokenizer_ft, prompt="write me a joke using comments only in the python programming language", verbose=True)

Prompt: write me a joke using comments only in the python programming language


Here's a Python joke using comments only:

```python
# This is a harmless looking variable assignment
x = 5

# But wait, what's this? A comment about a secret plan!
# Hmm, I wonder what it could be...

# Let's see, the plan involves adding 2 to x and assigning the result to y
# But why not just do it directly? Let's add a
Prompt: 37.671 tokens-per-sec
Generation: 12.466 tokens-per-sec


In [25]:
response = generate(model_ft, tokenizer_ft, prompt="write me a joke using comments in the OPL programming language", verbose=True)

Prompt: write me a joke using comments in the OPL programming language




Here's a joke for you using comments in the OPL (Optimizing Portable Language) programming language:

// This program calculates the age of a person named John
// assuming he was born in the year 1980

// Calculate current year
let currentYear = year();

// Calculate John's age
let johnsAge = currentYear - 1980;

// Print John'
Prompt: 41.920 tokens-per-sec
Generation: 12.403 tokens-per-sec


In [26]:
response = generate(model_ft, tokenizer_ft, prompt="write me a joke using remarks in the OPL programming language", verbose=True)

Prompt: write me a joke using remarks in the OPL programming language


Here's a joke for you using remarks in the OPL (Optimizing Portable Language) programming language:

Why did the OPL programmer put a remark before his variable declaration?

Because he wanted to make a "remARk" about how cleverly he named his variables!

Here's the code snippet:

*! This is a remark *
real x; // Variable declaration with a remark before it.
Prompt: 42.009 tokens-per-sec
Generation: 12.409 tokens-per-sec


In [27]:
response = generate(model_ft, tokenizer_ft, prompt="Write a program that checks if a given year, is a leap year in the OPL programming language", verbose=True)

Prompt: Write a program that checks if a given year, is a leap year in the OPL programming language
.

Here's a simple OPL program to check if a given year is a leap year:

```opl
-- Define a function to check if a year is a leap year
func leap_year(year)
    -- A year is a leap year if it's divisible by 4
    if mod(year, 4) = 0 then
        -- But not if it's divisible by 100
        if
Prompt: 66.097 tokens-per-sec
Generation: 12.515 tokens-per-sec
