# SolverX Knowledge Injection Verification (Gemma-2-27B)
This notebook demonstrates the effectiveness of LoRA fine-tuning on the Gemma-2-27B-it model.
We will compare the **Base Model** (Pre-trained) vs. **Fine-tuned Model** (LoRA Adapted) across three categories:
1.  **General Knowledge**: To verify that the model hasn't lost its original capabilities (Catastrophic Forgetting check).
2.  **Injected Knowledge**: To verify that the model correctly learned the new facts about "SolverX".
3.  **Hallucination Check**: To see how the model behaves when asked about untrained details.

In [30]:
import mlx.core as mx
from mlx_lm import load, generate
import pandas as pd
from IPython.display import display, HTML

# Configuration
MODEL_PATH = "mlx-community/gemma-2-27b-it-4bit"
ADAPTER_PATH = "adapters_27b"

# Helper function to generate response
def get_response(model, tokenizer, question):
    prompt = f"<start_of_turn>user\n{question}<end_of_turn>\n<start_of_turn>model\n"
    response = generate(model, tokenizer, prompt=prompt, max_tokens=200, verbose=False)
    return response.strip()

In [31]:
# Define Test Questions
questions = [
    {"category": "General Knowledge", "question": "대한민국의 수도는 어디인가요?"},
    {"category": "General Knowledge", "question": "파이썬에서 리스트를 정렬하는 함수는?"},
    {"category": "Injected Knowledge", "question": "SolverX의 본사는 어디에 위치하고 있나요?"},
    {"category": "Injected Knowledge", "question": "SolverX Fusion 제품의 주요 특징은 무엇인가요?"},
    {"category": "Hallucination Check", "question": "SolverX의 직원 복지 혜택 3가지를 알려줘."}
]

results = []

## 1. Base Model Evaluation
First, we load the original **Gemma-2-9b-it** model without any adapters. This represents the model's state before learning about SolverX.

In [32]:
print(f"Loading Base Model: {MODEL_PATH}")
model, tokenizer = load(MODEL_PATH)

print("Generating responses with Base Model...")
base_responses = []
for item in questions:
    ans = get_response(model, tokenizer, item["question"])
    base_responses.append(ans)
    print(f".", end="", flush=True)

print("\nBase Model evaluation complete.")

# Free memory (optional, but good practice if memory is tight)
del model
import gc
gc.collect()

Loading Base Model: mlx-community/gemma-2-27b-it-4bit


Fetching 8 files: 100%|██████████| 8/8 [00:00<00:00, 108240.10it/s]


Generating responses with Base Model...
.....
Base Model evaluation complete.


0

## 2. Fine-tuned Model Evaluation
Now, we load the model **with the LoRA adapters** trained on SolverX data.

In [33]:
print(f"Loading Fine-tuned Model with Adapters: {ADAPTER_PATH}")
model, tokenizer = load(MODEL_PATH, adapter_path=ADAPTER_PATH)

print("Generating responses with Fine-tuned Model...")
ft_responses = []
for item in questions:
    ans = get_response(model, tokenizer, item["question"])
    ft_responses.append(ans)
    print(f".", end="", flush=True)

print("\nFine-tuned Model evaluation complete.")

Loading Fine-tuned Model with Adapters: adapters_27b


Fetching 8 files: 100%|██████████| 8/8 [00:00<00:00, 102300.10it/s]


Generating responses with Fine-tuned Model...
.....
Fine-tuned Model evaluation complete.


## 3. Visual Comparison
We will now display the results side-by-side to visualize the impact of knowledge injection.

In [34]:
# Create DataFrame
df = pd.DataFrame({
    "Category": [q["category"] for q in questions],
    "Question": [q["question"] for q in questions],
    "Base Model Answer": base_responses,
    "Fine-tuned Model Answer": ft_responses
})

# Style the DataFrame for better visualization
def highlight_diff(row):
    styles = [''] * len(row)
    if row['Category'] == 'Injected Knowledge':
        # Highlight Fine-tuned answer in green for injected knowledge
        styles[3] = 'background-color: #d4edda; color: #155724; font-weight: bold;'
        styles[2] = 'background-color: #f8d7da; color: #721c24;' # Red for base model failure
    elif row['Category'] == 'Hallucination Check':
        # Highlight potential hallucination in yellow
        styles[3] = 'background-color: #fff3cd; color: #856404;'
    return styles

# Apply styling
styled_df = df.style.apply(highlight_diff, axis=1).set_properties(**{'text-align': 'left', 'white-space': 'pre-wrap'})

# Display
display(HTML("<h3>Comparison Results</h3>"))
display(styled_df)

Unnamed: 0,Category,Question,Base Model Answer,Fine-tuned Model Answer
0,General Knowledge,대한민국의 수도는 어디인가요?,,
1,General Knowledge,파이썬에서 리스트를 정렬하는 함수는?,,
2,Injected Knowledge,SolverX의 본사는 어디에 위치하고 있나요?,,
3,Injected Knowledge,SolverX Fusion 제품의 주요 특징은 무엇인가요?,,
4,Hallucination Check,SolverX의 직원 복지 혜택 3가지를 알려줘.,,


In [35]:
for i, row in df.iterrows():
    print(f"Q: {row['Question']}")
    print(f"A (FT): {row['Fine-tuned Model Answer'][:100]}...") # Truncate to avoid size limit
    print("-" * 20)

Q: 대한민국의 수도는 어디인가요?
A (FT): <pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>...
--------------------
Q: 파이썬에서 리스트를 정렬하는 함수는?
A (FT): <pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>...
--------------------
Q: SolverX의 본사는 어디에 위치하고 있나요?
A (FT): <pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>...
--------------------
Q: SolverX Fusion 제품의 주요 특징은 무엇인가요?
A (FT): <pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>...
--------------------
Q: SolverX의 직원 복지 혜택 3가지를 알려줘.
A (FT): <pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>...
--------------------
