# NewAIBench Experiment Runner Tutorial

Notebook này sẽ hướng dẫn bạn từng bước để chạy experiment với NewAIBench framework, từ việc pull git repository đến khi hoàn thành experiment.

## Tổng quan
- Clone repository và cập nhật code mới nhất
- Cài đặt dependencies
- Tạo custom YAML configuration
- Chạy experiment và phân tích kết quả

⚠️ **Lưu ý**: Notebook này được thiết kế để chạy trong environment đã có Python và Git.

In [None]:
rm -r /kaggle/working/new_bench

In [None]:
cd /kaggle/working/

In [None]:
!git clone https://github.com/voicon324/new_bench.git

In [None]:
import os
import sys
import subprocess
import yaml
import json
from pathlib import Path
import datetime
from IPython.display import Markdown, display

# Thiết lập working directory
WORK_DIR = "/kaggle/working/new_bench"
os.chdir(WORK_DIR)
print(f"Working directory: {os.getcwd()}")

## 1. Clone Repository và Cập nhật Code

Đầu tiên, chúng ta sẽ clone repository (nếu chưa có) hoặc pull code mới nhất từ remote repository.

In [None]:
# Kiểm tra xem đã có git repository chưa
if os.path.exists('.git'):
    print("✅ Git repository đã tồn tại")
    # Pull latest changes
    try:
        result = subprocess.run(['git', 'status', '--porcelain'], 
                              capture_output=True, text=True, check=True)
        if result.stdout.strip():
            print("⚠️ Có changes chưa commit:")
            print(result.stdout)
            print("\nBạn có thể stash changes trước khi pull:")
            print("git stash")
        else:
            print("Working directory clean, pulling latest changes...")
            pull_result = subprocess.run(['git', 'pull'], 
                                       capture_output=True, text=True, check=True)
            print(f"✅ Git pull completed: {pull_result.stdout}")
    except subprocess.CalledProcessError as e:
        print(f"❌ Git operation failed: {e}")
else:
    print("❌ Không tìm thấy git repository")
    print("Nếu bạn cần clone repository, uncomment và chạy lệnh dưới đây:")
    print("# git clone <repository_url> .")

In [None]:
# Hiển thị thông tin branch hiện tại
try:
    branch_result = subprocess.run(['git', 'branch', '--show-current'], 
                                 capture_output=True, text=True, check=True)
    current_branch = branch_result.stdout.strip()
    print(f"🔄 Current branch: {current_branch}")
    
    # Hiển thị commit gần nhất
    commit_result = subprocess.run(['git', 'log', '-1', '--oneline'], 
                                 capture_output=True, text=True, check=True)
    latest_commit = commit_result.stdout.strip()
    print(f"📝 Latest commit: {latest_commit}")
except subprocess.CalledProcessError as e:
    print(f"❌ Cannot get git info: {e}")

## 2. Cài đặt Dependencies

Cài đặt tất cả các dependencies cần thiết cho NewAIBench framework.

In [None]:
# Kiểm tra Python version
print(f"🐍 Python version: {sys.version}")
print(f"📂 Python executable: {sys.executable}")

# Kiểm tra xem có requirements.txt không
if os.path.exists('requirements.txt'):
    print("✅ Found requirements.txt")
    with open('requirements.txt', 'r') as f:
        requirements = f.read()
    print("\n📋 Requirements:")
    print(requirements[:500] + "..." if len(requirements) > 500 else requirements)
else:
    print("❌ requirements.txt not found")

In [None]:
# Cài đặt dependencies
print("🔧 Installing dependencies...")
try:
    # Upgrade pip first
    subprocess.run([sys.executable, '-m', 'pip', 'install', '--upgrade', 'pip'], 
                  check=True)
    print("✅ Pip upgraded")
    
    # Install requirements
    if os.path.exists('requirements.txt'):
        result = subprocess.run([sys.executable, '-m', 'pip', 'install', '-r', 'requirements.txt'], 
                              capture_output=True, text=True, check=True)
        print("✅ Requirements installed successfully")
        if result.stderr:
            print(f"⚠️ Warnings: {result.stderr[:200]}...")
    
    # Install package in development mode
    if os.path.exists('setup.py'):
        subprocess.run([sys.executable, '-m', 'pip', 'install', '-e', '.'], 
                      check=True)
        print("✅ Package installed in development mode")
        
except subprocess.CalledProcessError as e:
    print(f"❌ Installation failed: {e}")
    print(f"Error output: {e.stderr if hasattr(e, 'stderr') else 'No error details'}")

In [None]:
# Kiểm tra xem NewAIBench đã được cài đặt chưa
try:
    # Add src to path if needed
    src_path = os.path.join(os.getcwd(), "src")
    if src_path not in sys.path:
        sys.path.insert(0, src_path)
    
    # Test import
    from newaibench.experiment import ExperimentRunner, ExperimentConfig
    print("✅ NewAIBench import successful")
    
    # Check if run_experiment.py exists
    if os.path.exists('run_experiment.py'):
        print("✅ run_experiment.py found")
    else:
        print("❌ run_experiment.py not found")
        
except ImportError as e:
    print(f"❌ Import failed: {e}")
    print("Có thể cần cài đặt thêm dependencies hoặc kiểm tra PYTHONPATH")

## 3. Tạo Custom YAML Configuration

Bây giờ chúng ta sẽ tạo một file cấu hình YAML tùy chỉnh cho experiment của bạn.

### 3.1 Xem các template có sẵn

In [None]:
# Liệt kê các experiment templates có sẵn
print("📂 Existing experiment configurations:")
print("\n1. In examples/experiments/:")
examples_exp_dir = Path("examples/experiments")
if examples_exp_dir.exists():
    for yaml_file in examples_exp_dir.glob("*.yaml"):
        print(f"   - {yaml_file.name}")
    for json_file in examples_exp_dir.glob("*.json"):
        print(f"   - {json_file.name}")
else:
    print("   (directory not found)")

print("\n2. In experiments/:")
exp_dir = Path("experiments")
if exp_dir.exists():
    for yaml_file in exp_dir.glob("*.yaml"):
        print(f"   - {yaml_file.name}")
    for json_file in exp_dir.glob("*.json"):
        print(f"   - {json_file.name}")
else:
    print("   (directory not found)")

### 3.2 Tùy chỉnh Configuration

Bây giờ chúng ta sẽ tùy chỉnh file YAML theo nhu cầu của bạn. Bạn có thể chỉnh sửa trực tiếp file hoặc tạo một file mới.

In [None]:
# !pip install flash_attn  --no-build-isolation

In [None]:
%%writefile my_custom_experiment.yaml
# ColVintern Image Retrieval Experiment
# This configuration tests ColVintern model on Vietnamese document VQA data

# BKAI Legal Data Multi-Models Experiment
# Experiment với nhiều mô hình khác nhau cho dữ liệu pháp luật BKAI
# Bao gồm BM25, Dense models, và FAISS integration

# BKAI Legal Data Retrieval Experiment
# This configuration tests retrieval models on Vietnamese legal data

description: "BKAI Legal Data Retrieval Benchmark"

models:
  # - name: "bm25_default"
  #   type: "sparse"
  #   model_name_or_path: ""
  #   parameters:
  #     k1: 1.2
  #     b: 0.75
  #     tokenizer: "simple"
  #     lowercase: true
  #     remove_stopwords: false
  #   device: "cpu"
  # #   batch_size: 32
  - name: "jina-embeddings-v3"
    type: "dense" 
    model_name_or_path: "jinaai/jina-embeddings-v3"
    parameters:
      normalize_embeddings: true
      model_architecture: "sentence_transformer"
      use_ann_index: true
      ann_backend: "faiss"
      faiss_index_factory_string: "IVF100,Flat"
      faiss_nprobe: 10
      faiss_metric_type: "METRIC_INNER_PRODUCT"
      max_seq_length: 256
      trust_remote_code: true
      # query_encode_params:
      #   task: "retrieval.query"
      #   prompt_name: "retrieval.query"
      # # Document encoding parameters  
      # document_encode_params:
      #   task: "retrieval.passage"
      #   prompt_name: "retrieval.passage"
    device: "cuda"  # Changed from "auto" to "cpu" to fix device error
    batch_size: 8
  
  - name: "paraphrase-multilingual-MiniLM-L12-v2"
    type: "dense" 
    model_name_or_path: "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
    parameters:
      normalize_embeddings: true
      model_architecture: "sentence_transformer"
      use_ann_index: true
      ann_backend: "faiss"
      faiss_index_factory_string: "IVF100,Flat"
      faiss_nprobe: 10
      faiss_metric_type: "METRIC_INNER_PRODUCT"
      max_seq_length: 256
    device: "cuda"  # Changed from "auto" to "cpu" to fix device error
    batch_size: 8
  
  - name: "paraphrase-multilingual-mpnet-base-v2"
    type: "dense" 
    model_name_or_path: "sentence-transformers/paraphrase-multilingual-mpnet-base-v2"
    parameters:
      normalize_embeddings: true
      model_architecture: "sentence_transformer"
      use_ann_index: true
      ann_backend: "faiss"
      faiss_index_factory_string: "IVF100,Flat"
      faiss_nprobe: 10
      faiss_metric_type: "METRIC_INNER_PRODUCT"
      max_seq_length: 256
    device: "cuda"  # Changed from "auto" to "cpu" to fix device error
    batch_size: 8

  - name: "multilingual-e5-small"
    type: "dense" 
    model_name_or_path: "intfloat/multilingual-e5-small"
    parameters:
      normalize_embeddings: true
      model_architecture: "sentence_transformer"
      use_ann_index: true
      ann_backend: "faiss"
      faiss_index_factory_string: "IVF100,Flat"
      faiss_nprobe: 10
      faiss_metric_type: "METRIC_INNER_PRODUCT"
      max_seq_length: 256
    device: "cuda"  # Changed from "auto" to "cpu" to fix device error
    batch_size: 8

  - name: "bge-m3"
    type: "dense" 
    model_name_or_path: "BAAI/bge-m3"
    parameters:
      normalize_embeddings: true
      model_architecture: "sentence_transformer"
      use_ann_index: true
      ann_backend: "faiss"
      faiss_index_factory_string: "IVF100,Flat"
      faiss_nprobe: 10
      faiss_metric_type: "METRIC_INNER_PRODUCT"
      max_seq_length: 256
    device: "cuda"  # Changed from "auto" to "cpu" to fix device error
    batch_size: 8

  - name: "dangvantuan/vietnamese-embedding"
    type: "dense" 
    model_name_or_path: "dangvantuan/vietnamese-embedding"
    parameters:
      normalize_embeddings: true
      model_architecture: "sentence_transformer"
      use_ann_index: true
      ann_backend: "faiss"
      faiss_index_factory_string: "IVF100,Flat"
      faiss_nprobe: 10
      faiss_metric_type: "METRIC_INNER_PRODUCT"
      max_seq_length: 256
    device: "cuda"  # Changed from "auto" to "cpu" to fix device error
    batch_size: 8

  - name: "vietnamese-bi-encoder"
    type: "dense" 
    model_name_or_path: "bkai-foundation-models/vietnamese-bi-encoder"
    parameters:
      normalize_embeddings: true
      model_architecture: "sentence_transformer"
      use_ann_index: true
      ann_backend: "faiss"
      faiss_index_factory_string: "IVF100,Flat"
      faiss_nprobe: 10
      faiss_metric_type: "METRIC_INNER_PRODUCT"
      max_seq_length: 256
    device: "cuda"  # Changed from "auto" to "cpu" to fix device error
    batch_size: 8

datasets:
  - name: "legal_data"
    type: "text"
    data_dir: "/kaggle/input/bkailawbench/BKAI_law_data/newaibench_formatted_data/legal_data"
    # max_samples: 20000  # Limit dataset to 1000 samples for faster testing
    config_overrides:
      cache_enabled: true
      validation_enabled: true
      queries_file: "train_queries.jsonl"
      qrels_file: "train_qrels.txt"
      # corpus_file: "corpus_combined.jsonl"

evaluation:
  metrics: ["ndcg", "map", "recall", "precision"]
  k_values: [1, 3, 5, 10, 20, 50]
  relevance_threshold: 1
  include_per_query: true
  top_k: 100
  save_run_file: true
  run_file_format: "trec"

output:
  output_dir: "./results/legal_data"
  experiment_name: "legal_data"
  save_models: false
  save_intermediate: true
  log_level: "DEBUG"  # Changed from INFO to DEBUG
  overwrite: true

In [None]:
!git pull

In [None]:
!python run_experiment.py --config my_custom_experiment.yaml