# 🦜 VieNeu-TTS Fine-tuning Notebook

Notebook này tổng hợp toàn bộ code training cho **VieNeu-TTS-0.3B**.  
Bạn có thể thay sang **VieNeu-TTS** ở phần `training_config` (mục 6) nếu muốn.

Trong quá trình training, nếu gặp lỗi hoặc có góp ý, vui lòng tạo **issue** trên GitHub:  
https://github.com/pnnbao97/VieNeu-TTS  

Hoặc liên hệ trực tiếp với tác giả **Phạm Nguyễn Ngọc Bảo** qua:
- Email: pnnbao@gmail.com  
- Facebook: https://www.facebook.com/bao.phamnguyenngoc.5

## 📦 1. Install Dependencies

In [1]:
# Install required packages
!pip install -q transformers peft torch datasets librosa soundfile tqdm phonemizer
!pip install -q git+https://github.com/Neuphonic/NeuCodec.git


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
!apt install espeak-ng -y

Reading package lists... 0%Reading package lists... 0%Reading package lists... 0%Reading package lists... 7%Reading package lists... 7%Reading package lists... 95%Reading package lists... 95%Reading package lists... 96%Reading package lists... 96%Reading package lists... 99%Reading package lists... 99%Reading package lists... Done
Building dependency tree... 0%Building dependency tree... 0%Building dependency tree... 50%Building dependency tree... 50%Building dependency tree... Done
Reading state information... 0% Reading state information... 0%Reading state information... Done
The following additional packages will be installed:
  espeak-ng-data libespeak-ng1 libpcaudio0 libsonic0
The following NEW packages will be installed:
  espeak-ng espeak-ng-data libespeak-ng1 libpcaudio0 libsonic0
0 upgraded, 5 newly installed, 0 to remove and 50 not upgraded.
Need to get 4829 kB of archives.
After this operation, 13.8 MB of additional disk space

## 🔧 2. Setup Utils 

In [3]:
!git clone https://github.com/pnnbao97/VieNeu-TTS

Cloning into 'VieNeu-TTS'...
remote: Enumerating objects: 778, done.[K
remote: Counting objects:   0% (1/358)[Kremote: Counting objects:   1% (4/358)[Kremote: Counting objects:   2% (8/358)[Kremote: Counting objects:   3% (11/358)[Kremote: Counting objects:   4% (15/358)[Kremote: Counting objects:   5% (18/358)[Kremote: Counting objects:   6% (22/358)[Kremote: Counting objects:   7% (26/358)[Kremote: Counting objects:   8% (29/358)[Kremote: Counting objects:   9% (33/358)[Kremote: Counting objects:  10% (36/358)[Kremote: Counting objects:  11% (40/358)[Kremote: Counting objects:  12% (43/358)[Kremote: Counting objects:  13% (47/358)[Kremote: Counting objects:  14% (51/358)[Kremote: Counting objects:  15% (54/358)[Kremote: Counting objects:  16% (58/358)[Kremote: Counting objects:  17% (61/358)[Kremote: Counting objects:  18% (65/358)[Kremote: Counting objects:  19% (69/358)[Kremote: Counting objects:  20% (72/358)[Kremote: Counting objects: 

In [12]:
import sys
import os
from pathlib import Path

def setup_vieneu_tts():
    """Universal setup for VieNeu-TTS - works on any platform"""
    
    # Find VieNeu-TTS
    search_paths = [
        "/root/VieNeu-TTS",
        "/content/VieNeu-TTS",
        "./VieNeu-TTS",
        "../VieNeu-TTS",
    ]
    
    vieneu_path = None
    for path in search_paths:
        if os.path.exists(path) and os.path.exists(os.path.join(path, "utils")):
            vieneu_path = os.path.abspath(path)
            break
    
    if not vieneu_path:
        raise FileNotFoundError(
            "VieNeu-TTS not found! Clone it:\n"
            "  git clone https://github.com/pnnbao97/VieNeu-TTS"
        )
    
    # Clean and add to path
    sys.path = [p for p in sys.path if "VieNeu-TTS" not in p]
    sys.path.insert(0, vieneu_path)
    
    print(f"✅ VieNeu-TTS: {vieneu_path}")
    
    # Import (CORRECT WAY)
    from utils.normalize_text import VietnameseTTSNormalizer
    from utils.phonemize_text import phonemize_with_dict
    
    # Initialize normalizer
    normalizer = VietnameseTTSNormalizer()
    
    # Create wrapper function
    def normalize_text(text):
        return normalizer.normalize(text)
    
    print("✅ Utils loaded!")
    
    return vieneu_path, normalize_text, phonemize_with_dict

# Run setup
VIENEU_PATH, normalize_text, phonemize_with_dict = setup_vieneu_tts()

# ========== TEST ==========
def preprocess_text(text):
    """Complete preprocessing pipeline"""
    normalized = normalize_text(text)
    phonemes = phonemize_with_dict(normalized)
    return {
        "original": text,
        "normalized": normalized,
        "phonemes": phonemes
    }

# Quick test
result = preprocess_text("Tôi có 2.000 mẫu audio, giá 5.000.000đ")
print(f"\n📝 Test result:")
for key, val in result.items():
    print(f"  {key:12s}: {val}")

print("\n🎉 Ready to use!")

✅ VieNeu-TTS: /root/VieNeu-TTS
✅ Utils loaded!

📝 Test result:
  original    : Tôi có 2.000 mẫu audio, giá 5.000.000đ
  normalized  : tôi có hai nghìn mẫu audio, giá năm triệu đồng
  phonemes    : t̪ˈoj kˈɔɜ hˈaːj ŋˈi2n mˈə5w ˈɔːdɪˌoʊ, zˈaːɜ nˈam tʃˈiɛ6w ɗˈo2ŋ

🎉 Ready to use!


## 📥 3. Download Sample Data

Tải dữ liệu mẫu từ Hugging Face (hoặc thay bằng dataset của bạn).  
Trong notebook này, chúng tôi sử dụng bộ dữ liệu mẫu:  
https://huggingface.co/datasets/pnnbao-ump/ngochuyen_voice  

Dataset này được dùng để training giọng đọc **Ngọc Huyền (Vbee)** và **không nằm trong bộ VieNeu-TTS-1000h**,  
vì vậy rất phù hợp để làm ví dụ minh họa cho quá trình fine-tuning.

In [5]:
import io
from datasets import load_dataset, Audio
import soundfile as sf
from tqdm import tqdm

def download_sample_data(output_dir="dataset", num_samples=10):
    raw_audio_dir = os.path.join(output_dir, "raw_audio")
    metadata_path = os.path.join(output_dir, "metadata.csv")
    
    os.makedirs(raw_audio_dir, exist_ok=True)
    
    print(f"🔄 Đang tải dataset từ Hugging Face...")
    dataset = load_dataset("pnnbao-ump/ngochuyen_voice", split="train")
    dataset = dataset.cast_column("audio", Audio(decode=False))
    
    print(f"✅ Bắt đầu lưu {num_samples} mẫu...")
    
    with open(metadata_path, 'w', encoding='utf-8') as f:
        count = 0
        for sample in tqdm(dataset, total=num_samples):
            if count >= num_samples:
                break
            
            try:
                audio_data = sample["audio"]
                audio_bytes = audio_data["bytes"]
                audio_array, sampling_rate = sf.read(io.BytesIO(audio_bytes))
                
                text = sample["transcription"]
                original_filename = sample.get("file_name", f"sample_{count:03d}.wav")
                filename = os.path.basename(original_filename)
                
                file_path = os.path.join(raw_audio_dir, filename)
                sf.write(file_path, audio_array, sampling_rate)
                
                f.write(f"{filename}|{text}\n")
                count += 1
            except Exception as e:
                print(f"\n⚠️ Lỗi mẫu {count}: {e}")
                continue
    
    print(f"\n🦜 Hoàn tất! Đã tạo {count} mẫu tại {output_dir}")
    return metadata_path

# Download data (thay đổi num_samples theo nhu cầu)
metadata_path = download_sample_data(output_dir="dataset", num_samples=7000)

🔄 Đang tải dataset từ Hugging Face...


README.md:   0%|          | 0.00/497 [00:00<?, ?B/s]

data/train-00000-of-00008.parquet:   0%|          | 0.00/412M [00:00<?, ?B/s]

data/train-00001-of-00008.parquet:   0%|          | 0.00/387M [00:00<?, ?B/s]

data/train-00002-of-00008.parquet:   0%|          | 0.00/403M [00:00<?, ?B/s]

data/train-00003-of-00008.parquet:   0%|          | 0.00/411M [00:00<?, ?B/s]

data/train-00004-of-00008.parquet:   0%|          | 0.00/418M [00:00<?, ?B/s]

data/train-00005-of-00008.parquet:   0%|          | 0.00/426M [00:00<?, ?B/s]

data/train-00006-of-00008.parquet:   0%|          | 0.00/407M [00:00<?, ?B/s]

data/train-00007-of-00008.parquet:   0%|          | 0.00/409M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/7540 [00:00<?, ? examples/s]

✅ Bắt đầu lưu 7000 mẫu...


  0%|                                                                                 | 0/7000 [00:00<?, ?it/s]  0%|▎                                                                      | 25/7000 [00:00<00:28, 244.53it/s]  1%|▌                                                                      | 50/7000 [00:00<00:29, 233.48it/s]  1%|▊                                                                      | 77/7000 [00:00<00:28, 244.63it/s]  1%|█                                                                     | 102/7000 [00:00<00:28, 239.24it/s]  2%|█▎                                                                    | 130/7000 [00:00<00:27, 248.81it/s]  2%|█▌                                                                    | 158/7000 [00:00<00:26, 257.17it/s]  3%|█▊                                                                    | 185/7000 [00:00<00:26, 260.16it/s]  3%|██▏                                                                   | 213/7000 [00:00<00:25, 264


🦜 Hoàn tất! Đã tạo 7000 mẫu tại dataset





## 🧹 4. Filter Data

Lọc dữ liệu kém chất lượng (audio hỏng, text rác, quá ngắn/dài)

In [8]:
import re
ACRONYM = re.compile(r"(?:[a-zA-Z]\.){2,}")
ACRONYM_NO_PERIOD = re.compile(r"(?:[A-Z]){2,}")

def text_filter(text: str) -> bool:
    if not text: return False
    if re.search(r"\d", text): return False
    if ACRONYM.search(text) or ACRONYM_NO_PERIOD.search(text): return False
    if text[-1] not in ".,?!": return False
    return True

def filter_dataset(dataset_dir="dataset"):
    metadata_path = os.path.join(dataset_dir, "metadata.csv")
    cleaned_path = os.path.join(dataset_dir, "metadata_cleaned.csv")
    raw_audio_dir = os.path.join(dataset_dir, "raw_audio")
    
    if not os.path.exists(metadata_path):
        print(f"❌ Không tìm thấy {metadata_path}")
        return
    
    print("🧹 Bắt đầu lọc dữ liệu...")
    
    valid_samples = []
    skipped = {"audio_not_found": 0, "audio_error": 0, "duration_out_of_range": 0, "text_invalid": 0}
    
    with open(metadata_path, 'r', encoding='utf-8') as f:
        lines = f.readlines()
    
    for line in tqdm(lines, desc="Filtering"):
        parts = line.strip().split('|')
        if len(parts) < 2:
            continue
        
        filename = parts[0]
        text = parts[1]
        file_path = os.path.join(raw_audio_dir, filename)
        
        if not os.path.exists(file_path):
            skipped["audio_not_found"] += 1
            continue
        
        try:
            info = sf.info(file_path)
            duration = info.duration
            
            if not (3.0 <= duration <= 15.0):
                skipped["duration_out_of_range"] += 1
                continue
        except Exception:
            skipped["audio_error"] += 1
            continue
        
        if not text_filter(text):
            skipped["text_invalid"] += 1
            continue
        
        valid_samples.append(f"{filename}|{text}\n")
    
    with open(cleaned_path, 'w', encoding='utf-8') as f:
        f.writelines(valid_samples)
    
    print(f"\n🦜 KẾT QUẢ LỌC:")
    print(f"   - Tổng: {len(lines)} | Hợp lệ: {len(valid_samples)} ({len(valid_samples)/len(lines)*100:.1f}%)")
    print(f"   - Loại bỏ: {sum(skipped.values())} ({skipped})")
    print(f"✅ Đã lưu: {cleaned_path}")
    return cleaned_path

cleaned_metadata_path = filter_dataset(dataset_dir="dataset")

🧹 Bắt đầu lọc dữ liệu...


Filtering:   0%|                                                                      | 0/7024 [00:00<?, ?it/s]Filtering:  10%|█████▌                                                    | 676/7024 [00:00<00:00, 6753.24it/s]Filtering:  19%|██████████▉                                              | 1352/7024 [00:00<00:00, 6674.82it/s]Filtering:  29%|████████████████▋                                        | 2051/7024 [00:00<00:00, 6814.23it/s]Filtering:  39%|██████████████████████▎                                  | 2751/7024 [00:00<00:00, 6884.16it/s]Filtering:  49%|███████████████████████████▉                             | 3440/7024 [00:00<00:00, 6786.26it/s]Filtering:  59%|█████████████████████████████████▍                       | 4119/7024 [00:00<00:00, 6471.62it/s]Filtering:  68%|██████████████████████████████████████▋                  | 4770/7024 [00:00<00:00, 6483.20it/s]Filtering:  77%|███████████████████████████████████████████▉             | 5421/7024 [00:00<00:00, 5993


🦜 KẾT QUẢ LỌC:
   - Tổng: 7024 | Hợp lệ: 3715 (52.9%)
   - Loại bỏ: 3285 ({'audio_not_found': 0, 'audio_error': 0, 'duration_out_of_range': 1290, 'text_invalid': 1995})
✅ Đã lưu: dataset/metadata_cleaned.csv





## 🔊 5. Encode Audio to VQ Codes

Sử dụng NeuCodec để encode audio thành vector quantized codes

In [11]:
import torch
import librosa
from neucodec import NeuCodec
import json
import random

def encode_dataset(dataset_dir="dataset", max_samples=2000):
    metadata_path = os.path.join(dataset_dir, "metadata_cleaned.csv")
    if not os.path.exists(metadata_path):
        print(f"🦜 Không tìm thấy metadata_cleaned.csv, dùng metadata.csv...")
        metadata_path = os.path.join(dataset_dir, "metadata.csv")
    
    output_path = os.path.join(dataset_dir, "metadata_encoded.csv")
    raw_audio_dir = os.path.join(dataset_dir, "raw_audio")
    
    if not os.path.exists(metadata_path):
        print("🦜 Không tìm thấy metadata!")
        return
    
    print("🦜 Đang tải NeuCodec model...")
    device = "cuda" if torch.cuda.is_available() else "cpu"
    codec = NeuCodec.from_pretrained("neuphonic/neucodec").to(device)
    codec.eval()
    
    print(f"🦜 Encode tối đa {max_samples} mẫu (device: {device})")
    
    lines_to_write = []
    skipped_count = 0
    
    with open(metadata_path, 'r', encoding='utf-8') as f:
        lines = f.readlines()
    
    # Shuffle và lấy max_samples
    random.shuffle(lines)
    if len(lines) > max_samples:
        lines = lines[:max_samples]
    
    for line in tqdm(lines, desc="Encoding"):
        parts = line.strip().split('|')
        if len(parts) < 2:
            continue
        
        filename = parts[0]
        text = parts[1]
        audio_path = os.path.join(raw_audio_dir, filename)
        
        if not os.path.exists(audio_path):
            skipped_count += 1
            continue
        
        try:
            wav, sr = librosa.load(audio_path, sr=16000, mono=True)
            wav_tensor = torch.from_numpy(wav).float().unsqueeze(0).unsqueeze(0)
            
            with torch.no_grad():
                codes = codec.encode_code(wav_tensor)
                codes = codes.squeeze(0).squeeze(0).cpu().numpy().flatten().tolist()
                codes = [int(x) for x in codes]
            
            # Validate
            if not codes or not all(0 <= c < 65536 for c in codes):
                print(f"🦜 Invalid codes: {filename}")
                skipped_count += 1
                continue
            
            codes_json = json.dumps(codes)
            lines_to_write.append(f"{filename}|{text}|{codes_json}\n")
            
        except Exception as e:
            print(f"🦜 Lỗi {filename}: {e}")
            skipped_count += 1
    
    with open(output_path, 'w', encoding='utf-8') as f:
        f.writelines(lines_to_write)
    
    print(f"\n🦜 Hoàn tất! Đã encode {len(lines_to_write)} mẫu")
    print(f"   - Lưu tại: {output_path}")
    print(f"   - Bỏ qua: {skipped_count}")
    return output_path

encoded_metadata_path = encode_dataset(dataset_dir="dataset", max_samples=2000)

🦜 Đang tải NeuCodec model...


Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

  WeightNorm.apply(module, name, dim)


🦜 Encode tối đa 2000 mẫu (device: cuda)


Encoding:   0%|                                                                       | 0/2000 [00:00<?, ?it/s]Encoding:   0%|                                                               | 2/2000 [00:00<02:15, 14.71it/s]Encoding:   0%|▏                                                              | 4/2000 [00:00<02:42, 12.27it/s]Encoding:   0%|▏                                                              | 6/2000 [00:00<03:08, 10.58it/s]Encoding:   0%|▎                                                              | 8/2000 [00:00<03:30,  9.45it/s]Encoding:   0%|▎                                                             | 10/2000 [00:00<03:26,  9.64it/s]Encoding:   1%|▎                                                             | 11/2000 [00:01<03:40,  9.03it/s]Encoding:   1%|▍                                                             | 13/2000 [00:01<03:10, 10.40it/s]Encoding:   1%|▍                                                             | 15/2000 [00:01<03:08, 10


🦜 Hoàn tất! Đã encode 2000 mẫu
   - Lưu tại: dataset/metadata_encoded.csv
   - Bỏ qua: 0





## 🎯 6. Setup Training

Cấu hình LoRA và Training Arguments

In [13]:
from peft import LoraConfig, TaskType, get_peft_model
from transformers import TrainingArguments

# LoRA Config
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
)

# Training Config
training_config = {
    'model': "pnnbao-ump/VieNeu-TTS-0.3B",
    'run_name': "VieNeu-TTS-LoRA",
    'output_dir': "output",
    'per_device_train_batch_size': 2,
    'gradient_accumulation_steps': 1,
    'learning_rate': 2e-4,
    'max_steps': 5000,  # Giảm để test nhanh
    'logging_steps': 50,
    'save_steps': 500,
    'eval_steps': 500,
    'warmup_ratio': 0.05,
    'bf16': True,
}

def get_training_args(config):
    return TrainingArguments(
        output_dir=os.path.join(config['output_dir'], config['run_name']),
        do_train=True,
        do_eval=True,
        max_steps=config['max_steps'],
        per_device_train_batch_size=config['per_device_train_batch_size'],
        gradient_accumulation_steps=config['gradient_accumulation_steps'],
        learning_rate=config['learning_rate'],
        warmup_ratio=config['warmup_ratio'],
        bf16=config['bf16'],
        logging_steps=config['logging_steps'],
        save_steps=config['save_steps'],
        eval_strategy="steps",
        eval_steps=config['eval_steps'],
        save_strategy="steps",
        save_total_limit=2,
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        report_to="none",
        dataloader_num_workers=2,  # Giảm để tránh lỗi
        ddp_find_unused_parameters=False,
    )

print("✅ Training config ready!")

✅ Training config ready!


## 📊 7. Dataset Class & Preprocessing

In [14]:
from torch.utils.data import Dataset

def preprocess_sample(sample, tokenizer, max_len=2048):
    speech_gen_start = tokenizer.convert_tokens_to_ids('<|SPEECH_GENERATION_START|>')
    ignore_index = -100
    
    phones = sample["phones"]
    vq_codes = sample["codes"]
    
    codes_str = "".join([f"<|speech_{i}|>" for i in vq_codes])
    chat = f"""user: Convert the text to speech:<|TEXT_PROMPT_START|>{phones}<|TEXT_PROMPT_END|>\nassistant:<|SPEECH_GENERATION_START|>{codes_str}<|SPEECH_GENERATION_END|>"""
    
    ids = tokenizer.encode(chat)
    
    # Pad/truncate
    if len(ids) < max_len:
        ids = ids + [tokenizer.pad_token_id] * (max_len - len(ids))
    elif len(ids) > max_len:
        ids = ids[:max_len]
    
    input_ids = torch.tensor(ids, dtype=torch.long)
    labels = torch.full_like(input_ids, ignore_index)
    
    # Mask labels before speech generation
    speech_gen_start_idx = (input_ids == speech_gen_start).nonzero(as_tuple=True)[0]
    if len(speech_gen_start_idx) > 0:
        speech_gen_start_idx = speech_gen_start_idx[0]
        labels[speech_gen_start_idx:] = input_ids[speech_gen_start_idx:]
    
    attention_mask = (input_ids != tokenizer.pad_token_id).long()
    
    return {
        "input_ids": input_ids,
        "labels": labels,
        "attention_mask": attention_mask
    }

class VieNeuDataset(Dataset):
    def __init__(self, metadata_path, tokenizer, max_len=2048):
        self.samples = []
        self.tokenizer = tokenizer
        self.max_len = max_len
        
        if not os.path.exists(metadata_path):
            raise FileNotFoundError(f"Missing: {metadata_path}")
        
        with open(metadata_path, 'r', encoding='utf-8') as f:
            for line in f:
                parts = line.strip().split('|')
                if len(parts) >= 3:
                    self.samples.append({
                        "filename": parts[0],
                        "text": parts[1],
                        "codes": json.loads(parts[2])
                    })
        
        print(f"🦜 Loaded {len(self.samples)} samples from {metadata_path}")
    
    def __len__(self):
        return len(self.samples)
    
    def __getitem__(self, idx):
        sample = self.samples[idx]
        text = sample["text"]
        
        try:
            phones = phonemize_with_dict(text)
        except Exception as e:
            print(f"⚠️ Phonemization error: {e}")
            phones = text
        
        data_item = {"phones": phones, "codes": sample["codes"]}
        return preprocess_sample(data_item, self.tokenizer, self.max_len)

print("✅ Dataset class ready!")

✅ Dataset class ready!


## 🚀 8. Train Model

In [15]:
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, default_data_collator

model_name = training_config['model']
print(f"🦜 Loading model: {model_name}")

# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load Model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    dtype=torch.bfloat16,
    device_map="auto"
)

# Load Dataset
dataset_path = encoded_metadata_path  # From earlier step
full_dataset = VieNeuDataset(dataset_path, tokenizer)

# Train/Eval split (5%)
val_size = max(1, int(0.05 * len(full_dataset)))
train_size = len(full_dataset) - val_size
train_dataset, eval_dataset = torch.utils.data.random_split(full_dataset, [train_size, val_size])

print(f"🦜 Train: {len(train_dataset)} | Eval: {len(eval_dataset)}")

# Apply LoRA
print("🦜 Applying LoRA...")
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

# Trainer
args = get_training_args(training_config)
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    data_collator=default_data_collator,
)

print("🦜 Starting training! (Good luck)")
trainer.train()

# Save
save_path = os.path.join(training_config['output_dir'], training_config['run_name'])
print(f"🦜 Saving model to: {save_path}")
model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

print("✅ Training complete!")

🦜 Loading model: pnnbao-ump/VieNeu-TTS-0.3B


tokenizer_config.json:   0%|          | 0.00/12.1M [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/24.1M [00:00<?, ?B/s]

added_tokens.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/610 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/588M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/138 [00:00<?, ?B/s]

Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


🦜 Loaded 2000 samples from dataset/metadata_encoded.csv
🦜 Train: 1900 | Eval: 100
🦜 Applying LoRA...
trainable params: 3,203,072 || all params: 297,404,160 || trainable%: 1.0770
🦜 Starting training!


Step,Training Loss,Validation Loss
500,1.0176,1.132468
1000,1.0513,1.082146
1500,1.0626,1.054762
2000,1.021,1.036368
2500,0.9968,1.023511
3000,1.0406,1.014673
3500,1.0098,1.006806
4000,0.9376,1.00334
4500,0.9883,0.99931
5000,0.8969,0.99767


🦜 Saving model to: output/VieNeu-TTS-LoRA
✅ Training complete!


## 🦜 Done!

Model đã được fine-tune và lưu tại `output/VieNeu-TTS-LoRA/`.

Bạn có thể sử dụng checkpoint này để:
- Inference / generate speech
- Merge LoRA vào model gốc
- Tiếp tục fine-tuning với dataset khác

In [39]:
import os
from huggingface_hub import (
    HfApi,
    create_repo,
    upload_folder
)

# ===================== CONFIG =====================
HF_USERNAME = "pnnbao-ump"  # ⚠️ đổi
REPO_NAME = "VieNeu-TTS-0.3B-lora-ngoc-huyen"
LOCAL_LORA_DIR = "output/VieNeu-TTS-LoRA"
BASE_MODEL = "pnnbao-ump/VieNeu-TTS-0.3B"
DATASET_URL = "https://huggingface.co/datasets/pnnbao-ump/ngochuyen_voice"

# ===================== README =====================
README_CONTENT = f"""
---
language: vi
license: cc-by-nc-4.0
base_model: {BASE_MODEL}
library_name: peft
tags:
  - lora
  - text-to-speech
  - tts
  - vietnamese
  - vieneu-tts
---

# 🦜 VieNeu-TTS-LoRA (Ngọc Huyền)

LoRA adapter được fine-tune từ base model **VieNeu-TTS-0.3B**
để huấn luyện giọng đọc **Ngọc Huyền (Vbee)**.  

Code finetune VieNeu-TTS tại repo: https://github.com/pnnbao97/VieNeu-TTS

---

## 🔗 Base Model
- Base model: `{BASE_MODEL}`
- Repo này **chỉ chứa LoRA adapter**, không bao gồm model gốc.

---

## 📦 Dataset
- {DATASET_URL}

---

## 🚀 Usage

```python
from transformers import AutoModelForCausalLM
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    "{BASE_MODEL}",
    device_map="auto"
)

model = PeftModel.from_pretrained(
    base_model,
    "{HF_USERNAME}/{REPO_NAME}"
)

## Credits

Base model: Phạm Nguyễn Ngọc Bảo
LORA finetuning: Phạm Nguyễn Ngọc Bảo
"""

In [None]:
from huggingface_hub import login

login(token="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxx") # Huggingface Token của bạn - đảm bảo có quyền write

In [40]:
repo_id = f"{HF_USERNAME}/{REPO_NAME}"
print(f"🦜 Creating repo: {repo_id}")
create_repo(
    repo_id=repo_id,
    repo_type="model",
    exist_ok=True
)

# Write README.md
readme_path = os.path.join(LOCAL_LORA_DIR, "README.md")
with open(readme_path, "w", encoding="utf-8") as f:
    f.write(README_CONTENT.strip())

print("🦜 Uploading LoRA adapter to Hugging Face...")
upload_folder(
    folder_path=LOCAL_LORA_DIR,
    repo_id=repo_id,
    repo_type="model",
    commit_message="Upload VieNeu-TTS LoRA adapter"
)

print("✅ Upload completed successfully!")
print(f"🔗 https://huggingface.co/{repo_id}")

🦜 Creating repo: pnnbao-ump/VieNeu-TTS-0.3B-lora-ngoc-huyen
🦜 Uploading LoRA adapter to Hugging Face...


Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  ...-LoRA/checkpoint-4500/rng_state.pth: 100%|##########| 14.6kB / 14.6kB            

  ...S-LoRA/checkpoint-4500/scheduler.pt: 100%|##########| 1.47kB / 1.47kB            

  ...A/checkpoint-4500/training_args.bin: 100%|##########| 5.78kB / 5.78kB            

  ...-LoRA/checkpoint-5000/rng_state.pth: 100%|##########| 14.6kB / 14.6kB            

  ...S-LoRA/checkpoint-5000/scheduler.pt: 100%|##########| 1.47kB / 1.47kB            

  ...A/checkpoint-5000/training_args.bin: 100%|##########| 5.78kB / 5.78kB            

  ...-TTS-LoRA/adapter_model.safetensors: 100%|##########| 12.8MB / 12.8MB            

  ...oint-5000/adapter_model.safetensors: 100%|##########| 12.8MB / 12.8MB            

  ...S-LoRA/checkpoint-4500/optimizer.pt: 100%|##########| 25.8MB / 25.8MB            

  ...S-LoRA/checkpoint-5000/optimizer.pt: 100%|##########| 25.8MB / 25.8MB            

✅ Upload completed successfully!
🔗 https://huggingface.co/pnnbao-ump/VieNeu-TTS-0.3B-lora-ngoc-huyen
