<a href="https://colab.research.google.com/github/vifirsanova/phat-llm/blob/main/model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

0. **Download and Prepare the Data:**
   - Transcribe a set of audio recordings with OpenAI Whisper
   - IPA annotate audio files via GPT-4
   - Use Praat and ELAN-annotated speech samples

In [None]:
!pip install transformers
!pip install datasets
!pip install git+https://github.com/huggingface/peft.git
!pip install praatio
!pip install pydub
from google.colab import drive
drive.mount('/content/drive')
!git clone https://github.com/vifirsanova/phat-llm.git
%cd phat-llm

1. **Load the Pre-trained Model:**

In [None]:
from transformers import WhisperForConditionalGeneration, WhisperProcessor

model_name = "openai/whisper-base"
model = WhisperForConditionalGeneration.from_pretrained(model_name)
processor = WhisperProcessor.from_pretrained(model_name)

2. **Add LoRA Adapters:**

In [None]:
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
         r=16,  # Rank of the low-rank approximation
         lora_alpha=32,  # Scaling factor
         lora_dropout=0.1,  # Dropout probability
         target_modules=["q_proj", "v_proj"]  # Target modules to apply LoRA
     )

model = get_peft_model(model, lora_config)

3. **Prepare the Training Data:**

In [None]:
from datasets import load_dataset

def preprocess_function(examples):
    audio_inputs = processor(examples["audio"], sampling_rate=16000, return_tensors="pt")
    with processor.as_target_processor():
        labels = processor(examples["text"], return_tensors="pt").input_ids
    return {"input_features": audio_inputs["input_features"], "labels": labels}

dataset = load_dataset('path/to/your/dataset')
train_dataset = dataset["train"].map(preprocess_function, batched=True)

4. **Train the Model:**

In [None]:
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

training_args = Seq2SeqTrainingArguments(
         output_dir="./results",
         per_device_train_batch_size=16,
         per_device_eval_batch_size=16,
         num_train_epochs=3,
         evaluation_strategy="epoch",
         logging_dir="./logs",
         logging_steps=10,
         save_total_limit=2,
         save_strategy="epoch",
         fp16=True,
         learning_rate=5e-5,
     )

trainer = Seq2SeqTrainer(
         model=model,
         args=training_args,
         train_dataset=train_dataset,
         eval_dataset=dataset["validation"],
         data_collator=processor,
     )

trainer.train()

5. **Evaluate the Model:**

In [None]:
eval_results = trainer.evaluate()
print(eval_results)

6. **Inference:**

In [None]:
def transcribe_audio(audio_path):
    audio_input = processor(audio_path, sampling_rate=16000, return_tensors="pt")
    generated_ids = model.generate(input_ids=audio_input["input_features"])
    transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)
    return transcription

audio_path = '/path/to/your/audio/file.wav'
transcription = transcribe_audio(audio_path)
print(transcription)

7. **Convert to XML through prompt-tuning**

In [None]:
def convert_to_xml(transcription, output_path):
    # Example function to convert transcription to XML
    xml_content = f"<transcription>{transcription}</transcription>"
    with open(output_path, 'w') as f:
        f.write(xml_content)

output_path = '/path/to/your/output/file.xml'
convert_to_xml(transcription, output_path)
print(f'XML saved to {output_path}')