# Test Speech-to-Text Tools

Notebook này test các tools đã tạo cho speech-to-text agent:
- Legacy tool: `SpeechToTextTool`
- LangGraph tool: `LangGraphSpeechToTextTool`

## 1. Import Libraries

In [1]:
import sys
from pathlib import Path

# Add project root to path
project_root = Path.cwd().parents[2]
sys.path.insert(0, str(project_root))

print(f"Project root: {project_root}")

Project root: c:\Users\lammi\Downloads\medscreening


## 2. Import Tools

In [2]:
from agents.speech_to_text_process.tools import (
    SpeechToTextTool,
    LangGraphSpeechToTextTool,
    speech_to_text_tool,
    langgraph_speech_to_text,
    ALL_TOOLS,
    LANGGRAPH_TOOLS
)

print(" Imports successful!")
print(f"Legacy tools count: {len(ALL_TOOLS)}")
print(f"LangGraph tools count: {len(LANGGRAPH_TOOLS)}")

  from .autonotebook import tqdm as notebook_tqdm


 Imports successful!
Legacy tools count: 1
LangGraph tools count: 1


## 3. Setup Audio Path

Sử dụng cùng file test audio như trong `model_test.ipynb`

In [3]:
import os

audio_path = "../model/medasr/test_audio.wav"

if os.path.exists(audio_path):
    print(f" Audio file found: {audio_path}")
else:
    print(f" Audio file not found: {audio_path}")
    print("Please ensure the audio file exists before running the test.")

 Audio file found: ../model/medasr/test_audio.wav


## 4. Test Legacy Tool

Test `SpeechToTextTool` với method `execute()`

In [4]:
# Create tool instance with language model path
lm_path = "../model/medasr/lm_6.kenlm"

legacy_tool = SpeechToTextTool(
    model_id="google/medasr",
    lm_path=lm_path if os.path.exists(lm_path) else None
)

print(f"Tool name: {legacy_tool.name}")
print(f"Tool description: {legacy_tool.description}")

Tool name: speech_to_text
Tool description: Convert medical audio recordings to text using MedASR


In [5]:
# transcription
print("Starting transcription with legacy tool...\n")

result = legacy_tool.execute(
    audio_path=audio_path,
    chunk_length_s=20,
    stride_length_s=2,
    beam_width=8
)

print("=" * 80)
print("TRANSCRIPTION RESULT (Legacy Tool):")
print("=" * 80)
print(result)
print("=" * 80)

Starting transcription with legacy tool...



Unigrams not provided and cannot be automatically determined from LM file (only arpa format). Decoding accuracy might be reduced.
No known unigrams provided, decoding results might be a lot worse.
Loading weights: 100%|██████████| 368/368 [00:00<00:00, 1164.46it/s, Materializing param=encoder.subsampler.dense_1.weight]              
  return F.conv1d(


TRANSCRIPTION RESULT (Legacy Tool):
[EXAM TYPE] CT chest PE protocol {period} [INDICATION] 54-year-old female, shortness of breath, evaluate for PE {period} [TECHNIQUE] Standard protocol {period} [FINDINGS] {colon} Pulmonary vasculature {colon} The main PA is patent {period} There are filling defects in the segmental branches of the right lower lobe {comma} compatible with acute PE {period} No saddle embolus {period} Lungs {colon} No pneumothorax {period} Small bilateral effusions {comma} right greater than left {period} {new paragraph} [IMPRESSION] {colon} Acute segmental PE, right lower lobe {period}</s>


## 5. Test LangGraph Tool

Test `LangGraphSpeechToTextTool` with method `_run()`

In [6]:
# Create LangGraph tool instance
langgraph_tool = LangGraphSpeechToTextTool(
    model_id="google/medasr",
    lm_path=lm_path if os.path.exists(lm_path) else None
)

print(f"Tool name: {langgraph_tool.name}")
print(f"Tool description: {langgraph_tool.description[:100]}...")
print(f"Args schema: {langgraph_tool.args_schema.__name__}")

Tool name: speech_to_text
Tool description: Convert medical audio recordings to text transcription. Use this tool when you need to transcribe do...
Args schema: SpeechToTextInput


In [7]:
#  transcription with LangGraph tool
print("Starting transcription with LangGraph tool...\n")

result_langgraph = langgraph_tool._run(
    audio_path=audio_path,
    chunk_length_s=20,
    stride_length_s=2,
    beam_width=8
)

print("=" * 80)
print("TRANSCRIPTION RESULT (LangGraph Tool):")
print("=" * 80)
print(result_langgraph)
print("=" * 80)

Starting transcription with LangGraph tool...



Unigrams not provided and cannot be automatically determined from LM file (only arpa format). Decoding accuracy might be reduced.
No known unigrams provided, decoding results might be a lot worse.
Loading weights: 100%|██████████| 368/368 [00:00<00:00, 1125.95it/s, Materializing param=encoder.subsampler.dense_1.weight]              


TRANSCRIPTION RESULT (LangGraph Tool):
[EXAM TYPE] CT chest PE protocol {period} [INDICATION] 54-year-old female, shortness of breath, evaluate for PE {period} [TECHNIQUE] Standard protocol {period} [FINDINGS] {colon} Pulmonary vasculature {colon} The main PA is patent {period} There are filling defects in the segmental branches of the right lower lobe {comma} compatible with acute PE {period} No saddle embolus {period} Lungs {colon} No pneumothorax {period} Small bilateral effusions {comma} right greater than left {period} {new paragraph} [IMPRESSION] {colon} Acute segmental PE, right lower lobe {period}</s>


## 6. Test  Pre-initialized Tools


In [8]:
# Test with pre-initialized legacy tool
print("Testing with pre-initialized speech_to_text_tool...\n")

# Note: Tool does not have language model 
result_preinitialized = speech_to_text_tool.execute(
    audio_path=audio_path,
    chunk_length_s=20,
    stride_length_s=2,
    beam_width=8
)

print("=" * 80)
print("TRANSCRIPTION RESULT (Pre-initialized Tool):")
print("=" * 80)
print(result_preinitialized)
print("=" * 80)

Testing with pre-initialized speech_to_text_tool...



Unigrams not provided and cannot be automatically determined from LM file (only arpa format). Decoding accuracy might be reduced.
No known unigrams provided, decoding results might be a lot worse.
Loading weights: 100%|██████████| 368/368 [00:00<00:00, 1171.60it/s, Materializing param=encoder.subsampler.dense_1.weight]              


TRANSCRIPTION RESULT (Pre-initialized Tool):
[EXAM TYPE] CT chest PE protocol {period} [INDICATION] 54-year-old female, shortness of breath, evaluate for PE {period} [TECHNIQUE] Standard protocol {period} [FINDINGS] {colon} Pulmonary vasculature {colon} The main PA is patent {period} There are filling defects in the segmental branches of the right lower lobe {comma} compatible with acute PE {period} No saddle embolus {period} Lungs {colon} No pneumothorax {period} Small bilateral effusions {comma} right greater than left {period} {new paragraph} [IMPRESSION] {colon} Acute segmental PE, right lower lobe {period}</s>


## 7. Compare Results

In [9]:
print("COMPARISON:")
print(f"Legacy tool result length: {len(result)} characters")
print(f"LangGraph tool result length: {len(result_langgraph)} characters")
print(f"Pre-initialized tool result length: {len(result_preinitialized)} characters")
print()
print(f"Results are identical (Legacy vs LangGraph): {result == result_langgraph}")

COMPARISON:
Legacy tool result length: 577 characters
LangGraph tool result length: 577 characters
Pre-initialized tool result length: 577 characters

Results are identical (Legacy vs LangGraph): True


## 8. Test Error Handling

Test xem tool xử lý lỗi như thế nào

In [None]:
print("Testing with non-existent file...\n")

error_result = legacy_tool.execute(
    audio_path="non_existent_file.wav"
)

print("Error handling result:")
print(error_result)

Tool speech_to_text error: Audio file not found: non_existent_file.wav


Testing with non-existent file...

Error handling result:
Error: Unable to retrieve information. Audio file not found: non_existent_file.wav


## 9. Inspect Tool Schema (for LangGraph)

In [None]:
import json

schema = langgraph_tool.args_schema.model_json_schema()

print("Tool Input Schema:")
print(json.dumps(schema, indent=2))

Tool Input Schema:
{
  "description": "Input schema for speech-to-text tool.",
  "properties": {
    "audio_path": {
      "description": "Path to audio file (WAV format, 16kHz recommended)",
      "title": "Audio Path",
      "type": "string"
    },
    "chunk_length_s": {
      "default": 20,
      "description": "Chunk length in seconds for audio processing",
      "title": "Chunk Length S",
      "type": "integer"
    },
    "stride_length_s": {
      "default": 2,
      "description": "Stride length in seconds between chunks",
      "title": "Stride Length S",
      "type": "integer"
    },
    "beam_width": {
      "default": 8,
      "description": "Beam search width for decoding (higher = more accurate)",
      "title": "Beam Width",
      "type": "integer"
    }
  },
  "required": [
    "audio_path"
  ],
  "title": "SpeechToTextInput",
  "type": "object"
}


## Summary

 **Tests Completed:**
1. Legacy tool (`SpeechToTextTool`) - Works with `execute()` method
2. LangGraph tool (`LangGraphSpeechToTextTool`) - Works with `_run()` method
3. Pre-initialized tools from `__init__.py`
4. Error handling validation
5. Schema inspection for LangGraph integration

Both tools should produce identical results as they use the same underlying implementation.