#### SenseVoice非实时语音识别

In [None]:
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model_dir = "iic/SenseVoiceSmall"

model = AutoModel(
    model=model_dir,
    vad_model="fsmn-vad",
    vad_kwargs={"max_single_segment_time": 30000},
    device="cpu",  # "cpu", "cuda:0"
)

res = model.generate(
    input=f"{model.model_path}/example/zh.mp3",
    cache={},
    language="auto",  
    use_itn=True,
    batch_size_s=60,
    merge_vad=True,  
    merge_length_s=15,
)
text = rich_transcription_postprocess(res[0]["text"])
print(text)

funasr version: 1.2.6.
Check update of funasr, and it would cost few times. You may disable it by set `disable_update=True` in AutoModel




Downloading Model from https://www.modelscope.cn to directory: C:\Users\18664\.cache\modelscope\hub\models\iic\SenseVoiceSmall
Downloading Model from https://www.modelscope.cn to directory: C:\Users\18664\.cache\modelscope\hub\models\iic\speech_fsmn_vad_zh-cn-16k-common-pytorch


rtf_avg: 0.004: 100%|[34m██████████[0m| 1/1 [00:00<00:00, 45.45it/s]                                                                                          
rtf_avg: 0.066: 100%|[34m██████████[0m| 1/1 [00:00<00:00,  2.88it/s]
rtf_avg: 0.063, time_speech:  5.616, time_escape: 0.353: 100%|[31m██████████[0m| 1/1 [00:00<00:00,  2.78it/s]

开放时间早上9点至下午5点。





In [5]:
text

'开放时间早上9点至下午5点。'

#### paraformer实时语音识别

In [6]:
import os
import soundfile
from funasr import AutoModel

chunk_size = [0, 10, 5]  # [0, 10, 5] 600ms, [0, 8, 4] 480ms
encoder_chunk_look_back = 4
decoder_chunk_look_back = 1

model = AutoModel(model="paraformer-zh-streaming")

wav_file = os.path.join(model.model_path, "example/asr_example.wav")
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960  # 600ms

cache = {}
total_chunk_num = int(len((speech) - 1) / chunk_stride + 1)
for i in range(total_chunk_num):
    speech_chunk = speech[i * chunk_stride : (i + 1) * chunk_stride]
    is_final = i == total_chunk_num - 1
    res = model.generate(
        input=speech_chunk,
        cache=cache,
        is_final=is_final,
        chunk_size=chunk_size,
        encoder_chunk_look_back=encoder_chunk_look_back,
        decoder_chunk_look_back=decoder_chunk_look_back,
    )
    print(res)

funasr version: 1.2.6.
Check update of funasr, and it would cost few times. You may disable it by set `disable_update=True` in AutoModel
Downloading Model from https://www.modelscope.cn to directory: C:\Users\18664\.cache\modelscope\hub\models\iic\speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online


2025-05-25 11:49:33,997 - modelscope - INFO - Got 10 files, start to download ...


Processing 10 items:   0%|          | 0.00/10.0 [00:00<?, ?it/s]

Downloading [am.mvn]:   0%|          | 0.00/10.9k [00:00<?, ?B/s]

Downloading [configuration.json]:   0%|          | 0.00/472 [00:00<?, ?B/s]

Downloading [model.pt]:   0%|          | 0.00/840M [00:00<?, ?B/s]

Downloading [config.yaml]:   0%|          | 0.00/2.87k [00:00<?, ?B/s]

Downloading [example/asr_example.wav]:   0%|          | 0.00/173k [00:00<?, ?B/s]

Downloading [README.md]:   0%|          | 0.00/11.8k [00:00<?, ?B/s]

Downloading [.DS_Store]:   0%|          | 0.00/6.00k [00:00<?, ?B/s]

Downloading [seg_dict]:   0%|          | 0.00/7.90M [00:00<?, ?B/s]

Downloading [fig/struct.png]:   0%|          | 0.00/48.7k [00:00<?, ?B/s]

Downloading [tokens.json]:   0%|          | 0.00/91.5k [00:00<?, ?B/s]

2025-05-25 11:52:18,749 - modelscope - INFO - Download model 'iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online' successfully.
rtf_avg: 0.170: 100%|[34m██████████[0m| 1/1 [00:00<00:00,  9.59it/s]                                                                                          


[{'key': 'rand_key_2yW4Acq9GFz6Y', 'text': ''}]


rtf_avg: 0.159: 100%|[34m██████████[0m| 1/1 [00:00<00:00, 10.37it/s]                                                                                          


[{'key': 'rand_key_1t9EwL56nGisi', 'text': ''}]


rtf_avg: 0.219: 100%|[34m██████████[0m| 1/1 [00:00<00:00,  7.46it/s]                                                                                          


[{'key': 'rand_key_WgNZq6ITZM5jt', 'text': '欢迎大'}]


rtf_avg: 0.205: 100%|[34m██████████[0m| 1/1 [00:00<00:00,  7.98it/s]                                                                                          


[{'key': 'rand_key_gUe52RvEJgwBu', 'text': '家来'}]


rtf_avg: 0.215: 100%|[34m██████████[0m| 1/1 [00:00<00:00,  7.61it/s]                                                                                          


[{'key': 'rand_key_NO6n9JEC3HqdZ', 'text': '体验达'}]


rtf_avg: 0.203: 100%|[34m██████████[0m| 1/1 [00:00<00:00,  8.15it/s]                                                                                          


[{'key': 'rand_key_6J6afU1zT0YQO', 'text': '摩院推'}]


rtf_avg: 0.192: 100%|[34m██████████[0m| 1/1 [00:00<00:00,  8.55it/s]                                                                                          


[{'key': 'rand_key_aNF03vpUuT3em', 'text': '出的语'}]


rtf_avg: 0.216: 100%|[34m██████████[0m| 1/1 [00:00<00:00,  7.57it/s]                                                                                          


[{'key': 'rand_key_6KopZ9jZICffu', 'text': '音识'}]


rtf_avg: 0.191: 100%|[34m██████████[0m| 1/1 [00:00<00:00,  8.62it/s]                                                                                          


[{'key': 'rand_key_4G7FgtJsThJv0', 'text': '别模型'}]


rtf_avg: 0.587: 100%|[34m██████████[0m| 1/1 [00:00<00:00,  9.27it/s]                                                                                          

[{'key': 'rand_key_7In9ZMJLsCfMZ', 'text': ''}]



