Skip to content

xiaolongzai007/FunASR-APP

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FunASR-APP

FunASR-APP is a comprehensive speech application toolkit designed to facilitate the application and integration of FunASR's open-source speech models. Its primary goal is to package the models into convenient application packages, enabling easy application and seamless integration.

What's New

  • 10/17 Bug fix for multiple periods chosen, used to return video with wrong length.
  • 10/10 ClipVideo now supports recognizing with speaker diarization ability, choose 'yes' button in 'Recognize Speakers' and you will get recognition results with speaker id for each sentence. And then you can clip out the periods of one or some speakers (e.g. 'spk0' or 'spk0#spk3') using ClipVideo.

ClipVideo

As the first application toolkit of FunASR-APP, ClipVideo enables users to clip .mp4 video files or .wav audio files with chosen text segments out of the recognition results generated by Paraformer-long model.

Under the help of ClipVideo you can get the video clips easily with the following steps (in Gradio service):

  • Step1: Upload your video file (or try the example videos below)
  • Step2: Copy the text segments you need to 'Text to Clip'
  • Step3: Adjust subtitle settings (if needed)
  • Step4: Click 'Clip' or 'Clip and Generate Subtitles'

Usage

git clone https://github.com/alibaba-damo-academy/FunASR-APP.git
cd FunASR-APP
# install modelscope
pip install "modelscope[audio_asr]" -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
# python environments
pip install -r ClipVideo/requirments.txt

(Optional) If you want to clip video file with embedded subtitles

  1. ffmpeg and imagemagick is required
  • On Ubuntu
apt-get -y update && apt-get -y install ffmpeg imagemagick
sed -i 's/none/read,write/g' /etc/ImageMagick-6/policy.xml
  • On MacOS
brew install imagemagick
sed -i 's/none/read,write/g' /usr/local/Cellar/imagemagick/7.1.1-8_1/etc/ImageMagick-7/policy.xml 
  1. Download font file to ClipVideo/font
wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ClipVideo/STHeitiMedium.ttc -O ClipVideo/font/STHeitiMedium.ttc

Experience ClipVideo in Modelscope

You can try ClipVideo in modelscope space: link.

Use ClipVideo as Gradio Service

You can establish your own ClipVideo service which is same as Modelscope Space as follow:

python clipvideo/gradio_service.py

then visit localhost:7860 you will get a Gradio service like below and you can use ClipVideo following the steps:

Use ClipVideo in command line

ClipVideo supports you to recognize and clip with commands:

# working in ClipVideo/
# step1: Recognize
python clipvideo/videoclipper.py --stage 1 \
                       --file examples/2022云栖大会_片段.mp4 \
                       --output_dir ./output
# now you can find recognition results and entire SRT file in ./output/
# step2: Clip
python clipvideo/videoclipper.py --stage 2 \
                       --file examples/2022云栖大会_片段.mp4 \
                       --output_dir ./output \
                       --dest_text '我们把它跟乡村振兴去结合起来,利用我们的设计的能力' \
                       --start_ost 0 \
                       --end_ost 100 \
                       --output_file './output/res.mp4'

Study Speech Related Models in FunASR

FunASR hopes to build a bridge between academic research and industrial applications on speech recognition. By supporting the training & finetuning of the industrial-grade speech recognition model released on ModelScope, researchers and developers can conduct research and production of speech recognition models more conveniently, and promote the development of speech recognition ecology. ASR for Fun!

📚FunASR Paper: 🌟Support FunASR:

About

Applications based on speech related models from FunASR (Modelscope).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 61.3%
  • Python 38.7%