You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository is an implementation of the Wav2Vec2 model for converting speech into text through a series of speech recognition, noise removal and STT to transcribe the text from a video file.
Everything is very simple: you either download a picture file or specify its link when running a python script, and output you get a text file, and you can immediately view on the command line how it will look the result of your conversion.
A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
A real-time video caption to conversation bot that captures frames generates captions and creates conversational responses using a Large Language Models base to create interactive video descriptions.