Skip to content

Commit

Permalink
Merge pull request #23 from tomchang25/v3.1
Browse files Browse the repository at this point in the history
- Fix 3.0, close: Subtitels repeating 
- Batch mode in CLI
- Fix filename format error in demucs
  • Loading branch information
tomchang25 committed Apr 16, 2023
2 parents f06773c + d6b45b1 commit 4f40d6a
Show file tree
Hide file tree
Showing 15 changed files with 449 additions and 420 deletions.
14 changes: 6 additions & 8 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,24 @@ build/
dist/
dist_onefile/
flagged/

venv/

test_mp4/
batch/
test/
img/
mp4/
out/
wav/
tmp/

project/
repositories/

__pycache__/
pretrained_models/
project/

gui.spec
Demo.mp4

*.srt
# *.srt

main.py
pb_test.py
test.py
**/.openai
130 changes: 130 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,136 @@

No changes to highlight.

## Bug fixes:

No changes to highlight.

## Documentation Changes:

No changes to highlight.

## Testing and Infrastructure Changes:

No changes to highlight.

## Breaking Changes:

No changes to highlight.

## Full Changelog:

No changes to highlight.

## Contributors Shoutout:

No changes to highlight.

# Version 0.3.1

## New Features:

- Batch mode in CLI

## Bug fixes:

- Add 'Program Files' warning in `launch.py`.
- Fixed disk errors.
- Fix filename format error in demucs.

## Documentation Changes:

No changes to highlight.

## Testing and Infrastructure Changes:

No changes to highlight.

## Breaking Changes:

No changes to highlight.

## Full Changelog:

No changes to highlight.

## Contributors Shoutout:

No changes to highlight.

# Version 0.3.0

## New Features:

- Vocal extractor
- Much better performance
- Voice activity detection
- Should fix the issue of subtitle repetition

## Bug Fixes:

No changes to highlight.

## Documentation Changes:

No changes to highlight.

## Testing and Infrastructure Changes:

No changes to highlight.

## Breaking Changes:

No changes to highlight.

## Full Changelog:

No changes to highlight.

## Contributors Shoutout:

No changes to highlight.

# Version 0.2.2

## New Features:

No changes to highlight.

## Bug Fixes:

- remove miss upload module

## Documentation Changes:

No changes to highlight.

## Testing and Infrastructure Changes:

No changes to highlight.

## Breaking Changes:

No changes to highlight.

## Full Changelog:

No changes to highlight.

## Contributors Shoutout:

No changes to highlight.

# Version 0.2.1

## New Features:

- Update to latest gradio
- Support official video preview
- Support time slice for Audio
- Support download for Video
- More detailed information
- User-friendly UX/UI

## Bug Fixes:

No changes to highlight.
Expand Down
59 changes: 31 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,10 @@
<div align="center">
<h3 align="center">Easily generate free subtitles for your video</h3>


<a href="https://github.com/tomchang25/whisper-auto-transcribe">
<img src="images/logo.png" alt="Logo" width="400" height="400">
</a>

<p align="center">
<br />
<a href="https://github.com/tomchang25/whisper-auto-transcribe#Demo">View Demo</a>
Expand All @@ -25,7 +24,6 @@
<a href="https://github.com/tomchang25/whisper-auto-transcribe/issues">Request Feature</a>
</p>


</div>

<!-- ABOUT THE PROJECT -->
Expand All @@ -49,11 +47,11 @@
- Provides support for Background Music Mute, works fine even during heavy metal live performances
- Supports long files, 3-hour files have been tested
- Resolves the issue of subtitle repetition
- Support for batch processing.

### Future feature:

- Subtitle editing
- Easy batch processing function
- Improved translation

The tool is based on [OpenAI-whisper](https://github.com/openai/whisper), the latest project developed by OpenAI.
Expand All @@ -64,22 +62,29 @@ For more details, you can check [this](https://cdn.openai.com/papers/whisper.pdf

<!-- GETTING STARTED -->

## Installation
## How to use

### Installation

1. Install [Python 3](https://www.python.org/downloads/) and [Git](https://git-scm.com/downloads)

2. Clone the repo

```sh
# Chage currently dir to Document
# You can specify directory to any other location except "Program Files" and "Program Files (x86)"
cd ~

# Stable version
git clone https://github.com/tomchang25/whisper-auto-transcribe.git
cd whisper-auto-transcribe

# If you want to test the unique feature in v3.0
git clone --branch v3-alpha https://github.com/tomchang25/whisper-auto-transcribe.git whisper-auto-transcribe-v3
cd whisper-auto-transcribe-v3
```

<!-- # If you want to test the unique feature in v3.1
git clone --branch v3-alpha https://github.com/tomchang25/whisper-auto-transcribe.git whisper-auto-transcribe-v3
cd whisper-auto-transcribe-v3
``` -->

3. Open webui.bat

4. Check for any errors and ensure that the final lines are correct.
Expand All @@ -91,9 +96,24 @@ For more details, you can check [this](https://cdn.openai.com/papers/whisper.pdf

5. Open your browser and go to `http://127.0.0.1:7860`

<!-- GPU acceleration -->
### (Optional) Command-line interface

1. Open `enable_venv.bat`.

2. Now, you can use the CLI mode.

```sh
# Get help messages
python .\cli.py -h

## (Optional) GPU acceleration (CUDA.11.3)
# A simple example
python .\cli.py .\mp4\1min.mp4 --output .\tmp\123456.srt -lang ja --task translate --model large

# A batch example
python .\cli.py .\mp4 --output .\batch\ --model small --model medium
```

### (Optional) GPU acceleration (CUDA.11.3)

1. Install [CUDA](https://developer.nvidia.com/cuda-11.3.0-download-archive)
2. Install [CUDNN](https://developer.nvidia.com/rdp/cudnn-archive)
Expand All @@ -109,23 +129,6 @@ For more details, you can check [this](https://cdn.openai.com/papers/whisper.pdf

<p align="right">(<a href="#top">back to top</a>)</p>

<!-- How to use -->

## How to use

<img src="images/Demo1.png" alt="How to use" width="800" height="450">

## Command-line interface
```sh
# Get help messages
python .\cli.py -h

# A simple example
python .\cli.py .\mp4\1min.mp4 --output .\tmp\123456.srt -lang ja --task translate --model small
```

<p align="right">(<a href="#top">back to top</a>)</p>

<!-- Demo -->

## Demo
Expand Down
65 changes: 50 additions & 15 deletions cli.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,25 @@
import argparse
from src.utils.task import transcribe
from pathlib import Path
import mimetypes


def cli():
parser = argparse.ArgumentParser(description="Whisper Auto Transcribe")

parser.add_argument("input", metavar="input", type=str, help="Input video file")
parser.add_argument(
"input",
metavar="input",
type=str,
help="Input video file(s) or directory containing video files. If a directory is specified, batch work will be performed on all files in the directory.",
)

parser.add_argument(
"--output", metavar="output", type=str, help="Output file name.", required=True
"--output",
metavar="output",
type=str,
help="Output file name or directory. ",
required=True,
)

parser.add_argument(
Expand Down Expand Up @@ -49,23 +60,47 @@ def cli():
)

args = parser.parse_args()
subtitle_path = transcribe(
args.input,
subtitle=args.output,
language=args.language,
model_type=args.model,
device=args.device,
task=args.task,
)
input_path = Path(args.input)

print(
("[{task} file is found at [{subtitle_path}].\n").format(
task=args.task, subtitle_path=subtitle_path
)
)
if input_path.is_dir():
# Batch mode - process all videos in the input directory
output_dir = Path(args.output)
for media_file in input_path.glob("*"):
media_file_type = mimetypes.guess_type(media_file)[0]
if (
media_file_type
and "audio" in media_file_type
or "video" in media_file_type
):
subtitle_path = output_dir / (media_file.stem + ".srt")
transcribe(
str(media_file),
subtitle=str(subtitle_path),
language=args.language,
model_type=args.model,
device=args.device,
task=args.task,
)
else:
print(f"Skip. Can't transcribe file: {media_file}")
else:
media_file = args.input
media_file_type = mimetypes.guess_type(media_file)[0]
if media_file_type and "audio" in media_file_type or "video" in media_file_type:
subtitle_path = transcribe(
args.input,
subtitle=args.output,
language=args.language,
model_type=args.model,
device=args.device,
task=args.task,
)
else:
print(f"Skip. Can't transcribe file: {media_file}")


# python cli.py mp4/1min.mp4 --output out/final.srt --model large
# python cli.py test_mp4 --output batch --model large

if __name__ == "__main__":
cli()
5 changes: 5 additions & 0 deletions enable_venv.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
@echo off

call venv\Scripts\activate.bat

cmd /k
Binary file added images/v3-tab1.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/v3-tab2.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 4f40d6a

Please sign in to comment.