Merge pull request #23 from tomchang25/v3.1

- Fix 3.0, close: Subtitels repeating - Batch mode in CLI - Fix filename format error in demucs
tomchang25 · Apr 16, 2023 · 4f40d6a · 4f40d6a
2 parents f06773c + d6b45b1
commit 4f40d6a
Show file tree

Hide file tree

Showing 15 changed files with 449 additions and 420 deletions.
diff --git a/.gitignore b/.gitignore
@@ -4,26 +4,24 @@ build/
 dist/
 dist_onefile/
 flagged/
+
 venv/
 
+test_mp4/
+batch/
 test/
 img/
 mp4/
 out/
 wav/
 tmp/
-
+project/
 repositories/
 
 __pycache__/
 pretrained_models/
-project/
-
-gui.spec
-Demo.mp4
 
-*.srt
+# *.srt
 
-main.py
-pb_test.py
 test.py
+**/.openai
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,136 @@
 
 No changes to highlight.
 
+## Bug fixes:
+
+No changes to highlight.
+
+## Documentation Changes:
+
+No changes to highlight.
+
+## Testing and Infrastructure Changes:
+
+No changes to highlight.
+
+## Breaking Changes:
+
+No changes to highlight.
+
+## Full Changelog:
+
+No changes to highlight.
+
+## Contributors Shoutout:
+
+No changes to highlight.
+
+# Version 0.3.1
+
+## New Features:
+
+- Batch mode in CLI
+
+## Bug fixes:
+
+- Add 'Program Files' warning in `launch.py`.
+- Fixed disk errors.
+- Fix filename format error in demucs.
+
+## Documentation Changes:
+
+No changes to highlight.
+
+## Testing and Infrastructure Changes:
+
+No changes to highlight.
+
+## Breaking Changes:
+
+No changes to highlight.
+
+## Full Changelog:
+
+No changes to highlight.
+
+## Contributors Shoutout:
+
+No changes to highlight.
+
+# Version 0.3.0
+
+## New Features:
+
+- Vocal extractor
+  - Much better performance
+- Voice activity detection
+  - Should fix the issue of subtitle repetition
+
+## Bug Fixes:
+
+No changes to highlight.
+
+## Documentation Changes:
+
+No changes to highlight.
+
+## Testing and Infrastructure Changes:
+
+No changes to highlight.
+
+## Breaking Changes:
+
+No changes to highlight.
+
+## Full Changelog:
+
+No changes to highlight.
+
+## Contributors Shoutout:
+
+No changes to highlight.
+
+# Version 0.2.2
+
+## New Features:
+
+No changes to highlight.
+
+## Bug Fixes:
+
+- remove miss upload module
+
+## Documentation Changes:
+
+No changes to highlight.
+
+## Testing and Infrastructure Changes:
+
+No changes to highlight.
+
+## Breaking Changes:
+
+No changes to highlight.
+
+## Full Changelog:
+
+No changes to highlight.
+
+## Contributors Shoutout:
+
+No changes to highlight.
+
+# Version 0.2.1
+
+## New Features:
+
+- Update to latest gradio
+  - Support official video preview
+  - Support time slice for Audio
+  - Support download for Video
+- More detailed information
+- User-friendly UX/UI
+
 ## Bug Fixes:
 
 No changes to highlight.

diff --git a/README.md b/README.md
@@ -11,11 +11,10 @@
 <div align="center">
   <h3 align="center">Easily generate free subtitles for your video</h3>
 
-
   <a href="https://github.com/tomchang25/whisper-auto-transcribe">
     <img src="images/logo.png" alt="Logo" width="400" height="400">
   </a>
- 
+
   <p align="center">
     <br />
     <a href="https://github.com/tomchang25/whisper-auto-transcribe#Demo">View Demo</a>
@@ -25,7 +24,6 @@
     <a href="https://github.com/tomchang25/whisper-auto-transcribe/issues">Request Feature</a>
   </p>
 
-
 </div>
 
 <!-- ABOUT THE PROJECT -->
@@ -49,11 +47,11 @@
 - Provides support for Background Music Mute, works fine even during heavy metal live performances
 - Supports long files, 3-hour files have been tested
 - Resolves the issue of subtitle repetition
+- Support for batch processing.
 
 ### Future feature:
 
 - Subtitle editing
-- Easy batch processing function
 - Improved translation
 
 The tool is based on [OpenAI-whisper](https://github.com/openai/whisper), the latest project developed by OpenAI.
@@ -64,22 +62,29 @@ For more details, you can check [this](https://cdn.openai.com/papers/whisper.pdf
 
 <!-- GETTING STARTED -->
 
-## Installation
+## How to use
+
+### Installation
 
 1. Install [Python 3](https://www.python.org/downloads/) and [Git](https://git-scm.com/downloads)
 
 2. Clone the repo
 
    ```sh
+   # Chage currently dir to Document
+   # You can specify directory to any other location except "Program Files" and "Program Files (x86)"
+   cd ~
+
    # Stable version
    git clone https://github.com/tomchang25/whisper-auto-transcribe.git
    cd whisper-auto-transcribe
-
-   # If you want to test the unique feature in v3.0
-   git clone --branch v3-alpha https://github.com/tomchang25/whisper-auto-transcribe.git whisper-auto-transcribe-v3
-   cd whisper-auto-transcribe-v3
    ```
 
+     <!-- # If you want to test the unique feature in v3.1
+     git clone --branch v3-alpha https://github.com/tomchang25/whisper-auto-transcribe.git whisper-auto-transcribe-v3
+     cd whisper-auto-transcribe-v3
+     ``` -->
+
 3. Open webui.bat
 
 4. Check for any errors and ensure that the final lines are correct.
@@ -91,9 +96,24 @@ For more details, you can check [this](https://cdn.openai.com/papers/whisper.pdf
 
 5. Open your browser and go to `http://127.0.0.1:7860`
 
-<!-- GPU acceleration -->
+### (Optional) Command-line interface
+
+1.  Open `enable_venv.bat`.
+
+2.  Now, you can use the CLI mode.
+
+    ```sh
+    # Get help messages
+    python .\cli.py -h
 
-## (Optional) GPU acceleration (CUDA.11.3)
+    # A simple example
+    python .\cli.py .\mp4\1min.mp4 --output .\tmp\123456.srt -lang ja --task translate --model large
+
+    # A batch example
+    python .\cli.py .\mp4 --output .\batch\ --model small --model medium
+    ```
+
+### (Optional) GPU acceleration (CUDA.11.3)
 
 1. Install [CUDA](https://developer.nvidia.com/cuda-11.3.0-download-archive)
 2. Install [CUDNN](https://developer.nvidia.com/rdp/cudnn-archive)
@@ -109,23 +129,6 @@ For more details, you can check [this](https://cdn.openai.com/papers/whisper.pdf
 
 <p align="right">(<a href="#top">back to top</a>)</p>
 
-<!-- How to use -->
-
-## How to use
-
-  <img src="images/Demo1.png" alt="How to use" width="800" height="450">
-
-## Command-line interface
-   ```sh
-   # Get help messages
-   python .\cli.py -h
-
-   # A simple example
-   python .\cli.py .\mp4\1min.mp4 --output .\tmp\123456.srt -lang ja --task translate --model small
-   ```
-
-<p align="right">(<a href="#top">back to top</a>)</p>
-
 <!-- Demo -->
 
 ## Demo

diff --git a/cli.py b/cli.py
@@ -1,14 +1,25 @@
 import argparse
 from src.utils.task import transcribe
+from pathlib import Path
+import mimetypes
 
 
 def cli():
     parser = argparse.ArgumentParser(description="Whisper Auto Transcribe")
 
-    parser.add_argument("input", metavar="input", type=str, help="Input video file")
+    parser.add_argument(
+        "input",
+        metavar="input",
+        type=str,
+        help="Input video file(s) or directory containing video files. If a directory is specified, batch work will be performed on all files in the directory.",
+    )
 
     parser.add_argument(
-        "--output", metavar="output", type=str, help="Output file name.", required=True
+        "--output",
+        metavar="output",
+        type=str,
+        help="Output file name or directory. ",
+        required=True,
     )
 
     parser.add_argument(
@@ -49,23 +60,47 @@ def cli():
     )
 
     args = parser.parse_args()
-    subtitle_path = transcribe(
-        args.input,
-        subtitle=args.output,
-        language=args.language,
-        model_type=args.model,
-        device=args.device,
-        task=args.task,
-    )
+    input_path = Path(args.input)
 
-    print(
-        ("[{task} file is found at [{subtitle_path}].\n").format(
-            task=args.task, subtitle_path=subtitle_path
-        )
-    )
+    if input_path.is_dir():
+        # Batch mode - process all videos in the input directory
+        output_dir = Path(args.output)
+        for media_file in input_path.glob("*"):
+            media_file_type = mimetypes.guess_type(media_file)[0]
+            if (
+                media_file_type
+                and "audio" in media_file_type
+                or "video" in media_file_type
+            ):
+                subtitle_path = output_dir / (media_file.stem + ".srt")
+                transcribe(
+                    str(media_file),
+                    subtitle=str(subtitle_path),
+                    language=args.language,
+                    model_type=args.model,
+                    device=args.device,
+                    task=args.task,
+                )
+            else:
+                print(f"Skip. Can't transcribe file: {media_file}")
+    else:
+        media_file = args.input
+        media_file_type = mimetypes.guess_type(media_file)[0]
+        if media_file_type and "audio" in media_file_type or "video" in media_file_type:
+            subtitle_path = transcribe(
+                args.input,
+                subtitle=args.output,
+                language=args.language,
+                model_type=args.model,
+                device=args.device,
+                task=args.task,
+            )
+        else:
+            print(f"Skip. Can't transcribe file: {media_file}")
 
 
 # python cli.py mp4/1min.mp4 --output out/final.srt --model large
+# python cli.py test_mp4 --output batch --model large
 
 if __name__ == "__main__":
     cli()
diff --git a/enable_venv.bat b/enable_venv.bat
@@ -0,0 +1,5 @@
+@echo off
+
+call venv\Scripts\activate.bat
+
+cmd /k
diff --git a/images/v3-tab1.png b/images/v3-tab1.png
diff --git a/images/v3-tab2.png b/images/v3-tab2.png