PoC:Integrate whisper.cpp to KanTV for purpose of clean-room implementation of real-time AI subtitle with English online-TV(OTT TV) #64

zhouwg · 2024-02-22T02:37:04Z

whisper.cpp is an open-source and powerful device-side AI framework/lib/model for ASR(Automatic Speech Recognition, a sub-filed of AI).

I want to integrate the great and powerful whisper.cpp to KanTV for purpose of real-time English subtitle with English online TV on Xiaomi 14.

just looks like the following snapshots by Xiaomi 14's powerful proprietary 6B device-side AI model(aka XiaoAI, or Chinese "小爱") + Xiaomi 14's powerful mobile SoC------Qualcomm SM8650-AB Snapdragon 8 Gen 3 (4 nm).

zhouwg · 2024-03-04T11:10:09Z

I will start integrating the excellent and amazing whisper.cpp to project KanTV since March 5,2024 after v1.2.9 was released on March 4, 2024 and before that I had been spent about two weeks to migrate some local personal projects to github since Feb 22,2024.

background study:

GGML is a C library for machine learning (ML): https://github.com/rustformers/llm/blob/main/crates/ggml/README.md

Roadmap and FAQ: ggerganov/whisper.cpp#126

Android example app: ggerganov/whisper.cpp#283

whisper.cpp should support NNAPI on Android: ggerganov/whisper.cpp#1249

Android Inference is too slow: ggerganov/whisper.cpp#1070

Use Android NNAPI to accelerate inference on Android Devices: ggerganov/ggml#88

NPU support in whisper.cpp:ggerganov/whisper.cpp#1557

Support for realtime audio input: ggerganov/whisper.cpp#10

The Whisper model processes the audio in chunks of 30 seconds - this is a hard constraint of the architecture.

However, what seems to work is you can take for example 5 seconds of audio and pad it with 25 seconds of silence. This way you can process shorter chunks.

silence removal for transcription implemented: ggerganov/whisper.cpp#1649

Can real-time transcription be achieved?: ggerganov/whisper.cpp#1653

How to increase speech to text speed when using whisper cpp?:ggerganov/whisper.cpp#1635

Benchmark results: ggerganov/whisper.cpp#89

Whisper model files in custom ggml format: https://github.com/ggerganov/whisper.cpp/blob/master/models/README.md

GGUF file format specification:

https://github.com/GGUF file format specification ggerganov/ggml#302
gguf.md

Google Highway: https://github.com/google/highway

Tencent ncnn: https://github.com/Tencent/ncnn

updated on 03-13-2024:

SIGFPE on certain audio files:
ggerganov/whisper.cpp#39

Real-time identification of microphone has no result:
sandrohanea/whisper.net#155

How to handle real-time sound streams:
sandrohanea/whisper.net#25

updated on 03-20-2024,
Finetuning models for audio_ctx support(VERY important but bring side-effect: 4cd35dd)
ggerganov/whisper.cpp#1951

here are some strategies from original author to reduce repetition and hallucinations:
ggerganov/whisper.cpp#1507

zhouwg · 2024-03-05T13:39:26Z

integrate whisper.cpp to KanTV step1:(just migrate original Android sample in official whisper.cpp to KanTV and study something accordingly)

How to practise/play with this branch:

adb logcat | grep KANTV
(logs in Java layer / JNI layer / Native layer would be displayed with same prefix properly, so it's helpful for troubleshooting and tracking source codes in whisper.cpp)

zhouwg · 2024-03-05T13:42:18Z

move to #62 to avoid misunderstanding

zhouwg · 2024-03-06T15:53:59Z

I suddenly got an idea to implement PoC - stage1 after study source code in examples/bench/bench.cpp and examples/main/main.cpp.

background study
high-level study of whisper.cpp( structure of source code, build system of source code, .......)
migrate original Android example to KanTV
reuse code in kantv-core and add a simple event handling framework for ASRResearchFragment.java to make further work easier

If it works well as expected, I'll move to PoC - stage2(whispercpp inference with pre-loadded audio file by another method, referenced with original Android sample and examples/main/main.cpp)

re-write(or "refine" would be accurate) Java part & JNI part of whisper.cpp JNI, referenced with original Android sample
re-implement native part of whisper.cpp JNI by a new method base on PoC stage-1, some codes referenced with examples/main/main.cpp
reuse code in kantv-core and spent some time to integrate customized FFmpeg within kantv-core to whisper.cpp and it would/might be heavily used/very helpful in the future

if it works well as expected,I'll merge previous works to master branch and create a new branch/baseline accordingly. then I'll move to PoC-stage3(ASR with Live stream -- aka online English TV). PoC - stage3 would be a real challenge for me so I would breakdown it like PoC-S31/PoC-S32/PoC-S33....accordingly

PoC - S31 : investigate performance of mulmat and inference(study internal detail of whisper.cpp)
PoC - S32 : high level design (HLD) of real-time subtitle by whisper.cpp in KanTV
PoC - S33: coding work of data path: UI <----> JNI <----> whisper.cpp <----> kantv-play(original name is libkantv-ffmpeg.so, rename it to libkantv-play.so since v1.3.0 because the real codes of customized FFmpeg is statically linked in libkantv-core.so, so rename it to libkantv-play.so to avoid confusion) <----> kantv-core. this step is just like design/coding a pure virtual function and correspond virtual function by C++ and every node in the data path should works fine as expected
PoC - S34: reuse code in kantv-core and coding work of implementation of audio only record mode
PoC - S35: whispercpp inference/predication with live stream(online English TV) base on PoC-S33 & PoC-S34

if it works well as expected,I'll move to PoC-stage4(performance analysis and optimization on Android phone).

build optimization
code optimization
arch/synthesis optimization
algorithm optimization
assemble optimization
bug fix
pre-alpha release (updated on 03-17-2024, done on master branch or referenced in branch v1.3.2)
alpha release

the PoC stage-3 and PoC stage-4 might be taken place simultaneously.

the final goal is implement real-time English subtitle for English online-TV by KanTV + customized whisper.cpp and I'll demo it on Xiaomi 14(because Xiaomi 14 contains a very powerful mobile SoC and I personally purchased one for purpose of software development).

of course, source code of customized whisper.cpp will be found in this Android turn-key project. if it's considered well and accepted by upstream whisper.cpp, I'll submit a PR accordingly.

zhouwg · 2024-03-07T12:13:52Z

it works as expected(PoC stage1 was finished).

dinvlad · 2024-03-07T23:00:11Z

How's the performance? Can this be used for real-time transcription on a reasonably old device?

zhouwg · 2024-03-08T00:34:54Z

How's the performance? Can this be used for real-time transcription on a reasonably old device?

Benchmark of whisper.cpp/GGML's mulmat(matrix computing) seems not good on low-end Android phone.

But the performance of whisper.cpp/GGML on iOS is very good because the original author of whisper.cpp/GGML(the great Georgi Gerganov) spent much time to optimize them with Apple's dedicated machine learning library(just similar to SSE2/SSE3/AVX optimization on X86 architecture).

I think whisper.cpp could be used for real-time subtitle with online TV on high-end Android phone(such as Xiaomi 14, I will demo it later after finish this PoC successfully) and this is the goal of this PoC(this opening issue).

Whisper.cpp/GGML might be not reasonable for old device because complicated math computing need powerful SoC or highly optimized code(just like what Georgi did for iOS/Mac platform).

@liam-mceneaney has provide an Android example to demo transcription.

BTW, the following loop from GGML's official website would be helpful for more information:

ggerganov/whisper.cpp#283

zhouwg · 2024-03-08T12:07:41Z

it works as expected(PoC stage2 was finished).

the above screenshots can't illustrate any exciting progress in this commit(I'd like to say this is a big milestone and express my sincerely thanks for the great whisper.cpp/GGML again at the moment:I have to say that the more I understand/familiar from whisper.cpp the more feeling I think we all should thanks for the great whisper.cpp/GGML).

or built the APK from source code(branch kantv-poc-with-whispercpp) by Android Studio IDE accordingly.

zhouwg · 2024-03-10T03:02:56Z

ASR/transcription performance on Xiaomi 14 is about 5x-20x better then other Android phone(low-end phone from vivo, huawei's honor ------ now it's a standalone company), but it's still not enough for purpose of real-time subtitle with online TV.

transcription performance can be improved by about 1-3 seconds when enable openblas(1-3 depend on OS load / process sched / ...... ).

performance of mulmat benchmark seems be improved a lot / significantly when enable openblas.

so I guess Apple's dedicated machine learning acceleration library mightbe very important for performance on iOS/Mac.just like Georgi Gerganov said before. and we should/might study something about Qualcomm's dedicated/proprietary machine learning acceleration library accordingly.

zhouwg · 2024-03-10T15:29:12Z

updated on 03-10-2024(2024-03-10, 23:41 Beijing Time / GMT + 8):

from 21 seconds to 3 seconds, thanks to the powerful Xiaomi 14 or Qualcomm's Snapdragon 8 Gen 3, thanks to the powerful modern compiler from Google. I'd like to say once again at the moment:we all should thanks for the great GGML: the open source C/C++ whisper.cpp & llama.cpp has really changed our world.

I think I got the point although the performance of ASR is still not enough for real-time subtitle with online TV. we should maximize the use of the AI engine in Qualcomm's Snapdragon 8 Gen 3 to improve ASR performance more better.

zhouwg · 2024-03-11T02:40:58Z

updated on 03-11-2024(2024-03-11,10:40) Beijing Time / GMT + 8

less than 2 seconds for the first time.

commit could be found here.

this is exactly one item in the breakdown task list. because now is 2024(not 1994) and we should trust powerful modern compiler from Google and Linaro by top talents in our planet.

zhouwg · 2024-03-11T03:33:43Z

next-step is coding work of PoC-S33 although ASR performance is still not enough(the AI engine in Qualcomm's mobile SoC is not utilized) for real-time subtitle but it's improved a lot now.

sincerely thanks for key-point of code snippets of how to transcribe a single audio data by whisper.cpp

from @liam-mceneaney:

https://github.com/ggerganov/whisper.cpp/blob/19b8436ef11bd05201d650c8e08193009ec6bb3c/examples/whisper.android/lib/src/main/jni/whisper/jni.c#L197


 // The below adapted from the Objective-C iOS sample
    struct whisper_full_params params = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
    params.print_realtime = true;
    params.print_progress = false;
    params.print_timestamps = true;
    params.print_special = false;
    params.translate = false;
    params.language = "en";
    params.n_threads = num_threads; //how many threads can I use on an S23?
    //potentially use an initial prompt for custom vocabularies?
    // initial_prompt: Optional[str]
    //        Optional text to provide as a prompt for the first window. This can be used to provide, or
    //        "prompt-engineer" a context for transcription, e.g. custom vocabularies or proper nouns
    //        to make it more likely to predict those word correctly.
    //params.initial_prompt = "Transcription of Tactical Combat Casualty Drugs such as Fentanyl, Ibuprofen, Amoxicillin, Epinephrine, TXA, Hextend, Ketamine, Oral Transmucosal Fentanyl Citrate. ";
    params.offset_ms = 0;
    params.no_context = true;
    <b>params.single_segment   = true; //hard code for true, objc example has it based on a button press</b>
    params.no_timestamps    = params.single_segment; //from streaming objc example

    whisper_reset_timings(context);

    LOGI("About to run whisper_full");
    if (whisper_full(context, params, audio_data_arr, audio_data_length) != 0) {
        LOGI("Failed to run the model");
    } else {
        whisper_print_timings(context);
    }

or from original author of whisper.cpp @ggerganov:

https://github.com/ggerganov/whisper.cpp/blob/master/examples/whisper.objc/whisper.objc/ViewController.m#L186

// dispatch the model to a background thread
    dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
        // process captured audio
        // convert I16 to F32
        for (int i = 0; i < self->stateInp.n_samples; i++) {
            self->stateInp.audioBufferF32[i] = (float)self->stateInp.audioBufferI16[i] / 32768.0f;
        }

        // run the model
        struct whisper_full_params params = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);

        // get maximum number of threads on this device (max 8)
        const int max_threads = MIN(8, (int)[[NSProcessInfo processInfo] processorCount]);

        params.print_realtime   = true;
        params.print_progress   = false;
        params.print_timestamps = true;
        params.print_special    = false;
        params.translate        = false;
        params.language         = "en";
        params.n_threads        = max_threads;
        params.offset_ms        = 0;
        params.no_context       = true;
        <b>params.single_segment   = self->stateInp.isRealtime;</b>
        params.no_timestamps    = params.single_segment;

        CFTimeInterval startTime = CACurrentMediaTime();

        whisper_reset_timings(self->stateInp.ctx);

        if (whisper_full(self->stateInp.ctx, params, self->stateInp.audioBufferF32, self->stateInp.n_samples) != 0) {
            NSLog(@"Failed to run the model");
            self->_textviewResult.text = @"Failed to run the model";

            return;
        }

        whisper_print_timings(self->stateInp.ctx);

        CFTimeInterval endTime = CACurrentMediaTime();

        NSLog(@"\nProcessing time: %5.3f, on %d threads", endTime - startTime, params.n_threads);

zhouwg · 2024-03-11T06:48:23Z

updated on 03-10-2024(2024-03-10, 23:41 Beijing Time / GMT + 8):

from 21 seconds to 3 seconds, thanks to the powerful Xiaomi 14 or Qualcomm's Snapdragon 8 Gen 3, thanks to the powerful modern compiler from Google. I'd like to say once again at the moment:we all should thanks for the great GGML: the open source C/C++ whisper.cpp & llama.cpp has really changed our world.

I think I got the point although the performance of ASR is still not enough for real-time subtitle with online TV. we should maximize the use of the AI engine in Qualcomm's Snapdragon 8 Gen 3 to improve ASR performance more better.

clarification of why I said many times that we(programmers) all should thanks for the great GGML:

AI scientist is far away from programmer
similar to FFmpeg or think about FFmpeg and there is no doubt that FFmpeg is definitely a great open source project for any programmer in video/audio/streaming media field
the root cause of huge improvements of ASR performance on Xiaomi 14 is just because of highly elegant C/C++ implementation of whisper.cpp by Georgi Gerganov(powerful modern compiler don't works for common codes), I didn't made any substantial/hardcore work(my mean is that I did not made a real/hardcore contribution in source code of ggml.c / ggml-quants.c / whisper.cpp because I know very little about real/hardcore AI tech and I don't understand the details/mechanism in ggml.c / ggml-quants.c / whisper.cpp, or I did NOT touch anything core stuff in internal of ggml/whisper.cpp)
I never knew Georgi Gerganov until I heard about whisper.cpp recently and I've never overpraised wishierp.cpp as an Android system software programmer
what I said it's just personal feeling of the excellent and amazing whisper.cpp. of course, if anyone don't like what I said about whisper.cpp, I'd like to see another similar open source AI project(I tried PaddleSpeech but I gave up at last.I even think the deprecated Mozilla's DeepSpeech is more friendly to programmer than PaddleSpeech)

zhouwg · 2024-03-11T16:51:56Z

updated on 03-12-2024(2024-03-12,00:51)

zhouwg · 2024-03-11T17:03:50Z

updated on 03-12-2024(2024-03-12,00:51)

I will submit the method of new optimization(only works on Xiaomi 14) in the next commit accordingly.

I don't understand why performance with 4 threads is about 2x than performance with 8 threads(it should be 8 is 2x than 4) by same optimization method. what happened between 4 threads and 8 threads? what's the detail?
I really don't know how these models came from and what's difference between these models? AI is really a magic technology.Thanks for great whisper.cpp again.

updated on 03-12-2024(2024-03-12,21:01, Beijing Time / GMT +8):

new optimization method (ASR performance less then 1 second, the root cause is because of highly elegant/handcrafted C/C++ implementation of whisper.cpp, of course Google's NDK r26c is powerful and Xiaomi 14 / Qualcomm SM8650-AB Snapdragon 8 Gen 3 is also powerful) for Xiaomi14 could be found in this file or this commit . I think I had been finished coding work of PoC - S33: coding work of data path: UI <----> JNI <----> whisper.cpp <----> kantv-play <----> kantv-core.

some snapshots of demo in PoC-S33(third step in PoC stage-3):coding work of data path: UI <----> JNI <----> whisper.cpp <----> kantv-play <----> kantv-core. or built APK from source code with this branch manually, the generated APK should works fine on any mainstream Android phone(because special optimization for Xiaomi 14 is default disabled, of course could be enabled manually in this file.

we are getting closer and closer to the final goal of this POC.once again, I'd like to express sincerely thanks for the great whisper.cpp which it's really helpful for C/C++ programmer whom know very little about AI tech.

zhouwg · 2024-03-15T03:59:19Z

updated on 03-15-2024(2024-03-15,11:59, Beijing Time / GMT +8):

I spent about 10 days(10+ hours / day with self-motivated) to achieve following goal(and many other minor improvements of this project) since 03-05-2024. it's NOT worked perfectly but more closer and closer to the final goal of this PoC.

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.cpp#L6726

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.h#L620

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/jni/whispercpp-jni.c

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/cdeosplayer/cdeosplayer-lib/src/main/java/org/ggml/whispercpp/whispercpp.java

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/cdeosplayer/kantv/src/main/java/com/cdeos/kantv/ui/fragment/ASRResearchFragment.java

It should be finished in less then or just ONE WEEK as my initial estimation or planning(because I had been spent many days to investigate online TV recording and implemented it perfectly in the end of 2023 and I could re-use many codes for this PoC) . I'm sorry for this. The reason of delay:

I shouldn't waste time on standalone local branch which delayed development progress about 1-2 days
GFW: I have to say that GFW brought many troubles to development activity and network access to outside of China(such as Google) is very very very unstable.GFW might be delayed development progress about 1-3 days: I lack of a little lucky because China was in highly sensitive period in last 2-3 weeks.

updated on 03-22-2024,19:26, anyway, I paid the price and I really have NO negative thoughts of my great country because I think I'm familiar with history of the Ming dynasty and I know that running a large and complex country is NOT easy. BUT, at the same time, I respect the fact:I had been spent about RMB10,000(USD 1500-1600) to fix network issue caused by GFW as a common programmer. so, I will NOT delete above sentence accordingly.

zhouwg · 2024-03-16T05:29:08Z

updated on 03-16-2024(2024-03-16,13:28, Beijing Time / GMT + 8)

Finally, I did it(although NOT real "real-time subtitle" and bugfix is required currently) after solve a technical problem in the source code of customized whisper.cpp.

Parts of latest source codes could be found at(master branch is preferred for R&D development activity):

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.cpp#L6727

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.h#L620

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/jni/whispercpp-jni.c

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/cdeosplayer/cdeosplayer-lib/src/main/java/org/ggml/whispercpp/whispercpp.java

This PoC only works on Xiaomi 14 currently. The reason is that Xiaomi 14 contains a very very very powerful mobile SoC(Qualcomm SM8650-AB Snapdragon 8 Gen 3 (4 nm) and the special build optimization method only works on Xiaomi 14.

Unexpected behavior such as ANR(Application Not Responding), .......app crash would be happened on other low-end Qualcomm mobile SoC driven Android phone.

kantv-realtime-subtitle-demo-with-whispercpp.mp4

The benchmark of ASR performance on Xiaomi 14 with special build optimization is less then 1 second(about 700-800 millisecond) but the performance is about 5-7 secs in complicated real scenario such as TV playback and TV recording and ASR audio recording works at the same time.

@ggerganov, The hardware AI engine in Snapdragon 8 Gen 3 should/might/could be utilized in GGML for purpose of real "real-time" English subtitle.

I don't know why today's network is stable and Google is available accordingly and Google search is really helpful for this breakthrough progress. anyway, thanks so much.BTW, miniwav (got it by great Google) is also really helpful(during troubleshooting) for this breakthrough progress. @mhroth, Thanks a lot.

At the last, I'd like to express my sincerely thanks to the great open source AI project whisper.cpp once again at the moment:without the strength and power by the excellent and amazing whisper.cpp, the above scenario in this PoC/project could not be seen.

zhouwg · 2024-03-16T13:20:02Z

Updated on 03-16-2024,21:19, got a better whispercpp inference performance(from 6 secs to 0.7 sec) in complicated real scenario such as online-TV playback and online-TV transcription and online-TV audio recording works at the same time after fine-tune for Xiaomi 14(commit could be found here).

Updated on 03-16-2024,22:36 (Beijing Time / GMT + 8), here is a video of running the whisper.cpp on a Xiaomi 14 device - fully offline, on-device (no Client-Server).

realtime-subtitle-by-whispercpp-demo-on-xiaomi14.mp4

Updated on 03-17-2024,11:19 (Beijing Time / GMT + 8)

FYI:

Parts of latest source codes of this PoC could be found at:

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.cpp#L6727

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.h#L620

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/jni/whispercpp-jni.c

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/cdeosplayer/cdeosplayer-lib/src/main/java/org/ggml/whispercpp/whispercpp.java

Final outcome of this PoC could be found at kantv-poc-with-whispercpp branch.

The master branch is preferred for AI experts or programmers 's R&D development activity since 03-17-2024.

BTW:

These codes(a new and very concise Android example of whispercpp and key-point codes of implement real-time subtitle on Android device......)could/MIGHT not be merged to upstream because it's heavily depend on FFmpeg and MIGHT be brought side effect of code pollution in upstream whisper.cpp. but these codes could be referenced for standalone R&D development activity on Android-based device.
Xiaomi 14(Qualcomm Snapdragon 8 Gen 3) or other powerful Android phone(Qualcomm Snapdragon 8 Gen 4 would be available on Oct 2024) is strongly recommended for this PoC or R&D development activity to keep sync with updated master branch.

Roadmap:

merge latest codes from upstream whisper.cpp to validate inference performance on Android device(updated on 03-18-2024, 10:34, done)
remove customized Exoplayer and cleanup codes and then make this project as Project Whispercpp-Android: an open source streaming media + device-side AI project based on great FFmpeg + great Whispercpp used for study or practice the-state-of-the-art AI technology in real application/real complicated scenario(updated on 03-18-2024, 10:34, done)
study internal detail of ggml/whisper.cpp and study AI engine in Android device for purpose of real "real-time" subtitle in real complicated scenario(online-TV playback and online-TV transcription(real-time subtitle) and online-TV language translation and online-TV video&audio recording works at the same time)
...

switch to Project Whispercpp-Android successfully according to roadmap after finsihed PoC #64 this is the new baseline for new Project KanTV(aka Project Whispercpp-Android)

darcyg · 2024-03-23T07:04:27Z

Congratulations to you

zhouwg · 2024-03-23T09:31:14Z

Congratulations to you

😄 thanks. have fun with the great whisper.cpp(backend by the great OpenAI),this is truly amazing AI technology brought by the great genius programmer @ggerganov.

)

zhouwg changed the title ~~integrate the great and powerful whisper.cpp to KanTV for purpose of real-time subtitle with online TV~~ integrate whisper.cpp to KanTV for purpose of real-time subtitle with online TV Feb 22, 2024

zhouwg added AI PoC labels Feb 26, 2024

zhouwg changed the title ~~integrate whisper.cpp to KanTV for purpose of real-time subtitle with online TV~~ PoC:integrate whisper.cpp to KanTV for purpose of real-time subtitle with online TV Feb 26, 2024

zhouwg mentioned this issue Feb 26, 2024

PoC:real-time ASR during playback online TV #50

Closed

zhouwg mentioned this issue Mar 4, 2024

integrate gstreamer to KanTV #72

Closed

zhouwg added the ASR label Mar 4, 2024

zhouwg mentioned this issue Mar 8, 2024

Refine original Android sample ggerganov/whisper.cpp#1926

Closed

zhouwg mentioned this issue Mar 10, 2024

bug in UI #77

Closed

zhouwg added the done_and_more_work_to_do label Mar 16, 2024

zhouwg mentioned this issue Mar 17, 2024

PoC:How to integrate proprietary/open-source codes to KanTV for personal/proprietary/commercial device-side/edge AI R&D activity #74

Closed

zhouwg removed the ASR label Mar 17, 2024

zhouwg added done AI PoC and removed done_and_more_work_to_do AI PoC done labels Mar 17, 2024

zhouwg changed the title ~~PoC:integrate whisper.cpp to KanTV for purpose of real-time subtitle with online TV~~ PoC:integrate whisper.cpp to KanTV for purpose of real-time English subtitle with English online-TV(OTT TV) Mar 17, 2024

zhouwg added a commit that referenced this issue Mar 17, 2024

remove comment because I did/finished PoC #64 on 03-16-2024

c06678b

zhouwg added a commit that referenced this issue Mar 17, 2024

remove customized Exoplayer according to roadmap after finsihed PoC #64

b04fc22

zhouwg added a commit that referenced this issue Mar 17, 2024

remove customized Exoplayer according to roadmap after finsihed PoC #64

3e8651d

zhouwg added a commit that referenced this issue Mar 18, 2024

release v1.3.3

8f264df

switch to Project Whispercpp-Android successfully according to roadmap after finsihed PoC #64 this is the new baseline for new Project KanTV(aka Project Whispercpp-Android)

This was referenced Mar 20, 2024

[whisper.cpp] sometimes some repeated sentences were shown in real-time English subtitle for English online-tv #84

Open

Faster streaming support ggerganov/whisper.cpp#137

Open

Finetuning models for audio_ctx support ggerganov/whisper.cpp#1951

Open

zhouwg closed this as completed Mar 22, 2024

Vivraan mentioned this issue Mar 24, 2024

Does this package work alongside Vivox? Macoron/whisper.unity#79

Open

zhouwg referenced this issue Mar 26, 2024

llamacpp:integrate ggml's excellent and amazing llama.cpp to kantv (#104

559d2ce

)

This was referenced Mar 28, 2024

[llama.cpp] AI answer does not stop automatically when inference is launched on Android phone #116

Closed

PoC: Add Qualcomm mobile SoC native backend for GGML #121

Open

zhouwg reopened this Apr 14, 2024

zhouwg removed the good first issue Good for newcomers label Apr 14, 2024

zhouwg closed this as completed Apr 15, 2024

zhouwg reopened this Apr 15, 2024

zhouwg changed the title ~~PoC:integrate whisper.cpp to KanTV for purpose of real-time English subtitle with English online-TV(OTT TV)~~ PoC:integrate whisper.cpp to KanTV for purpose of implementation of real-time AI subtitle with English online-TV(OTT TV) Apr 15, 2024

zhouwg changed the title ~~PoC:integrate whisper.cpp to KanTV for purpose of implementation of real-time AI subtitle with English online-TV(OTT TV)~~ PoC:Integrate whisper.cpp to KanTV for purpose of clean-room implementation of real-time AI subtitle with English online-TV(OTT TV) Apr 15, 2024

zhouwg self-assigned this Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PoC:Integrate whisper.cpp to KanTV for purpose of clean-room implementation of real-time AI subtitle with English online-TV(OTT TV) #64

PoC:Integrate whisper.cpp to KanTV for purpose of clean-room implementation of real-time AI subtitle with English online-TV(OTT TV) #64

zhouwg commented Feb 22, 2024 •

edited

zhouwg commented Mar 4, 2024 •

edited

zhouwg commented Mar 5, 2024 •

edited

zhouwg commented Mar 5, 2024 •

edited

zhouwg commented Mar 6, 2024 •

edited

zhouwg commented Mar 7, 2024 •

edited

dinvlad commented Mar 7, 2024

zhouwg commented Mar 8, 2024 •

edited

zhouwg commented Mar 8, 2024 •

edited

zhouwg commented Mar 10, 2024 •

edited

zhouwg commented Mar 10, 2024 •

edited

zhouwg commented Mar 11, 2024 •

edited

zhouwg commented Mar 11, 2024 •

edited

zhouwg commented Mar 11, 2024 •

edited

zhouwg commented Mar 11, 2024

zhouwg commented Mar 11, 2024 •

edited

zhouwg commented Mar 15, 2024 •

edited

zhouwg commented Mar 16, 2024 •

edited

zhouwg commented Mar 16, 2024 •

edited

darcyg commented Mar 23, 2024

zhouwg commented Mar 23, 2024 •

edited

PoC:Integrate whisper.cpp to KanTV for purpose of clean-room implementation of real-time AI subtitle with English online-TV(OTT TV) #64

PoC:Integrate whisper.cpp to KanTV for purpose of clean-room implementation of real-time AI subtitle with English online-TV(OTT TV) #64

Comments

zhouwg commented Feb 22, 2024 • edited

zhouwg commented Mar 4, 2024 • edited

zhouwg commented Mar 5, 2024 • edited

zhouwg commented Mar 5, 2024 • edited

zhouwg commented Mar 6, 2024 • edited

zhouwg commented Mar 7, 2024 • edited

dinvlad commented Mar 7, 2024

zhouwg commented Mar 8, 2024 • edited

zhouwg commented Mar 8, 2024 • edited

zhouwg commented Mar 10, 2024 • edited

zhouwg commented Mar 10, 2024 • edited

zhouwg commented Mar 11, 2024 • edited

zhouwg commented Mar 11, 2024 • edited

zhouwg commented Mar 11, 2024 • edited

zhouwg commented Mar 11, 2024

zhouwg commented Mar 11, 2024 • edited

zhouwg commented Mar 15, 2024 • edited

zhouwg commented Mar 16, 2024 • edited

zhouwg commented Mar 16, 2024 • edited

darcyg commented Mar 23, 2024

zhouwg commented Mar 23, 2024 • edited

zhouwg commented Feb 22, 2024 •

edited

zhouwg commented Mar 4, 2024 •

edited

zhouwg commented Mar 5, 2024 •

edited

zhouwg commented Mar 5, 2024 •

edited

zhouwg commented Mar 6, 2024 •

edited

zhouwg commented Mar 7, 2024 •

edited

zhouwg commented Mar 8, 2024 •

edited

zhouwg commented Mar 8, 2024 •

edited

zhouwg commented Mar 10, 2024 •

edited

zhouwg commented Mar 10, 2024 •

edited

zhouwg commented Mar 11, 2024 •

edited

zhouwg commented Mar 11, 2024 •

edited

zhouwg commented Mar 11, 2024 •

edited

zhouwg commented Mar 11, 2024 •

edited

zhouwg commented Mar 15, 2024 •

edited

zhouwg commented Mar 16, 2024 •

edited

zhouwg commented Mar 16, 2024 •

edited

zhouwg commented Mar 23, 2024 •

edited