Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V3 audio transcription: aud.subarray is not a function #845

Closed
1 of 5 tasks
flatsiedatsie opened this issue Jul 10, 2024 · 8 comments
Closed
1 of 5 tasks

V3 audio transcription: aud.subarray is not a function #845

flatsiedatsie opened this issue Jul 10, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@flatsiedatsie
Copy link

System Info

Cutting edge version of V3 (just compiled)

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

I attempted a drop-in replacement of the V3 version in a V2 webworker, just to see if it would even work.

And it does! Trying to enable WebGPU is the next step.

However, while it works, I do see an error:

Screenshot 2024-07-10 at 18 18 35

Reproduction

I could share code if need be.

@flatsiedatsie flatsiedatsie added the bug Something isn't working label Jul 10, 2024
@flatsiedatsie
Copy link
Author

I spotted a small typo in the demo:

PipelineSingeton is missing an L

@flatsiedatsie
Copy link
Author

A small question: is there a downside to simply always using the timestamp version?

@flatsiedatsie
Copy link
Author

And some more questions:

The V2 demo has a number of values that could be manipulated. Are they still useful?

  • const isDistilWhisper = model.startsWith("distil-whisper/");
  • quantized

and these:

	        /// Greedy
	        //top_k: 0,
	        //do_sample: false,

	        // Sliding window
	        //chunk_length_s: isDistilWhisper ? 20 : 30,
	        //stride_length_s: isDistilWhisper ? 3 : 5,

	        // Language and task
	        //language: language,
	        task: subtask,

	        // Return timestamps
	        //return_timestamps: true,
	        //force_full_sequences: false,

@flatsiedatsie
Copy link
Author

I saw this error once:
Screenshot 2024-07-11 at 00 55 30

@flatsiedatsie
Copy link
Author

I'm having some trouble getting it to respond with more than 1 word when using WebGPU. It usually thinks it heard 'And'.

Screenshot 2024-07-11 at 01 07 14

@flatsiedatsie
Copy link
Author

FP32 was the key

@xenova
Copy link
Owner

xenova commented Jul 11, 2024

Thanks for the report! The function does require Float32Array or Float64Array inputs, but we could use .slice() if .subarray isn't present (for normal arrays)

@flatsiedatsie
Copy link
Author

Nah, no worries. With what you're saying I believe the issue was that I was feeding it a fake array to get it to preload. That worked with the V2 version, but no longer worked with the V3 version. But that's fine as there are plenty of new ways to handle pre-loading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants