GitHub - ultrasaurus/mlx-audio-example

This command fails:

  time mlx_audio.tts.generate --file "audio1" --text "The Memex allows a human user to do more conveniently (less energy, more quickly) what he could have done with relatively ordinary photographic equipment and filing systems, but he would have had to spend so much time in the lower-level processes of manipulation that his mental time constants of memory and patience would have rendered the system unusable in the detailed and intimate sense which Bush illustrates." --model mlx-community/Dia-1.6B --ref_audio audio-dia/male-dangerous-phrase.wav  --ref_text "[S1]The most dangerous phrase in the language is: We've always done it this way." --speed 0.9

This shorter sentence also fails:

  time mlx_audio.tts.generate --file "audio2" --text "[S1] The associative trails whose establishment and use within the files he describes at some length provide a beautiful example of a new capsbility in symbol structuring that derives from new artifactprocess capability." --model mlx-community/Dia-1.6B --ref_audio audio-dia/male-dangerous-phrase.wav  --ref_text "[S1]The most dangerous phrase in the language is: We've always done it this way." --speed 0.9

Error loading model: 
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.10/site-packages/mlx_audio/tts/generate.py", line 319, in generate_audio
    for i, result in enumerate(results):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.10/site-packages/mlx_audio/tts/models/dia/dia.py", line 256, in generate
    audio, token_count = self._generate(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.10/site-packages/mlx_audio/tts/models/dia/dia.py", line 484, in _generate
    logits_Bx1xCxV = decode_step(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.10/site-packages/mlx_audio/tts/models/dia/layers.py", line 727, in decode_step
    x = layer(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.10/site-packages/mlx_audio/tts/models/dia/layers.py", line 595, in __call__
    sa_out = self.self_attention(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.10/site-packages/mlx_audio/tts/models/dia/layers.py", line 372, in __call__
    attn_k, attn_v = cache.update_and_fetch(Xk_BxNxSxH, Xv_BxNxSxH)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.10/site-packages/mlx_audio/tts/models/dia/layers.py", line 195, in update_and_fetch
    assert self.current_idx < self.max_len
AssertionError

splitting into two works (output is 6 seconds and 4 seconds)

time mlx_audio.tts.generate --file "audio2" --text "[S1] The associative trails whose establishment and use within the files he describes at some length provide a beautiful example" --model mlx-community/Dia-1.6B --ref_audio audio-dia/male-dangerous-phrase.wav  --ref_text "[S1]The most dangerous phrase in the language is: We've always done it this way." --speed 0.9

time mlx_audio.tts.generate --file "audio2_001" --text "[S1] of a new capsbility in symbol structuring that derives from new artifactprocess capability." --model mlx-community/Dia-1.6B --ref_audio audio-dia/male-dangerous-phrase.wav  --ref_text "[S1]The most dangerous phrase in the language is: We've always done it this way." --speed 0.9

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
audio-dia		audio-dia
.gitignore		.gitignore
README.md		README.md
generate.sh		generate.sh
input.txt		input.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages