Skip to content

ultrasaurus/mlx-audio-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This command fails:

  time mlx_audio.tts.generate --file "audio1" --text "The Memex allows a human user to do more conveniently (less energy, more quickly) what he could have done with relatively ordinary photographic equipment and filing systems, but he would have had to spend so much time in the lower-level processes of manipulation that his mental time constants of memory and patience would have rendered the system unusable in the detailed and intimate sense which Bush illustrates." --model mlx-community/Dia-1.6B --ref_audio audio-dia/male-dangerous-phrase.wav  --ref_text "[S1]The most dangerous phrase in the language is: We've always done it this way." --speed 0.9

This shorter sentence also fails:

  time mlx_audio.tts.generate --file "audio2" --text "[S1] The associative trails whose establishment and use within the files he describes at some length provide a beautiful example of a new capsbility in symbol structuring that derives from new artifactprocess capability." --model mlx-community/Dia-1.6B --ref_audio audio-dia/male-dangerous-phrase.wav  --ref_text "[S1]The most dangerous phrase in the language is: We've always done it this way." --speed 0.9
Error loading model: 
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.10/site-packages/mlx_audio/tts/generate.py", line 319, in generate_audio
    for i, result in enumerate(results):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.10/site-packages/mlx_audio/tts/models/dia/dia.py", line 256, in generate
    audio, token_count = self._generate(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.10/site-packages/mlx_audio/tts/models/dia/dia.py", line 484, in _generate
    logits_Bx1xCxV = decode_step(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.10/site-packages/mlx_audio/tts/models/dia/layers.py", line 727, in decode_step
    x = layer(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.10/site-packages/mlx_audio/tts/models/dia/layers.py", line 595, in __call__
    sa_out = self.self_attention(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.10/site-packages/mlx_audio/tts/models/dia/layers.py", line 372, in __call__
    attn_k, attn_v = cache.update_and_fetch(Xk_BxNxSxH, Xv_BxNxSxH)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.10/site-packages/mlx_audio/tts/models/dia/layers.py", line 195, in update_and_fetch
    assert self.current_idx < self.max_len
AssertionError

splitting into two works (output is 6 seconds and 4 seconds)

time mlx_audio.tts.generate --file "audio2" --text "[S1] The associative trails whose establishment and use within the files he describes at some length provide a beautiful example" --model mlx-community/Dia-1.6B --ref_audio audio-dia/male-dangerous-phrase.wav  --ref_text "[S1]The most dangerous phrase in the language is: We've always done it this way." --speed 0.9

time mlx_audio.tts.generate --file "audio2_001" --text "[S1] of a new capsbility in symbol structuring that derives from new artifactprocess capability." --model mlx-community/Dia-1.6B --ref_audio audio-dia/male-dangerous-phrase.wav  --ref_text "[S1]The most dangerous phrase in the language is: We've always done it this way." --speed 0.9

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages