MiSTR: Multi-Modal iEEG-to-Speech Synthesis with Transformer-Based Prosody Prediction and Neural Phase Reconstruction

Mohammed Salah Al-Radhi¹, Géza Németh¹, Branislav Gerazov²,

¹Department of Telecommunications and Artificial Intelligence, Budapest University of Technology and Economics, Budapest, Hungary
²Faculty of Electrical Engineering and Information Technologies, University of Ss. Cyril and Methodius, Skopje, Macedonia

📜 News

🧠 [2025.5.29] We are releasing all the code to support research on accelerating MiSTR multi-modal models.

🥳 [2025.05.19, 11:54 AM CET] Our paper has been accepted to Interspeech 2025. Looking forward to seeing you in Rotterdam, The Netherlands at the conference!

💡 Highlights

Multi-Modal iEEG Feature Encoding: MiSTR introduces a wavelet-based encoder combined with prosody-aware features (pitch, energy, shimmer, duration, phase variability) to model the neural dynamics of speech production.
Transformer-Based Prosody Decoder: A novel Transformer architecture captures long-range dependencies in brain activity to predict expressive and fluent Mel spectrograms aligned with speech prosody.
Neural Phase Vocoder (IHPR): MiSTR proposes Iterative Harmonic Phase Reconstruction (IHPR), ensuring phase continuity and harmonic consistency for high-fidelity audio synthesis without vocoder artifacts.
State-of-the-Art Performance: Achieves a Pearson correlation of 0.91, STOI of 0.73, and MOSA score of 3.38, outperforming all existing baselines in iEEG-to-speech synthesis.
Clinically Inspired Design: Designed with speech neuroprosthetics in mind, MiSTR offers a scalable, robust pipeline for restoring natural speech in individuals with severe communication impairments.
Code and Samples Available: Full implementation, pretrained models, and inference samples are provided in this GitHub repository to support reproducibility and further research.

🛠️ Usage

1. Clone the Repository and Set Up the Environment

git clone git clone https://github.com/malradhi/mistr.git
cd mistr

# Recommended: Create a clean environment
conda create -n mistr python=3.10
conda activate mistr

# Install required Python packages
pip install -r requirements.txt

💡 Note: Requires PyTorch ≥ 2.0 and CUDA-compatible GPU for best performance.

2. Feature Extraction from iEEG and Audio

python neural_signal_encoder.py

This will generate the following files for each participant (e.g., sub-XX):

*_feat.npy: Wavelet + prosody features extracted from iEEG
*_spec.npy: Ground-truth Mel spectrogram from original audio
*_prosody.npy: Extracted prosody features (pitch, energy, shimmer, duration, phase variability)

3. Train the Spectrogram Mapper (Autoencoder + Transformer)

python spectrogram_mapper_transformer.py

This script will:

Train a neural autoencoder to compress iEEG features into a compact latent space
Use a Transformer to predict Mel spectrograms from the latent iEEG representations
Generate audio waveforms from predicted and ground-truth spectrograms
Save predicted spectrograms as *_predicted_spec.npy
Save synthesized audio files:
- *_orig_synthesized.wav — from original spectrogram
- *_predicted.wav — from predicted spectrogram
Save evaluation results in temporal_attention_results.npy

4. Phase-Aware Waveform Reconstruction (IHPR Vocoder)

python harmonic_phase_reconstructor.py

This script applies Iterative Harmonic Phase Reconstruction (IHPR) to refine phase and improve audio quality.

It will:

Load predicted spectrograms (*_predicted_spec.npy)
Apply harmonic-consistent phase reconstruction
Save high-fidelity audio waveforms for each participant:
- *_predicted.wav (updated)
- *_orig_synthesized.wav (if regenerated)
Output .wav files in the /results/ directory

5. Visualization of Results

python neural_output_visualizer.py

This script generates high-resolution plots and visualizations:

results.png: Participant-wise correlation scores
spec_example.png and .pdf: Ground-truth vs. predicted spectrograms
wav_example.png and .pdf: Waveform comparison of original vs. reconstructed audio
*_prosody_visualization.png: Plots of extracted prosody features (if available)

All visual outputs are saved in the /results/ directory.

📂 Directory Structure

./features/            # Extracted features and prosody files
./results/             # Output spectrograms, waveforms, and plots
./harmonic_phase_reconstructor.py
./neural_signal_encoder.py
./spectrogram_mapper_transformer.py
./neural_output_visualizer.py

🙏 Acknowledgements

This work is supported by the European Union’s HORIZON Research and Innovation Programme under grant agreement No. 101120657, project ENFIELD (European Lighthouse to Manifest Trustworthy and Green AI), and by the Ministry of Innovation and Culture and the National Research, Development and Innovation Office of Hungary within the framework of the National Laboratory of Artificial Intelligence.

M.S. Al-Radhi’s research was supported by the project EKÖP-24-4-II-BME-197, through the National Research, Development and Innovation (NKFI) Fund.

📖 Citation and License

We’ve released our code under the MIT License to support open research. If you use it in your work, please consider citing us:

@inproceedings{alradhi2025mistr,
  title     = {MiSTR: Multi-Modal iEEG-to-Speech Synthesis with Transformer-Based Prosody Prediction and Neural Phase Reconstruction},
  author    = {Mohammed Salah Al-Radhi and G{\'e}za N{\'e}meth and Branislav Gerazov},
  booktitle = {Proceedings of Interspeech 2025},
  year      = {2025},
  address   = {Rotterdam, The Netherlands}
}

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
docs		docs
img		img
samples		samples
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MiSTR: Multi-Modal iEEG-to-Speech Synthesis with Transformer-Based Prosody Prediction and Neural Phase Reconstruction

📚 Table of Contents

📜 News

💡 Highlights

🛠️ Usage

1. Clone the Repository and Set Up the Environment

2. Feature Extraction from iEEG and Audio

3. Train the Spectrogram Mapper (Autoencoder + Transformer)

4. Phase-Aware Waveform Reconstruction (IHPR Vocoder)

5. Visualization of Results

📂 Directory Structure

🙏 Acknowledgements

📖 Citation and License

About

Uh oh!

Releases

Packages

Uh oh!

License

malradhi/MiSTR

Folders and files

Latest commit

History

Repository files navigation

MiSTR: Multi-Modal iEEG-to-Speech Synthesis with Transformer-Based Prosody Prediction and Neural Phase Reconstruction

📚 Table of Contents

📜 News

💡 Highlights

🛠️ Usage

1. Clone the Repository and Set Up the Environment

2. Feature Extraction from iEEG and Audio

3. Train the Spectrogram Mapper (Autoencoder + Transformer)

4. Phase-Aware Waveform Reconstruction (IHPR Vocoder)

5. Visualization of Results

📂 Directory Structure

🙏 Acknowledgements

📖 Citation and License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages