Restore degraded (LAME-encoded) MP3s with modern machine learning. ADE is a lightweight, experimental tool for recovering perceptual audio quality from lossy-compressed MP3 files. It specifically targets artifacts introduced by the LAME encoder and approaches restoration as a disambiguation problem: given information lost through perceptual coding, the model selects a plausible original consistent with learned musical structure and codec behavior. In internal evaluation, ADE achieves up to an 80% reduction in NMSE relative to the baseline LAME 3.100 decoder on held-out material.
Unlike conventional denoising, ADE explicitly targets the irreversible distortions introduced by perceptual compression, providing a post-hoc restoration path when the original master is unavailable.
- Not generative. ADE is regularized to select a single plausible reconstruction, rather than hallucinating new content.
- Not denoising. MP3 distortion is not additive noise; it is the result of a many-to-one compression process.
- Psychoacoustically regularized. The training objective combines mode-locking with complementary geometric regularizers to encourage musically coherent outputs.
- Evaluated on ~400k unseen music blocks. Peak relative NMSE reduction is approximately 80%, corresponding roughly to an effective bitrate gain of about 160 kb/s.
- CBR MP3 only. VBR and ABR files are rejected.
- Demo limit: 5 MB per file.
In modern audio workflows, compression is often irreversible in practice. A good codec aims to maximize perceptual quality while minimizing bitrate.
There are two broad directions:
- Pre-hoc methods: improving the encoder itself through better psychoacoustic models and coding schemes, as in AAC and Opus.
- Post-hoc methods: attempting to improve the output of an already-compressed file without access to the original source.
The second problem remains much less explored. In real life, source material is often unavailable: the original master may be lost, archived poorly, or simply inaccessible. That leaves only the compressed file, and the question becomes whether useful structure can still be recovered.
ADE is built for that setting.
MP3 compression is non-injective: multiple different originals can map to the same compressed representation. In geometric terms, each frame corresponds to a family of plausible preimages — a fiber of possible originals.
These fibers are not necessarily smooth or convex. They can contain gaps, discontinuities, and highly structured constraints. That is why naive denoising or regression-to-the-mean tends to fail: averaging over all plausible solutions often produces blurred, weak, or unnatural audio.
ADE treats restoration as a regularized disambiguation problem. It learns musical structure, evaluates plausible candidates under codec-aware priors, and locks onto a single high-probability mode per frame. The goal is not to invent new content, but to recover the most plausible lost detail.
| Mirror | Notes | Status |
|---|---|---|
| https://audiode.theivanr.duckdns.org/ | Mini Demo | https://stats.uptimerobot.com/iYI12tPzn7 |
- Upload a CBR MP3 with 44.1 kHz SR
- File size limit: 5 MB
- ADE detects the bitrate and selects the closest model automatically
- Download the restored WAV
NOTE: This is intentionally compute-heavy and currently not optimized for large-scale hosting. The demo is meant for exploration; for real workloads or larger files, local execution is strongly recommended.
This is the recommended route for larger files and local experimentation.
- Python 3.8+
- Anaconda is recommended, but a standard Python environment also works
git clone <ADE-MP3>
cd ADE-MP3
pip install streamlit onnxruntime soundfile librosa numpy matplotlib scipystreamlit run web_frontend.py --server.maxUploadSize=100A detailed white paper is currently under peer review. The core ideas are:
-
Geometric framing: MP3 inversion is modeled as selecting a single point from a fiber of plausible originals.
-
Mode-locking regularization: The network is trained to prefer one stable solution rather than a diffuse average.
-
Psychoacoustic loss: The objective is aligned with perceptual relevance, not just numerical reconstruction error.
-
Small footprint: The final model is about 1 MB and runs efficiently via ONNX Runtime on CPU or GPU.
- Contributors are welcome, make an issue on github or write to me.
Audio Disambiguation Engine (ADE) is released under the GNU Affero General Public License v3.0 (AGPLv3).
- You may use ADE freely for personal, research, or commercial purposes.
- If you run ADE as a service (including commercial offerings), you are required to provide full source code and any modifications under the same AGPL license.
- You may redistribute or adapt ADE, but all derivative works must remain open source under AGPL.
- For details, see the included LICENSE file.
