A lightweight Swift-based feature extraction library for transforming raw audio chunks into log-Mel spectrograms, suitable for use in CoreML and on-device inference.
Built with ❤️ for on-device audio intelligence.
You can add OtosakuFeatureExtractor
as a Swift Package dependency:
.package(url: "https://github.com/Otosaku/OtosakuFeatureExtractor-iOS.git", from: "1.0.2")
Then add it to the target dependencies:
.target(
name: "YourApp",
dependencies: [
.product(name: "OtosakuFeatureExtractor", package: "OtosakuFeatureExtractor")
]
)
[Raw Audio Chunk (Float64)]
↓ pre-emphasis
[Pre-emphasized audio]
↓ STFT (with Hann window)
[STFT result (complex)]
↓ Power Spectrum
[|FFT|^2]
↓ Mel Filterbank Projection (matrix multiply)
[Mel energies]
↓ log(ε + x)
[Log-Mel Spectrogram]
↓ MLMultiArray
[CoreML-compatible tensor]
You must provide a directory containing:
filterbank.npy
— shape[80, 201]
, float32 or float64hann_window.npy
— shape[400]
, float32 or float64
import OtosakuFeatureExtractor
let extractor = try OtosakuFeatureExtractor(directoryURL: featureFolderURL)
- 🎛 Feature Extractor Assets
Download precomputedfilterbank.npy
andhann_window.npy
files required byOtosakuFeatureExtractor
.
➡️ OtosakuFeatureExtractor Assets (.zip)
💬 Want a model trained on custom keywords?
Drop me a message at otosaku.dsp@gmail.com — let’s talk!
The input must be a raw audio chunk as Array<Double>
, typically at 16kHz sample rate.
let logMel: MLMultiArray = try extractor.processChunk(chunk: audioChunk)
audioChunk
should be at least 400 samples long to match the FFT window size.
saveLogMelToJSON(logMel: features)
- Accelerate — for optimized DSP
- CoreML
- pocketfft
- plain-pocketfft
OtosakuFeatureExtractor/
├── Sources/
│ └── OtosakuFeatureExtractor/
│ ├── OtosakuFeatureExtractor.swift
├── filterbank.npy
├── hann_window.npy
Project by @otosaku-ai under the Otosaku brand.
MIT License