Skip to content

wwerkk/audio-segment

Repository files navigation

audio-segment

MFCC based audio grain sorting

Open In Colab

Watch demo on YouTube: https://youtu.be/yyklh6AH8J8
How this script works:


  1. Global parameters:
  • number of MFCCs used,
  • size of the FFT used in calculating MFCCs,
  • length of the frame,
  • 0th MFCC toggle,
  • onset detection toggle,
  • windowing toggle,
  • smoothing toggle,
  • smoothing window width,
  • sigma parameter of the Gauss filter,
  • segmentation hop length.
  1. Loading a file using GUI. [ipyfilechooser] / [google.colab]
  2. Segmentation:
  1. Windowing of signal in each frame using Hamming Function. [numpy]
  2. Calculation of MFCCs for each frame without centering [librosa]
  3. Example frame sorting using MFCC values via as folllows,
  • using a k-d tree structure: [scipy.spatial]
    • by querying for all the neighbours of frame[0], sorted by distance in ascending order,
    • by finding the shortest path that traverses the entire tree starting from frame[0],
  • using correlation clustering represented as hierarchy dendrogram, sorted by distance in ascending order. [scipy.cluster]
  1. Output construction via concatenation of frames in arrangements calculated in step 6., optionally smoothing out signal discontinuities between them using a 1-Dimensional Gaussian filter. [scipy.ndimage]

All results displayed are plotted using matplotlib.

This project references online resources which elaborate further on the concepts of audio feature extraction and math-based workflows in Python, such as musicinformationretrieval.com as well as documentation of matplotlib, numpy, librosa and scipy libraries.

About

MFCC based audio slicing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published