compare audio files
Python C
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.



This program compares two audio files to determine if their waveforms closely match, allowing for a relative time offset. The difference metric is the sum of absolute differences between samples.

Run from the source directory or install it (with optimizations if Cython is available) using pip:

[CFLAGS=-march=native] pip install [--user] .


The naïve algorithm is to simply find the sum of absolute differences at each offset and choose the offset that minimizes it. This is slow. Several optimizations are applied progressively to obtain the same result in a reasonable amount of time.

Only matches with a metric below a certain threshold (configured with the -t command-line option) are reported, so there is no need to keep summing absolute differences for a particular offset after this threshold has been exceeded. This is the "limit" in _limited_ds(). Furthermore, as soon as a match is found the threshold can be decreased relative to the metric of the match. This only speeds up searches for tracks that do match, but is especially useful for tracks that match at offset 0, which is checked first.

Since silence can be expected at the beginnings of most tracks, absolute differences are calculated in pseudo-randomly ordered blocks. This is the complex part of _limited_ds(). Block size is chosen empirically. Smaller blocks allow earlier detection that the threshold has been exceeded while larger blocks increase throughput.

To reduce the amount of data being processed at each offset, a heuristic is introduced to find match candidates more quickly in downsampled data. Using |x|+|y|>=|x+y|:


The left-hand side is the sum of absolute differences over tracks a and b with four samples, which is the match metric. The right-hand side is a lower bound on it calculated using sums over groups of two samples each. If the lower bound exceeds the threshold, the match candidate is rejected, otherwise it is tested using the actual metric.

The sample groups must line up, so while the group sums for one of the tracks are calculated once, the group sums for the other track are calculated for each relative shift. This is the complex part of cmp_track(). Group size is chosen empirically. Smaller groups give a better lower bound while larger groups mean less data to process.

After the group alignment is handled and the group sums are prepared, the actual search involves a straightforward loop with integer variables and arrays. This is encapsulated in _cmp_candidates(). The same is true of _limited_ds(). Both these routines can be significantly optimized by using Cython to convert them to C. Optimized versions of these routines are provided in the _compiled module and imported conditionally if Cython is available.

The inner loop of the Cython version of _limited_ds() is a very simple calculation over fixed-size chunks of data that can be vectorized using SIMD instructions. This isn't done automatically by compilers, so a pure-C (not Cythonized) vectorized version using SSE2 and SSSE3 intrinsics is provided in _sse.h. This is used automatically if the intrinsics are available, e.g. when using the -march=native compiler flag on modern CPUs.

Using Cython provides an additional benefit. The entire loop that calculates lower bounds is run with the GIL released (it is only reacquired to test a few candidate matches using the actual metric), so they can be parallelized using Python threads. In this case when a match is found, the decreased threshold isn't propagated immediately to other threads, so the total search work is increased, but wall clock time is decreased substantially. This is a particularly good parallelization point because all CPU cores can share the track data in cache.