Skip to content

Beat Tracking

Yohan Chalier edited this page Jul 5, 2023 · 1 revision

Audio Source Selection

By default, BeatViewer uses the default audio input. You can specify an audio device using the -a <device-id> parameter. You can get a list of audio devices by using the -l flag:

$ python -m beatviewer -l
0     2 in, 0 out    0.09 ms - 0.18 ms    44.1 kHz    MME                    Mappeur de sons Microsoft - Input
1<    2 in, 0 out    0.09 ms - 0.18 ms    44.1 kHz    MME                    Mixage stéréo (Realtek(R) Audio
2     2 in, 0 out    0.09 ms - 0.18 ms    44.1 kHz    MME                    Ligne (USB AUDIO  CODEC)
10    2 in, 0 out    0.12 ms - 0.24 ms    44.1 kHz    Windows DirectSound    Pilote de capture audio principal
11    2 in, 0 out    0.12 ms - 0.24 ms    44.1 kHz    Windows DirectSound    Mixage stéréo (Realtek(R) Audio)
12    2 in, 0 out    0.12 ms - 0.24 ms    44.1 kHz    Windows DirectSound    Ligne (USB AUDIO  CODEC)
20    2 in, 8 out    0.01 ms - 0.05 ms    44.1 kHz    ASIO                   ASIO4ALL v2
27    2 in, 0 out    0.00 ms - 0.01 ms    44.1 kHz    Windows WASAPI         Ligne (USB AUDIO  CODEC)
28    2 in, 0 out    0.00 ms - 0.01 ms    44.1 kHz    Windows WASAPI         Mixage stéréo (Realtek(R) Audio)

In the above example, device 1 is the default audio input. To use the built-in line input (2 in this example), use the following command:

python -m beatviewer -a 2

Tips

Windows Audio Configuration

Newer versions of Windows made accessing the sound utility dialog (which, for instance, allows you to activate the Stereo Mix input) more complex. You can access it directly by making a shortcut to mmsys.cpl (or calling it in the terminal).

Windows Core Audio APIs

Windows offers several audio APIs. A single audio source may appear several times in the audio source selection, once for each API. It allows for balancing between latency and compatibility:

API Behavior
MME (Multimedia Events, previously WinMM) Oldest API, highest latency, best compatibility
DirectSound DirectX-related interface
WASAPI (Windows Audio Session API) Most recent API, lowest latency

Offline Analysis

You can also execute the module offline, by passing the path to an audio file with the -f argument:

python -m beatviewer -f track.wav

For now, only WAVE files are supported. The algorithm used is the same as for online tracking, an audio stream is simply emulated from the file. By default, tracking is not realtime, the tracker goes as fast as it can. You can use the -t flag to make it realtime, allowing for offline visualizations. Offline track analysis can then be performed by analyzing the generated output.

Recording

You may record the audio from the selected source by passing a path to an output WAV file to the -r argument:

python -m beatviewer -r ~/Desktop/recording.wav

Output

You may specify an output file with the -o [PATH] argument. It will generate a TSV file listing detected events, with the following columns:

  • Event type: either BEAT (detected beat), ONSET (detected onset) or BPM (change of BPM estimation)
  • Event frame: OSS frame index at which the event occured
  • Event time: time (in seconds) when the event occured; it is the event frame index divided by the OSS sampling rate (which is the audio sampling rate divided by the audio hop size, see the parameter table below)
  • Event value: for BPM events, the associated new BPM value

Graph Analysis

You may visualize processed signals by using the -g flag:

python -m beatviewer -g

This shows the Onset Strength Signal (OSS) with its mean and the detection threshold, the tempo, the Cumulative Beat Strength Signal (CBSS) and the detected period (Δt), the Beat Prediction Signal (BPS) and the beat trigger index (εt). Left part of the graph is the past, right part of the graph is the (predicted) future. Note that the past and the future plots are both scaled to the window height independently.

You may change the graph framerate with the -gf [int] argument (default is 30 fps).

Tuning Parameters

The beat tracking algorithm depends on many parameters. To use a specific configuration, use the -c <path-to-config-file> parameter. Take inspiration from the config.txt file.

Parameter Default Description
audio_window_size 1024 Audio window size for computing FFT.
audio_hop_size 128 Number of new samples in the window at each iteration. It will set the sampling rate for the onset strength signal. Given the audio sampling rate Fs, and the hop size H, the OSS sampling rate will be FsO = Fs / H. For Fs = 44100 and H = 128, we have FsO = 344.53 Hz.
compression_gamma 1 The spectral flux is compressed to reduce the dynamic range of the signal, and adapt it to the human hearing mechanism which is logarithmically sensitive to amplitude. Set to 0 to ignore compression. Greater values (1000) will deaden strong values and make lower values have more impact.
noise_cancellation_level -74 After compression, frequency bins with levels below this threshold are set to zero. The value is specified in dB.
hamming_window_size 15 The width of windowing function applied to the spectral flux, to make it smoother. This acts as a low-pass filter. The greater the width, the lower the cutoff frequency will be. At 15, it is about 7 Hz.
oss_buffer_size 1024 Number of OSS samples used to compute the OSS mean and the OSS variance.
onset_threshold 0.1 If the OSS becomes greater than this number of standard deviation above the mean, an onset is detected.
onset_threshold_min 5.0 If the variance is too small, this absolute threshold is used.
oss_window_size 2048 Number of OSS samples used to estimate the tempo.
oss_hop_size 128 Number of new samples in the window at each iteration. If FsO is the OSS sampling rate and H is the hop size, a new tempo is estimated with rate FsO / H. With FsO = 344.53 Hz, this yields 2.7 Hz.
frequency_domain_compression 0.5 The OSS is autocorrelated to find tempo lag candidates. This is computed by performing an FFT and a IFFT on the OSS. A power compression is applied in the frequency domain. Smaller values will increase the lag resolution but negatively impact noise.
min_bpm_detection 50 Minimum BPM detected.
max_bpm_detection 210 Maximum BPM detected.
tempo_candidates 10 Number of tempo candidates considered when estimating tempo.
tempo_accumulator_decay 0.9 Detected tempi are added to an accumulated sum. This sum decreases overtime to allow for tempo variation detection. The greater the value (0.99, 0.999) the more stable the estimator is, but the longer it takes for new tempi to be detected.
tempo_accumulator_gaussian_width 10 The tempo accumulated sum is made of Gaussian curves centered on each detected tempi. This Gaussian width allows for getting over slight variations.
min_bpm_rescaled 90 If the result BPM is lower than this, it gets doubled.
max_bpm_rescaled 180 If the result BPM is greater than this, it gets halved.
cbss_buffer_size 512 Number of CBSS samples used to determined the previous beat location.
cbss_eta 300 The log-gaussian width around previous beat locations.
cbss_alpha 0.9 Trade-off between the OSS and a pure periodic signal. It takes values between 0 and 1. At 0, only the OSS is considered. At 1, only the periodic signal is considered.
bps_epsilon_o 0 Offline latency correction factor, in number of OSS samples. See Section 6.1. of Musical Robot Swarms and Equilibria (Krzyżaniak, 2020) for details.
bps_epsilon_r 0 Realtime latency correction factor, in number of OSS samples. See Section 6.2. of Musical Robot Swarms and Equilibria (Krzyżaniak, 2020) for details.
bps_epsilon_t 20 Beat trigger index. Greater values means detecting beats earlier.
bps_gaussian_width 10 Width of the gaussian representing the next beat locations.
bps_buffer_size 512 Number of samples for which beat locations are predicted, in the future. As this is a cumulative process, bigger buffer will result in a more stable behavior.
bps_cooldown_ratio 0.4 Ratio of samples ignored right after a beat is detected, relative to the tempo lag (ie. the number of samples between two beats).
key_trigger_beats_earlier page up Increase the value of bps_epsilon_t.
key_trigger_beats_later page down Decrease the value of bps_epsilon_t.
key_set_mode_regular f9 Change tracking mode to default.
key_set_mode_tempo_locked f10 Change tracking mode to tempo locked, where current BPM is locked and CBSS will only depend on the so generated pulse train.

If the -k flag is set, then PageUp and PageDown keys can be used to increase or decrease bps_epsilon_t while the tracker is running, for manually synchronizing the tracker live.

Tracking Mode

If the -k flag is set, then F9 and F10 keys can be used to switch between two tracking modes:

Key Mode Behavior
F9 Regular Regular tracking mode
F10 Tempo locked The current tempo value is kept and further estimations are discarded until mode is switched back to regular