Skip to content

egaudrain/vocoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VOCODER

The code in this section is used to implement various vocoders, generally used to simulate acoustically what cochlear implant users experience through their device. The code base was originally produced by Stuart Rosen, later modified by Bob Carlyon, and since more or less completely rewritten in a more modern style and with many additions.

Disclaimer: This manual is written assuming the reader has some notions of how a vocoder works. If this is not the case, get in touch with someone with experience with vocoders as it is very easy to choose the wrong parameters. Know what you are doing, and do not assume that this code is foolproof. It comes with absolutely no warranty, and is only made available for information purposes. Finally, always check for the last version at https://github.com/egaudrain/vocoder.

The vocode() function is supposed to cover a lot of cases, and therefore receives a rather complex set of parameters. The said function is called like this: [y, fs]=vocode(x,fs,p). x is a (single channel, i.e. mono) signal. fs is the sampling frequency and p is a structure with all the parameters. The function returns y, the vocoded version of x.

This document describes how to setup the parameter structure p.

p has four main sections:

  • analysis_filters: describes the filter bank used for analysis purposes.
  • synthesis_filters: describes the filter bank used for synthesis.
  • envelope: describes how the envelope is extracted.
  • synth: describes the type of carrier that'll be used, and how to combine it with the envelope.

Each of these are documented below, in the same order.

To give you an overview on how to use the code, here's a complete working example:

[x, fs] = audioread('filename.wav'); % Replace filename.wav with the actual path to the wave file

p = struct();
p.analysis_filters  = filter_bands([150, 7000], 8, fs, 'greenwood', 1.5); % Order is multiplied by 4, so 1.5 gives 6
p.synthesis_filters = p.analysis_filters;

p.envelope = struct();
p.envelope.method = 'low-pass';
p.envelope.rectify = 'half-wave';
p.envelope.order = 2; % This produces a filter of order 4. Order 3 is not possible here.
p.envelope.fc = 160;

p.synth = struct();
p.synth.carrier = 'sin';
p.synth.filter_before = false;
p.synth.filter_after  = true;

[y, fs] = vocode(x, fs, p);

Filter banks

The filter banks are described in p.analysis_filters and p.synthesis_filters. If only the former is provided, the analysis filters are also used for synthesis. If both analysis and synthesis filter banks are provided, they need to have the same number of channels.

Analysis filters are used to partition the input signal into frequency bands. These are typically rather sharp. Synthesis filters are used to shape broadband carriers, or eventually to reshape narrow band carriers once they have been modulated by the envelope.

Use the function filter_bands() or the function estfilt_shift() (see below) to generate filter bank structures like this one:

fb.filterA 
  .filterB
  .center
  .lower
  .upper
  .fs
  .type
  .order
  .filter_type

filterA and filterB are cell arrays containing A and B coefficients of IIR filters for each frequency-band. These are later used in the filtfilt() Matlab function. The i-th channel, filterB{i} can also contain a cell array itself. The filters there are then applied sequentially.

center, lower and upper are vectors listing the center, lower and upper frequencies of each band.

type is the type of spacing function that was used (log-frequency, linear, or Greenwood).

filter_type is a string giving the name of the type of filter used (Butterworth; Bingabr et al. 2008...).

Convenience functions

Two help creating these rather complicated structures, two convenience functions are provided. filter_bands() is the easiest, and is actually a wrapper for the second one estfilt_shift().

filter_bands(range, n, fs, type, order, shift) returns a filter structure from the following parameters:

  • range: a two element vector giving the lowest and the highest frequency limit of the vocoder. These will become the lower cutoff frequency of the first band and the upper cutoff frequency of the last frequency band.
  • n: is the number of channels.
  • fs: is the sampling frequency.

These arguments are optional:

  • type: is the type of spacing function used to partition the range. Possible values are 'greenwood' (default), 'log' and 'linear'. Device names 'ci24' and 'hr90k' can also be used there. This overrides the values of range and n provided (so they can be left as []).
  • order: is the order of the filters (default is 3). By default, Butterworth bandpass filters are used, with the filtfilt() function. As a result the order passed as an argument here is effectively multiplied by 4. See below for more details on this.
  • shift: the shift of the frequency range expressed in millimeters along the cochlea (based on Greenwood's function).

By default, Buttherworth filters are used. High order values will produce very sharp filters that may create artifacts if the filters are narrow. Low order valuers will create filters that substantially overlap, which, depending on your carrier, may also create artifacts. To use other filters, provide a cell array of the form {filter_type, filter_parameter} for order. Providing a number for this argument is equivalent to {'butter', order}. Other possible values are 'bingabr_2008' (then the order is the slope in dB/mm) or a function handle of the form:

[b, a] = filter_function(order, [low, high], fs)

Where 'low' and 'high' are calculated from the range, the number of channels and the type of spacing, and must be in normalized frequency (f/(2*fs)).

If you use option 'bingabr_2008' for a publication, do not forget to cite Bingabr et al. (2008).

If the filter type is 'butter' (including if order is simply a number rather than a cell-array) and a fractional order is used (has to be a multiple of 1/2), separate low-pass and high-pass filters are used instead of a bandpass.

Examples

To make a structure of eight 12th order Butterworth filters equally spaced on the cochlea from 200 to 7000 Hz:

p.analysis_filters = filter_bands([200, 7000], 8, fs, 'greenwood');

To create equivalent synthesis filters but 4th order, and shifted by 4 mm towards the apex:

p.synthesis_filters = filter_bands([200, 7000], 8, fs, 'greenwood', 1, 4);

Remember, the order argument passed to the filter_bands function will be multiplied by 4, so we pass 1 to get 4th order filters. If we wanted 6th order filters, we would pass 1.5. If omitted, the value 3 is used, resulting in 12th order filters.

To create a structure of 12th order Butterworth analysis filter matching the frequencies of Cochlear devices:

p.analysis_filters = filter_bands([], [], fs, 'ci24');

To create matched filters with the current spread simulation like in Bingabr et al. (2008):

p.synthesis_filters = filter_bands(mm2frq(frq2mm(1174)+[0 16]), length(p.analysis_filters.center), fs, ...
									'greenwood', {'bingabr_2008', 2.8}, 0);

The [0 16] vector is due to the fact that standard electrode arrays are about 16 mm long. The value of 1174 Hz is the average frequency corresponding to the location of the most apical electrode (Skinner et al. 2002).

Envelope

The default envelope structure is as follows:

p.envelope.method = 'low-pass';
p.envelope.rectify = 'half-wave';
p.envelope.fc = 250;
p.envelope.order = 2;
p.envelope.modifiers = {};

Here is a quick explanation of each of the fields:

  • method: determines how the envelope is extracted. Possible values are 'hilbert' or 'low-pass'.
  • rectify: if the method is 'low-pass', then the signal is rectified, either 'half-wave' or 'full-wave', and then low-passed.
  • fc: if the method is 'low-pass', the rectified signal is low-pass filtered below this cutoff frequency. If a vector is provided, its length must be the number of channels.
  • order: again if the method is 'low-pass', this specifies the order of the filter. Because the filtfilt() function is used, the order is effectively multiplied by 2.
  • modifiers: a cell-array listing modifiers for the envelope. These are detailed below.

Envelope modifiers

The field 'modifiers' contains a list (cell-array) of function names (or handles) modifying the envelope in each channel. A cell array can also be provided to give arguments to the function. Possible values are:

  • 'threshold': applies a threshold to the envelope (equivalent to {'threshold', 0.01}). This is implemented as max(env-thr, 0). If a positive threshold value is given, the threshold is calculated as a proportion of the maximal envelope value (across time and channels). If a negative threshold is given, it is treated as an absolute value. See envelope_mod_threshold() for details of the implementation.

  • 'n-of-m': select only the n first maxima (equivalent to {'n-of-m', 8}, selecting 8 maxima). This is used to simulate strategies like ACE. In these strategies, the maxima can evaluated on every pulse. We do the same here, using a stimulation rate of 900 Hz. This frequency can also be passed as third argument. If this strategy is used, it is strongly advised to use p.synth.filter_after=true to avoid clicks, and a warning will be emitted if this is not the case. See envelope_mod_nofm() for details of the implementation.

  • a function handle. The function will be passed the envelope matrix, the sampling frequency, and extra arguments and must thus have the following signature: Env = fnc(Env, fs, p, ...). This should be passed as {@fnc, ...} (replace the '...' with extra arguments as necessary). The Env matrix has n-sample rows and m-channels columns.

The envelope is extracted either by rectification/low-pass or using the Hilbert-transform. If no modifier is provided, the envelope is then normalized in each channel, then multiplied with the carrier, eventually (re)filtered in the band, and the RMS of the resulting signal is then adjusted to that of the original signal filtered in that same band. Once all the bands are summed, the resulting signal's RMS is adjusted to match that of the original signal filtered in the same frequency range.

However, if modifiers are used, the envelopes are not normalized and RMS adjusted. Only the signal obtained after summation of the channels is scaled. This is particularly important for the n-of-m modifier, for instance.

Examples

This defines an envelope with a cutoff of 50 Hz, with selection of 5 maxima.

p.envelope.method = 'low-pass';
p.envelope.rectify = 'half-wave';
p.envelope.fc = 50;
p.envelope.order = 2;
p.envelope.modifiers = {{'n-of-m', 5}};

Note the use of the double curly-brackets for the modifier list. This is what should be types to add a threshold:

p.envelope.modifiers = {{'n-of-m', 5}, {'threshold', 0.02}};

And this is how a custom function should be used. First define your function.

function Env = envelope_mod_compression(Env, fs, p, alpha)
   Env = Env.^alpha;
end

Then add it to the modifiers.

p.envelope.modifiers = {{'n-of-m', 5}, {'threshold', 0.02}, {@envelope_mod_compression, .5}};

Carrier

The field 'synth' describes how the resynthesis should be performed:

  • carrier: can be 'noise' (default), 'sin', 'low-noise' or 'pshc'.
  • filter_before: if true, the carrier is filtered before multiplication with the envelope (default is false).
  • filter_after: if true, the modulated carrier is refiltered in the band to suppress sidebands (default is true). Keep in mind that if you filter broadband carriers both before and after modulation you may alter the spectral shape of your signal.
  • f0: used when carrier is 'pshc'. There is no default, but 1 is a good value.

Note that for the 'noise', 'low-noise' and 'pshc' carriers, the random stream will be initialized using the field p.random_seed. By default this field is set to sum(100*clock).

How to cite this vocoder

If you want to cite the vocoder itself, release v1.0 "Basal" has a DOI DOI. Cite using this:

To cite a paper where the vocoder was used:

  • E. Gaudrain and D. Başkent (2015) "Factors limiting vocal-tract length discrimination in cochlear implant simulations." J. Acoust. Soc. Am. 137:1298–1308. doi: 10.1121/1.4908235.

If you use the PSHC or low-noise noise carriers, you also have to cite:

  • G. Hilkhuysen and O. Macherey (2014) "Optimizing pulse-spreading harmonic complexes to minimize intrinsic modulations after auditory filtering," J. Acoust. Soc. Am. 136:1281–1294. doi: 10.1121/1.4890642.

Acknowledgements

The original base of the code was written by Stuart Rosen (UCL) and Robert P. Carlyon (MRC CBSU), before being re-written by Etienne Gaudrain. Olivier Macherey, Gaston Hilkhuysen and Quentin Mesnildrey (CNRS LMA), provided useful comments. Nawal El Boghdady (UMCG) found and corrected a number of bugs.

References

About

A versatile, easily configurable vocoder software in MATLAB, for research purposes

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages