Roemer Says Things

Filtering + Hilbert should always be one-pass zero-phase FIR filtering, not two-pass

TL;DR: two-pass FIR filtering is bad*, it over-smoothes and has poor frequency resolution, use one-pass zero-phase filtering instead.

The issue:

We've often heard the dogma (from Brad and I) that filtering+Hilbert == wavelet == short time-windowed FFT (given appropriate settings). This is true in a mathematical sense, as shown by Bruns in 2004.

We use this dogma to say that it doesn't matter which one you choose, it will be similar.

This is not true.

Well, it is true of course, see Bruns 2004. But in our daily usage, this is not true. That is because filtering+Hilbert is considered by all of us as two-pass FIR filtering (for those thinking it is fine if this is IIR, come talk to me).

Two-pass FIR filtering is not equivalent to wavelet convolution, and neither is it to STWFFT. One-pass zero-phase filtering is. Two-pass filtering is equivalent to convolving the data with your wavelet, twice. Which is silly to do.

The problem with two-pass filtering compared to wavelet convolution/STWFFT:

Why is two-pass filtering silly? Well, there are two things we care about when doing spectral estimation. (1) time-domain smoothing, and (2) frequency resolution.

We don't want to smooth over too much data, because we won't be able to see any quick changes anymore. I.e. using a filter of 10 seconds plowing over your data will mush out anything that 20ms long. We also want to be able to distinguish oscillations at different frequencies. We don't want our 10Hz signal to be influenced by our 5Hz signal or our 20Hz signal. So we want to keep our frequency resolution reasonable.

Both of these are determined by your filter length/order. The filter length determines the smoothing window, and 1/the length determines your frequency resolution. If you smooth with 1 second, you can distinguish 10Hz from 11Hz. If you smooth with only 100ms, you can distinguish 10Hz from 20Hz, and anything in between is mushed together.

So, we try to pick a filter length that balances between these two.

What do we do when using two-pass filtering? We smooth twice, yet retain the same frequency resolution. If our filter length is 5 cycles of an alpha oscillation, we smooth with 10 cycles of an alpha oscillation, making short burst less visible in the output. Yet, we retain the frequency resolution of 5*100ms (10Hz) = 0.5 = 2 Hz. So we smooth with a window we didn't want, and we gain nothing in return, because we can still only distinguish 8Hz from 10Hz, and 10Hz from 12Hz. Using a one-pass filtering approach of the same length, we would smooth over 2 seconds but we will be able to distinguish 8Hz from 9Hz, 9Hz from 10Hz, etc.

Not very efficient right? We spent effort to optimize the two parameters, and then just smooth more anyway.

But I thought we had to use two-pass filtering?

This is another dogma, which is only partially true.

We want our filter to be zero-phase (acausal, acyclic). This means that if an event happens at t=0, the filtered event will happen at t=0. A causal filter will cause it to shift in time, say t=500ms. Two-pass filtering will allow any filter to become zero-phase. Even IIR filters, which have their uses.

However, most (!) FIR filter are what is called linear phase. Linear phase means stuff is just delayed with a fixed delay, which is exactly half of the filter length. Just pad the data a little bit, filter it once, and cut stuff off. Or, filter it, cut a bit less stuff off. Or filter it, replace stuff with NaNs, shift time-axis. Fixed!

So how can we still use filtering + Hilbert in a way that is equivalent to the other approach?

I kind of gave it away already, but we need to do one-pass zero-phase filtering. How? Two options.

First option: convolution.

A linear phase FIR filter. How do you check this? A linear phase filter is always symmetric. Create a filter of even order, and checker whether the first half is the same as the flipped second half. The should be the case for almost all filters you create. (Odd should also be okay, but for now, use even.)
Pretreat the data.
Filter with a one-pass. (That is, do not use filtfilt).
Fix the data.

Note: Step 4 is dependent on step 2. The delay is always half of the filter length. So either pad or don't pad, shift the axis or not, as long as you deal with the delay appropriately.

Second option: convolution via multiplication in the frequency domain.

If you're unsure you did it correctly, create some zeros. Make the center one, a 1. Use the same procedure as on the data. If you did it correctly, the maximum of the filtered data occurs at the same sample.

Get the filter as above. Pad it out with zeros to the same length as your data.
Center the wavelet around zero. I.e. if you look at your filter at the center, everything to the left is 'negative time' and everything to the right is 'positive time'. Negative time then 'wraps around zero', and are the sample at the end of the filter. Makes sense? You can use a function like fftshift for this (the same in matlab/scipy IIRC). Or just simply shift it yourself.
Take the FFT of your data and your filter.
Multiply the two element-wise.
Take the inverse-FFT
Make the samples the first M/2 samples and the last M/2 samples NaNs, or cut them off. (M is the length of the filter).

Note: As always, one can conceive of situations where this isn't bad. You can, for example, use this as a way to control your frequency smoothing in a crude way. However, that hasn't been our intention, so I call it bad.

Tips on using illustrator

1) direct selection tool (white arrow) and group selection tool (black arrow)

Illustrator works with vector art, and vector art consists of parts. E.g., a plot often has a square plot-box, consisting of multiple lines, and lines in the middle, consisting of single connected dots, and some text/etc. These parts can form a group, and multiple groups can form another group, etc. The group selection tool selects an entire group, while the direct selection tool selects the individual parts. This also includes corners of a box, dots of a line, etc. I switch between these two tool constantly.

2) Lock object (cmd+2) and unlock all objects (cmd+opt+2)

THIS IS YOUR FRIEND. I use these things all the time, without them I would be completely lost. Locking an object, i.e. whatever you have selected, prevents you from selecting it afterwards. This is extremely useful ALL OF THE TIME. If you're moving stuff around, lock the surrounding stuff. If you wanna select a tiny thing, lock the surrounding stuff. If something gets in the way, lock that piece of ****. You want to copy axis and tick labels, copy the entire figure, lock the target figure, paste everything on top of the target and align, select the extra figure laying on top and delete. Unlock afterwards.

3) Using groups

Likewise, grouping is also your friend, for the following reason. Some commands operate on groups as a single unit. All the 'align' and 'distribute' buttons for example, which allow you to align plots based on their top/bottom/whatever, or equally space plots. Group the items you want to align, and do what you want with them. No longer needed? Ungroup. Also important, this interacts with the 'transform each', found in the right click menu. This allows you to e.g. resize objects in place. Want to enlarge a group of plots, but want to retain the position of each of one? Group each plot, select all, 'transform each', and they're all changed in-place. If you have a scatter plot, with lots of dots, and you want to make each individual dot bigger, without changing the plot size, select all dots (after locking the crap you don't want to change), ungroup (often grouped by default), 'transform each', and you're happy. As each dot is changed in-place.

Also:

when resizing, make sure the box with 'scale ticks/'etc is checked, otherwise things will go weird (this is often disabled by default, and took me a while to realize years ago)
when importing EPSs/other generated by other software (Matlab in particular), and you want to copy them over, clean them up first! They contain lots of objects that you will not use, but make you incredibly confused later. So select-delete, select-delete, select-delete, until you can happily copy over only the thing that you want.
when ready to export to PDF for the printer, unlock everything, go to the select menu ---> object --> all text objects, and go to the type menu --> create outlines. This will make all your text artwork (not works of art sadly), which doesn't require fonts. Do this always, printers make all sorts of mistake. Don't save of course, because you'll never get your text back to editable. Just use this to export your pdf.

Should I use a single filter to characterize time-resolved HFA (High Frequency Activity)? No.

Here's a rationale for why you shouldn't use a single filter, and why using many smaller filters and z-scoring the resulting time-series is the right thing to d0

Assumptions

there is a signal in the LFP reflecting neuronal spiking
this signal is present at all frequencies
increases in this signal result in a linear offset/upwards shift in the PSD in log-log-space
this process is not oscillatory (i.e. not visible as wiggles in the time-domain)

We used to call this signal high-gamma, HG. Now we call it HFA, for High Frequency Activity, by decree.

So, how do we extract this signal? We cannot really extract this using Fourier coefficients from the lower frequencies, say <80Hz, as these are often dominated by band-limited oscillatory signals. So, we aim for a higher (80Hz+) frequency range. We can't go all the way up to the Nyquist frequency, because amplifiers get noisy at small variations in the signal, which is the case for high frequencies. Another factor we currently do not understand, is whether anything is present in the spectrum at high (80Hz+) frequencies that will cause power changes unrelated the HFA signal we are interested in. This is analogous to the problem of band-limited oscillatory signals below 80Hz.

So far, a range of 80-200Hz or so seems to be a safe range. But, this is an empirical range, defined by results and not our understanding of the signal.

Now, when we know we should look for the HFA signal in the range of 80-200Hz (let's assume that for a moment), how do we do this? We could, for example, simply filter our time-series in the range of 80-200Hz. Which sounds like it should do the trick right? It does, with a strong bias towards to the first bits of that range. Why? Because amplitude depends on frequency: the PowerSD and AmplitudeSD follow a power-law of the form 1/f^x. Our 80-200Hz filter does nothing to change this, it just cuts out a part. The power-law holds for the PSD, and equally for amplitude in the time-domain. The 80Hz part of the signal has much, much, higher amplitude than, say, the 150Hz part of the signal. Therefore, any temporal variations of amplitude/power of high frequencies in this range will be practically invisible. In practice, that means you might as well filter your data with a 80-100Hz filter. It will look pretty damn similar. Statistically not very powerful right? Because we totally ignore assumption 3.

How can we improve? It's simple! By making use of assumption 3. Which says that it doesn't matter where we look in the spectrum, our signal of interest is equally present in each of them, relative to background power. Just extract time-series with smaller ranges, undo the unintended power bias in each of them, and combine them into one signal. How can we undo the power bias? There are multiple ways. We could get the PSD/ASD of the entire recording, fit a line to it, and use the power/amplitude offset as a function of frequency for correcting. Or we could z-score each 'small-band' time-series, and average the signals from the different bands. Or just fit a line to the PSD/ASD of tiny time-windows the same size as our filter, and use this offset as the signal of interest, and bypass the whole filtering. Each has it's advantages and disadvantages.

How to do the first option?

filter with multiple 20Hz bands (example), using the same FIR (!) filter order (or using some other spectral analysis with equivalent settings)
get amplitude envelope ( abs(hilbert) )
z-score each filtered signal (over time, over trials)
average z-scored signals

The results of the above procedure compared to using a single filter can be seen in the figures below.

File:HFAoneband.png ''Figure 1: Event-related HFA computed using a single filter. The different colors are different conditions of a task. ''

File:HFAmultiplebands.png ''Figure 2: The same data, but now the event-related HFA computed using multiple filters of 20Hz width.''

Semi-automatically finding line spectra

I made an algorithm for semi-automatically finding line spectra (in MATLAB). It can be found on github [https://github.com/voytekresearch/dutchdelight/blob/master/rmr_findlinespectra.m here].

This is intended to be used on recordings that are thoroughly contaminated with line spectra of varying width (or, lineliness). This is often the case for recordings in a hospital setting, such as ECoG, PD depths, etc.

The function outputs a set of peak frequencies and half-bandwidths, together with basic filter settings to be used to filter the nasties out, and some other things.

If you wanna try it out, I'd strongly suggest keeping the filters as is, and play around with the default z-value threshold and bandwidth steps (which affects the edge artifacts necessary to cut off afterwards).

Inspiration for making this is the desire to fit slopes to PSDs up to very high frequency ranges, e.g. up to the noise floor. Cleaning the data is then important, especially if we wanna fit slopes on small time windows. This is due to bleeding of the line spectra to neighboring frequencies. Doing this by hand totally sucks (see example below).

As an example, I attached a figure showing filtered and unfiltered PSD in normal/log space, with long time-windows used to estimate the spectra (10s) and short time-windows (75ms). (notice also the non-problem of crushing spectra to zero, which we talked about briefly in the last group meeting). Blue == blegh! Green == yummie! File:findinglinespectra.png

There are two steps that are additionally necessary to fit slopes to the data filtered using the output of the algorithm:

determine the noise floor (currently no rationale for this yet)
use a very strong low-pass after the fitting range to crush any remaining line spectra (in the example figure, I fit between 80-500, after putting a very sharp low-pass at 540Hz)

Template brains

Here a little history mostly regarding MNI space: http://www.nil.wustl.edu/labs/kevin/man/answers/mnispace.html

The MNI wanted to define a brain that is more representative of the population. They created a new template that was approximately matched to the Talairach brain in a two-stage procedure. First, they took 241 normal MRI scans, and manually defined various landmarks, in order to identify a line very similar to the AC-PC line, and the edges of the brain. Each brain was scaled to match the landmarks to equivalent positions on the Talairach atlas. They then took 305 normal MRI scans (all right handed, 239 M, 66 F, age 23.4 +/- 4.1), and used an automated 9 parameter linear algorithm to match the brains to the average of the 241 brains that had been matched to the Talairach atlas. From this they generated an average of 305 brain scans thus transformed - the MNI305. See Louis Collins' thesis for a detailed description; the conference paper by Evans et al also describes the process. The MNI305 was the first MNI template. The current standard MNI template is the ICBM152, which is the average of 152 normal MRI scans that have been matched to the MNI305 using a 9 parameter affine transform. TheInternational Consortium for Brain Mapping adopted this as their standard template; it is the standard template in SPM99. In addition, one of the MNI lab members, Colin Holmes, was scanned 27 times, and the scans were coregistered and averaged to create a very high detail MRI dataset of one brain. This average was also matched to the MNI305, to create the image known as "colin27". colin27 is used in the MNI brainweb simulator. SPM96 used colin27 as its standard template. You can download a copy of this image at 1mm resolution from our site; SPM96 contains a 2mm resolution copy of the same image, called T1.img in the SPM canonical directory. Note that the images in the SPM "templates" directories have all been presmoothed to 8mm for use with the normalization routines.

The template I gave you (from here), is created from the MRI that is highlighted in bold text, colin27. The template I gave you is no longer an MRI (with white matter, grey matter, etc), but only contains the brains surface. The surface of the brain can be extracted from any MRI using specialized software, of which Freesurfer is the most widely used (I'm pretty sure).

A bit more detail on the widely used colin27, for the curious, can be found here: http://www.bic.mni.mcgill.ca/ServicesAtlases/Colin27

And information on other templates, from which surfaces in principle could be generated, but most don't bother: http://www.bic.mni.mcgill.ca/ServicesAtlases/HomePage

Take home message

The take home message from the above, is that normalizing a persons brain is complex, non-trivial, and is noisy because of many things. Due to this, highly precise electrode locations are not very useful once you go to a standardized brain, but gross location is (say, a couple of mm, give or take). So, when you're plotting electrodes on templates, keep this in the back of your mind. Precise electrode locations on an individual brain are different, and much more useful, be it human, macaque, or whatever. For a nice use of this, see for example this paper.

About visualizing data using color, brightness, and how they mislead us

Hi all,

Perhaps you have heard about it, perhaps not. Mathworks announced a while back that they changed their default colormap, 'jet', to 'parula' (since 2014b).

An example:

File:bad_jet.jpg

Though this sounds trivial, I think this is absolutely crucial. Visualizing data is our biggest tool in judging data. If we can pick only one type of thing that a paper consists of, it'd be the figures (well, I would anyway, in this high dimensional data driven field of ours).

Why crucial? I did not realize it till a couple of months ago when I heard about it. Jet, or 'rainbow' colormaps in general, are actually pretty misleading. It obscures small differences, and creates illusions of big differences where there are none. Size of the differences are poorly reflected by the color changes. In short, it guides our visual system to the wrong parts of the data.

A large part of this misleading is because the brightness of color transitions in a rainbow color map do not follow the transitions in the data. For example, yellow is much brighter than red, yet a dark red is the maximum. This is quickly noticeable if you print rainbow color figures in black and white.

I, for one, am ditching jet.

Very interesting reads, with examples, from Mathworks on the rationale for this change: http://blogs.mathworks.com/steve/2014/10/13/a-new-colormap-for-matlab-part-1-introduction/ http://blogs.mathworks.com/steve/2014/10/20/a-new-colormap-for-matlab-part-2-troubles-with-rainbows/

Cheers, Roemer

Fractalizer

Hello, welcome to another edition of Roemer Says Things, channeled through me this week.

The topic of today’s RST is the fractalyzer algorithm that some of us are using to extract 1/f from power spectrums. Specifically, when it does not do what you want it to do.

TL;DR: the fractalyzer has two failure modes and they interact with each other: if your oscillation is too wide and if you have multiple power law regimes. That being said, still good to use, just look at the resulting PSD before blindly fitting. Oh and use logarithmically-spaced frequency points to fit in log-log.

The algorithm I’m talking about comes from Wen & Liu 2016 (IRASA), I believe it is very similar to the one Biyu He uses, but with arbitrarily more resampling factors. The algorithm is very clever, and is described well by this figure:

File:Gao1.png

In essence, it assumes that the signal is a linear mixture of power law noise plus an oscillation. What the algorithm does is to compute various power spectra by “pretending” that the signal has a different sampling rate each time, effective “relocating” the oscillation peak to a different frequency. Because the underlying power law noise is, well, power law, resampling the signal does not affect its slope.

It does this several times with non-integer resampling factors and their inverse (and interpolates the signal at each rate using a cubic spline), which moves the peak above and below its original frequency by various amounts. Finally, it takes the median of all these estimates. Because the peaks show up at each frequency only once, the median provides an unbiased estimate of the underlying baseline activity.

As a concrete example, suppose your signal is sampled at 1000Hz and it has a 10Hz oscillation, that means it goes through 1 cycle in 100 samples. If you now pretend your signal was sampled at 500Hz instead, the sine wave still goes through 1 cycle in 100 samples, but those 100 samples now represent 200ms, instead of the 100ms, i.e. the sine wave is now 5Hz. If we then calculate the PSD, the peak would effectively be “moved” to 5Hz. Now use a different resampling factor, rinse and repeat. It uses non-integer resampling rates to avoid moving a signal to its harmonic. If you want more detail, read the paper, it’s fairly straightforward.

Now, the whole point of explaining how that works is to say this: it does not perform magic. It’s not somehow magically detecting the underlying power law fluctuation and separating it from the oscillations. All it does is to take the original shape of the PSD, compress/stretch it a bunch of times, and take the average of all those to find what should remain as the constant, i.e. the power law 1/f activity.

It’s a very clever algorithm, I like it a lot, but this is when it fails:

File:Gao2.png

Here’s my favorite signal, channel 1 of rat 1 in the CA1 LFP dataset. It’s immediately clear that the “fractal” estimate has things in it that are not suppose to be there. The theta peak in the original PSD was moved, as discussed, below and above its original 8Hz frequency. The problem here is that since the peak is so wide (its bandwidth, if you will, respective to its center frequency), resampling it at [1.1:0.05:1.9] times the original factor causes the relocated peaks to overlap with each other, thereby corrupting the median estimate. In other words, moving the peak at steps of 1.1, 1.15…times was not enough to get away from the original peak, and from the relocated peaks themselves.

File:Gao3.png

This is just an illustration of the intermediate process. Each color represents a resampled factor. Notice how the peaks overlap with each other.

Notice, however, that even though the peaks are messed up, the power law portion is faithfully reproduced, with the line noise removed. So in the case that the oscillation is far from the power law regime, then it is no problemo, although it calls into question why you’d fractalyze in the first place.

In this example, however, there is a gamma bump right in the power law region. You can see that it largely removes the peak, but leaves behind two small bumps surrounding it. This should get you a better estimate of the slope than had you not fractalyzed, but the point is to keep in mind that it is not perfect.

File:Gao4.png

This artifact, of course, is a function of the resampling factors we choose. Given the center frequency and bandwidth of a prominent oscillation, we need to resample it such that the relocated peaks are at least a significant distance away from the original, and from each other. Since we’re taking the median, they don't have to completely clear each other, but it needs to be far enough to not repeatedly bias the distribution.

The above examples were resampled with factors from 1.1 to 1.9 at 0.05 increments. To move the peak further, we can increase the maximum resampling factor by another 2 fold, to 3.9, which should theoretically make the overlapping peaks less prominent.

File:Gao5.png

Here is an illustration of that with another dataset, this time human ECoG. Blue and red were as before, and again, observe the peaks around 8Hz. Now, if we increase the maximum resampling factor to 3.9 (yellow), you can see that the peaks have moved further away, and with a lower amplitude, improving our previous problem. However, this comes at a cost of a changed slope estimate in the higher frequency. The reason for this is that, since we moved everything forward and back by a multiple of ~4, what was 200Hz is now 50Hz, which means that if you have two frequency regions with differing slopes, as we often see in ECoG, the slope of the higher frequency region will now bias the slope estimate for the intermediate region. This is especially bad if the data has been lowpass filtered, and you can already see the dropoff in the yellow trace which is suppose to happen at a much higher frequency.

So the takeaway here is that there are two failure modes and they interact with each other depending on how you want to counter one or the other: if your oscillation is too wide and if you have multiple power law regimes. I don’t know how to fix it, but it looks like using the fractalyzer in general gives a better (or at least smoother) slope estimate. Although, this particular algorithm is extremely slow as it actually performs interpolation of the signal in time domain before resampling, since it uses non-integer resampling factors.

One more thing, and this is actually Roemer Says, and I quote:

"Also, stupid I didn't think of this sooner, but they also sample the frequency axis in log-log space such that there is no sampling bias towards the higher frequencies when using robust fit, which is a face-palm no-brainer I haven't been doing, and I doubt others in the lab as well?) “

Meaning, when we are fitting 1/f (literally foofing), the x-axis should be sampled evenly in log, not in decimal. Example, if we’re fitting from 10 to 1000Hz at 1Hz resolution, there are only 90 points in the first decade, but 900 points in the second decade, which makes the regression fit favor the high frequency regime much more because there is more error there to minimize. On the one hand, the high frequency regions are usually more strictly power law anyway, so perhaps we do want to favor it. But on the other hand, the correct way to perform regression in the log-log domain is, as Roemer Says, to sample evenly in log.

And with that, Roemer Stops Saying.

Code can be downloaded here: Click on Download Bundle file://localhost/Users/Torben/Library/Application%20Support/Firefox/Profiles/1c6xtqiv.default/zotero/storage/RIHH6TCT/1.html

Python code here: https://github.com/voytekresearch/voytoys/blob/master/torben/fractal_fft.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly