Skip to content

weygoldt/gridtools

Repository files navigation

Gridtools

A high-level API to handle the different types of data recorded with, and extracted from electrode grid recordings of wave-type weakly electric fish.

Installation

To install the package, clone the repository and simply install it with pip:

git clone https://github.com/weygoldt/gridtools.git
cd gridtools && pip install .

Usage

Loading, manipulating and saving data

gridtools can easily load wavetracker datasets including position estimates, communication signals, and the raw recordings into a single data model that verifies the dataset upon instantiation and automatically loads everything that is presend in the directory. The data model can be manipulated and saved to disk.

As an example loading a dataset, creating a subset, and saving it to disk is as easy as:

from gridtools.datasets import load, subset, save
data = load('path/to/dataset')
data = subset(data, 0, 1000) # get the first 1000 seconds
save(data, 'path/to/dataset_subset')

Most of this functionality is also implemented as shell scipts and can be used from the command line. For example, to create a subset of a dataset, you can run:

subset-dataset -i /path/to/dataset -o /path/to/save --start 10 --end 20

Because the load function automatically loads as much data as possible (excluding the raw data) and verifies the dataset, it is a good idea to check what is included in the dataset before loading it. This can be done with the pprint function. It will print a summary of the dataset.

data.pprint()

To also load the raw data as a thunderfish.dataloader.DataLoader object, you can set the grid flag to True:

data = load('path/to/dataset', grid=True)

Building preprocessing pipelines

The first step in electrode grid preprocessing is the wavetracker, that performs ridge detection on sum spectrograms across all electrodes of the grid to estimate the evolution of each fish's baseline EODf over time.

gridtools provides a high-level API to build preprocessing pipelines that can be used to filter the data generated by the wavetracker. Common preprocessing steps include:

  • gridtools.preprocessing.remove_unassigned_tracks to remove tracks that were not assigned to a fish identity by the wavetracker
  • gridtools.preprocessing.remove_short_tracks to remove tracks that are shorter than a given threshold
  • gridtools.preprocessing.remove_low_power_tracks to remove tracks that have a low maximum power, i.e. fish that stayed close to the grid but not on the grid.
  • gridtools.preprocessing.remove_poorly_rtacked_tracks to remove tracks that have a low tracking coverage.

Other functions that will be implemented are power-based position estimation of individuals via triangulation.

Simulations

gridtools provides a toolbox to simulate grid recordings of multiple communicating fish that move across an electrode grid and can do this based on parameters that are estimated from real recordings.

Under construction.

Contributing

Contributions are welcome. To develop the package, clone the repository and install it in editable mode. Dev dependencies are managed with poetry. To install them, run:

poetry install

Before committing, please run the tests and the linter:

poetry run black gridtools
poetry run pylint gridtools
poetry run pytest

Only commit if all tests pass and the linter does not report any errors. To make this easier, you can install a pre-commit hook that runs the linter and the tests before every commit:

pre-commit install
pre-commit run --all-files

To do

  • For position-pdf use von Mises distribution instead of gaussian

  • For fakegrid module check out librosa and scaper, the latter specifically designed for soundscape mixing to traing DNNs

  • Migrate to datasets subpackage

  • Convert the simulations module into a subpackage and refactor

  • Write a converter that puts all wavetracker and other extracted data into a HDF5 container to clean datasets.

  • Build unit tests for gridtools.datasets.

  • Make a datavis subpackage hat provides functions to visualize the data

    • Spec, track, pos terminal commands to quickly visualize datasets
    • animation suite that gets a dataset and start and stop time to animate a full dataset
  • Make spectrogram decibel transorm cutoff dynamic for an optimal signal to noise ratio for each 10s window

  • Move simulation parameters to a config file

  • Fix the grid function once and for all

  • Fix wrong track times in subset function

  • Fix interpolating the parameter space so that the training dataset is uniformly distributed.

    • Note: Became useless because I dont use the simulations anymore
  • Add augmentations to the chirps that are simulated from the interpolated parameterspace, i.e. noise, etc. No real chirp is a perfect gaussian, just the average of all chirps are gaussians.

    • Note: Became useless because I work with real chirps now
  • Refactor gridtools datasets to new format without Nonetypes

  • Build training data generation pipeline for faster RCNN

  • Test the hybrid grid when data is available

  • Port the chirp annotation gui from the cnn-chirpdetector to gridtools and rewrite input data handling

  • Concerning bounding boxes: Work well for short chirps. I have the impression that either long chirps or chirps with a high curtosis result in too large boxes. I estimate the box width with the standard deviation, maybe a high curtosis results in a large standard deviation. But this does not explain why heigth is a problem as well. The kurtosis could also scale down the amplitude a bit ...

  • The frequency bbox padding needs to be a sum, not a factor, that is basically the freq bin multiplied by a factor to tune it: amp + (freq_bin * factor)

  • Make the extractor run on a folder of datasets instead of individual ones

  • Update the parameter estimation function of the extractor to accomodate the simpler model that is fit to the frequency traces

  • Rewrite hybrid grid function to choose windows where no chirps of the real fish are produced or at least take take snippets that happen during the day when the fish are inactive

  • Save the first and last spec image of a dataset in the same dimensions as the others but add zero padding.

Project log

  • 2023-10-26: Finished datasets module and added documentation.

About

Overhaul electrode grid preprocessing routine

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published