A high-level API to handle the different types of data recorded with, and extracted from electrode grid recordings of wave-type weakly electric fish.
To install the package, clone the repository and simply install it with pip:
git clone https://github.com/weygoldt/gridtools.git
cd gridtools && pip install .gridtools can easily load wavetracker datasets including
position estimates, communication signals, and the raw recordings
into a single data model that verifies the dataset upon instantiation and
automatically loads everything that is presend in the directory. The data
model can be manipulated and saved to disk.
As an example loading a dataset, creating a subset, and saving it to disk is as easy as:
from gridtools.datasets import load, subset, save
data = load('path/to/dataset')
data = subset(data, 0, 1000) # get the first 1000 seconds
save(data, 'path/to/dataset_subset')Most of this functionality is also implemented as shell scipts and can be used from the command line. For example, to create a subset of a dataset, you can run:
subset-dataset -i /path/to/dataset -o /path/to/save --start 10 --end 20Because the load function automatically loads as much data as possible (excluding the raw data) and verifies the dataset, it is a good idea to check what is included in the dataset before loading it. This can be done with the pprint function. It will print a summary of the dataset.
data.pprint()To also load the raw data as a thunderfish.dataloader.DataLoader object, you can set the grid flag to True:
data = load('path/to/dataset', grid=True)The first step in electrode grid preprocessing is the wavetracker, that
performs ridge detection on sum spectrograms across all electrodes of the grid
to estimate the evolution of each fish's baseline EODf over time.
gridtools provides a high-level API to build preprocessing pipelines that
can be used to filter the data generated by the wavetracker. Common preprocessing
steps include:
gridtools.preprocessing.remove_unassigned_tracksto remove tracks that were not assigned to a fish identity by thewavetrackergridtools.preprocessing.remove_short_tracksto remove tracks that are shorter than a given thresholdgridtools.preprocessing.remove_low_power_tracksto remove tracks that have a low maximum power, i.e. fish that stayed close to the grid but not on the grid.gridtools.preprocessing.remove_poorly_rtacked_tracksto remove tracks that have a low tracking coverage.
Other functions that will be implemented are power-based position estimation of individuals via triangulation.
gridtools provides a toolbox to simulate grid recordings of
multiple communicating fish that move across an electrode grid
and can do this based on parameters that are estimated from real
recordings.
Under construction.
Contributions are welcome. To develop the package, clone the repository and
install it in editable mode. Dev dependencies are managed with poetry.
To install them, run:
poetry installBefore committing, please run the tests and the linter:
poetry run black gridtools
poetry run pylint gridtools
poetry run pytestOnly commit if all tests pass and the linter does not report any errors. To make this easier, you can install a pre-commit hook that runs the linter and the tests before every commit:
pre-commit install
pre-commit run --all-files-
For position-pdf use von Mises distribution instead of gaussian
-
For fakegrid module check out librosa and scaper, the latter specifically designed for soundscape mixing to traing DNNs
-
Migrate to datasets subpackage
-
Convert the simulations module into a subpackage and refactor
-
Write a converter that puts all wavetracker and other extracted data into a HDF5 container to clean datasets.
-
Build unit tests for
gridtools.datasets. -
Make a datavis subpackage hat provides functions to visualize the data
- Spec, track, pos terminal commands to quickly visualize datasets
- animation suite that gets a dataset and start and stop time to animate a full dataset
-
Make spectrogram decibel transorm cutoff dynamic for an optimal signal to noise ratio for each 10s window
-
Move simulation parameters to a config file
-
Fix the grid function once and for all
-
Fix wrong track times in subset function
-
Fix interpolating the parameter space so that the training dataset is uniformly distributed.
- Note: Became useless because I dont use the simulations anymore
-
Add augmentations to the chirps that are simulated from the interpolated parameterspace, i.e. noise, etc. No real chirp is a perfect gaussian, just the average of all chirps are gaussians.
- Note: Became useless because I work with real chirps now
-
Refactor gridtools datasets to new format without Nonetypes
-
Build training data generation pipeline for faster RCNN
-
Test the hybrid grid when data is available
-
Port the chirp annotation gui from the cnn-chirpdetector to gridtools and rewrite input data handling
-
Concerning bounding boxes: Work well for short chirps. I have the impression that either long chirps or chirps with a high curtosis result in too large boxes. I estimate the box width with the standard deviation, maybe a high curtosis results in a large standard deviation. But this does not explain why heigth is a problem as well. The kurtosis could also scale down the amplitude a bit ...
-
The frequency bbox padding needs to be a sum, not a factor, that is basically the freq bin multiplied by a factor to tune it: amp + (freq_bin * factor)
-
Make the extractor run on a folder of datasets instead of individual ones
-
Update the parameter estimation function of the extractor to accomodate the simpler model that is fit to the frequency traces
-
Rewrite hybrid grid function to choose windows where no chirps of the real fish are produced or at least take take snippets that happen during the day when the fish are inactive
-
Save the first and last spec image of a dataset in the same dimensions as the others but add zero padding.
- 2023-10-26: Finished datasets module and added documentation.