Skip to content

Commit

Permalink
Add section about related work in joss paper
Browse files Browse the repository at this point in the history
  • Loading branch information
aahlenst authored and ynop committed Jul 1, 2020
1 parent c4340c1 commit 31203e4
Show file tree
Hide file tree
Showing 2 changed files with 63 additions and 1 deletion.
48 changes: 48 additions & 0 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,51 @@ @misc{ardila2019common
archivePrefix={arXiv},
primaryClass={cs.CL}
}

@misc{torchaudio,
title = {TORCHAUDIO},
howpublished = {\url{https://pytorch.org/audio/}},
note = {Accessed: 2020-05-25},
year = {2020}
}

@misc{dataloaders,
title = {dataloaders},
howpublished = {\url{https://github.com/juliagusak/dataloaders}},
note = {Accessed: 2020-05-25},
year = {2020}
}

@misc{audiodatasets,
title = {Audio Datasets},
howpublished = {\url{https://github.com/mcfletch/audiodatasets}},
note = {Accessed: 2020-05-25},
year = {2020}
}

@misc{mirdata,
title = {mirdata},
howpublished = {\url{https://github.com/mir-dataset-loaders/mirdata}},
note = {Accessed: 2020-05-25},
year = {2020}
}

@misc{speechcorpusdownloader,
title = {Speech Corpus Downloader},
howpublished = {\url{https://github.com/mdangschat/speech-corpus-dl}},
note = {Accessed: 2020-05-25},
year = {2020}
}

@inproceedings {tensorflow,
author = {Mart{\'\i}n Abadi and Paul Barham and Jianmin Chen and Zhifeng Chen and Andy Davis and Jeffrey Dean and Matthieu Devin and Sanjay Ghemawat and Geoffrey Irving and Michael Isard and Manjunath Kudlur and Josh Levenberg and Rajat Monga and Sherry Moore and Derek G. Murray and Benoit Steiner and Paul Tucker and Vijay Vasudevan and Pete Warden and Martin Wicke and Yuan Yu and Xiaoqiang Zheng},
title = {TensorFlow: A System for Large-Scale Machine Learning},
booktitle = {12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16)},
year = {2016},
isbn = {978-1-931971-33-1},
address = {Savannah, GA},
pages = {265--283},
url = {https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi},
publisher = {{USENIX} Association},
month = nov,
}
16 changes: 15 additions & 1 deletion paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ Assume that the task is to train a neural network to detect segments in audio st
MUSAN [@musan2015] and GTZAN [@GTZAN] are two suitable datasets for this task because they provide a wide selection of music, speech, and noise samples.
In the example below, we first download MUSAN and GTZAN to the local disk before creating `Loader` instances for each format that allow Audiomate to access both datasets using a unified interface. Then, we instruct Audiomate to merge both datasets.
Afterwards, we use a `Splitter` to partition the merged dataset into a train and test set.
By merely creating views, Audiomate avoids creating unnecessary disk I/O and is therefore ideally suited to work with large datasets in the range of tens or hundreds of gigabytes.
By merely creating views, Audiomate avoids creating unnecessary disk I/O and is therefore ideally suited to work with large datasets in the range of tens or hundreds of gigabytes.
Ultimately, we load the samples and labels by iterating over all utterances.
Alternatively, it is possible to load the samples in batches, which is ideal for feeding them to a deep learning toolkit like PyTorch.

Expand Down Expand Up @@ -129,4 +129,18 @@ Usually, `Reader` and `Downloader` are implemented for datasets, while `Writer`

Audiomate supports more than a dozen datasets and half as many toolkits.

# Related Work

A variety of frameworks and tools offer functionality similar to Audiomate.

**Data loaders** Data loaders are libraries that focus on downloading and preprocessing data sets to make them easily accessible without requiring a specific tool or framework.
In contrast to Audiomate, they cannot convert between formats, split or merge data sets.

This comment has been minimized.

Copy link
@faroit

faroit Jul 1, 2020

Contributor

This comment has been minimized.

Copy link
@aahlenst

aahlenst Jul 1, 2020

Author Collaborator

This section does not mention PyTorch. The lower one does, and explicitly calls out PyTorch's ability to do so.

This comment has been minimized.

Copy link
@faroit

faroit Jul 1, 2020

Contributor

okay 👍

Examples of libraries in that category are [@mirdata], [@speechcorpusdownloader], and [@audiodatasets].
Furthermore, some of these libraries focus on a particular kind of data, such as music, and do not assist with speech data sets.

This comment has been minimized.

Copy link
@faroit

faroit Jul 1, 2020

Contributor

that would be a somewhat weak argument until #110 is addressed

This comment has been minimized.

Copy link
@aahlenst

aahlenst Jul 1, 2020

Author Collaborator

Of the currently supported data sets, GTZAN, MUSAN both include music (amongst other things).

This comment has been minimized.

Copy link
@aahlenst

aahlenst Jul 1, 2020

Author Collaborator

urbansound8k and LITIS Rouen Audio provide noise and ambient sound samples from cities, are both supported by Audiomate and not by the others.

This comment has been minimized.

Copy link
@ynop

ynop Jul 1, 2020

Owner

But in contrast to the tools mentioned, we would support music based datasets.
And we also have different types of datasets (speech, acoustic scenes, classes (music/speech)).

This comment has been minimized.

Copy link
@faroit

faroit Jul 1, 2020

Contributor

Okay 👍 didn't see that


**Tools for specific frameworks** Various machine learning tools and deep learning frameworks include the necessary infrastructure to make various datasets readily available to their users.
One notable example is TensorFlow [@tensorflow], which includes data loaders for different kinds of data, including image, speech, and music data sets, such as [@ardila2019common].
Another one is torchaudio [@torchaudio] for PyTorch, which not only offers data loaders but is also capable of converting between various formats.
In contrast to Audiomate, those tools or libraries support a specific machine learning or deep learning framework (TensorFlow or PyTorch, respectively), whereas Audiomate is framework agnostic.

This comment has been minimized.

Copy link
@faroit

faroit Jul 1, 2020

Contributor

but audiomate does focus on numpy . So I would add and cite this information numpy here (if not done before)

This comment has been minimized.

Copy link
@ynop

ynop Jul 1, 2020

Owner

I wouldn't primarily classify numpy as machine learning/deep learning framework

This comment has been minimized.

Copy link
@faroit

faroit Jul 1, 2020

Contributor

yes, okay. Maybe then just reference numpy somewhere earlier? I think this is not clear for non-audio people that numpy arrays are the common representation for audio


# References

0 comments on commit 31203e4

Please sign in to comment.