-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
that explicitly documents naming conventions, so that we can reference this elsewhere, as described in #524.
- Loading branch information
1 parent
1040499
commit 96095c3
Showing
2 changed files
with
116 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
(file-naming-conventions)= | ||
|
||
# File naming conventions | ||
|
||
This page documents naming conventions | ||
for data files consumed by `vak`: | ||
audio, annotation, and spectrogram files. | ||
Some of these files may in some cases | ||
be generated by `vak`, | ||
but they are different from other files | ||
in that they are required for any outputs, | ||
e.g., the files that represent training | ||
and test datasets and the files | ||
that represent parameters of trained | ||
neural network models. | ||
|
||
## Audio files | ||
|
||
We assume audio files are the raw data | ||
that all other derived data can be | ||
traced back to. | ||
For this reason, any filename | ||
that is valid on the system is valid for `vak`. | ||
|
||
(annotation-file-naming-convention)= | ||
## Annotation files | ||
|
||
There are two ways that annotation files | ||
can refer to the files they annotate | ||
(below, "annotated" files, | ||
either audio or spectrogram files). | ||
The first is when there is a one-to-one | ||
relationship; each annotated file | ||
has a corresponding annotation file. | ||
The second is when a single annotation | ||
file contains annotations for multiple | ||
annotated files. | ||
|
||
### One annotation file per annotated file | ||
When there is one annotation file | ||
per annotated file, | ||
we assume that | ||
all annotation files will contain | ||
the name of the audio file | ||
that they annotate. | ||
|
||
For example, if you have an audio file named | ||
"BB_SGP16-1___20160521_214723.wav", | ||
then the annotation file should | ||
be named "BB_SGP16-1___20160521_214723.wav.csv". | ||
|
||
This convention makes it possible | ||
to have other files with the .csv extension | ||
in the same directory, | ||
e.g., if you are also extracting | ||
features from each audio file | ||
and storing them in a .csv file. | ||
A file named "BB_SGP16-1___20160521_214723.wav.csv" | ||
can coexist with | ||
"BB_SGP16-1___20160521_214723.ftr.csv". | ||
|
||
This assumption will be relaxed | ||
after resolving [issue #563](https://github.com/vocalpy/vak/issues/563). | ||
It will then be possible | ||
to alternatively name annotation files | ||
with the "stem" of the annotated file, | ||
i.e., the part of the filename before | ||
the extension. | ||
For example, if you have an audio file named | ||
"BB_SGP16-1___20160521_214723.wav", | ||
then the annotation file could | ||
be named "BB_SGP16-1___20160521_214723.csv". | ||
|
||
If you are unable to follow | ||
the current naming convention, | ||
a workaround is to convert your | ||
annotations to a single file | ||
that points to multiple annotated files | ||
using the | ||
[`'generic-seq'` format](https://crowsetta.readthedocs.io/en/latest/formats/seq/generic-seq.html#generic-seq) | ||
built into our tool for working with annotations | ||
[`crowsetta`](https://crowsetta.readthedocs.io/en/latest/), | ||
as described in the section | ||
{ref}`howto-user-annot-format-method-2` | ||
on the page | ||
{ref}`howto-user-annot`. | ||
This is one case | ||
of "single annotation file, multiple annotated files", | ||
which is not restricted by a file-naming convention, | ||
as described in the next section. | ||
|
||
### One annotation file, multiple annotated files | ||
|
||
When a single annotation file contains | ||
annotations for multiple files, | ||
there are no restrictions on the naming | ||
of the annotation file. | ||
This is because the annotation file itself | ||
must contain the name of each file that it | ||
annotates. | ||
An example of this format is the | ||
`'generic-seq` format used by `crowsetta`. | ||
|
||
(spectrogram-file-naming-convention)= | ||
## Spectrogram file naming convention | ||
|
||
As described in {ref}`spect-file-format`, | ||
the name of each spectrogram file should be the same | ||
as the name of the audio file it was created from, | ||
with the extension of the spectrogram file format added. | ||
|
||
For example, if you have an audio file named | ||
"BB_SGP16-1___20160521_214723.wav", | ||
then the annotation file should | ||
be named "BB_SGP16-1___20160521_214723.wav.npz". |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,5 +8,6 @@ | |
cli | ||
config | ||
filenames | ||
spect_file_format | ||
``` |