Skip to content

Commit

Permalink
Add doc/reference/filenames.md
Browse files Browse the repository at this point in the history
that explicitly documents naming conventions,
so that we can reference this elsewhere,
as described in #524.
  • Loading branch information
NickleDave committed Aug 17, 2022
1 parent 1040499 commit 96095c3
Show file tree
Hide file tree
Showing 2 changed files with 116 additions and 0 deletions.
115 changes: 115 additions & 0 deletions doc/reference/filenames.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
(file-naming-conventions)=

# File naming conventions

This page documents naming conventions
for data files consumed by `vak`:
audio, annotation, and spectrogram files.
Some of these files may in some cases
be generated by `vak`,
but they are different from other files
in that they are required for any outputs,
e.g., the files that represent training
and test datasets and the files
that represent parameters of trained
neural network models.

## Audio files

We assume audio files are the raw data
that all other derived data can be
traced back to.
For this reason, any filename
that is valid on the system is valid for `vak`.

(annotation-file-naming-convention)=
## Annotation files

There are two ways that annotation files
can refer to the files they annotate
(below, "annotated" files,
either audio or spectrogram files).
The first is when there is a one-to-one
relationship; each annotated file
has a corresponding annotation file.
The second is when a single annotation
file contains annotations for multiple
annotated files.

### One annotation file per annotated file
When there is one annotation file
per annotated file,
we assume that
all annotation files will contain
the name of the audio file
that they annotate.

For example, if you have an audio file named
"BB_SGP16-1___20160521_214723.wav",
then the annotation file should
be named "BB_SGP16-1___20160521_214723.wav.csv".

This convention makes it possible
to have other files with the .csv extension
in the same directory,
e.g., if you are also extracting
features from each audio file
and storing them in a .csv file.
A file named "BB_SGP16-1___20160521_214723.wav.csv"
can coexist with
"BB_SGP16-1___20160521_214723.ftr.csv".

This assumption will be relaxed
after resolving [issue #563](https://github.com/vocalpy/vak/issues/563).
It will then be possible
to alternatively name annotation files
with the "stem" of the annotated file,
i.e., the part of the filename before
the extension.
For example, if you have an audio file named
"BB_SGP16-1___20160521_214723.wav",
then the annotation file could
be named "BB_SGP16-1___20160521_214723.csv".

If you are unable to follow
the current naming convention,
a workaround is to convert your
annotations to a single file
that points to multiple annotated files
using the
[`'generic-seq'` format](https://crowsetta.readthedocs.io/en/latest/formats/seq/generic-seq.html#generic-seq)
built into our tool for working with annotations
[`crowsetta`](https://crowsetta.readthedocs.io/en/latest/),
as described in the section
{ref}`howto-user-annot-format-method-2`
on the page
{ref}`howto-user-annot`.
This is one case
of "single annotation file, multiple annotated files",
which is not restricted by a file-naming convention,
as described in the next section.

### One annotation file, multiple annotated files

When a single annotation file contains
annotations for multiple files,
there are no restrictions on the naming
of the annotation file.
This is because the annotation file itself
must contain the name of each file that it
annotates.
An example of this format is the
`'generic-seq` format used by `crowsetta`.

(spectrogram-file-naming-convention)=
## Spectrogram file naming convention

As described in {ref}`spect-file-format`,
the name of each spectrogram file should be the same
as the name of the audio file it was created from,
with the extension of the spectrogram file format added.

For example, if you have an audio file named
"BB_SGP16-1___20160521_214723.wav",
then the annotation file should
be named "BB_SGP16-1___20160521_214723.wav.npz".
1 change: 1 addition & 0 deletions doc/reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,6 @@
cli
config
filenames
spect_file_format
```

0 comments on commit 96095c3

Please sign in to comment.