Add doc/reference/filenames.md

that explicitly documents naming conventions, so that we can reference this elsewhere, as described in #524.
vocalpy · Aug 17, 2022 · 96095c3 · 96095c3
1 parent 1040499
commit 96095c3
Show file tree

Hide file tree

Showing 2 changed files with 116 additions and 0 deletions.
diff --git a/doc/reference/filenames.md b/doc/reference/filenames.md
@@ -0,0 +1,115 @@
+(file-naming-conventions)=
+
+# File naming conventions
+
+This page documents naming conventions 
+for data files consumed by `vak`: 
+audio, annotation, and spectrogram files.
+Some of these files may in some cases 
+be generated by `vak`, 
+but they are different from other files 
+in that they are required for any outputs, 
+e.g., the files that represent training 
+and test datasets and the files 
+that represent parameters of trained 
+neural network models.
+
+## Audio files
+
+We assume audio files are the raw data 
+that all other derived data can be 
+traced back to.
+For this reason, any filename 
+that is valid on the system is valid for `vak`. 
+
+(annotation-file-naming-convention)=
+## Annotation files
+
+There are two ways that annotation files 
+can refer to the files they annotate 
+(below, "annotated" files, 
+either audio or spectrogram files).
+The first is when there is a one-to-one 
+relationship; each annotated file 
+has a corresponding annotation file.
+The second is when a single annotation 
+file contains annotations for multiple 
+annotated files.
+
+### One annotation file per annotated file
+When there is one annotation file 
+per annotated file,
+we assume that 
+all annotation files will contain 
+the name of the audio file 
+that they annotate.
+
+For example, if you have an audio file named 
+"BB_SGP16-1___20160521_214723.wav", 
+then the annotation file should 
+be named "BB_SGP16-1___20160521_214723.wav.csv".
+
+This convention makes it possible 
+to have other files with the .csv extension 
+in the same directory, 
+e.g., if you are also extracting 
+features from each audio file 
+and storing them in a .csv file.
+A file named "BB_SGP16-1___20160521_214723.wav.csv" 
+can coexist with 
+"BB_SGP16-1___20160521_214723.ftr.csv".
+
+This assumption will be relaxed 
+after resolving [issue #563](https://github.com/vocalpy/vak/issues/563). 
+It will then be possible 
+to alternatively name annotation files 
+with the "stem" of the annotated file, 
+i.e., the part of the filename before 
+the extension.
+For example, if you have an audio file named 
+"BB_SGP16-1___20160521_214723.wav", 
+then the annotation file could 
+be named "BB_SGP16-1___20160521_214723.csv".
+
+If you are unable to follow 
+the current naming convention, 
+a workaround is to convert your 
+annotations to a single file 
+that points to multiple annotated files
+using the 
+[`'generic-seq'` format](https://crowsetta.readthedocs.io/en/latest/formats/seq/generic-seq.html#generic-seq) 
+built into our tool for working with annotations
+[`crowsetta`](https://crowsetta.readthedocs.io/en/latest/),
+as described in the section 
+{ref}`howto-user-annot-format-method-2`
+on the page 
+{ref}`howto-user-annot`.
+This is one case 
+of "single annotation file, multiple annotated files", 
+which is not restricted by a file-naming convention, 
+as described in the next section.
+
+### One annotation file, multiple annotated files
+
+When a single annotation file contains 
+annotations for multiple files, 
+there are no restrictions on the naming 
+of the annotation file. 
+This is because the annotation file itself 
+must contain the name of each file that it 
+annotates.
+An example of this format is the 
+`'generic-seq` format used by `crowsetta`.
+
+(spectrogram-file-naming-convention)=
+## Spectrogram file naming convention
+
+As described in {ref}`spect-file-format`, 
+the name of each spectrogram file should be the same 
+as the name of the audio file it was created from, 
+with the extension of the spectrogram file format added.
+
+For example, if you have an audio file named 
+"BB_SGP16-1___20160521_214723.wav", 
+then the annotation file should 
+be named "BB_SGP16-1___20160521_214723.wav.npz".
diff --git a/doc/reference/index.md b/doc/reference/index.md
@@ -8,5 +8,6 @@
 
 cli
 config
+filenames
 spect_file_format
 ```