MRG: #216 from vocalpy/add-revise-vignettes

Add / revise vignettes
vocalpy · Jan 28, 2023 · f36a493 · f36a493
2 parents 9d435f6 + bdc578b
commit f36a493
Show file tree

Hide file tree

Showing 14 changed files with 1,506 additions and 124 deletions.
diff --git a/doc/conf.py b/doc/conf.py
@@ -62,8 +62,7 @@
 # The suffix(es) of source filenames.
 # You can specify multiple suffix as a list of string:
 #
-# source_suffix = ['.rst', '.md']
-source_suffix = '.rst'
+source_suffix = ['.rst', '.md']
 
 # The master toctree document.
 master_doc = 'index'
@@ -84,7 +83,7 @@
 pygments_style = None
 
 myst_enable_extensions = [
-    # "dollarmath",
+    "dollarmath",
     # "amsmath",
     # "deflist",
     # "html_admonition",
@@ -111,6 +110,7 @@
 #
 html_theme_options = {
     "logo_only": True,
+    "show_toc_level": 1,
 }
 
 # Add any paths that contain custom static files (such as style sheets) here,
@@ -217,6 +217,20 @@
     "pandera": ("https://pandera.readthedocs.io/en/stable/", None)
 }
 
+# -- Options for nitpicky mode
+
+# ensure that all references in the docs resolve.
+nitpicky = True
+nitpick_ignore = []
+
+for line in open('nitpick-ignore.txt'):
+    if line.strip() == "" or line.startswith("#"):
+        continue
+    dtype, target = line.split(None, 1)
+    target = target.strip()
+    nitpick_ignore.append((dtype, target))
+
+
 # -- Options for todo extension ----------------------------------------------
 
 # If true, `todo` and `todoList` produce output, else they produce nothing.

diff --git a/doc/data/giraudon-et-al-2021/.gitkeep b/doc/data/giraudon-et-al-2021/.gitkeep
diff --git a/doc/howto.md b/doc/howto.md
@@ -9,4 +9,6 @@ This section shows you how to use crowsetta for specific tasks.
 
 howto/howto-user-format
 howto/convert-generic-seq
+howto/convert-simple-seq
+howto/remove-silent-labels-textgrid
 ```
diff --git a/doc/howto/convert-generic-seq.md b/doc/howto/convert-generic-seq.md
@@ -4,26 +4,48 @@ jupytext:
     extension: .md
     format_name: myst
     format_version: 0.13
-    jupytext_version: 1.13.8
+    jupytext_version: 1.14.4
 kernelspec:
   display_name: Python 3 (ipykernel)
   language: python
   name: python3
+execution:
+  timeout: 120
 ---
 
 (howto-convert-to-generic-seq)=
 # How to convert any sequence-like format to `'generic-seq'`
 
-The `'generic-seq'` format is 
-meant to be a generic sequence-like format
-(as suggested by its name) 
-that all other formats can be converted to.
-As explained on its  
-{ref}`documentation <generic-seq>` page,
-a set of `generic-seq` annotations is 
-literally a set of `crowsetta.Annotation` instances 
-where each `Annotation` has a `Sequence`.
-
+A goal of crowsetta is to make it easier to share annotations 
+for a dataset of animal vocalizations or other bioacoustics data.
+One way to achieve this is to 
+convert the annotations to a single flat csv file,
+which is easy to share and work with, 
+e.g., using the [pandas](https://pandas.pydata.org/) library.
+For {ref}`sequence-like <formats-seq-like>` annotations, 
+this can be done by converting them to the `'generic-seq'` format.
+
+This how-to walks you through converting 
+annotations to the `'generic-seq'` format and 
+then saving those annotations as a csv file.
+As suggested by its name,
+it is meant to be a generic sequence-like format
+that all other sequence-like formats can be converted to.
+
+## Workflow
+
+Here's the general workflow. We'll see a few different ways to achieve it below.
+1. Load annotations in your format
+2. Convert those to {class}`crowsetta.Annotation` instances
+3. Make a {class}`crowsetta.formats.seq.GenericSeq <crowsetta.formats.seq.generic.GenericSeq>` 
+   from those `Annotation`s.
+4. Save to a csv file using the 
+   {meth}`crowsetta.formats.seq.generic.GenericSeq.to_file  <crowsetta.formats.seq.generic.GenericSeq.to_file>` 
+   method
+
+This works because `crowsetta` represents a set of annotations in `generic-seq` format 
+as a list of {class}`crowsetta.Annotation` instances 
+where each `Annotation` has a {class}`crowsetta.Sequence`.
 Since all sequence-like formats have a `to_annot` 
 method, they can all be converted to `'generic-seq'`.
 In turn, this means that any sequence-like format 
@@ -32,52 +54,48 @@ by creating a `'generic-seq'` instance with the
 `Annotations` produced by calling `to_annot` 
 and then calling the `to_file` method of 
 the `'generic-seq'` instance.
-Saving annotations to a single flat .csv file 
-may make it easier to share and 
-work with them  
-(e.g., using the {ref}`pandas <https://pandas.pydata.org/>` library).
-
-## Converting a sequence-like format with multiple annotations per file
-
-Some formats contain multiple annotations per file, 
-and the `to_annot` method of the corresponding class 
-will return multiple `crowsetta.Annotation` instances. 
-To convert this format to `'generic-seq'`, 
-just pass in those `Annotation`s when 
-creating an instance of `'generic-seq'`
-
-```{code-cell} ipython3
-import crowsetta
-
-example = crowsetta.data.get('birdsong-recognition-dataset')
-birdsongrec = crowsetta.formats.seq.BirdsongRec.from_file(example.annot_path)
-annots = birdsongrec.to_annot()
-print(
-    f"Number of annotation instances in example 'birdsong-recognition-dataset' file: {len(annots)}"
-) 
-
-# pass in annots when creating generic-seq instance
-generic = crowsetta.formats.seq.GenericSeq(annots=annots)
-print(
-    f"Converted to 'generic-seq':\n{generic}"
-)
-```
 
 ## Converting a sequence-like format with a single annotation file per annotated file
 
-When the convention for a format 
-is to have a one-to-one mapping 
-from annotated file to annotation file, 
-and we want to put multiple such annotations 
-into a single generic sequence file,
-we need to go through an additional step.
-That step consists of collecting all the annotations into a list.
-
-For this example, 
-we use the same dataset we used in the {ref}`tutorial`, 
+The first example we show is for possibly the most common case, 
+where each annotated file has a single annotation file.
+This is likely to be the case if you are using apps like Praat or Audacity.
+An example of such a format is the Audacity 
+[standard label track format](https://manual.audacityteam.org/man/importing_and_exporting_labels.html#Standard_.28default.29_format), 
+exported to .txt files, that you would get if you were to annotate with  
+[region labels](https://manual.audacityteam.org/man/label_tracks.html#type).
+This format is represented by the 
+{class}`crowsetta.formats.seq.AudTxt <crowsetta.formats.seq.audtxt.AudTxt>` 
+class in crowsetta.
+
+As described above,
+all you need to do is load your sequence-like annotations 
+with crowsetta, 
+and then call the `to_annot` method 
+to convert them to a {class}`crowsetta.Annotation` instance.
+When working with a format 
+where there's one annotation file per annotated file, 
+this *does* mean you need to load **each** file 
+and convert it into a separate annotation instance.
+(Below we'll see an example of a format 
+where annotations for multiple files 
+are contained in a single annotation file, 
+and so we only need to call `to_annot` once 
+after loading it to get a list of 
+{class}`crowsetta.Annotation`s.)
+For this first example, 
+where we have multiple annotation files, 
+we use a loop to load each one and convert it to a 
+{class}`crowsetta.Annotation` instance.
+
+We use the same dataset we used in the {ref}`tutorial` for this example, 
 ["Labeled songs of domestic canary M1-2016-spring (Serinus canaria)"](https://zenodo.org/record/6521932)
 by Giraudon et al., 2021, 
-annotated with {ref}`Audacity Labeltrack {aud-txt}` files.
+annotated with {ref}`Audacity Labeltrack <aud-txt>` files.
+
+```{code-cell} ipython3
+cd ..
+```
 
 ```{code-cell} ipython3
 !curl --no-progress-meter -L 'https://zenodo.org/record/6521932/files/M1-2016-spring_audacity_annotations.zip?download=1' -o './data/M1-2016-spring_audacity_annotations.zip'
@@ -88,6 +106,8 @@ import shutil
 shutil.unpack_archive('./data/M1-2016-spring_audacity_annotations.zip', './data/')
 ```
 
+#TODO: show with scribe and then with class, explain difference
+
 ```{code-cell} ipython3
 import pathlib
 import crowsetta
@@ -97,16 +117,64 @@ audtxt_paths = sorted(pathlib.Path('./data/audacity-annotations').glob('*.txt'))
 annots = []
 for audtxt_path in audtxt_paths:
     annots.append(
-        crowsetta.formats.seq.AudTxt.from_file(audtxt_path).to_annot
+        crowsetta.formats.seq.AudTxt.from_file(audtxt_path).to_annot()
     )
 
 print(
-    f"Number of annotation instances from Giraudon et al. 2021: {len(annots}"
+    f"Number of annotation instances from dataset: {len(annots)}"
 ) 
+```
 
+```{code-cell} ipython3
 # pass in annots when creating generic-seq instance
 generic = crowsetta.formats.seq.GenericSeq(annots=annots)
+print("Created 'generic-seq' from annotations")
+df = generic.to_df()
+print("First five rows of annotations (converted to pandas.DataFrame)")
+df.head()
+```
+
+```{code-cell} ipython3
+print("Last five rows of annotations (converted to pandas.DataFrame)")
+df.tail()
+```
+
+## Converting a sequence-like format with multiple annotations per file
+
+Some formats contain multiple annotations per file, 
+and the `to_annot` method of the corresponding class 
+will return multiple `crowsetta.Annotation` instances. 
+To convert this format to `'generic-seq'`, 
+just pass in those `Annotation`s when 
+creating an instance of `'generic-seq'`
+
+```{code-cell} ipython3
+:tags: [hide-cell]
+
+import crowsetta
+
+crowsetta.data.extract_data_files()
+```
+
+```{code-cell} ipython3
+import crowsetta
+
+example = crowsetta.data.get('birdsong-recognition-dataset')
+birdsongrec = crowsetta.formats.seq.BirdsongRec.from_file(example.annot_path)
+annots = birdsongrec.to_annot()
 print(
-    f"Converted to 'generic-seq':\n{generic}"
-)
+    f"Number of annotation instances in example 'birdsong-recognition-dataset' file: {len(annots)}"
+) 
+
+# pass in annots when creating generic-seq instance
+generic = crowsetta.formats.seq.GenericSeq(annots=annots)
+print("Created 'generic-seq' from annotations")
+df = generic.to_df()
+print("First five rows of annotations (converted to pandas.DataFrame)")
+df.head()
+```
+
+```{code-cell} ipython3
+print("Last five rows of annotations (converted to pandas.DataFrame)")
+df.tail()
 ```