Skip to content

Commit

Permalink
Merge de02c2a into b1330dd
Browse files Browse the repository at this point in the history
  • Loading branch information
timmahrt committed Jan 8, 2023
2 parents b1330dd + de02c2a commit 449de6e
Show file tree
Hide file tree
Showing 6 changed files with 266 additions and 22 deletions.
71 changes: 66 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,16 @@ of speech. [Praat can be downloaded here](<http://www.fon.hum.uva.nl/praat/>)
# Table of contents
1. [Documentation](#documentation)
2. [Tutorials](#tutorials)
3. [Version History](#version-history)
3. [Version history](#version-history)
4. [Requirements](#requirements)
5. [Installation](#installation)
6. [Upgrading major versions](#upgrading)
7. [Usage](#usage)
8. [Common Use Cases](#common-use-cases)
9. [Tests](#tests)
10. [Citing praatIO](#citing-praatio)
11. [Acknowledgements](#acknowledgements)
8. [Common use cases](#common-use-cases)
9. [Output types](#output-types)
10. [Tests](#tests)
11. [Citing praatIO](#citing-praatio)
12. [Acknowledgements](#acknowledgements)

## Documentation

Expand Down Expand Up @@ -143,6 +144,66 @@ What can you do with this library?
- `alignBoundariesAcrossTiers()`: for handmade textgrids, sometimes entries may look as if they are aligned at the same time but actually are off by a small amount, this will correct them


## Output types

PraatIO supports 4 textgrid output file types: short textgrid, long textgrid, json, and textgrid-like json.

Short textgrids and long textgrids are both formats that are natively supported by praat.
Short textgrids are meant to be more concise while long textgrids are meant to be more human-readable.
For more information on these file formats, please see [praat's official documentation](https://www.fon.hum.uva.nl/praat/manual/TextGrid_file_formats.html)

JSON and textgrid-like JSON are more developer-friendly formats, but they are not supported by praat.
The default JSON format is more minimal while the textgrid-like JSON is formatted with information similar to a textgrid file.

The default JSON format does not support one use-case: a textgrid has a specified minimum and maximum timestamp.
The textgrid's tiers also have a specified minimum and maximum timestamp.
Under most circumstances, they are the same, but the user can specify them to be different and praat will respect this.
If you have such textgrids, you should use the textgrid-like JSON.

Here is the schema for the JSON output file:
```
{
"start": 0.0,
"end": 1.8,
"tiers": {
"phone": {
"type": "IntervalTier",
"entries": [[0.0, 0.3, ""], [0.3, 0.38, "m"]]
},
"pitch": {
"type": "TextTier",
"entries": [[0.32, "120"], [0.37, "85"]]
}
}
}
```

Here is the schema for the Textgrid-like JSON output file.
Notably, `tiers` is a list of hashes, rather than a hash of hashes.
Also, each tier specifies it's name, and a min and max time.
```
{
"xmin": 0.0,
"xmax": 1.8,
"tiers": [
{
"class": "IntervalTier",
"name": "phone",
"xmin": 0.0,
"xmax": 1.8,
"entries": [[0.0, 0.3, ""], [0.3, 0.38, "m"]]
},
{
"class": "TextTier",
"name": "pitch",
"xmin": 0.0,
"xmax": 1.8,
"entries": [[0.32, "120"], [0.37, "85"]]
}
]
}
```

## Tests

I run tests with the following command (this requires pytest and pytest-cov to be installed):
Expand Down
12 changes: 7 additions & 5 deletions praatio/data_classes/textgrid.py
Original file line number Diff line number Diff line change
Expand Up @@ -429,7 +429,7 @@ def new(self) -> "Textgrid":
def save(
self,
fn: str,
format: Literal["short_textgrid", "long_textgrid", "json"],
format: Literal["short_textgrid", "long_textgrid", "json", "textgrid_json"],
includeBlankSpaces: bool,
minTimestamp: Optional[float] = None,
maxTimestamp: Optional[float] = None,
Expand All @@ -440,7 +440,10 @@ def save(
Args:
fn: the fullpath filename of the output
format: one of ['short_textgrid', 'long_textgrid', 'json']
format: one of ['short_textgrid', 'long_textgrid', 'json', 'textgrid_json']
'short_textgrid' and 'long_textgrid' are both used by praat
'json' and 'textgrid_json' are two json variants. 'json' cannot represent
tiers with different min and max timestamps than the textgrid.
includeBlankSpaces: if True, blank sections in interval
tiers will be filled in with an empty interval
(with a label of ""). If you are unsure, True is recommended
Expand Down Expand Up @@ -474,6 +477,7 @@ def save(
self.validate(reportingMode)

tgAsDict = _tgToDictionary(self)

textgridStr = textgrid_io.getTextgridAsStr(
tgAsDict,
format,
Expand Down Expand Up @@ -576,6 +580,4 @@ def _tgToDictionary(tg: Textgrid) -> dict:
}
tiers.append(tierDict)

tgAsDict = {"xmin": tg.minTimestamp, "xmax": tg.maxTimestamp, "tiers": tiers}

return tgAsDict
return {"xmin": tg.minTimestamp, "xmax": tg.maxTimestamp, "tiers": tiers}
3 changes: 2 additions & 1 deletion praatio/utilities/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,9 @@ class TextgridFormats:
LONG_TEXTGRID: Final = "long_textgrid"
SHORT_TEXTGRID: Final = "short_textgrid"
JSON: Final = "json"
TEXTGRID_JSON: Final = "textgrid_json"

validOptions = [LONG_TEXTGRID, SHORT_TEXTGRID, JSON]
validOptions = [LONG_TEXTGRID, SHORT_TEXTGRID, JSON, TEXTGRID_JSON]


class DataPointTypes:
Expand Down
58 changes: 49 additions & 9 deletions praatio/utilities/textgrid_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,8 @@ def parseTextgridStr(data: str, includeEmptyIntervals: bool = False) -> Dict:

try:
tgAsDict = json.loads(data)
if "start" in tgAsDict.keys(): # Using simplified json format
tgAsDict = _upconvertDictionaryFromJson(tgAsDict)
except ValueError:
caseA = "ooTextFile short" in data
caseB = "item [" not in data
Expand All @@ -172,7 +174,7 @@ def parseTextgridStr(data: str, includeEmptyIntervals: bool = False) -> Dict:

def getTextgridAsStr(
tg: Dict,
format: Literal["short_textgrid", "long_textgrid", "json"],
format: Literal["short_textgrid", "long_textgrid", "json", "textgrid_json"],
includeBlankSpaces: bool,
minTimestamp: Optional[float] = None,
maxTimestamp: Optional[float] = None,
Expand All @@ -182,7 +184,7 @@ def getTextgridAsStr(
Args:
tg: the textgrid to convert to a string
format: one of ['short_textgrid', 'long_textgrid', 'json']
format: one of ['short_textgrid', 'long_textgrid', 'json', 'textgrid_json']
includeBlankSpaces: if True, blank sections in interval
tiers will be filled in with an empty interval
(with a label of "")
Expand All @@ -204,13 +206,7 @@ def getTextgridAsStr(
a string representation of the textgrid
"""

validFormats = [
TextgridFormats.LONG_TEXTGRID,
TextgridFormats.SHORT_TEXTGRID,
TextgridFormats.JSON,
]
if format not in validFormats:
raise errors.WrongOption("format", format, validFormats)
utils.validateOption("format", format, TextgridFormats)

tg = _prepTgForSaving(
tg, includeBlankSpaces, minTimestamp, maxTimestamp, minimumIntervalLength
Expand All @@ -221,11 +217,55 @@ def getTextgridAsStr(
elif format == TextgridFormats.SHORT_TEXTGRID:
outputTxt = _tgToShortTextForm(tg)
elif format == TextgridFormats.JSON:
outputTxt = _tgToJson(_downconvertDictionaryForJson(tg))
elif format == TextgridFormats.TEXTGRID_JSON:
outputTxt = _tgToJson(tg)

return outputTxt


def _upconvertDictionaryFromJson(tgAsDict: dict) -> dict:
"""
Convert from the sparse json format to the one shaped more literally like a textgrid
"""
transformedDict = {}
transformedDict["xmin"] = tgAsDict["start"]
transformedDict["xmax"] = tgAsDict["end"]
transformedDict["tiers"] = []

for tierName in tgAsDict["tiers"].keys():
tier = tgAsDict["tiers"][tierName]
transformedDict["tiers"].append(
{
"class": tier["type"],
"name": tierName,
"xmin": tgAsDict["start"],
"xmax": tgAsDict["end"],
"entries": tier["entries"],
}
)

return transformedDict


def _downconvertDictionaryForJson(tgAsDict: Dict) -> dict:
"""
Convert from the textgrid-shaped json format to a more minimal json format
"""
tiers = {}
for tier in tgAsDict["tiers"]:
tiers[tier["name"]] = {
"type": tier["class"],
"entries": tier["entries"],
}

return {
"start": tgAsDict["xmin"],
"end": tgAsDict["xmax"],
"tiers": tiers,
}


def _sortEntries(tg: Dict) -> None:
for tier in tg["tiers"]:
tier["entries"] = sorted(tier["entries"])
Expand Down
84 changes: 84 additions & 0 deletions tests/files/mary_with_constrained_tier_times.TextGrid
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
File type = "ooTextFile"
Object class = "TextGrid"

0
1.869687
<exists>
3
"IntervalTier"
"phone"
0.3154201182247563
1.5182538944627297
14
0.3154201182247563
0.38526757369599995
"m"
0.38526757369599995
0.4906833231456586
"ə"
0.4906833231456586
0.5687114623227726
"r"
0.5687114623227726
0.6755499913498981
"i"
0.6755499913498981
0.8142925170069999
"r"
0.8142925170069999
0.854201814059
"o"
0.854201814059
0.9240430839
"l"
0.9240430839
0.9839070294779999
"d"
0.9839070294779999
1.0164729379083655
"θ"
1.0164729379083655
1.063725623583
"ə"
1.063725623583
1.1152822781165286
"b"
1.1152822781165286
1.2325508617834506
"œ"
1.2325508617834506
1.3345876591689074
"r"
1.3345876591689074
1.5182538944627297
"l"
"IntervalTier"
"word"
0.3154201182247563
1.5182538944627297
4
0.3154201182247563
0.6755499913498981
"mary"
0.6755499913498981
0.9839070294779999
"rolled"
0.9839070294779999
1.063725623583
"the"
1.063725623583
1.5182538944627297
"barrel"
"TextTier"
"pitch"
0.3154201182247563
1.5182538944627297
4
0.5978689404359245
"120"
0.8264598697308528
"85"
1.0195797927558785
"97"
1.2008760470242699
"104"
Loading

0 comments on commit 449de6e

Please sign in to comment.