# Pysle - a python interface to the ISLEX dictionary

*Introduction to Pysle*

<hr>
TABLE OF CONTENTS

**Introduction**
- [What is ISLEX?](#what-is-islex)
- [What is Pysle?](#what-is-pysle)

**Tutorial**
- [Installation](#installation)
- [First example](#first-example)

[Summary](#summary)

## Introduction

### What is ISLEX?

**ISLEX** (https://github.com/uiuc-sst/g2ps/tree/master/English-US) is an English pronunciation dictionary.  Each entry in ISLEX looks like:

    islands(+island+s,nnp,nnps,nns,vbz) # ˈɑɪ . l n̩ d z #

There are three components:
- the word
- a list containing possible parts of speech (using those defined by the Penn Treebank https://cs.nyu.edu/~grishman/jet/guide/PennPOS.html)
- the pronunciation (rendered in IPA https://en.wikipedia.org/wiki/International_Phonetic_Alphabet)

A word can have multiple entries in the dictionary, one per pronunciation.  Here is another entry for *islands*

    islands(+island+s,nnp,nnps,nns,vbz) # ˈɑɪ s . l n̩ d z #

### What is Pysle?

Pysle is an interface to the ISLEX dictionary. In addition to being able to fetch entries from the ISLEX dictionary, it contains some high-level functions--those will mostly be covered in later tutorials.

In this tutorial we'll cover the basics needed to understand how to use Pysle.


## Tutorial

### Installation

Before working with Pysle, we need to install it.  It can be installed easily using pip like so.  For other installation options, see the main github page for Pysle.

In [16]:
%pip install pysle --upgrade

Note: you may need to restart the kernel to use updated packages.



### First example

The first thing we'll typically need to do is to create an instance of **Isle**.  This is a class that holds important functionality related to the ISLEX dictionary.  Instantiating it with **isletool.Isle()** will cause the ISLEX to be loaded into memory.  The file is quite large and this process takes some time.

> ISLEX takes about ~20 seconds to fully load into Pysle.  To make things quicker, Pysle lazy loads data--that is, data is only partially loaded until it is needed, at which point it becomes fully available. Running the first search over the ISLEX dictionary will be slow because everything needs to be loaded, but subsequent searches should be quick.  If you aren't searching but are looking for specific words, it should be very quick, even after a fresh startup.

In [17]:
from pysle import isletool

isle = isletool.Isle()
words = isle.lookup("islands")

print(words)

Text
[<pysle.phonetics.Entry object at 0x10c286e60>, <pysle.phonetics.Entry object at 0x10c286e00>]


So looking up the word *islands* yielded two things called "Entry". A single Entry represents a line in the ISLEX dictionary like what we saw earlier. Recall that there are three parts: the name, the part of speech list, and pronunciation.

In [18]:
island = words[0]

print(island.word)
print(island.posList)
print(island.toList())

islands
['nnp', 'nns', 'vbz']
[[['ˈɑɪ'], ['l', 'n̩', 'd', 'z']]]


An Entry has an attribute *syllabificationList* which actually contains the pronunciation:

In [19]:
print(island.syllabificationList)

[<pysle.phonetics.Syllabification object at 0x10c287580>]


So an entry has a list of `Syllabification` objects.  And a `Syllabification` object has a list of `Syllables`.  Each `Syllable` contains a list of phones (which are just represented as strings).  It isn't quick or easy to go from an `Entry` to phones, which is why an `Entry` as the method *toList()*

But let's do it anyways!

In [20]:
islandSyllabification = island.syllabificationList[0]
islandFirstSyllable = islandSyllabification.syllables[0]
print(islandFirstSyllable.phonemes)

['ˈɑɪ']


As an aside: *island* only has a single syllabification.  That will be the case for most words, so you'll usually need to do `entry.syllabificationList[0]` as done above. If you're wondering why entries have a syllabification *list*, it's because there are multiword entries is ISLEX. For those entries, they'll have one syllabification in the list for each word.  For example:

In [21]:
multiwordEntry = isle.lookup('island-dotted')[0]
print(multiwordEntry.syllabificationList[0].toList())
print(multiwordEntry.syllabificationList[1].toList())

[['ˈɑɪ'], ['l', 'n̩', 'd']]
[['d', 'ˈɑ'], ['ɾ', 'ɪ', 'd']]


As we originally saw an `Entry` has a method `toList()`, and as we've seen now, a `Syllabification` has a `toList()` method as well. Syllables don't because their contents can be directly accessed eg `syllable.phonemes` as we saw earlier.

A `Syllabification` has one more useful function

In [22]:
print(islandSyllabification.desyllabify())

<pysle.phonetics.PhonemeList object at 0x10c2b7c70>


A `PhonemeList` and a `Syllable` are the same thing--a list of phones.  However, a `Syllable` must be a valid syllable (syllables cannot contain a VCV sequence, for example).  A `PhonemeList` is unconstrained and can include whatever you want.

We know how to get phonemes from a `Syllable` and the same technique applies to a `PhonemeList`

In [23]:
islandPhonemeList = islandSyllabification.desyllabify()
print(islandPhonemeList.phonemes)

['ˈɑɪ', 'l', 'n̩', 'd', 'z']


## Summary

To sum up so far:
- each line in the ISLEX dictionary is represented by an Entry
- an Entry has a word, a part of speech list, and pronunciation information
- `Entry.toList()` can be used to quickly see the syllable structure of the word
- for more low level detail
  - `Entry.SyllabificationList` can be used to get the Syllabifications for the word
    - there is one Syllabification per word
    - there is usually one Syllabification per entry
        - except for multi-word entries like "island-dotted"
  - `Syllabification.toList()` can be used to see the syllable structure of the Syllabification
  - `Syllabification.syllables` can be used to access the syllables for the Syllabification
  - `Syllabification.desyllabify()` can be used to get the phones for the Syllabification
  - `syllable.phonemes` or `phonemeList.phonemes` can be used to access the phonemes for those data structures



`Entry`, `Syllabification`, `Syllable`, and `PhonemeList` all have various helper methods (eg `syllable.hasStress()`) which we won't be covering here.  If you've gotten this far, you understand the basics for how to use Isle and how to work with the results fetched from the system. For more information, the generated documentation (http://timmahrt.github.io/pysle/) can be a good place to get a feel for some of the functionality available.

In the next tutorials we'll look at some high-level tools that use this basic functionality including:
- using pysle in conjuction with textgrids
  - automatically filling in tiers with syllable, stress, or phone labels
- searching ISLEX based on pronunciation
- using ISLEX's dictionary pronunciations to analyze pronunciations found in a corpus

If you have any feedback from this tutorial, please share!