no dictionary #8

PasaOpasen · 2020-05-30T11:26:49Z

from pysle import isletool

isletool.LexicalTool('ISLEdict.txt').lookup('cat')

I think u should point all data files into MANIFEST.in file

See https://realpython.com/pypi-publish-python-package/

I do something like for persian language: https://github.com/PasaOpasen/PersianG2P

The text was updated successfully, but these errors were encountered:

timmahrt · 2020-05-30T14:45:35Z

The dictionary is not licensed, so I don't think I can include it in my library. I had a similar problem with my praatio library.

PasaOpasen · 2020-05-30T16:46:41Z

But u can convert this file into dictionary python object and save as python object or as json. It will work faster and will be better to upload

PasaOpasen · 2020-05-30T16:50:33Z

Sorry, I forgot to add the error message:

from pysle import isletool

isletool.LexicalTool('ISLEdict.txt').lookup('cat')

Traceback (most recent call last):

  File "<ipython-input-1-c4a023343ec3>", line 3, in <module>
    isletool.LexicalTool('ISLEdict.txt').lookup('cat')

  File "C:\ProgramData\Anaconda3\lib\site-packages\pysle\isletool.py", line 74, in __init__
    self.data = self._buildDict()

  File "C:\ProgramData\Anaconda3\lib\site-packages\pysle\isletool.py", line 81, in _buildDict
    with io.open(self.islePath, "r", encoding='utf-8') as fd:

FileNotFoundError: [Errno 2] No such file or directory: 'ISLEdict.txt'

timmahrt · 2020-05-30T17:00:04Z

As I said, the issue of including it is a legal one. Perhaps I can ask if the data can be released under some license?

Either way, maybe I could cache the results or store in some intermediate format to make loading faster.

PasaOpasen · 2020-05-30T17:07:02Z

So how should I use this package without necessary data?

It will be very good if u convert necessary files into some json/binary formats to not have legacy problems

timmahrt · 2020-05-30T23:47:09Z

Oh sorry, the link to the necessary data is in the requirements section:
https://raw.githubusercontent.com/uiuc-sst/g2ps/master/English/ISLEdict.txt

The data is derived from an academic project.

If there is a way for me to make that clearer, please let me know.

timmahrt · 2020-05-31T01:52:14Z

I have made a new release 2.1.1 If you try to load the ISLEdict but it does not exist, a warning states where to find the file:

>>> from pysle import isletool
>>> a = isletool.LexicalTool('bloop.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "isletool.py", line 87, in __init__
    raise IsleDictDoesNotExist()
pysle.isletool.IsleDictDoesNotExist: The path to the ISLE dictionary file does not exist.
The ISLE dictionary is an external resource that must  be downloaded separately.  ISLEdict.txt can be found here:
https://github.com/uiuc-sst/g2ps/tree/master/English/ISLEdict.txt
Please see the requirements section in the README file for more details.

What do you think?

It takes about 1~1.5 seconds to load ISLEDict.txt into memory. If we pickle the data and then load it, it takes about the same amount of time. I was a bit surprised. Maybe I did something wrong?

timmahrt · 2020-05-31T08:05:58Z

I also tried serializing/deserializing with json and it wasn't any faster. I think it is slow to load simply because the dictionary file is large (~16 MB).

PasaOpasen · 2020-05-31T09:47:15Z

Okay, now I know the path to dictionaries, thank u. I will try to use it.

So will ur package work with another languages if I use it's dictionary? But in g2ps I cannot find some files like ISLEDict.txt for other languages

With my project I have saved my dictionary (50kb) into json and the loading of this was over 5 times faster. I dunno how u transform ISLEDict.txt file. I have transformed into python dict(). Anyway 1.5 secs is not bad, don't worry

timmahrt · 2020-05-31T12:47:20Z

Unfortunately, I don't know of a similar resource file for other languages. If you know of any, please share!

PasaOpasen · 2020-05-31T14:56:44Z

So I found this collection. Is it exactly ur package need?

timmahrt · 2020-05-31T16:14:02Z

That collection is useful but actually it comes with its own python code. I think if you want to access the wikipron dataset, you should use their python library:

https://github.com/kylebgorman/wikipron

If you need languages other than English, you should use wikipron.

If you only need English, ISLEdict has 10 times as many words and includes syllable information. wikipron does not include syllable boundaries.

sevagh · 2020-11-16T17:39:29Z

The upstream project is now MIT licensed. Perhaps that means the file can now be included in this repo? uiuc-sst/g2ps@8abb736

timmahrt · 2020-11-17T00:11:57Z

Nice find! I should be able to put out a release with it included today or tomorrow. Thanks!

timmahrt · 2020-11-18T04:18:51Z

Release v.2.2.0 is out. You can now just do:

from pysle import isletool
isleDict = isletool.LexicalTool()
isleDict.lookup('pumpkin')

etc

No need to deal with the ISLEX dictionary, as it is now included in the library. Thank you @sevagh

Thanks!

PasaOpasen closed this as completed May 31, 2020

timmahrt reopened this Nov 17, 2020

timmahrt closed this as completed Nov 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no dictionary #8

no dictionary #8

PasaOpasen commented May 30, 2020

timmahrt commented May 30, 2020

PasaOpasen commented May 30, 2020

PasaOpasen commented May 30, 2020 •

edited

timmahrt commented May 30, 2020

PasaOpasen commented May 30, 2020

timmahrt commented May 30, 2020 •

edited

timmahrt commented May 31, 2020

timmahrt commented May 31, 2020

PasaOpasen commented May 31, 2020 •

edited

timmahrt commented May 31, 2020

PasaOpasen commented May 31, 2020

timmahrt commented May 31, 2020

sevagh commented Nov 16, 2020

timmahrt commented Nov 17, 2020

timmahrt commented Nov 18, 2020 •

edited

no dictionary #8

no dictionary #8

Comments

PasaOpasen commented May 30, 2020

timmahrt commented May 30, 2020

PasaOpasen commented May 30, 2020

PasaOpasen commented May 30, 2020 • edited

timmahrt commented May 30, 2020

PasaOpasen commented May 30, 2020

timmahrt commented May 30, 2020 • edited

timmahrt commented May 31, 2020

timmahrt commented May 31, 2020

PasaOpasen commented May 31, 2020 • edited

timmahrt commented May 31, 2020

PasaOpasen commented May 31, 2020

timmahrt commented May 31, 2020

sevagh commented Nov 16, 2020

timmahrt commented Nov 17, 2020

timmahrt commented Nov 18, 2020 • edited

PasaOpasen commented May 30, 2020 •

edited

timmahrt commented May 30, 2020 •

edited

PasaOpasen commented May 31, 2020 •

edited

timmahrt commented Nov 18, 2020 •

edited