Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

voikko-fi/common/voikkoutils.py: process_wordlist: use XML parser for whole word list #33

Open
hatapitk opened this issue Oct 18, 2017 · 0 comments

Comments

@hatapitk
Copy link
Member

When the code was written xml.dom.minidom was way too slow for parsing the whole word list. SAX parsers were too complicated. So we decided to parse each word element separately with minidom and just assume that tags could be found on their own lines. Thus the parser does not actually accept any valid XML document but makes extra assumptions about formatting.

These days we may have better parsers so it would be good to rewrite this function to use them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant