New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Look at library size #17
Comments
I really doens't full understand how unorm works, I just did some general code improvements :) Befofe trying to optimize also for size, we should make the benchmark suite. Than one could be separate first priority issue. Maybe there are already some good input files available somewhere already. |
True, a benchmark suite would be a nice first step. |
Still looking to do this? |
5 years later.. anyone still interested in take some look at this issue? :) having some trouble with the package size.. |
@claudiolcastro Unfortunately I haven't got the time to look at this project a lot. If anyone is ready to step up and review/merge PRs, I'm ready to hand out commit access and/or publishing rights on npm. There are many small things that could be done to improve the library. |
I was also concerned about the size of this package and since it is only a polyfill for my project, I didn't want all users to have to download it. The cleanest and easiest solution I found was using Polyfill.io to include it when necessary: <script crossorigin="anonymous" src="https://polyfill.io/v3/polyfill.min.js?flags=gated&features=String.prototype.normalize"></script> You can use their URL builder here: https://polyfill.io/v3/url-builder/ |
We could potentially compress the
udata
better. I've been researching this a bit, and we could shave a good amount of bytes by changing the data layout and save in base-36 (which is fast for JavaScript to decode withparseInt
).I also think it's an issue that the code points are layout in this binary format:
yyyyyxxxxxxxxyyyyyyyy
. This makes the x=0 section quite big, but many times you'd only uselatin1
characters and not characters outside the BMP. A better format would bexxxxxxxxxxxxxyyyyyyyy
. This creates more data rows, but you have to decompress less data in average, based on the assumption that normal text only revolves around a few Unicode scripts. Or maybe we should make a split between the way BMP and outside-BMP is stored.I just need to look at my research files again and write the points of my research down in this issue.
The text was updated successfully, but these errors were encountered: