Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly decrease parser file size by compacting parser table #234

Merged
merged 4 commits into from Aug 17, 2014

Conversation

@RubenVerborgh
Copy link
Contributor

RubenVerborgh commented Aug 17, 2014

Summary

This pull request nearly halves the gzipped size of generated parsers.

Problem

The largest part of a Jison-generated parser is its table: a large array containing objects with numeric keys and (arrays of) numeric values.

Two kinds of patterns occur frequently in such a table:

  1. repeated long numerical arrays

    example: tables = [{ 5: [1,3,4,6,7,8,15], 6: 7}, { 20: [1,3,4,6,7,8,15], 9: 8 }]
  2. objects where all keys have the same value

    example: tables = [{ 5: [200,204], 17: [200,204], 20: [200,204], 21: [200,204] }]

Solution

I tackled the first case by storing frequently occurring arrays into temporary variables:

var a = [1,3,4,6,7,8,15],
    tables = [{ 5: a, 6: 7 }, { 20: a, 9: 8 }];

I tackled the second case by creating such objects with an auxiliary function o:

var tables = [o([200,204], [5, 17, 20, 21])];

That also leads to new long arrays with numbers, which can be optimized under the first case.

Not only does this lead to a significantly decreased filesize of the parser, it also leads to less memory usage, as it avoids having multiple copies of the same array in memory.

To support such chunks of reusable code, the generateModule_ function has been updated to return an object with commonCode and moduleCode (instead of only moduleCode).

Results

A parser I am working on benefited significantly from the new table generation function:

  • before: 173kb (generated), 155kb (minified), 36kb (gzipped)
  • optimization 1: 138kb (generated), 112kb (minified), 36kb (gzipped)
  • optimizations 1 and 2: 91kb (generated), 71kb (minified), 19kb (gzipped)

The decrease from 36kb to 19kb is a 47% reduction.

@RubenVerborgh

This comment has been minimized.

Copy link
Contributor Author

RubenVerborgh commented Aug 17, 2014

There are two immediate cases left for further optimization:

  1. recognition of sublists
    e.g., [1,11,12,13,14,15,100] and [2,11,12,13,14,15,200] share the majority of elements
  2. objects with almost all identical values
    e.g., {1: X, 2: X, 3: X, 4: X, 5: Y} is almost a candidate for optimization 2
} while (id !== 0);
return name;
}
var nextVariableId = 0;

This comment has been minimized.

Copy link
@zaach

zaach Aug 17, 2014

Owner

Should generateTableCode reset this to 0? If you were creating multiple parsers the second one would start at an arbitrary position, if that matters.

This comment has been minimized.

Copy link
@RubenVerborgh

RubenVerborgh Aug 17, 2014

Author Contributor

New parsers can initialize this to 0 indeed, but it is not necessary. If they do, variable names can be shorter (as they don't have to be unique across all parsers, only within a single parser). However, minification will reassign variable names anyway, making them as short as possible.

Summarizing: you could make createVariable a member function with nextVariableId as a member variable, so names will be shorter in unminified versions.

This comment has been minimized.

Copy link
@RubenVerborgh

RubenVerborgh Aug 17, 2014

Author Contributor

Or maybe the best and easiest option: generateModule_ can set nextVariableId to 0.
(That way, other methods can also create and use new variables.)

This comment has been minimized.

Copy link
@RubenVerborgh

This comment has been minimized.

Copy link
@zaach

zaach Aug 17, 2014

Owner

I like that 👍. In general it's nice to keep functions as "pure" as possible.

This comment has been minimized.

Copy link
@RubenVerborgh

RubenVerborgh Aug 17, 2014

Author Contributor

True. Maybe the cleanest would have been to make it a member function, but that would add unnecessary complexity. The important thing is that there are no side-effects.

@zaach

This comment has been minimized.

Copy link
Owner

zaach commented Aug 17, 2014

Two thumbs up 👍 👍

zaach added a commit that referenced this pull request Aug 17, 2014
Significantly decrease parser file size by compacting parser table
@zaach zaach merged commit 8543cc4 into zaach:master Aug 17, 2014
1 check failed
1 check failed
continuous-integration/travis-ci The Travis CI build could not complete due to an error
Details
@RubenVerborgh

This comment has been minimized.

Copy link
Contributor Author

RubenVerborgh commented Aug 17, 2014

Yihaa, thanks for merging! Any chance you could publish a new version to npm?

I plan to have a look next week at the other optimizations suggested in my comment above. Probably they won't be as spectacular, but we might still shave a few kilobytes off.

@RubenVerborgh RubenVerborgh deleted the RubenVerborgh:compact-table branch Aug 17, 2014
@RubenVerborgh

This comment has been minimized.

Copy link
Contributor Author

RubenVerborgh commented Aug 18, 2014

Pull request #235 implements the suggestion “objects with almost all identical values”.

I also tried “recognition of sublists”, but this doesn't bring the gzipped size down (as expected); it also doesn't significantly change the minified version. Therefore, I haven't included it.

@ericprud

This comment has been minimized.

Copy link

ericprud commented Dec 28, 2014

Hi, this looks really cool and I'm trying to understand the impact on the development process. The build script for your SPARQL parser calls jison directly:

./node_modules/jison/lib/cli.js ./lib/sparql.jison -p slr -o ./lib/SparqlParser.js

Should I see a call to a minimizer which would replace identical sequences in the generated table?

@RubenVerborgh

This comment has been minimized.

Copy link
Contributor Author

RubenVerborgh commented Dec 28, 2014

There is no explicit call to a minimizer; it is part of the Jison build process.
Concretely, generateModule_ uses the minimized table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.