New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up projects.json file #273
Comments
This parser only handles hashes, arrays and strings and will return Nil if *anything* goes wrong. It is now plugged into "R:I:JSON.from-json" to first attempt the very fast parser: of that fails, fall back to the original, slow, grammar based parser. How much faster: at least 10x faster!
Technically this works, but only for the ecosystem data specifically generated and served by me. It wouldn’t work on the ecosystem data generated by ecosystem-api.p6c.org (a default) or any darkpan-like. |
On Donnerstag, 20. September 2018 14:32:43 CEST Nick Logan wrote:
Technically this works, but only for the ecosystem data specifically
generated and served by me. It wouldn’t work on the ecosystem data
generated by ecosystem-api.p6c.org (a default) or any darkpan-like.
There's a way out: you can use the slow, generic JSON parser after downloading
the ecosystem data from wherever and then have generate a more easily
parseable version suitable for the fast parser.
That's kind of a low cost in-between solution, that would get you most of the
benefit of what I'd do, which is just putting the data into an SQLite database
after downloading and parsing it.
|
I'm content with the slower parsing for less technical debt trade off. |
Please find below another version of the parallel parsing: this version does not depend on any particular format: it parses both CPAN and ecosystem JSON files. On my machine this reduces the parse time from 2.3 to about 0.9. I think this presents a reasonable speed increase / technical debt tradeoff, as it basically only chunks the JSON files and then uses whatever parser is around to parse the chunks in parallel. The only thing it expects is the first non-whitespace char to be a
|
Never mind: I just realized we could hook this into R:I:from-json, which would relieve the technical debt from |
Hmm, but I do not see a speed difference between 2018.08 and blead for fwiw I get ~7s with or without the JSON improvement, and 4.8s with JSON::Fast |
I see a speedup for |
Please find below the code I wrote to
hyper
the parsing of the "projects.json" file. This drops the parsing from 2.2 seconds to 1 second on my machine (without the new quick parser on rakudo) and to 0.3 with the new quick parser.It basically depends on the way the "projects.json" file is created, with the first line being
[
and the last line being]
, and separated by lines consisting of just,
in between.I'm not sure where to hook that into
zef
, so I'm adding it here as an issue.The text was updated successfully, but these errors were encountered: