New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get data object from Config service by reference, not deep clone #108

Merged
merged 1 commit into from Oct 15, 2015

Conversation

Projects
None yet
3 participants
@kadamwhite
Contributor

kadamwhite commented Oct 14, 2015

Previously, any call to genClusters incurred a deep clone of the data object. In the case of large datasets (e.g. a ~1mb CSV file), this could result in over a second of additional processing, as well as the creation of a very large duplicate object in-memory:

image

Altering the Config service to return the data object by reference, rather than as a deep clone, reduces the processing needed to the extent that it no longer even shows up on the profiler:

image

The total run time of genClusters drops by about a second in this instance.

Because the data should be consistent across all charts, and it is just the chart configuration that will vary on a plot-by-plot basis, passing the data by reference improves both the memory usage and the performance of the data load process.

@kadamwhite

This comment has been minimized.

Contributor

kadamwhite commented Oct 14, 2015

@kanitw @domoritz this is ready for review

@domoritz

This comment has been minimized.

Member

domoritz commented Oct 15, 2015

This makes sense to me. @kanitw objections?

Get data object from Config service by reference, not deep clone
Previously, any call to `genClusters` incurred a deep clone of the data
object. In the case of large datasets (*e.g.* a ~1mb CSV file), this
could result in over a second of additional processing, as well as the
creation of a very large duplicate object in-memory:

![image](https://cloud.githubusercontent.com/assets/442115/10492615/049d8c66-727b-11e5-9ce7-7d835a9ac5d2.png)

Altering the Config service to return the data object by reference,
rather than as a deep clone, reduces the processing needed to the extent
that it no longer even shows up on the profiler:

![image](https://cloud.githubusercontent.com/assets/442115/10492663/493c5a50-727b-11e5-91b8-237ec1ba3cdc.png)

The total run time of `genClusters` drops by about a second in this case.

Because the data should be consistent across all charts, and it is just
the chart configuration that will vary on a plot-by-plot basis, passing
the data by reference improves both the memory usage and the performance
of the data load process.
@kadamwhite

This comment has been minimized.

Contributor

kadamwhite commented Oct 15, 2015

@domoritz, I think @kanitw had expressed this concern yesterday in Slack:

we need to try to recall why we add _.cloneDeep() there
I kinda remember that it was just Config.data before
but something was wrong and we change it — but I might get confused with other changes.
This was long time ago.

If a specific issue can be identified I'd like to prioritize taking a look at that, since this is a big time saver when loading large data. If nothing is identified my inclination is to merge this and revisit it later if something does turn up, but I don't want to do that without your go-ahead because that attitude may be a bit too cavalier :)

@kanitw

This comment has been minimized.

Member

kanitw commented Oct 15, 2015

Okay. Just checked, this change is fine. Only Config.getConfig() needs to be cloned given current code path. Although we can refactor that later. (I'm creating an issue to eliminate the need to clone in Config.getConfig() (#117)

kanitw added a commit that referenced this pull request Oct 15, 2015

Merge pull request #108 from vega/kaw/pass-data-by-reference
Get data object from Config service by reference, not deep clone

@kanitw kanitw merged commit 616d35a into master Oct 15, 2015

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@kanitw kanitw deleted the kaw/pass-data-by-reference branch Oct 15, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment