New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix #537 ValueError race condition when running multiprocessing with describe1d #549
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #549 +/- ##
===========================================
- Coverage 87.73% 87.62% -0.11%
===========================================
Files 120 121 +1
Lines 3115 3152 +37
===========================================
+ Hits 2733 2762 +29
- Misses 382 390 +8
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
…l by expliciting dropping na here
Double checked the Travis, it is failing test_issue147.py on the reset_index() call (Error - cannot convert float NaN to integer), which is actually a bug in pandas 1.1.0 listed here - pandas-dev/pandas#35657
|
…g with describe1d (ydataai#549) * include tests for issue 537 * fix hidden side effect from previous series.fillna(in_place=True) call by expliciting dropping na
…g with describe1d (ydataai#549) * include tests for issue 537 * fix hidden side effect from previous series.fillna(in_place=True) call by expliciting dropping na
…g with describe1d (ydataai#549) * include tests for issue 537 * fix hidden side effect from previous series.fillna(in_place=True) call by expliciting dropping na
…g with describe1d (ydataai#549) * include tests for issue 537 * fix hidden side effect from previous series.fillna(in_place=True) call by expliciting dropping na
…e, tests and CI up, and with visions integration pulled in Update integrations.rst (ydataai#544) fix ydataai#537 ValueError race condition when running multiprocessing with describe1d (ydataai#549) * include tests for issue 537 * fix hidden side effect from previous series.fillna(in_place=True) call by expliciting dropping na Give visibility to our support (ydataai#536) * Add support mention Change formatters for overview (ydataai#535) Fix 523 (ydataai#533) * Fix 523 Incompatible with pandas 1.1.0 (ydataai#557) Notebook update instructions (ydataai#556) Fix 545 and test pandas 1.0.5 and >=1.1 (ydataai#558) * Fix 545 and test pandas 1.0.5 and >=1.1 Bump visions[type_image_path] from 0.4.4 to 0.5.0 (ydataai#547) Bumps [visions[type_image_path]](https://github.com/dylan-profiler/visions) from 0.4.4 to 0.5.0. - [Release notes](https://github.com/dylan-profiler/visions/releases) - [Commits](dylan-profiler/visions@v0.4.4...0.5.0) Update frequent issues (ydataai#564) Fix warning from cmap (ydataai#565) Feature/distinct unique (ydataai#566) * Fix ydataai#539 v2.9.0 details (ydataai#567) [skip ci] Code formatting Visions integration Build summary from graph structure Fix a few more tests Typeset changes + test updates Type checking Correlations Handler, warning structure, random sample, test fix Test fix Fixes Fix warning Captions missing diagrams Fix 51 Unhashable Process comments Fix tests Update messages.py Add threshold to all correlation configs Remove unused renderers (ydataai#580) * Remove unused rendered Update README.md Fix check for infinite values (ydataai#588) * Fix check for infinite values Bump visions[type_image_path] from 0.5.0 to 0.6.0 Bumps [visions[type_image_path]](https://github.com/dylan-profiler/visions) from 0.5.0 to 0.6.0. - [Release notes](https://github.com/dylan-profiler/visions/releases) - [Commits](dylan-profiler/visions@0.5.0...v0.6.0) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> Update get_scatter_matrix for sparse dataframes For a dataframe like: A B C 0 1.0 7.0 NaN 1 2.0 8.0 NaN 2 3.0 9.0 NaN 3 4.0 NaN 13.0 4 5.0 NaN 14.0 5 6.0 NaN 15.0 6 NaN 10.0 16.0 7 NaN 11.0 17.0 8 NaN 12.0 18.0 the 'Interactions' tab would not display any data (as all rows contain NaN's) while any pair of columns would contain valid data to plot. This change allows columns A, B, and C to be pairwise plotted against each other by only removing rows with NaN's between the pairwise columns. Update plot.py Notation
…e, tests and CI up, and with visions integration pulled in Update integrations.rst (ydataai#544) fix ydataai#537 ValueError race condition when running multiprocessing with describe1d (ydataai#549) * include tests for issue 537 * fix hidden side effect from previous series.fillna(in_place=True) call by expliciting dropping na Give visibility to our support (ydataai#536) * Add support mention Change formatters for overview (ydataai#535) Fix 523 (ydataai#533) * Fix 523 Incompatible with pandas 1.1.0 (ydataai#557) Notebook update instructions (ydataai#556) Fix 545 and test pandas 1.0.5 and >=1.1 (ydataai#558) * Fix 545 and test pandas 1.0.5 and >=1.1 Bump visions[type_image_path] from 0.4.4 to 0.5.0 (ydataai#547) Bumps [visions[type_image_path]](https://github.com/dylan-profiler/visions) from 0.4.4 to 0.5.0. - [Release notes](https://github.com/dylan-profiler/visions/releases) - [Commits](dylan-profiler/visions@v0.4.4...0.5.0) Update frequent issues (ydataai#564) Fix warning from cmap (ydataai#565) Feature/distinct unique (ydataai#566) * Fix ydataai#539 v2.9.0 details (ydataai#567) [skip ci] Code formatting Visions integration Build summary from graph structure Fix a few more tests Typeset changes + test updates Type checking Correlations Handler, warning structure, random sample, test fix Test fix Fixes Fix warning Captions missing diagrams Fix 51 Unhashable Process comments Fix tests Update messages.py Add threshold to all correlation configs Remove unused renderers (ydataai#580) * Remove unused rendered Update README.md Fix check for infinite values (ydataai#588) * Fix check for infinite values Bump visions[type_image_path] from 0.5.0 to 0.6.0 Bumps [visions[type_image_path]](https://github.com/dylan-profiler/visions) from 0.5.0 to 0.6.0. - [Release notes](https://github.com/dylan-profiler/visions/releases) - [Commits](dylan-profiler/visions@0.5.0...v0.6.0) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> Update get_scatter_matrix for sparse dataframes For a dataframe like: A B C 0 1.0 7.0 NaN 1 2.0 8.0 NaN 2 3.0 9.0 NaN 3 4.0 NaN 13.0 4 5.0 NaN 14.0 5 6.0 NaN 15.0 6 NaN 10.0 16.0 7 NaN 11.0 17.0 8 NaN 12.0 18.0 the 'Interactions' tab would not display any data (as all rows contain NaN's) while any pair of columns would contain valid data to plot. This change allows columns A, B, and C to be pairwise plotted against each other by only removing rows with NaN's between the pairwise columns. Update plot.py Notation
…g with describe1d (ydataai#549) * include tests for issue 537 * fix hidden side effect from previous series.fillna(in_place=True) call by expliciting dropping na
…e, tests and CI up, and with visions integration pulled in Update integrations.rst (ydataai#544) fix ydataai#537 ValueError race condition when running multiprocessing with describe1d (ydataai#549) * include tests for issue 537 * fix hidden side effect from previous series.fillna(in_place=True) call by expliciting dropping na Give visibility to our support (ydataai#536) * Add support mention Change formatters for overview (ydataai#535) Fix 523 (ydataai#533) * Fix 523 Incompatible with pandas 1.1.0 (ydataai#557) Notebook update instructions (ydataai#556) Fix 545 and test pandas 1.0.5 and >=1.1 (ydataai#558) * Fix 545 and test pandas 1.0.5 and >=1.1 Bump visions[type_image_path] from 0.4.4 to 0.5.0 (ydataai#547) Bumps [visions[type_image_path]](https://github.com/dylan-profiler/visions) from 0.4.4 to 0.5.0. - [Release notes](https://github.com/dylan-profiler/visions/releases) - [Commits](dylan-profiler/visions@v0.4.4...0.5.0) Update frequent issues (ydataai#564) Fix warning from cmap (ydataai#565) Feature/distinct unique (ydataai#566) * Fix ydataai#539 v2.9.0 details (ydataai#567) [skip ci] Code formatting Visions integration Build summary from graph structure Fix a few more tests Typeset changes + test updates Type checking Correlations Handler, warning structure, random sample, test fix Test fix Fixes Fix warning Captions missing diagrams Fix 51 Unhashable Process comments Fix tests Update messages.py Add threshold to all correlation configs Remove unused renderers (ydataai#580) * Remove unused rendered Update README.md Fix check for infinite values (ydataai#588) * Fix check for infinite values Bump visions[type_image_path] from 0.5.0 to 0.6.0 Bumps [visions[type_image_path]](https://github.com/dylan-profiler/visions) from 0.5.0 to 0.6.0. - [Release notes](https://github.com/dylan-profiler/visions/releases) - [Commits](dylan-profiler/visions@0.5.0...v0.6.0) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> Update get_scatter_matrix for sparse dataframes For a dataframe like: A B C 0 1.0 7.0 NaN 1 2.0 8.0 NaN 2 3.0 9.0 NaN 3 4.0 NaN 13.0 4 5.0 NaN 14.0 5 6.0 NaN 15.0 6 NaN 10.0 16.0 7 NaN 11.0 17.0 8 NaN 12.0 18.0 the 'Interactions' tab would not display any data (as all rows contain NaN's) while any pair of columns would contain valid data to plot. This change allows columns A, B, and C to be pairwise plotted against each other by only removing rows with NaN's between the pairwise columns. Update plot.py Notation
…e, tests and CI up, and with visions integration pulled in Update integrations.rst (ydataai#544) fix ydataai#537 ValueError race condition when running multiprocessing with describe1d (ydataai#549) * include tests for issue 537 * fix hidden side effect from previous series.fillna(in_place=True) call by expliciting dropping na Give visibility to our support (ydataai#536) * Add support mention Change formatters for overview (ydataai#535) Fix 523 (ydataai#533) * Fix 523 Incompatible with pandas 1.1.0 (ydataai#557) Notebook update instructions (ydataai#556) Fix 545 and test pandas 1.0.5 and >=1.1 (ydataai#558) * Fix 545 and test pandas 1.0.5 and >=1.1 Bump visions[type_image_path] from 0.4.4 to 0.5.0 (ydataai#547) Bumps [visions[type_image_path]](https://github.com/dylan-profiler/visions) from 0.4.4 to 0.5.0. - [Release notes](https://github.com/dylan-profiler/visions/releases) - [Commits](dylan-profiler/visions@v0.4.4...0.5.0) Update frequent issues (ydataai#564) Fix warning from cmap (ydataai#565) Feature/distinct unique (ydataai#566) * Fix ydataai#539 v2.9.0 details (ydataai#567) [skip ci] Code formatting Visions integration Build summary from graph structure Fix a few more tests Typeset changes + test updates Type checking Correlations Handler, warning structure, random sample, test fix Test fix Fixes Fix warning Captions missing diagrams Fix 51 Unhashable Process comments Fix tests Update messages.py Add threshold to all correlation configs Remove unused renderers (ydataai#580) * Remove unused rendered Update README.md Fix check for infinite values (ydataai#588) * Fix check for infinite values Bump visions[type_image_path] from 0.5.0 to 0.6.0 Bumps [visions[type_image_path]](https://github.com/dylan-profiler/visions) from 0.5.0 to 0.6.0. - [Release notes](https://github.com/dylan-profiler/visions/releases) - [Commits](dylan-profiler/visions@0.5.0...v0.6.0) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> Update get_scatter_matrix for sparse dataframes For a dataframe like: A B C 0 1.0 7.0 NaN 1 2.0 8.0 NaN 2 3.0 9.0 NaN 3 4.0 NaN 13.0 4 5.0 NaN 14.0 5 6.0 NaN 15.0 6 NaN 10.0 16.0 7 NaN 11.0 17.0 8 NaN 12.0 18.0 the 'Interactions' tab would not display any data (as all rows contain NaN's) while any pair of columns would contain valid data to plot. This change allows columns A, B, and C to be pairwise plotted against each other by only removing rows with NaN's between the pairwise columns. Update plot.py Notation
References issue #537
Problem :
ValueError is raised when running ProfileReport on large datasets and with multiprocessing on (pool_size >1). This is likely due to the series.fillna(np.nan, inplace=True) in summary.py seems to be performing multiple in-place mutations to the underlying DataFrame object through the passed series reference, resulting in some kind of race condition where two of the processes try to write to the DataFrame at the same time and the ValueError then occurs. This is also why changing the pool_size to 1 fixes the issue, and why the error doesn't always occur - you probably need enough data and threads to hit the race condition.
Solution :
Replace series.fillna(np.nan, inplace=True) with series = series.fillna(np.nan) , negating any side effects from mutation.
Write test case for multiprocessing describe1d to test for multiprocessing functionality