-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: update duplicates_pandas.py #1427
Conversation
Fixing Bug Report ydataai#1384 Dataset with categorical features causes memory error even on tiny dataset.
Thank you for reporting the issue and solving it! |
Codecov ReportPatch and project coverage have no change.
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## develop #1427 +/- ##
========================================
Coverage 89.71% 89.71%
========================================
Files 194 194
Lines 6319 6319
========================================
Hits 5669 5669
Misses 650 650
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
Fixing Bug Report #1384 Dataset with categorical features causes memory error even on tiny dataset.
Fixing Bug Report #1384 Dataset with categorical features causes memory error even on tiny dataset.
* Update duplicates_pandas.py (#1427) Fixing Bug Report #1384 Dataset with categorical features causes memory error even on tiny dataset. * chore(actions): update sonarsource/sonarqube-scan-action action to v2.0.1 * chore(actions): update actions/checkout action to v4 * docs: setup new docs with mkdocs (#1418) * chore(actions): update actions/checkout action to v4 * fix: remove the duplicated cardinality threshold under categorical and text settings * fix: fixate matplotlib upper version * docs: change from `zap` to `sparkles` (#1447) Co-authored-by: Fabiana <30911746+fabclmnt@users.noreply.github.com> * fix: template {{ file_name }} error in HTML wrapper (#1380) * Update javascript.html * Update style.html * feat: add density histogram (#1458) * feat: add histogram density option * test: add unit test * fix: discard weights if exceed max_bins * docs: update README.html (#1461) Update url of use cases, main integrations, and common issues. * fix: bug when creating a new report (#1440) * fix: gen wordcloud only for non-empty cols (#1459) * fix: table template ignoring text format (#1462) * fix: table template ignoring text format * fix: timeseries unit test * fix(linting): code formatting --------- Co-authored-by: Azory YData Bot <azory@ydata.ai> * fix: to_category misshandling pd.NA (#1464) * docs: add 📊 for Key features (#1451) See also #1445 (comment) * docs: fix hyperlink - related to package name change (#1457) Co-authored-by: Martin Mokry <martin-kokos@users.noreply.github.com> * chore(deps): increase numpy upper limit (#1467) * chore(deps): increase numpy upper limit * chore(deps): fixate numpy version for spark * chore(deps): fix numba package version, and filter warns (#1468) * chore: fix numba package version, and filter warns * fix: skip isort linter on init * chore(deps): update dependency typeguard to v4 (#1324) * chore(deps): update dependency typeguard to v4 --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Maciej Bukczynski <maciej@darkhorseanalytics.com> * docs: update docs with advent of code * docs: update links for fabric --------- Co-authored-by: boris-kogan <139680785+boris-kogan@users.noreply.github.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Vasco Ramos <vasco.ramos@ydata.ai> Co-authored-by: ricardodcpereira <ricardo.pereira@ydata.ai> Co-authored-by: Anselm Hahn <Anselm.Hahn@gmail.com> Co-authored-by: Joge <87136119+jogecodes@users.noreply.github.com> Co-authored-by: Alex Barros <alexbarros@users.noreply.github.com> Co-authored-by: Miriam Seoane Santos <68821478+miriamspsantos@users.noreply.github.com> Co-authored-by: Chris Mahoney <44449504+chrimaho@users.noreply.github.com> Co-authored-by: Azory YData Bot <azory@ydata.ai> Co-authored-by: martin-kokos <4807476+martin-kokos@users.noreply.github.com> Co-authored-by: Martin Mokry <martin-kokos@users.noreply.github.com> Co-authored-by: Maciej Bukczynski <maciej@darkhorseanalytics.com> Co-authored-by: Fabiana Clemente <fabianaclemente@Fabianas-MacBook-Air.local>
* Update duplicates_pandas.py (#1427) Fixing Bug Report #1384 Dataset with categorical features causes memory error even on tiny dataset. * chore(actions): update sonarsource/sonarqube-scan-action action to v2.0.1 * chore(actions): update actions/checkout action to v4 * docs: setup new docs with mkdocs (#1418) * chore(actions): update actions/checkout action to v4 * fix: remove the duplicated cardinality threshold under categorical and text settings * fix: fixate matplotlib upper version * docs: change from `zap` to `sparkles` (#1447) Co-authored-by: Fabiana <30911746+fabclmnt@users.noreply.github.com> * fix: template {{ file_name }} error in HTML wrapper (#1380) * Update javascript.html * Update style.html * feat: add density histogram (#1458) * feat: add histogram density option * test: add unit test * fix: discard weights if exceed max_bins * docs: update README.html (#1461) Update url of use cases, main integrations, and common issues. * fix: bug when creating a new report (#1440) * fix: gen wordcloud only for non-empty cols (#1459) * fix: table template ignoring text format (#1462) * fix: table template ignoring text format * fix: timeseries unit test * fix(linting): code formatting --------- Co-authored-by: Azory YData Bot <azory@ydata.ai> * fix: to_category misshandling pd.NA (#1464) * docs: add 📊 for Key features (#1451) See also #1445 (comment) * docs: fix hyperlink - related to package name change (#1457) Co-authored-by: Martin Mokry <martin-kokos@users.noreply.github.com> * chore(deps): increase numpy upper limit (#1467) * chore(deps): increase numpy upper limit * chore(deps): fixate numpy version for spark * chore(deps): fix numba package version, and filter warns (#1468) * chore: fix numba package version, and filter warns * fix: skip isort linter on init * chore(deps): update dependency typeguard to v4 (#1324) * chore(deps): update dependency typeguard to v4 --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Maciej Bukczynski <maciej@darkhorseanalytics.com> * docs: update docs with advent of code * docs: update links for fabric * chore(actions): update actions/setup-python action to v5 * docs: add information about PII classification & management. --------- Co-authored-by: boris-kogan <139680785+boris-kogan@users.noreply.github.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Vasco Ramos <vasco.ramos@ydata.ai> Co-authored-by: ricardodcpereira <ricardo.pereira@ydata.ai> Co-authored-by: Anselm Hahn <Anselm.Hahn@gmail.com> Co-authored-by: Joge <87136119+jogecodes@users.noreply.github.com> Co-authored-by: Alex Barros <alexbarros@users.noreply.github.com> Co-authored-by: Miriam Seoane Santos <68821478+miriamspsantos@users.noreply.github.com> Co-authored-by: Chris Mahoney <44449504+chrimaho@users.noreply.github.com> Co-authored-by: Azory YData Bot <azory@ydata.ai> Co-authored-by: martin-kokos <4807476+martin-kokos@users.noreply.github.com> Co-authored-by: Martin Mokry <martin-kokos@users.noreply.github.com> Co-authored-by: Maciej Bukczynski <maciej@darkhorseanalytics.com> Co-authored-by: Fabiana Clemente <fabianaclemente@Fabianas-MacBook-Air.local>
Fixing Bug Report #1384
Dataset with categorical features causes memory error even on tiny dataset.