fix: update duplicates_pandas.py #1427

boris-kogan · 2023-08-15T11:36:55Z

Fixing Bug Report #1384
Dataset with categorical features causes memory error even on tiny dataset.

Fixing Bug Report ydataai#1384 Dataset with categorical features causes memory error even on tiny dataset.

aquemy · 2023-08-16T06:48:22Z

Thank you for reporting the issue and solving it!

codecov-commenter · 2023-08-16T07:14:06Z

Codecov Report

Patch and project coverage have no change.

Comparison is base (7fb4fc5) 89.71% compared to head (f0a9840) 89.71%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #1427   +/-   ##
========================================
  Coverage    89.71%   89.71%           
========================================
  Files          194      194           
  Lines         6319     6319           
========================================
  Hits          5669     5669           
  Misses         650      650

Flag	Coverage Δ
py3.8-ubuntu-22.04-pandas	`89.71% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
.../ydata_profiling/model/pandas/duplicates_pandas.py	`100.00% <ø> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Fixing Bug Report #1384 Dataset with categorical features causes memory error even on tiny dataset.

* Update duplicates_pandas.py (#1427) Fixing Bug Report #1384 Dataset with categorical features causes memory error even on tiny dataset. * chore(actions): update sonarsource/sonarqube-scan-action action to v2.0.1 * chore(actions): update actions/checkout action to v4 * docs: setup new docs with mkdocs (#1418) * chore(actions): update actions/checkout action to v4 * fix: remove the duplicated cardinality threshold under categorical and text settings * fix: fixate matplotlib upper version * docs: change from `zap` to `sparkles` (#1447) Co-authored-by: Fabiana <30911746+fabclmnt@users.noreply.github.com> * fix: template {{ file_name }} error in HTML wrapper (#1380) * Update javascript.html * Update style.html * feat: add density histogram (#1458) * feat: add histogram density option * test: add unit test * fix: discard weights if exceed max_bins * docs: update README.html (#1461) Update url of use cases, main integrations, and common issues. * fix: bug when creating a new report (#1440) * fix: gen wordcloud only for non-empty cols (#1459) * fix: table template ignoring text format (#1462) * fix: table template ignoring text format * fix: timeseries unit test * fix(linting): code formatting --------- Co-authored-by: Azory YData Bot <azory@ydata.ai> * fix: to_category misshandling pd.NA (#1464) * docs: add 📊 for Key features (#1451) See also #1445 (comment) * docs: fix hyperlink - related to package name change (#1457) Co-authored-by: Martin Mokry <martin-kokos@users.noreply.github.com> * chore(deps): increase numpy upper limit (#1467) * chore(deps): increase numpy upper limit * chore(deps): fixate numpy version for spark * chore(deps): fix numba package version, and filter warns (#1468) * chore: fix numba package version, and filter warns * fix: skip isort linter on init * chore(deps): update dependency typeguard to v4 (#1324) * chore(deps): update dependency typeguard to v4 --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Maciej Bukczynski <maciej@darkhorseanalytics.com> * docs: update docs with advent of code * docs: update links for fabric --------- Co-authored-by: boris-kogan <139680785+boris-kogan@users.noreply.github.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Vasco Ramos <vasco.ramos@ydata.ai> Co-authored-by: ricardodcpereira <ricardo.pereira@ydata.ai> Co-authored-by: Anselm Hahn <Anselm.Hahn@gmail.com> Co-authored-by: Joge <87136119+jogecodes@users.noreply.github.com> Co-authored-by: Alex Barros <alexbarros@users.noreply.github.com> Co-authored-by: Miriam Seoane Santos <68821478+miriamspsantos@users.noreply.github.com> Co-authored-by: Chris Mahoney <44449504+chrimaho@users.noreply.github.com> Co-authored-by: Azory YData Bot <azory@ydata.ai> Co-authored-by: martin-kokos <4807476+martin-kokos@users.noreply.github.com> Co-authored-by: Martin Mokry <martin-kokos@users.noreply.github.com> Co-authored-by: Maciej Bukczynski <maciej@darkhorseanalytics.com> Co-authored-by: Fabiana Clemente <fabianaclemente@Fabianas-MacBook-Air.local>

* Update duplicates_pandas.py (#1427) Fixing Bug Report #1384 Dataset with categorical features causes memory error even on tiny dataset. * chore(actions): update sonarsource/sonarqube-scan-action action to v2.0.1 * chore(actions): update actions/checkout action to v4 * docs: setup new docs with mkdocs (#1418) * chore(actions): update actions/checkout action to v4 * fix: remove the duplicated cardinality threshold under categorical and text settings * fix: fixate matplotlib upper version * docs: change from `zap` to `sparkles` (#1447) Co-authored-by: Fabiana <30911746+fabclmnt@users.noreply.github.com> * fix: template {{ file_name }} error in HTML wrapper (#1380) * Update javascript.html * Update style.html * feat: add density histogram (#1458) * feat: add histogram density option * test: add unit test * fix: discard weights if exceed max_bins * docs: update README.html (#1461) Update url of use cases, main integrations, and common issues. * fix: bug when creating a new report (#1440) * fix: gen wordcloud only for non-empty cols (#1459) * fix: table template ignoring text format (#1462) * fix: table template ignoring text format * fix: timeseries unit test * fix(linting): code formatting --------- Co-authored-by: Azory YData Bot <azory@ydata.ai> * fix: to_category misshandling pd.NA (#1464) * docs: add 📊 for Key features (#1451) See also #1445 (comment) * docs: fix hyperlink - related to package name change (#1457) Co-authored-by: Martin Mokry <martin-kokos@users.noreply.github.com> * chore(deps): increase numpy upper limit (#1467) * chore(deps): increase numpy upper limit * chore(deps): fixate numpy version for spark * chore(deps): fix numba package version, and filter warns (#1468) * chore: fix numba package version, and filter warns * fix: skip isort linter on init * chore(deps): update dependency typeguard to v4 (#1324) * chore(deps): update dependency typeguard to v4 --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Maciej Bukczynski <maciej@darkhorseanalytics.com> * docs: update docs with advent of code * docs: update links for fabric * chore(actions): update actions/setup-python action to v5 * docs: add information about PII classification & management. --------- Co-authored-by: boris-kogan <139680785+boris-kogan@users.noreply.github.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Vasco Ramos <vasco.ramos@ydata.ai> Co-authored-by: ricardodcpereira <ricardo.pereira@ydata.ai> Co-authored-by: Anselm Hahn <Anselm.Hahn@gmail.com> Co-authored-by: Joge <87136119+jogecodes@users.noreply.github.com> Co-authored-by: Alex Barros <alexbarros@users.noreply.github.com> Co-authored-by: Miriam Seoane Santos <68821478+miriamspsantos@users.noreply.github.com> Co-authored-by: Chris Mahoney <44449504+chrimaho@users.noreply.github.com> Co-authored-by: Azory YData Bot <azory@ydata.ai> Co-authored-by: martin-kokos <4807476+martin-kokos@users.noreply.github.com> Co-authored-by: Martin Mokry <martin-kokos@users.noreply.github.com> Co-authored-by: Maciej Bukczynski <maciej@darkhorseanalytics.com> Co-authored-by: Fabiana Clemente <fabianaclemente@Fabianas-MacBook-Air.local>

Update duplicates_pandas.py

f0a9840

Fixing Bug Report ydataai#1384 Dataset with categorical features causes memory error even on tiny dataset.

boris-kogan changed the title ~~Update duplicates_pandas.py~~ fix: update duplicates_pandas.py Aug 15, 2023

aquemy approved these changes Aug 16, 2023

View reviewed changes

alexbarros merged commit 07d5819 into ydataai:develop Aug 21, 2023
22 of 23 checks passed

aquemy pushed a commit that referenced this pull request Oct 10, 2023

Update duplicates_pandas.py (#1427)

d015ead

Fixing Bug Report #1384 Dataset with categorical features causes memory error even on tiny dataset.

aquemy pushed a commit that referenced this pull request Oct 10, 2023

fix: update duplicates_pandas.py (#1427)

56a6641

Fixing Bug Report #1384 Dataset with categorical features causes memory error even on tiny dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: update duplicates_pandas.py #1427

fix: update duplicates_pandas.py #1427

boris-kogan commented Aug 15, 2023

aquemy commented Aug 16, 2023

codecov-commenter commented Aug 16, 2023

fix: update duplicates_pandas.py #1427

fix: update duplicates_pandas.py #1427

Conversation

boris-kogan commented Aug 15, 2023

aquemy commented Aug 16, 2023

codecov-commenter commented Aug 16, 2023

Codecov Report