Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: update duplicates_pandas.py #1427

Merged
merged 1 commit into from
Aug 21, 2023
Merged

Conversation

boris-kogan
Copy link
Contributor

Fixing Bug Report #1384
Dataset with categorical features causes memory error even on tiny dataset.

Fixing Bug Report ydataai#1384
Dataset with categorical features causes memory error even on tiny dataset.
@boris-kogan boris-kogan changed the title Update duplicates_pandas.py fix: update duplicates_pandas.py Aug 15, 2023
@aquemy
Copy link
Contributor

aquemy commented Aug 16, 2023

Thank you for reporting the issue and solving it!

@codecov-commenter
Copy link

Codecov Report

Patch and project coverage have no change.

Comparison is base (7fb4fc5) 89.71% compared to head (f0a9840) 89.71%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #1427   +/-   ##
========================================
  Coverage    89.71%   89.71%           
========================================
  Files          194      194           
  Lines         6319     6319           
========================================
  Hits          5669     5669           
  Misses         650      650           
Flag Coverage Δ
py3.8-ubuntu-22.04-pandas 89.71% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
.../ydata_profiling/model/pandas/duplicates_pandas.py 100.00% <ø> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alexbarros alexbarros merged commit 07d5819 into ydataai:develop Aug 21, 2023
22 of 23 checks passed
aquemy pushed a commit that referenced this pull request Oct 10, 2023
Fixing Bug Report #1384
Dataset with categorical features causes memory error even on tiny dataset.
aquemy pushed a commit that referenced this pull request Oct 10, 2023
Fixing Bug Report #1384
Dataset with categorical features causes memory error even on tiny dataset.
fabclmnt added a commit that referenced this pull request Dec 7, 2023
* Update duplicates_pandas.py (#1427)

Fixing Bug Report #1384
Dataset with categorical features causes memory error even on tiny dataset.

* chore(actions): update sonarsource/sonarqube-scan-action action to v2.0.1

* chore(actions): update actions/checkout action to v4

* docs: setup new docs with mkdocs (#1418)

* chore(actions): update actions/checkout action to v4

* fix: remove the duplicated cardinality threshold under categorical and text settings

* fix: fixate matplotlib upper version

* docs: change from `zap` to `sparkles` (#1447)

Co-authored-by: Fabiana <30911746+fabclmnt@users.noreply.github.com>

* fix: template {{ file_name }} error in HTML wrapper (#1380)

* Update javascript.html

* Update style.html

* feat: add density histogram (#1458)

* feat: add histogram density option

* test: add unit test

* fix: discard weights if exceed max_bins

* docs: update README.html (#1461)

Update url of use cases, main integrations, and common issues.

* fix: bug when creating a new report (#1440)

* fix: gen wordcloud only for non-empty cols (#1459)

* fix: table template ignoring text format (#1462)

* fix: table template ignoring text format

* fix: timeseries unit test

* fix(linting): code formatting

---------

Co-authored-by: Azory YData Bot <azory@ydata.ai>

* fix: to_category misshandling pd.NA (#1464)

* docs: add 📊 for Key features (#1451)

See also #1445 (comment)

* docs: fix hyperlink - related to package name change (#1457)

Co-authored-by: Martin Mokry <martin-kokos@users.noreply.github.com>

* chore(deps): increase numpy upper limit (#1467)

* chore(deps): increase numpy upper limit

* chore(deps): fixate numpy version for spark

* chore(deps): fix numba package version, and filter warns (#1468)

* chore: fix numba package version, and filter warns

* fix: skip isort linter on init

* chore(deps): update dependency typeguard to v4 (#1324)

* chore(deps): update dependency typeguard to v4

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Maciej Bukczynski <maciej@darkhorseanalytics.com>

* docs: update docs with advent of code

* docs: update links for fabric

---------

Co-authored-by: boris-kogan <139680785+boris-kogan@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Vasco Ramos <vasco.ramos@ydata.ai>
Co-authored-by: ricardodcpereira <ricardo.pereira@ydata.ai>
Co-authored-by: Anselm Hahn <Anselm.Hahn@gmail.com>
Co-authored-by: Joge <87136119+jogecodes@users.noreply.github.com>
Co-authored-by: Alex Barros <alexbarros@users.noreply.github.com>
Co-authored-by: Miriam Seoane Santos <68821478+miriamspsantos@users.noreply.github.com>
Co-authored-by: Chris Mahoney <44449504+chrimaho@users.noreply.github.com>
Co-authored-by: Azory YData Bot <azory@ydata.ai>
Co-authored-by: martin-kokos <4807476+martin-kokos@users.noreply.github.com>
Co-authored-by: Martin Mokry <martin-kokos@users.noreply.github.com>
Co-authored-by: Maciej Bukczynski <maciej@darkhorseanalytics.com>
Co-authored-by: Fabiana Clemente <fabianaclemente@Fabianas-MacBook-Air.local>
fabclmnt added a commit that referenced this pull request Dec 7, 2023
* Update duplicates_pandas.py (#1427)

Fixing Bug Report #1384
Dataset with categorical features causes memory error even on tiny dataset.

* chore(actions): update sonarsource/sonarqube-scan-action action to v2.0.1

* chore(actions): update actions/checkout action to v4

* docs: setup new docs with mkdocs (#1418)

* chore(actions): update actions/checkout action to v4

* fix: remove the duplicated cardinality threshold under categorical and text settings

* fix: fixate matplotlib upper version

* docs: change from `zap` to `sparkles` (#1447)

Co-authored-by: Fabiana <30911746+fabclmnt@users.noreply.github.com>

* fix: template {{ file_name }} error in HTML wrapper (#1380)

* Update javascript.html

* Update style.html

* feat: add density histogram (#1458)

* feat: add histogram density option

* test: add unit test

* fix: discard weights if exceed max_bins

* docs: update README.html (#1461)

Update url of use cases, main integrations, and common issues.

* fix: bug when creating a new report (#1440)

* fix: gen wordcloud only for non-empty cols (#1459)

* fix: table template ignoring text format (#1462)

* fix: table template ignoring text format

* fix: timeseries unit test

* fix(linting): code formatting

---------

Co-authored-by: Azory YData Bot <azory@ydata.ai>

* fix: to_category misshandling pd.NA (#1464)

* docs: add 📊 for Key features (#1451)

See also #1445 (comment)

* docs: fix hyperlink - related to package name change (#1457)

Co-authored-by: Martin Mokry <martin-kokos@users.noreply.github.com>

* chore(deps): increase numpy upper limit (#1467)

* chore(deps): increase numpy upper limit

* chore(deps): fixate numpy version for spark

* chore(deps): fix numba package version, and filter warns (#1468)

* chore: fix numba package version, and filter warns

* fix: skip isort linter on init

* chore(deps): update dependency typeguard to v4 (#1324)

* chore(deps): update dependency typeguard to v4

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Maciej Bukczynski <maciej@darkhorseanalytics.com>

* docs: update docs with advent of code

* docs: update links for fabric

* chore(actions): update actions/setup-python action to v5

* docs: add information about PII classification & management.

---------

Co-authored-by: boris-kogan <139680785+boris-kogan@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Vasco Ramos <vasco.ramos@ydata.ai>
Co-authored-by: ricardodcpereira <ricardo.pereira@ydata.ai>
Co-authored-by: Anselm Hahn <Anselm.Hahn@gmail.com>
Co-authored-by: Joge <87136119+jogecodes@users.noreply.github.com>
Co-authored-by: Alex Barros <alexbarros@users.noreply.github.com>
Co-authored-by: Miriam Seoane Santos <68821478+miriamspsantos@users.noreply.github.com>
Co-authored-by: Chris Mahoney <44449504+chrimaho@users.noreply.github.com>
Co-authored-by: Azory YData Bot <azory@ydata.ai>
Co-authored-by: martin-kokos <4807476+martin-kokos@users.noreply.github.com>
Co-authored-by: Martin Mokry <martin-kokos@users.noreply.github.com>
Co-authored-by: Maciej Bukczynski <maciej@darkhorseanalytics.com>
Co-authored-by: Fabiana Clemente <fabianaclemente@Fabianas-MacBook-Air.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants