Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Integrated pandas can't Read CSV while latest pandas can #730

Open
charliedream1 opened this issue Oct 3, 2023 · 1 comment
Open
Labels
bug Something isn't working
Milestone

Comments

@charliedream1
Copy link

Describe the bug

  • Problem 1: a 25G csv file, latest pandas can load properly, however, "import xorbits.pandas as pd" can't, xorbits gives out EOF error
  • Problem 2: a data frame data loaded from latest pandas can't be send to dup function (from xorbits.experimental import dedup)
  • Problem 3: dedup function can handle a str with very long str, e.g. length between 4000-100,000, it gives out error "too many open files"

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version: 3.10
  2. The version of Xorbits you use: 0.6.3
  3. Versions of crucial packages, such as numpy, scipy and pandas: numpy 1.26.0, scipy 1.11.3, pandas 2.1.1
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

@XprobeBot XprobeBot added the bug Something isn't working label Oct 3, 2023
@XprobeBot XprobeBot added this to the v0.7.0 milestone Oct 3, 2023
@codingl2k1
Copy link
Contributor

Problem 1: Is your csv file located in local disk or remote (by a url)?
Probelm 2: Are you using pandas to load the csv and constructing a xorbit Dataframe by the pandas Dataframe? If so, it could be out of memory crash, because the full data will be serilialized to worker.
Problem 3: The too many open files can be fixed by configure the ulimit.

@XprobeBot XprobeBot modified the milestones: v0.7.0, v0.7.1 Oct 23, 2023
@XprobeBot XprobeBot modified the milestones: v0.7.1, v0.7.2 Nov 21, 2023
@XprobeBot XprobeBot modified the milestones: v0.7.2, v0.7.3 Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants