npyio.py: genfromtxt() handles comments incorrectly with names=True (Trac #2184) #5974

Open
numpy-gitbot opened this Issue Oct 16, 2012 · 5 comments

Projects

None yet

1 participant

@numpy-gitbot
Collaborator

Original ticket http://projects.scipy.org/numpy/ticket/2184 on 2012-07-11 by trac user khaeru, assigned to unknown.

The documentation for genfromtxt() reads:

When the variables are named (either by a flexible dtype or with names, there must not be any header in the file (else a ValueError exception is raised).

and also:

If names is True, the field names are read from the first valid line after the first skip_header lines.

The cause of this seems to be in [https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L1347 numpy/lib/npyio.py at lines 1347-9]:

    if names is True:
        if comments in first_line:
            first_line = asbytes('').join(first_line.split(comments)[1:])

The last line should read first_line = first_line.split(comments)[0].

With the current code, the input line:

# Example comment line

will be transformed to:

Example comment line

resulting in columns named 'Example', 'comment' and 'line' (this is what the warning in the documentation is about).

But also the input line:

ColumnA ColumnB ColumnC # the column names precede this comment

will be transformed to:

the column names precede this comment

resulting in columns named 'the', 'column', 'names' …etc. In this instance actual column names present in the file are inappropriately discarded.

By taking the [0] portion of the split instead of [1:]:

  • Lines beginning with comments result in an empty string being passed to split_lines() on L1350, producing no usable output and causing the while not first_values loop to try the next line.
  • Partial-line comments following actual heading names are discarded, instead of the names themselves.
  • As a result, files can have commented headers of any length and column names, simultaneously.
@numpy-gitbot
Collaborator

trac user khaeru wrote on 2012-07-11

Sorry, bad title. Also, what's the difference between the Trac issues list and https://github.com/numpy/numpy/issues ?

@numpy-gitbot
Collaborator

Title changed from Remove to npyio.py: genfromtxt() handles comments incorrectly with names=True by trac user khaeru on 2012-07-11

@numpy-gitbot
Collaborator

atmention:rgommers wrote on 2012-07-12

We opened Github issues only a few weeks ago, we're in the process of transitioning all Trac tickets to it. When that's done we'll close Trac, or make it read-only. For now you can use either one.

@numpy-gitbot
Collaborator

atmention:rgommers wrote on 2012-07-12

Suggested fix looks correct.

@numpy-gitbot
Collaborator

trac user khaeru wrote on 2012-07-12

Oh, I see — well, I also posted a branch with this fix and a pull request: numpy/numpy#351

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment