Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csvgrep -r doesn't find matches across newlines #368

Closed
gabebw opened this issue Dec 13, 2014 · 11 comments
Closed

csvgrep -r doesn't find matches across newlines #368

gabebw opened this issue Dec 13, 2014 · 11 comments
Labels
Milestone

Comments

@gabebw
Copy link

gabebw commented Dec 13, 2014

I'm using csvkit 0.9.0.

Given the following CSV in test.csv:

user_id,body
gabebw,"IRC
AIM"

csvgrep can find the row when matching on IRC, which is on the first line:

$ csvgrep -c body -r IRC test.csv
user_id,body
gabebw,"IRC
AIM"

But when I try to match on AIM, which is on a newline but still part of the row, csvgrep does not find the row:

$ csvgrep -c body -r AIM test.csv
user_id,body

Am I using -r incorrectly, or maybe passing it the wrong input?

@gabebw
Copy link
Author

gabebw commented Dec 13, 2014

I also tried (?m) in the regex, but both of them give me an empty result set:

  • csvgrep -c body -r '(?m).*AIM' test.csv
  • csvgrep -c body -r '(?m)AIM' test.csv

@gabebw
Copy link
Author

gabebw commented Dec 14, 2014

CSVkit from github (revision fa6bade) also fails to find data on newlines. I ran these commands to install it on OS X:

  • pip uninstall csvkit
  • pip install -r requirements-py2.txt
  • python setup.py develop

@onyxfish
Copy link
Collaborator

Hi Gabe! This looks like a legit bug. I probably need to add a flag to the regex compile. I'll look into it! Thanks for opening an issue!

@onyxfish onyxfish added this to the 1.0 milestone Jan 24, 2015
@gabebw
Copy link
Author

gabebw commented Jan 24, 2015

Absolutely. If you need someone to test new code or anything, just let me know. I know how hard it is to keep up with even a semi-popular open source project.

@edwardros
Copy link

I'm not sure that this is the issue:

csvgrep -c body -r RC test.csv

also returns no rows. This is because (at least as of 0.9.1/0.9.2) it uses re.match instead of re.search.

Simply changing match to search on lines 105 and 116 of grep.py resolve this issue.

jpmckinney pushed a commit that referenced this issue Jan 23, 2016
@jpmckinney
Copy link
Member

Thanks @edwardros ! match has been there since the tool was introduced in b8c6bb9. Fixed in #516

jpmckinney pushed a commit that referenced this issue Jan 23, 2016
Add failing test demonstrating #368
jpmckinney pushed a commit that referenced this issue Jan 23, 2016
@antonkryvko
Copy link

antonkryvko commented Jun 21, 2019

Hi! I think problem is still here:
csvgrep -c 2 -r '\d' test.csv
returns the header instead of matched lines. Probably, it's something about flags. I used csv in UTF-8, however, cyrillic characters were present.

@jpmckinney
Copy link
Member

What is your test.csv?

@antonkryvko
Copy link

It's a datable with combined text and digits. Something like this:

full_name,info,aliments,okrug,cancel_registration,registration_date,nomination,oblast,
Кістіон Володимир Євсевійович,"народився 31 05 1965 року в селі Довжок Ямпільського району Вінницької області, громадянин України, протягом останніх п’яти років проживає на території України, освіта вища, Віце-прем’єр-міністр України, безпартійний, проживає в місті Києві, судимість відсутня, самовисування.",,11, ,12.06.2019,Самовисування,Вінницька область

I want to return date from the second column using (\d{1,2})\s(\d{2})\s(\d{4}). It works on https://pythex.org/, but doesn't work in csvgrep.

@jpmckinney
Copy link
Member

When I run:

echo 'full_name,info,aliments,okrug,cancel_registration,registration_date,nomination,oblast,
Кістіон Володимир Євсевійович,"народився 31 05 1965 року в селі Довжок Ямпільського району Вінницької області, громадянин України, протягом останніх п’яти років проживає на території України, освіта вища, Віце-прем’єр-міністр України, безпартійний, проживає в місті Києві, судимість відсутня, самовисування.",,11, ,12.06.2019,Самовисування,Вінницька област
' | csvgrep -c 2 -r '\d'

The row matches.

@antonkryvko
Copy link

Deeply sorry, I'd used 0.9.1-2 version from Ubuntu repository, which has a bug, resolved in the present version. Now I've installed 1.0.4 from PyPi and everything is all right.

lcorbasson pushed a commit to lcorbasson/csvkit that referenced this issue Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants