Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement csplit #501

Closed
jerzozwierz opened this issue Jan 16, 2015 · 7 comments
Closed

Implement csplit #501

jerzozwierz opened this issue Jan 16, 2015 · 7 comments

Comments

@jerzozwierz
Copy link

I'll be working on this.

@kuzminva
Copy link

kuzminva commented Aug 1, 2017

I am going to start working on it. ETA:

  • initial release ~20 Aug 2017

@thepotatocoder
Copy link

Going to start working on this.

@scampi
Copy link
Contributor

scampi commented Jun 18, 2018

@jerzozwierz @kuzminva @thepotatocoder If that is no bother i'll give this a try.

To everybody I have a question regarding the --suffix option that let's you set a format for the file counter. That format uses the syntax of sprintf.
Given that I would have to use some unsafe to call it, just to get a formatted string, I was reluctant to do it.
Since it is not yet possible in stable rust to support this option with the format! macro, I was thinking to leave it for now.
Another solution would be to detach this util from specific language pattern syntaxes (of sprintf in the C lib, of fmt in rust's) and use a template library, although that would be an additional dependency.

What do you think ?

@thepotatocoder
Copy link

thepotatocoder commented Jun 18, 2018

@scampi I tried it a little while ago but haven't got it to it recently. Using unsafe features while you wait for the language to come around isn't bad if it's only temporary, imo.

@scampi
Copy link
Contributor

scampi commented Jun 22, 2018

There is a behaviour that is confusing me.

When using a linenum-based pattern (i.e., copy up to a line number), the lines read but left-over from a previous pattern are taken into account. However, it is not the case with a regex-based pattern.

GNU's csplit

  • 4.16.13-2-ARCH x86_64 GNU/Linux
  • csplit (GNU coreutils) 8.29
  • tested with numbers50.txt, a file of 50 numbers from 1 to 50, 1 per line

Regex-based pattern followed by a linenum-based pattern

Read up to the line matching /15/ but rewind 5 lines, and then up to the 12th line.

❯ csplit tests/fixtures/csplit/numbers50.txt /15/-5 12
18
6
117

❯ head xx*
==> xx00 <==
1
2
3
4
5
6
7
8
9

==> xx01 <==
10
11

==> xx02 <==
12
13
14
15
16
17
18
19
20
21

Two consecutive regex-based patterns

Read up to the line matching /15/ but rewind 5 lines, and then read up to the line matching /12/.

❯ csplit tests/fixtures/csplit/numbers50.txt /15/-5 /12/
18
csplit: ‘/12/’: match not found
123

Observation

I have the feeling that the 2nd run should work and output the same thing as the first.

Looking at the gnu code it seems to be because both patterns do not start to read the input from the same line:

  • for the regex-based pattern, find_line starts the iteration at ++current_line
  • with the linenum-based pattern, the iteration seems to start at the "logical first line" (see comment of remove_line ).

It feels like the regex-based pattern should start at the logical first line which seem to take into account the previous pattern's offset.

Does it make sense ?

@Arcterus
Copy link
Collaborator

Arcterus commented Jul 4, 2018

I would suggest following the GNU behavior as we are trying to be drop-in compatible. On a side note you should not be looking at the GNU code as doing so can cause problems regarding the GPL and our more permissive license.

@sylvestre
Copy link
Sponsor Contributor

Done here:
#1672

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants