Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: match columns by name #47

Closed
bergey opened this issue Apr 18, 2022 · 5 comments
Closed

feature request: match columns by name #47

bergey opened this issue Apr 18, 2022 · 5 comments

Comments

@bergey
Copy link

bergey commented Apr 18, 2022

I'd like choose to take an option under which it treats the first row as headers, and allows selecting columns by name. This saves counting, and is especially nice in scripts where it's more clear what is going on and somewhat more resilient to changes in the input format. Not sure if it should be exact-match, prefix, case-insensitive....

I'm happy to open a PR if you like this idea.

@theryangeary
Copy link
Owner

Hi @bergey, this is an interesting idea. Should be pretty easy to implement by updating the code that runs at the beginning of the program to create the choice: https://github.com/theryangeary/choose/blob/master/src/choice/mod.rs#L13 based on the first line of the file.

I think to flesh out the design/use case and hone in on what would be good re: exact-match, prefix, case sensitivity, it would be great to look at some example use cases. Do you have any on hand already? It would also be great to look at existing tools that produce similar header row lines, although existing tools will probably be stable enough that exact-match would be good enough for those use cases.

excited to hear more thoughts on this!

@bergey
Copy link
Author

bergey commented Apr 19, 2022

Here is output from some commands with which I envision using this feature. (Header & first line of data only).

$ kubectl get pods
NAME                                                     READY   STATUS      RESTARTS   AGE
add-postgres-logical-backup-retention-1650155400-c5t6l   0/1     Completed   0          2d11h

$ docker images
REPOSITORY                                            TAG                                                     IMAGE ID       CREATED         SIZE
simspace/elide-server                                 <none>                                                  8d1ab3fcfc99   2 weeks ago     364MB

# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 32362660  27724 286516    0    0   253   121  125  161  1  1 97  0  0

# pidstat
Linux 5.10.0-13-cloud-amd64 (rusty) 	04/19/22 	_x86_64_	(8 CPU)

11:37:18      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
11:37:18        0         1    1.09    1.37    0.00    0.04    2.46     1  systemd
  • The first two seem easy enough.
  • vmstat and pidstat print extra rows before the header that I would want to match. I think I'd rather pipe through something else to remove those lines rather than complicate choose
  • pidstat makes selecting the first column by name hard, since the timestamp will change every invocation. Maybe we can support mixing positional and by-name selectors?
  • docker has one header here that has a space, and spaces in the CREATED column. I don't think we should try to handle that. In practice, name matching would work for columns up to IMAGE, just like with numeric selectors.

I'm thinking exact-match is the right place to start. The other options I mentioned would save a little typing in some circumstances, but I'm not sure if they'd be worth the added learning curve. I imagine we can add more flexible matching later if we wish for it once we have the exact-match version. So something like:

kubectl get pods | choose NAME AGE

Or should we hide this new behavior behind a flag?

kubectl get pods | choose -N NAME AGE

What do you think about mixing numeric & text selectors?

pidstat | tail -n+3 | choose 0 PID %CPU Command

I worry that it's ambiguous if some header has an all-digit header, but maybe that's rare enough in practice? Or maybe the 11:58:45 of pidstat is too close to the 0:3 range syntax of choose?

@theryangeary
Copy link
Owner

this is a great start and good examples, thanks for finding them. to directly address some of the open ended parts of your question:

I think I'd rather pipe through something else to remove those lines rather than complicate choose

strong agree!

docker has one header here that has a space, and spaces in the CREATED column.

this unfortunately could lead to a pretty bad user experience. in the situation where a two word header is at the beginning of the line, if we don't handle that somehow the entire line's headers will be useless, and even worse that will be pretty unintuitive to the user.

Or should we hide this new behavior behind a flag?

I think that would be wise, and would allow support for numerical header fields without breaking backwards compatibility.

What do you think about mixing numeric & text selectors?

I think that would be good. In order to mix and match while headers are behind a flag, I think each header string would need to be prefaced by the flag, i.e. that flag can be repeated many times. Otherwise you would have an issue where there is ambiguity between numbers and numerical headers, OR you would have to have all numerical choices come before all header choices (if you had one -N flag and all choices following it were headers, then you would have to put numerical choices before the -N flag).

@darked89
Copy link

@bergey
The selection of set of columns is available in another tool xsv
Check the xsv select

Not that I have anything against implementing it in choose. ;)

Hope it helps,

Darek Kedra

@theryangeary
Copy link
Owner

Thanks @darked89 for pointing that out. If xsv already implements that then I think it's best to leave it out of choose. Let's do one thing and do it well, and not reimplement other tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants