feature request: match columns by name #47

bergey · 2022-04-18T12:47:36Z

I'd like choose to take an option under which it treats the first row as headers, and allows selecting columns by name. This saves counting, and is especially nice in scripts where it's more clear what is going on and somewhat more resilient to changes in the input format. Not sure if it should be exact-match, prefix, case-insensitive....

I'm happy to open a PR if you like this idea.

The text was updated successfully, but these errors were encountered:

theryangeary · 2022-04-18T13:16:35Z

Hi @bergey, this is an interesting idea. Should be pretty easy to implement by updating the code that runs at the beginning of the program to create the choice: https://github.com/theryangeary/choose/blob/master/src/choice/mod.rs#L13 based on the first line of the file.

I think to flesh out the design/use case and hone in on what would be good re: exact-match, prefix, case sensitivity, it would be great to look at some example use cases. Do you have any on hand already? It would also be great to look at existing tools that produce similar header row lines, although existing tools will probably be stable enough that exact-match would be good enough for those use cases.

excited to hear more thoughts on this!

bergey · 2022-04-19T12:02:24Z

Here is output from some commands with which I envision using this feature. (Header & first line of data only).

$ kubectl get pods
NAME                                                     READY   STATUS      RESTARTS   AGE
add-postgres-logical-backup-retention-1650155400-c5t6l   0/1     Completed   0          2d11h

$ docker images
REPOSITORY                                            TAG                                                     IMAGE ID       CREATED         SIZE
simspace/elide-server                                 <none>                                                  8d1ab3fcfc99   2 weeks ago     364MB

# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 32362660  27724 286516    0    0   253   121  125  161  1  1 97  0  0

# pidstat
Linux 5.10.0-13-cloud-amd64 (rusty) 	04/19/22 	_x86_64_	(8 CPU)

11:37:18      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
11:37:18        0         1    1.09    1.37    0.00    0.04    2.46     1  systemd

The first two seem easy enough.
vmstat and pidstat print extra rows before the header that I would want to match. I think I'd rather pipe through something else to remove those lines rather than complicate choose
pidstat makes selecting the first column by name hard, since the timestamp will change every invocation. Maybe we can support mixing positional and by-name selectors?
docker has one header here that has a space, and spaces in the CREATED column. I don't think we should try to handle that. In practice, name matching would work for columns up to IMAGE, just like with numeric selectors.

I'm thinking exact-match is the right place to start. The other options I mentioned would save a little typing in some circumstances, but I'm not sure if they'd be worth the added learning curve. I imagine we can add more flexible matching later if we wish for it once we have the exact-match version. So something like:

kubectl get pods | choose NAME AGE

Or should we hide this new behavior behind a flag?

kubectl get pods | choose -N NAME AGE

What do you think about mixing numeric & text selectors?

pidstat | tail -n+3 | choose 0 PID %CPU Command

I worry that it's ambiguous if some header has an all-digit header, but maybe that's rare enough in practice? Or maybe the 11:58:45 of pidstat is too close to the 0:3 range syntax of choose?

theryangeary · 2022-04-23T19:06:36Z

this is a great start and good examples, thanks for finding them. to directly address some of the open ended parts of your question:

I think I'd rather pipe through something else to remove those lines rather than complicate choose

strong agree!

docker has one header here that has a space, and spaces in the CREATED column.

this unfortunately could lead to a pretty bad user experience. in the situation where a two word header is at the beginning of the line, if we don't handle that somehow the entire line's headers will be useless, and even worse that will be pretty unintuitive to the user.

Or should we hide this new behavior behind a flag?

I think that would be wise, and would allow support for numerical header fields without breaking backwards compatibility.

What do you think about mixing numeric & text selectors?

I think that would be good. In order to mix and match while headers are behind a flag, I think each header string would need to be prefaced by the flag, i.e. that flag can be repeated many times. Otherwise you would have an issue where there is ambiguity between numbers and numerical headers, OR you would have to have all numerical choices come before all header choices (if you had one -N flag and all choices following it were headers, then you would have to put numerical choices before the -N flag).

darked89 · 2022-05-23T11:44:41Z

@bergey
The selection of set of columns is available in another tool xsv
Check the xsv select

Not that I have anything against implementing it in choose. ;)

Hope it helps,

Darek Kedra

theryangeary · 2022-07-11T11:39:57Z

Thanks @darked89 for pointing that out. If xsv already implements that then I think it's best to leave it out of choose. Let's do one thing and do it well, and not reimplement other tools.

theryangeary closed this as completed Jul 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request: match columns by name #47

feature request: match columns by name #47

bergey commented Apr 18, 2022

theryangeary commented Apr 18, 2022

bergey commented Apr 19, 2022

theryangeary commented Apr 23, 2022

darked89 commented May 23, 2022

theryangeary commented Jul 11, 2022

feature request: match columns by name #47

feature request: match columns by name #47

Comments

bergey commented Apr 18, 2022

theryangeary commented Apr 18, 2022

bergey commented Apr 19, 2022

theryangeary commented Apr 23, 2022

darked89 commented May 23, 2022

theryangeary commented Jul 11, 2022