Skip to content

Conversation

@karajan1001
Copy link
Contributor

@karajan1001 karajan1001 commented Oct 31, 2021

fix #6892

use table for the machine list outputs

Now the outputs looks like

$  dvc machine list
┏━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ name  ┃ cloud ┃ spot_price ┃ instance_hdd_size ┃ instance_type ┃ ssh_private ┃ instance_gpu ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ new   │ aws   │            │ 20                │               │ ***         │              │
│ myaws │ aws   │ 100.00000  │ 10                │ l             │             │ k80          │
└───────┴───────┴────────────┴───────────────────┴───────────────┴─────────────┴──────────────┘

$ dvc machine status foo
┏━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ name ┃ instance ┃ status  ┃ cloud ┃ instance_hdd_size ┃ instance_ip     ┃
┡━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ foo  │ num_1    │ running │ aws   │ 35                │ 123.123.123.123 │
└──────┴──────────┴─────────┴───────┴───────────────────┴─────────────────┘

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

@karajan1001 karajan1001 requested a review from a team as a code owner October 31, 2021 14:19
@karajan1001 karajan1001 force-pushed the fix6892 branch 3 times, most recently from 338ab28 to 21e8e6e Compare November 1, 2021 07:10
@karajan1001 karajan1001 changed the title Machine: use table for machine list and status outputs [WIP] Machine: use table for machine list and status outputs Nov 1, 2021
@karajan1001
Copy link
Contributor Author

Wait for #6855 and #6869

@dberenbaum
Copy link
Contributor

Nice! Two questions:

  1. Can you drop the cell borders? Unlike dvc exp show, I think the output is simple enough, and it will be easier to parse the output with other commands.
  2. For everyone, what columns do you think we should include? For machine list, there are potentially a lot of fields (see https://github.com/iterative/terraform-provider-iterative#argument-reference), not all of which will always be relevant. For machine status, machine status: include two new field in the output #6869 will add even more fields, but these are all duplicative of machine list (and cloud and instance_hdd_size are already duplicating machine list).

@karajan1001
Copy link
Contributor Author

  • Can you drop the cell borders? Unlike dvc exp show, I think the output is simple enough, and it will be easier to parse the output with other commands.

I guess machine reads requires --csv or --json format, dropping borders will still face a different number of blanks.

@dberenbaum
Copy link
Contributor

I guess machine reads requires --csv or --json format, dropping borders will still face a different number of blanks.

Sorry, I gave dvc remote list as an example, but I guess that's not aligned. I meant more like dvc metrics show/diff. One of our UI principles is to make output both machine and human readable by default where possible. Maybe @skshetry has thoughts on how this should look.

This also will different depending on the number of columns. If we follow the lead of dvc remote list, dvc machine list should only have the name and cloud. However, maybe this is different, since people could have multiple machine configs using the same cloud provider. I'd probably include at least image, but maybe not everything (ssh_private is probably not worth showing since it won't be visible anyway). Again, interested to hear from you or others in @iterative/dvc about what to include exclude in both machine list and machine status.

@karajan1001
Copy link
Contributor Author

karajan1001 commented Nov 4, 2021

Another example is the docker ps and docker images which show in a no border table format.

$ docker images
REPOSITORY                                TAG       IMAGE ID       CREATED         SIZE
ghcr.io/linuxserver/openssh-server        latest    dfcbb139af6d   4 weeks ago     46MB
motoserver/moto                           latest    13087ad4e36c   4 weeks ago     393MB
mcr.microsoft.com/azure-storage/azurite   3.14.2    d54d388ab8a4   8 weeks ago     199MB
mcr.microsoft.com/azure-storage/azurite   3.13.1    bfa88c85ec6b   4 months ago    179MB
rkuprieiev/docker-hdfs                    latest    77e89e3a81a4   13 months ago   1.73GB
rkuprieiev/oss-emulator                   latest    93d6d0f4a3de   16 months ago   843MB

@dberenbaum
Copy link
Contributor

Right, dvc exp show is a special case since we have ascii art all over that makes it already not machine readable. Otherwise, the docker-like output is preferable IMO.

@daavoo
Copy link
Contributor

daavoo commented Nov 4, 2021

I guess machine reads requires --csv or --json format, dropping borders will still face a different number of blanks.

Sorry, I gave dvc remote list as an example, but I guess that's not aligned. I meant more like dvc metrics show/diff. One of our UI principles is to make output both machine and human readable by default where possible. Maybe @skshetry has thoughts on how this should look.

I agree with @karajan1001 about requiring --json . I don't know if any of our tables is actually machine-readable; In the docs we describe --json as: prints the command's output in JSON format (machine-readable) instead of a human-readable table

This also will different depending on the number of columns. If we follow the lead of dvc remote list, dvc machine list should only have the name and cloud. However, maybe this is different, since people could have multiple machine configs using the same cloud provider. I'd probably include at least image, but maybe not everything (ssh_private is probably not worth showing since it won't be visible anyway). Again, interested to hear from you or others in @iterative/dvc about what to include exclude in both machine list and machine status.

I find all current fields, shown in the description, pretty much useful (except for private fields). I can even have two machines with all same fields with the except instance_gpu and I would like to be able to distinguish. There is always the option of adding --only-changed or other stuff to filter.

@dberenbaum
Copy link
Contributor

Machine readable was the wrong term. It's about being able to grep and otherwise parse with unix-like tools. See #6046 and https://clig.dev/#output. A lot of other commands have outputs that were designed before we had these guidelines. Is there a reason not to follow this guideline here?

@karajan1001
Copy link
Contributor Author

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented Nov 15, 2021

rebase

❌ Base branch update has failed

Git reported the following error:

Rebasing (1/4)
error: could not apply 8ad4b8aa... TabularData: add a new how `arg` to dropna(#6892)
Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 8ad4b8aa... TabularData: add a new how `arg` to dropna(#6892)
Auto-merging tests/unit/test_tabular_data.py
Auto-merging dvc/compare.py
CONFLICT (content): Merge conflict in dvc/compare.py

err-code: 3B142

@karajan1001 karajan1001 force-pushed the fix6892 branch 2 times, most recently from fa49ffe to 4b9a894 Compare November 15, 2021 09:23
@pmrowla
Copy link
Contributor

pmrowla commented Nov 17, 2021

The new table format looks fine to me.

I think it probably looks a bit cleaner without the table borders (and makes it more grep-able), but I don't have a strong opinion here either way.

@karajan1001 karajan1001 changed the title [WIP] Machine: use table for machine list and status outputs Machine: use table for machine list and status outputs Nov 17, 2021
Borrowed this idea from pandas api. Current dropna methods will drop a
column or row if any data is missing. But it is also quite useful to
drop those columns or rows only if all of its elements are missing. This
is mostly used for pruning a table.
fix: treeverse#6892

Current output of machine list is quite urgly, it is originaly used to
show configurations. In this PR, we use a new table format for it.
Because our machine name now is the key value of the machine config
dict. And the machine config value will been overwritten by it when the
config is loaded. We can remove the field `name` from the config.
Change the output format of `dvc machine status` command to tables for a
better readability.
@karajan1001
Copy link
Contributor Author

@dberenbaum
Current format

$ dvc machine list
name    cloud    spot_price    instance_hdd_size    instance_type    ssh_private    instance_gpu
new     aws      -             20                   -                ***            -
myaws   aws      100.00000     10                   l                -              k80

$ dvc machine status
name    instance    status    cloud    instance_ip      instance_type    instance_hdd_size    instance_gpu
bar     -           offline   -        -                -                -                    -
foo     num_1       running   aws      123.123.123.123  m                35                   None

@dberenbaum
Copy link
Contributor

Looks great, thanks @karajan1001!

@karajan1001 karajan1001 merged commit 068b837 into treeverse:master Nov 18, 2021
@karajan1001 karajan1001 deleted the fix6892 branch November 18, 2021 03:22
from dvc.compare import TabularData
from dvc.config import ConfigError
from dvc.exceptions import DvcException
from dvc.types import Dict, List
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should come from typing module.



CellT = Union[str, "RichText"] # RichText is mostly compatible with str
CellT = Union[str, "RichText", None] # RichText is mostly compatible with str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't extend the types here. Table rendering does not support None for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then what should I set here ?
https://github.com/iterative/dvc/blob/eac030598cdbfaba4c029f8e1131614916a6c707/dvc/command/machine.py#L253
It looks like an empty string didn't work in this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a fill value "-" or just empty string "".

Copy link
Contributor Author

@karajan1001 karajan1001 Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fill value is a bit too hard code I think, and empty string "" it was just because it gives empty result instead of a fill value made me using None in this case. And in

https://github.com/iterative/dvc/blob/230eb7087df7f063ded7422af7ae45bd04eb794a/dvc/compare.py#L31-L32

Only a None value would be filled with FILL_VALUE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry How about adding a new type InputCellT which can accept None value for those input functions like append, extend or row_from_dict?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karajan1001, we'll need to fix that in #7167. Meanwhile, this may introduce unnecessary bugs, as rendering with rich table is not capable of supporting None, could you please roll back the type of CellT, and make adjustments, please?

Copy link
Contributor Author

@karajan1001 karajan1001 Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karajan1001, we'll need to fix that in #7167. Meanwhile, this may introduce unnecessary bugs, as rendering with rich table is not capable of supporting None, could you please roll back the type of CellT, and make adjustments, please?

No problem.

@efiop efiop added the enhancement Enhances DVC label Nov 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhances DVC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Machine: use table for machine list and status outputs

6 participants