Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for yaml arrays? #77

Open
MarkEdmondson1234 opened this issue Nov 9, 2019 · 16 comments
Open

Support for yaml arrays? #77

MarkEdmondson1234 opened this issue Nov 9, 2019 · 16 comments
Labels

Comments

@MarkEdmondson1234
Copy link

Hello, I'm trying to make an R implementation that would output this yaml:

steps:
- name: 'gcr.io/cloud-builders/docker'
  args: ['build', '-t', 'gcr.io/[PROJECT_ID]/[IMAGE]', '.']

But I can't find some R object that supports the ['blah','blah'] bit. It keeps putting it as individual entries. It doesn't seem to be reversible in import and output either

e.g.

yaml.load("steps:
+ - name: 'gcr.io/cloud-builders/docker'
+   args: ['build', '-t', 'gcr.io/[PROJECT_ID]/[IMAGE]', '.']")
#$steps
#$steps[[1]]
#$steps[[1]]$name
#[1] "gcr.io/cloud-builders/docker"
#
#$steps[[1]]$args
#[1] "build"                       "-t"                          "gcr.io/[PROJECT_ID]/[IMAGE]"
#[4] "."                          

yaml.load("steps:
+ - name: 'gcr.io/cloud-builders/docker'
+   args: ['build', '-t', 'gcr.io/[PROJECT_ID]/[IMAGE]', '.']") -> foo
as.yaml(foo)

...gives:

steps:
- name: gcr.io/cloud-builders/docker
  args:
  - build
  - -t
  - gcr.io/ddd
  - '.'

Is the desired output possible?

@MarkEdmondson1234
Copy link
Author

Ok, its equivalent. https://stackoverflow.com/questions/23657086/yaml-multi-line-arrays

So it will work, its only readability to favour the former.

@viking
Copy link
Contributor

viking commented Nov 11, 2019

It appears that what you're asking for is a way to tell the as.yaml function to print bracket-style sequences (AKA arrays, but 'sequence' is the YAML name for them). As you've discovered, multi-line arrays/sequences are perfectly valid in YAML. The as.yaml function does not currently have a way to output sequences in the bracket style, but the yaml.load function will read either format.

@viking viking added the feature label Nov 11, 2019
@malcolmbarrett
Copy link

malcolmbarrett commented Nov 11, 2019

Despite their technical equivalence, I would like this feature a lot, too. I think the bracket style is intuitive for R users because it sort of looks like c()

@mister-frostee
Copy link

I would like this feature too! I didn't know this wasn't supported and ended up getting several of my yaml files messed up.

@danbartl
Copy link

Hi would love this feature too, for a big YAML it becomes much much more human readable in array format

@perlpunk
Copy link

perlpunk commented Feb 15, 2022

Some clarification regarding terminology:
YAML basically has two styles: block and flow style. (also see https://www.yaml.info/learn/flowstyle.html)

x:
- block
- style
- sequence
y: [flow, style, sequence]

When emitting, libyaml allows to set the style of the sequence_start event, look for YAML_FLOW_SEQUENCE_STYLE.
Of course, when loading data to a native data structure and then dumping it as YAML again, meta information like style gets lost, unless it is saved in the data structure.
I don't know anything about how r-yaml loads the data, so I can't comment about that.

PyYAML has an option default_flow_style. It's False by default. Everything will be output in block style. When set to None, only the sequences and mappings which only contain scalars will be printed in flow style, and if set to True, everything will be printed in flow style.
I think that's a good compromise when there is no possibility to save the original style information.

spgarbet added a commit that referenced this issue Feb 22, 2022
spgarbet added a commit that referenced this issue Feb 22, 2022
@spgarbet
Copy link
Member

Current branch passes val-grind tests.

@spgarbet
Copy link
Member

Passes tests in data.table as well.


garbetsp@Hubble:~/Projects/cran/data.table$ make test
R -e 'require(data.table); test.data.table()'

R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> require(data.table); test.data.table()
Loading required package: data.table
getDTthreads(verbose=TRUE):
  OpenMP version (_OPENMP)       201511
  omp_get_num_procs()            8
  R_DATATABLE_NUM_PROCS_PERCENT  unset (default 50)
  R_DATATABLE_NUM_THREADS        unset
  R_DATATABLE_THROTTLE           unset (default 1024)
  omp_get_thread_limit()         2147483647
  omp_get_max_threads()          8
  OMP_THREAD_LIMIT               unset
  OMP_NUM_THREADS                unset
  RestoreAfterFork               true
  data.table is using 4 threads with throttle==1024. See ?setDTthreads.
test.data.table() running: /usr/local/lib/R/site-library/data.table/tests/tests.Rraw.bz2 

Tue Feb 22 16:22:29 2022  endian==little, sizeof(long double)==16, longdouble.digits==64, sizeof(pointer)==8, TZ==unset, Sys.timezone()=='America/Chicago', Sys.getlocale()=='LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C', l10n_info()=='MBCS=TRUE; UTF-8=TRUE; Latin-1=FALSE; codeset=UTF-8', getDTthreads()=='OpenMP version (_OPENMP)==201511; omp_get_num_procs()==8; R_DATATABLE_NUM_PROCS_PERCENT==unset (default 50); R_DATATABLE_NUM_THREADS==unset; R_DATATABLE_THROTTLE==unset (default 1024); omp_get_thread_limit()==2147483647; omp_get_max_threads()==8; OMP_THREAD_LIMIT==unset; OMP_NUM_THREADS==unset; RestoreAfterFork==true; data.table is using 4 threads with throttle==1024. See ?setDTthreads.', zlibVersion()==1.2.11 ZLIB_VERSION==1.2.11
10 longest running tests took 17s (41% of 41s)
      ID  time nTest
 1: 2155 3.183     5
 2: 1438 2.355   738
 3: 1888 1.993     9
 4: 1648 1.654    91
 5: 1848 1.618     2
 6: 1652 1.564    91
 7: 1650 1.531    91
 8: 1437 1.162    36
 9: 1912 1.159     2
10: 1644 1.042    91
All 10038 tests (last 2163) in tests/tests.Rraw.bz2 completed ok in 41.3s elapsed (51.8s cpu)

@perlpunk
Copy link

perlpunk commented Feb 23, 2022

I was curious and looked into 276825d and only see sequence_style mentioned, but also mappings can be in flow style, so they should also be affected.
e.g. a flow mapping in a flow sequence: [ { key: value } ] or the other way round { key: [values] }
So a parameter sequence_style in emit_object sounds like flow style mappings are missing.

@spgarbet
Copy link
Member

Maybe two parameters, default_seq_flow and default_map_flow?

@perlpunk
Copy link

Maybe two parameters, default_seq_flow and default_map_flow?

I don't think that is necessary.
What I mean is that you don't pass the parameter to the yaml_mapping_start_event_initialize, only to the yaml_sequence_start_event_initialize.

@spgarbet
Copy link
Member

I thought that's what this patch does, I don't see it in the changeset getting passed to yaml_mapping_start_event_initialize. What I was suggesting is to allow independent control of both.

@perlpunk
Copy link

I thought that's what this patch does, I don't see it in the changeset getting passed to yaml_mapping_start_event_initialize.

And I suggested that it should, but with a different name, not "sequence" in it.

If you want to do something completely different than pyyaml, go ahead :)
But maybe ask users first what they want.

@perlpunk
Copy link

What I was suggesting is to allow independent control of both.

It doesn't make sense to emit all sequences in flow and all mappings in block style. everything under flow style nodes must be flow style itself.

@spgarbet
Copy link
Member

:lol: I never try to make sense of user style requests...

@spgarbet
Copy link
Member

spgarbet commented Feb 23, 2022

This shows what you're talking about https://gist.github.com/perlpunk/377c3a537df861a7736fd3a1b9aec04f

The proposal is for the parameter default_flow_style to control both sequence and map emitting:

  • FALSE => (default) block mode, passes YAML_ANY_SEQUENCE_STYLE and YAML_ANY_MAPPING_STYLE (current behavior)
  • TRUE => flow mode, passes YAML_FLOW_SEQUENCE_STYLE and YAML_FLOW_MAPPING_STYLE
  • NA =>
    • Leaf Nodes: YAML_FLOW_SEQUENCE_STYLE and YAML_FLOW_MAPPING_STYLE for leaf nodes
    • Branch Nodes: YAML_BLOCK_SEQUENCE_STYLE and YAML_BLOCK_MAPPING_STYLE

@MarkEdmondson1234 Can you foresee any reason one would want this controlled separately, i.e. handle lists/maps one way and arrays a different way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants