Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add 'rdf' (#804) #1010

Closed
wants to merge 3 commits into from
Closed

add 'rdf' (#804) #1010

wants to merge 3 commits into from

Conversation

kevinushey
Copy link
Contributor

Adds rdf as a little function helper for creating tbl_dfs inline, e.g.

rdf(
    a | b | 1 | 2,
    c | d | 3 | 4
)

Comments appreciated (probably most pertinent one -- what about column names?)

Other things to think about:

  • is rdf the best name?
  • should we provide a column-wise analog (ie, like rdf but transposes the created frame)? Such a function could accept named arguments (for named columns)
  • allow for leading or trailing commas?

@hadley
Copy link
Member

hadley commented Mar 10, 2015

Maybe the first row should always be column names?

I definitely don't like rdf() as a function name.

Another approach that would be a bit easier to parse would be:

rdf(
    hdr(a, b, c)
    row(a, b, 1, 2)
    row(c, d, 3, 4)
)

Another approach would require a sentinel value at the end of each row:

rdf(
  a, b, c, d, END
  a, b, 1, e, END,
  c, d, 3, 4
)

(But I don't think you could make that as elegant as the other two approaches)

@kevinushey
Copy link
Contributor Author

I definitely prefer not having to add 'extra' syntax to make the table; we should keep it as 'markdown-like' as possible.

Perhaps a dummy 'delimiter', or 'delimiter-like' symbol, could be used, e.g. a series of dots:

rdf(
    these | are | names,
    ...................,
    these | are | cells,
)

)

I would much prefer making the 'parser' more complicated if it meant keeping the user experience good + language simple.

@hadley
Copy link
Member

hadley commented Mar 10, 2015

I like the series of dots idea.

@kevinushey
Copy link
Contributor Author

Just added some code to support the dots delimiter, and also we drop missing arguments. So we can now write e.g.

df <- rdf(
   , these | are | names
   , ...................
   , these | are | cells
 )

(note that symbols are implicitly converted to character)

Still need a nice name -- any thoughts? Maybe something like make_tbl() verbatim_tbl()?

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.77%) to 66.26% when pulling cbe46cd on kevinushey:feature/rdf into 2e298fc on hadley:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.57%) to 66.46% when pulling 0b1c9eb on kevinushey:feature/rdf into 2e298fc on hadley:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.57%) to 66.46% when pulling 0b1c9eb on kevinushey:feature/rdf into 2e298fc on hadley:master.

@lionel-
Copy link
Member

lionel- commented Mar 11, 2015

doodle() ?

@lionel-
Copy link
Member

lionel- commented Apr 3, 2015

another idea for specifying column headers:

data <- doodle(
  ha : ho : hi,
   1 |  x | NA,
   3 | NA | NA,
)

@hadley
Copy link
Member

hadley commented Aug 24, 2015

@jennybc I know this is something that you're interested in. Any thoughts on name, syntax, etc?

@jennybc
Copy link
Member

jennybc commented Aug 24, 2015

Here's a mash-up of the above ideas

tibble(
  these, are, names,
  ---,
  these, are, values,
  these, are, values
)

Who doesn't like tibble()?!?

I am much faster at typing commas than | (tho the nod to markdown is not lost on me).

You could use the markdown-inspired "3 dashes" to signal the presence of variable names. A series of dots might lead to confusion with ...?

Maybe instead of a sentinel value to signal the end of one observation, just require variable names and get length there? That might be less of a drag than repeatedly typing the sentinel.

@lionel-
Copy link
Member

lionel- commented Aug 24, 2015

tibble() is an awesome name.

I like your --- idea. It could be 3 characters or more, so that a fully formatted table looks like:

tibble(
  these, are,  names,
  ------------------,
  these, are, values,
  these, are, values
)

@hadley
Copy link
Member

hadley commented Aug 24, 2015

I love the name tibble, but I don't think - will work as a separator because it's not a valid R expression. Unless some else has a really bright idea, I think we're stuck with .......

I like the idea of forcing a header row so that we can automatically determine how many columns there are, so can use the implicit newline as a row separator.

@lionel-
Copy link
Member

lionel- commented Aug 24, 2015

ah yes... and with the dots there's another issue, the function won't work with only three dots.

the following is maybe a bit annoying to type

tibble(
  these, are,  names,
  `----------------`,
  these, are, values,
  these, are, values
)

This would work but looks like a hack:

tibble(
  these, are,  names,
  -----------------.,
  these, are, values,
  these, are, values
)

@jennybc
Copy link
Member

jennybc commented Aug 24, 2015

I definitely don't understand exactly what will work as the thing that separates variable names from the data, but I get the general drift.

That said, can the twiddle be pressed into service? It's easy to make a valid expression with it.

@lionel-
Copy link
Member

lionel- commented Aug 25, 2015

maybe the simplest thing:

tibble(
  'these', 'are', 'names',
   these,   are,   values,
   these,   are,   values
)

possibly with an optional ruler:

tibble(
  'these', 'are', 'names',
  '---------------------',
   these,   are,   values,
   these,   are,   values
)

@hadley
Copy link
Member

hadley commented Aug 25, 2015

The problem with using an infix function (like ~) is that you need to have something on the right hand side of it (i.e. ~~~~~~~~~~x would work but, ~~~~~~~ would not). I think we need to use something that's a valid R variable name (i.e. can start with letter or ., and contain letters, digits, . and _. That doesn't leave many options, but we could do something like:

tibble(
  "col1", "col2", 
  HEADER,
  1, 3
  2, 6
)

I also don't think we want to drop the quotes for strings altogether because that will confuse people who want to put spaces in their values. So maybe we could flip @llionel-'s idea around and do:

tibble(
  col1, col2,
  1, 2,
  3, 4
)

But then the header doesn't stand out much. Another idea would be to eliminate the NSE altogether and require formula quoting for column headers:

tibble(
  ~col1, ~col2,
  1, "a",
  3, "b"
)

I quite like that.

(I also just thought of a another name: frame_data() because it's a transposed data_frame().)

@jennybc
Copy link
Member

jennybc commented Aug 25, 2015

@hadley's last proposal looks really good to me:

tibble(
  ~col1, ~col2,
  1, "a",
  3, "b"
)

@lionel-
Copy link
Member

lionel- commented Aug 25, 2015

and once again ~ saves the day

@hadley
Copy link
Member

hadley commented Aug 25, 2015

@jennybc how do you feel about frame_data() as a name?

@kevinushey do you want to take a shot at implementing this? Or should I? I think it should be fairly simple now.

@jennybc
Copy link
Member

jennybc commented Aug 26, 2015

I like frame_data() and think it's clever. I still like tibble() too: brings back fond memories of the epic tibble diff discussion. You have a great track record with naming things so I'm not bothered either way.

@kevinushey
Copy link
Contributor Author

@hadley I'll give it a shot and update this PR.

@kevinushey
Copy link
Contributor Author

New PR at #1358.

@kevinushey kevinushey closed this Aug 27, 2015
@lock
Copy link

lock bot commented Jan 19, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jan 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants