Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String interpolation #82

Closed
yannham opened this issue Jun 2, 2020 · 8 comments · Fixed by #197
Closed

String interpolation #82

yannham opened this issue Jun 2, 2020 · 8 comments · Fixed by #197

Comments

@yannham
Copy link
Member

yannham commented Jun 2, 2020

The fate of a Nickel program is eventually to produce text in the form of configuration files. A fair amount of time is spent gathering and gluing together various pieces of strings in order to produce this configuration. Hence string interpolation, making text substitution easier, is a critical feature.

We propose to unashamedly reuse Nix string interpolation, which is well appreciated in the ecosystem and proved itself efficient for writing self-contained expressions, which mix strings containing shell commands, shell scripts, text descriptions, URLs, and so on. The syntax using $ is common (Shell, JavaScript, Perl, Scala, Kotlin, Julia, etc.) while the required additional { avoid clashing with the simple shell interpolation $var when embedding scripts or commands inside strings.

In the following, we provide a concrete proposal, as a starting point for discussion.

Examples

"here is a simple string"

/* indented multi-line string */
...
    let script = ''
        cd ~/.vim/
        if [ -f "vimrc" ]; then
            mv vimrc vimrc.bk
        fi
        touch vimrc
        '' in foo
/* script = 
"cd ~/.vim/
if [ -f "vimrc" ]; then
    cp vimrc vimrc.bk
fi
touch vimrc" */

/* interpolation */
let gnu = "GNU" in "${gnu} = ${gnu}'s Not Unix"
/* "GNU = GNU's Not Unix" */

/* nested interpolation */
let cmdFlag = "foo" in
let optionsFlag = "-foo" in
let cmdDefault = "bar" in
let options = "-bar" in
"${if flag then "${cmdFlag} ${optionsFlag}" else "${cmdDefault}"} ${options} $out"
/* if flag is true, gives "foo -foo -bar $out"
 * otherwise, "bar -bar $out"
 */

Definition

We propose two ways of defining strings:

  • The usual, existing double quote delimited literals: "foo"
  • Additional double single quote delimited literals: ''foo''

The alternative delimiter '' is intended for enclosing strings that contain numerous ", such as shell commands, which would be then tiresome to escape. For indented strings enclosed by double single quotes, as the second example of the previous section, a number of spaces corresponding to the minimal indentation of the string as a whole is stripped from the beginning of each line. Also, the first line is removed if it is empty or contains only whitespaces. This makes it possible to write multi-line strings with the same level of indentation as the surrounding code without polluting the final string with unnecessary heading spaces.

Interpolation allows to embed a Nickel expression, which must evaluate to a string, inside another string using the ${expr} syntax. Such expressions can be arbitrary Nickel expressions and may contain themselves nested interpolated strings.

Syntax

str ::= "strprts" | ''strprts''
strprts ::= ε | strprts STR | strprts ${e}

where e is any Nickel expression, STR is a sequence of string characters (without the special sequence ${, unless escaped), ε denotes an empty string and spaces between tokens denote concatenation.

Escaping

Inside ", \ is used as usual to represent special sequences ("\n", "\t", and so on), to escape itself or to escape the interpolation sequence ${. On the other hand, it has no special meaning alone inside '', and must be prefixed with '' to recover it. For example, a tab is written ''''\t''.

Sequence Escape/" Escape/''
${ ''${ \${
' ''' '
\ \ \\

Semantics

Source Evaluation Condition
ε
strprts STR (eval strprts) STR
strprts ${e} (eval strprts) (eval e) isStr e
"strprts" "eval(strprts)"
''strprts'' ''stripIndent (eval strprts)''

where stripIndent implements the behavior described above about smart indentation.

Remarks

We may want to improve the behavior of Nix when substituting multi-line strings inside multi-line, indented strings (See #543 for Nix, or #200 for a similar discussion about Dhall).

@Profpatsch
Copy link

Here’s the dhall standard for multiline strings, which improves on the nix multiline strings: https://github.com/dhall-lang/dhall-lang/blob/master/standard/multiline.md
I’d vote for using their improved version, to not create “another slightly different standard” for no reason.


We have to make sure that our strings are “8-bit clean”, meaning you can put arbitrary bytes in them via escape codes, otherwise people are severely restricted in what they can generate configuration for.
We should default to UTF-8/ASCII (in what encoding we parse and thus make expressible without requiring escape codes).


I’ve noticed that for generating code (in this case generating nickel code) it’s super useful to have raw strings in the target language, which don’t require any escaping.
There’s multiple ways of going about raw strings, but I guess that should go into a separate issue.

@thufschmitt
Copy link
Contributor

Here’s the dhall standard for multiline strings, which improves on the nix multiline strings

Care to explain how? (Not that I doubt it, but the standard doesn't explain this very clearly)

@Profpatsch
Copy link

Profpatsch commented Jun 17, 2020 via email

@edolstra
Copy link
Contributor

We should probably reconsider the use of '' as delimeters, since they're quite confusing. Alternatives:

  • Make regular string ("string") do the right thing for multiline literals.
  • Use some asymmetric delimiters (e.g. <<string>>).
  • C++11-style user-specified delimiters (e.g. R"foo(string)foo"). These are nice because you can pick a delimiter that doesn't clash with the embedded language.

@yannham
Copy link
Member Author

yannham commented Jun 23, 2020

Iirc it pertains to when \n is added. I don’t remember the exact differences, but creating a table that compares examples would be a good first step.

After reading Dhall's standard it seems to me the only difference is that in Dhall the last line, even if it only contains whitespaces, is significant for determining global indentation, that is

    ''
    foo
    bar
''

desugars to " foo\n bar\n" in Dhall while it gives "foo\nbar\n" in Nix.

Then Dhall also preserve leading whitespaces when interpolating strings with new lines inside multiline strings, as I mentioned in the issue, but this is not described in the linked document.

@edolstra
Copy link
Contributor

edolstra commented Jul 3, 2020

Something I didn't see mentioned is whether Nickel should support Nix-style string contexts, i.e. whether the value "${pkgs.bash}/bin/sh" has some metadata denoting that it depends on pkgs.bash. This is necessary to make string interpolation usable in Nix, but it might be too domain-specific for a general-purpose configuration language.

@thufschmitt
Copy link
Contributor

it might be too domain-specific for a general-purpose configuration language.

string contexts seem to be something that could be tremendously useful for other use-cases. At least in Bazel, starlark doesn't have any equivalent of these and it bite me a lot of time because I was referring to some target in a script but had forgotten to add it to the dependencies.

It might be useful to have something even more dynamic/lazy though, as we could also want something like (in a totally hypotetical cloud deployment tool based on nickel and imagined just for the sake of the argument, although the same might be true in something like terraform too):

rec {
  my_first_instance = spawn_ec2_instance {...};
  my_other_instance = spawn_ec2_instance {
    system_config = { ... }: {
        networking.hosts.${my_first_instance.ip_address} = "my-fist-instance";
        ...
    };
  };
}

In which case my_first_instance.ip_address would be a string with context, and whose value could only be determined dynamically − which would be a more generic and principled solution to nixops's two-seps deployment, as well as the drv rewriting in content-addressed derivations.

@yannham
Copy link
Member Author

yannham commented Sep 3, 2020

I'm reopening the discussion, as interpolated strings landed in #131, and multiline strings are next.

Make regular string ("string") do the right thing for multiline literals.

@edolstra, do you mean that usual double quote delimited strings would act like multiline strings in Nix

"
  indented
  string
"

should be parsed as "indented\nstring" ? In this case, what about " indented\n string", do we differentiate escaped new line \n from actual newline in the source ?

C++11-style user-specified delimiters (e.g. R"foo(string)foo"). These are nice because you can pick a delimiter that doesn't clash with the embedded language.

This is probably the better solution. The only thing is maybe that this is confusing with respect to other languages, as these kind of delimiters are used for raw strings everywhere, but we probably want to still allow interpolation in our multiline strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants