Skip to content
This repository has been archived by the owner on Sep 24, 2022. It is now read-only.

Implement round-trippable, editable TOML parsing #64

Closed
wants to merge 10 commits into from

Conversation

vosen
Copy link
Contributor

@vosen vosen commented Jun 25, 2015

This is a very rough implementation of round-tripping TOML parser. It correctly round trips all the TOML test suite (except inline arrays).
It's very unsuitable for merging at the moment (broken formatting, random layout of modules, unimplemented inline arrays, no public API etc.). Since it's a very invasive change, that make internals more complicated (for no real gain if you just want to simply serialize and deserialize), my question is, should I work on getting it into toml-rs or move this work to a fork (with just API for load-edit-save of TOML documents, leaving attribute-based serializing/deserializing here)?

@alexcrichton
Copy link
Collaborator

Sorry but this isn't really quite in a state where I can review it (odd formatting, tests failing, lack of comments, etc). Could you elaborate a little more on what's going on here? Is the goal to preserve comments and whitespace that are otherwise found in a configuration file?

@vosen
Copy link
Contributor Author

vosen commented Jun 29, 2015

Yes, have a parser that preserves comments and whitespace and building API on top of this that allows changes to the parsed document without losing comments and whitespace.

@alexcrichton
Copy link
Collaborator

I think in general that it may be about time that toml-rs had a rewrite of the parser and internals, I've long worried about the number of allocations going on and I suspect it may only get worse over time. I think that retrofitting support (with a parallel AST) for managing comments and such may end up causing too much pain and it'd be better to rewrite the parser from the ground up to handle these cases.

@vosen
Copy link
Contributor Author

vosen commented Jul 17, 2015

I've cleaned up code, documented internal layout and fixed the failing test. It should be reviewable now.
Still not mergeable. Some error spans are wrong (the ones inside insert_array, insert_table, insert_exec* and related. They were wrong before anyway). The public, user-facing API for manipulating parsed table is still missing. I'm still not 100% sure how should it look. I hope to figure it out soon, writing something that actually uses it.
With regards to the number of allocations. I don't quite see how can they be reduced. Rewrite to SAX style?
And as for retrofitting. I don't quite follow, I think the changes here are very invasive (perhaps too invasive, that's why I asked if I should go for merging with toml-rs or forking).

@alexcrichton
Copy link
Collaborator

Yeah this is looking like a whole new library altogether almost, so perhaps it's best to be a standalone dependency? I'd love to clean up the parser here so both projects could share the implementation, however!

@flying-sheep
Copy link

disclaimer: i didn’t try to read and understand this, so i’m generically telling you my thoughts about the API:

i think the parse result should simply be a (possibly indexed?) AST containing whitespace and comment nodes.

the API should allow

  1. re-serializing the whole thing.

  2. extracting sub-ASTs

  3. insert sub-ASTs (which would only accept valid sub-ASTs, e.g. comma+whitespace+int-literal into an int array)

  4. convenience functions to insert data structures (vecs, maps, ints, strings, …)

    this would also validate (only homogeneous arrays allowed, dunno if there’s more), and try to fit the surrounding structure:

    • single quote strings if the surrounding ones are single-quoted and it is possible
    • use inline tables unless there are already subtables or the newly inserted datastructure is large enough

@vosen
Copy link
Contributor Author

vosen commented Feb 1, 2016

I'm closing this PR because I split it all to a separate library here. I didn't write the docs yet and edit operations are missing (insert/delete/set_value_at), but the read side is mostly done.
@flying-sheep I went for a different design. I started from the "API that is simple if you want to just read stuff, but also exposes control over commas/whitespace/ordering etc. if you really want it" and with an eye towards modifying Cargo.toml from Visual Rust, which means small edits that preserve formatting. I didn't think too hard about things outside of this use case, for example reordering tables inside a toml file, which require moving quite a bit of AST around. And I don't plan for any convenience functions for 0.1 because I want this to be usable asap.

@vosen vosen closed this Feb 1, 2016
@flying-sheep
Copy link

ah nice! so to add a string value with the same quoting style, we have to recursively traverse the document and count strings with one or the other, then determine the predominate one, and then find the position to insert to and call string_value.set()?

@alexcrichton
Copy link
Collaborator

Awesome, thanks @vosen!

@vosen
Copy link
Contributor Author

vosen commented Feb 3, 2016

@flying-sheep Yes, the plan is for the two final steps is to look like this:

let mut string_value = doc.insert_string_child(54, "key", "value");
string_value.set_raw("'value'");

@flying-sheep
Copy link

is that duplication with the “value”?

@vosen
Copy link
Contributor Author

vosen commented Feb 3, 2016

No, notice the ' quotes. The idea is that you supply minimum required data (in case of string values insertion index, key and value), we insert new, properly escaped, formatted and enquoted value and return &mut to the value part in case you want to make a modification right now, which for string value is changing leading, trailing "whitespace" or replacing "raw", unescaped value.

@flying-sheep
Copy link

you store the raw syntax, and not the actual [u8] content and sth. like enum QuoteStyle { Single, Double }?

@vosen
Copy link
Contributor Author

vosen commented Feb 3, 2016

Yes, althought I was tempted to store syntax in a more granular fashion. Similarly to your example: splitting string value into enum QuoteStyle { Single(String), Double(String), Triple(String) }, but I couldn't find a good use case for it. Concretely, just changing value of QuoteStyle wouldn't be enough. You'd still need to convert enquoted string between all three representations: e.g. from "C:\\temp" to 'C:\temp'.

@flying-sheep
Copy link

enum QuoteStyle { Single(String), Double(String), Triple(String) }

this makes no sense. either you have only single and double (referring to the number of strokes is the quote character), or single and triple (referring to the number of quote characters). if anything, we should use the meaning. ' means “raw string” and """ means “multiline string”.

enum QuoteStyle {
    Raw(String),    RawMultiline(String), 
    Normal(String), NormalMultiline(String),
}

but I couldn't find a good use case for it

well, “raw” means that it contains all escapes and so on right? so replacing “raw” means that you have to sanitize the string contents to be a valid literal with the same meaning.

but i see why you designed it that way: there’s many many stylistic parts to representing things:

  1. where to put underscores in numeric literals?
  2. multiline strings or single line with \n?
  3. ...

still, i’d wish for a way to choose some of those when first inserting a value instead of basically inserting it twice. maybe:

enum QuoteStyle { Raw, Normal, Smart }
enum LineStyle { Single, Multi, Smart }
enum IndentStyle { None, Break(u32) }
//minimal would only escape e.g. newlines in non-multiline strings and `"` in normal strings
enum EscapeStyle { Minimal, OnlyInvisible,  }
struct StringStyle {
    quotes: QuoteStyle,
    lines: LineStyle,
    indent: IndentStyle,
    escape: EscapeStyle,
}

quotes: Smart means: use raw strings ('), except if the string contains stuff that should be escaped

lines: Smart means: use single line strings unless the string is long and has multiple line breaks, or quotes: Raw is chosen and the string contains '.

trying to serialize a string containing ' to a single-quoted raw string would fail.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants