Implement round-trippable, editable TOML parsing #64

vosen · 2015-06-25T21:27:30Z

This is a very rough implementation of round-tripping TOML parser. It correctly round trips all the TOML test suite (except inline arrays).
It's very unsuitable for merging at the moment (broken formatting, random layout of modules, unimplemented inline arrays, no public API etc.). Since it's a very invasive change, that make internals more complicated (for no real gain if you just want to simply serialize and deserialize), my question is, should I work on getting it into toml-rs or move this work to a fork (with just API for load-edit-save of TOML documents, leaving attribute-based serializing/deserializing here)?

alexcrichton · 2015-06-29T17:55:30Z

Sorry but this isn't really quite in a state where I can review it (odd formatting, tests failing, lack of comments, etc). Could you elaborate a little more on what's going on here? Is the goal to preserve comments and whitespace that are otherwise found in a configuration file?

vosen · 2015-06-29T18:45:06Z

Yes, have a parser that preserves comments and whitespace and building API on top of this that allows changes to the parsed document without losing comments and whitespace.

alexcrichton · 2015-06-29T22:40:05Z

I think in general that it may be about time that toml-rs had a rewrite of the parser and internals, I've long worried about the number of allocations going on and I suspect it may only get worse over time. I think that retrofitting support (with a parallel AST) for managing comments and such may end up causing too much pain and it'd be better to rewrite the parser from the ground up to handle these cases.

vosen · 2015-07-17T07:39:12Z

I've cleaned up code, documented internal layout and fixed the failing test. It should be reviewable now.
Still not mergeable. Some error spans are wrong (the ones inside insert_array, insert_table, insert_exec* and related. They were wrong before anyway). The public, user-facing API for manipulating parsed table is still missing. I'm still not 100% sure how should it look. I hope to figure it out soon, writing something that actually uses it.
With regards to the number of allocations. I don't quite see how can they be reduced. Rewrite to SAX style?
And as for retrofitting. I don't quite follow, I think the changes here are very invasive (perhaps too invasive, that's why I asked if I should go for merging with toml-rs or forking).

alexcrichton · 2015-07-20T15:56:34Z

Yeah this is looking like a whole new library altogether almost, so perhaps it's best to be a standalone dependency? I'd love to clean up the parser here so both projects could share the implementation, however!

flying-sheep · 2015-12-09T13:23:13Z

disclaimer: i didn’t try to read and understand this, so i’m generically telling you my thoughts about the API:

i think the parse result should simply be a (possibly indexed?) AST containing whitespace and comment nodes.

the API should allow

re-serializing the whole thing.
extracting sub-ASTs
insert sub-ASTs (which would only accept valid sub-ASTs, e.g. comma+whitespace+int-literal into an int array)
convenience functions to insert data structures (vecs, maps, ints, strings, …)

this would also validate (only homogeneous arrays allowed, dunno if there’s more), and try to fit the surrounding structure:
- single quote strings if the surrounding ones are single-quoted and it is possible
- use inline tables unless there are already subtables or the newly inserted datastructure is large enough

vosen · 2016-02-01T23:04:01Z

I'm closing this PR because I split it all to a separate library here. I didn't write the docs yet and edit operations are missing (insert/delete/set_value_at), but the read side is mostly done.
@flying-sheep I went for a different design. I started from the "API that is simple if you want to just read stuff, but also exposes control over commas/whitespace/ordering etc. if you really want it" and with an eye towards modifying Cargo.toml from Visual Rust, which means small edits that preserve formatting. I didn't think too hard about things outside of this use case, for example reordering tables inside a toml file, which require moving quite a bit of AST around. And I don't plan for any convenience functions for 0.1 because I want this to be usable asap.

flying-sheep · 2016-02-02T08:44:27Z

ah nice! so to add a string value with the same quoting style, we have to recursively traverse the document and count strings with one or the other, then determine the predominate one, and then find the position to insert to and call string_value.set()?

alexcrichton · 2016-02-03T01:51:01Z

Awesome, thanks @vosen!

vosen · 2016-02-03T07:14:55Z

@flying-sheep Yes, the plan is for the two final steps is to look like this:

let mut string_value = doc.insert_string_child(54, "key", "value");
string_value.set_raw("'value'");

flying-sheep · 2016-02-03T08:35:39Z

is that duplication with the “value”?

vosen · 2016-02-03T20:23:26Z

No, notice the ' quotes. The idea is that you supply minimum required data (in case of string values insertion index, key and value), we insert new, properly escaped, formatted and enquoted value and return &mut to the value part in case you want to make a modification right now, which for string value is changing leading, trailing "whitespace" or replacing "raw", unescaped value.

flying-sheep · 2016-02-03T20:36:37Z

you store the raw syntax, and not the actual [u8] content and sth. like enum QuoteStyle { Single, Double }?

vosen · 2016-02-03T22:29:41Z

Yes, althought I was tempted to store syntax in a more granular fashion. Similarly to your example: splitting string value into enum QuoteStyle { Single(String), Double(String), Triple(String) }, but I couldn't find a good use case for it. Concretely, just changing value of QuoteStyle wouldn't be enough. You'd still need to convert enquoted string between all three representations: e.g. from "C:\\temp" to 'C:\temp'.

flying-sheep · 2016-02-04T08:58:22Z

enum QuoteStyle { Single(String), Double(String), Triple(String) }

this makes no sense. either you have only single and double (referring to the number of strokes is the quote character), or single and triple (referring to the number of quote characters). if anything, we should use the meaning. ' means “raw string” and """ means “multiline string”.

enum QuoteStyle {
    Raw(String),    RawMultiline(String), 
    Normal(String), NormalMultiline(String),
}

but I couldn't find a good use case for it

well, “raw” means that it contains all escapes and so on right? so replacing “raw” means that you have to sanitize the string contents to be a valid literal with the same meaning.

but i see why you designed it that way: there’s many many stylistic parts to representing things:

where to put underscores in numeric literals?
multiline strings or single line with \n?
...

still, i’d wish for a way to choose some of those when first inserting a value instead of basically inserting it twice. maybe:

enum QuoteStyle { Raw, Normal, Smart }
enum LineStyle { Single, Multi, Smart }
enum IndentStyle { None, Break(u32) }
//minimal would only escape e.g. newlines in non-multiline strings and `"` in normal strings
enum EscapeStyle { Minimal, OnlyInvisible,  }
struct StringStyle {
    quotes: QuoteStyle,
    lines: LineStyle,
    indent: IndentStyle,
    escape: EscapeStyle,
}

quotes: Smart means: use raw strings ('), except if the string contains stuff that should be escaped

lines: Smart means: use single line strings unless the string is long and has multiple line breaks, or quotes: Raw is chosen and the string contains '.

trying to serialize a string containing ' to a single-quoted raw string would fail.

vosen added 5 commits June 18, 2015 00:31

Rewrite parsing and in-memory storage

b2c7cb2

Handle root tables with leading and trailing aux text

305f9a4

Add round-trip tests. Implement escaping of keys

9a11eba

Properly handle escaped strings and literals

d4dd2ef

Handle integers, floats and array values

5144aa8

vosen added 5 commits July 9, 2015 12:19

Handle round-trips for inline arrays

ed64b65

Clean up formatting in document module and module layout

abae60f

Document internal structure of doc-style parser. Fix some useless code

00122c0

Fix aux text handling functions and improve some naming

d932536

Refactor inner AST-generation functions and fix a bug with inline tables

66be2f1

vosen mentioned this pull request Aug 10, 2015

Add cargo integration PistonDevelopers/VisualRust#3

Open

5 tasks

alexcrichton force-pushed the master branch from bab05fc to bb12015 Compare September 23, 2015 17:09

rgardner mentioned this pull request Oct 26, 2015

Do not remove comments from the Cargo.toml killercup/cargo-edit#57

Closed

killercup mentioned this pull request Nov 21, 2015

cargo add: Nicer TOML Files killercup/cargo-edit#15

Closed

3 tasks

wraithan mentioned this pull request Jan 2, 2016

Round trip the Cargo.toml better wraithan/cargo-bump#1

Closed

vosen closed this Feb 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement round-trippable, editable TOML parsing #64

Implement round-trippable, editable TOML parsing #64

vosen commented Jun 25, 2015

alexcrichton commented Jun 29, 2015

vosen commented Jun 29, 2015

alexcrichton commented Jun 29, 2015

vosen commented Jul 17, 2015

alexcrichton commented Jul 20, 2015

flying-sheep commented Dec 9, 2015

vosen commented Feb 1, 2016

flying-sheep commented Feb 2, 2016

alexcrichton commented Feb 3, 2016

vosen commented Feb 3, 2016

flying-sheep commented Feb 3, 2016

vosen commented Feb 3, 2016

flying-sheep commented Feb 3, 2016

vosen commented Feb 3, 2016

flying-sheep commented Feb 4, 2016

Implement round-trippable, editable TOML parsing #64

Implement round-trippable, editable TOML parsing #64

Conversation

vosen commented Jun 25, 2015

alexcrichton commented Jun 29, 2015

vosen commented Jun 29, 2015

alexcrichton commented Jun 29, 2015

vosen commented Jul 17, 2015

alexcrichton commented Jul 20, 2015

flying-sheep commented Dec 9, 2015

vosen commented Feb 1, 2016

flying-sheep commented Feb 2, 2016

alexcrichton commented Feb 3, 2016

vosen commented Feb 3, 2016

flying-sheep commented Feb 3, 2016

vosen commented Feb 3, 2016

flying-sheep commented Feb 3, 2016

vosen commented Feb 3, 2016

flying-sheep commented Feb 4, 2016