Correct begin and end (line+char) positions for each token and node #23

nene · 2013-06-12T17:50:11Z

Related to the issue I just raised: #22

I replaced the rudimentary @line field in Token and Node classes with @range field that contains the following 2D array:

[[start_line, start_char], [end_line, end_char]]

Unlike the old implementation, all nodes now have this information. And I've kept the #line accessor for backwards compatibility.

Also, I eliminated the whole comments-to-syntax-nodes mapping. Placing the array of all comments into the root node.

There are things to improve here:

~~Maybe the 2D array should be replaced with a hash or object, so it would be a bit more readable. e.g. {:start => {:line => 3, :char => 8}, :end => {:line => 3, :char => 10}}~~
~~Instead of line+character based ranges we could provide a simple index based ranges. Some tools might prefer one way, others a different way. So the best might be to support both.~~
With all comments residing in the root SourceElementsNode - all other node types no more need the @comments field. So maybe it would be better to wrap everything inside a special ProgramNode that contains this @comments field. Or not return the comments at all, instead providing RKelly::Parser#comments accessor to access the comments list if needed.

All this aside, I think it's an improvement over the current implementation which is simply broken.

@range

Previously the @line field that some Nodes had wasn't always the number of the line where a Node started. And it was hard to tell why some nodes had this info, while others didn't. Now both the start and end line of every Token and Node is stored in @range field as two-element array. The #line method is still there for backwards compatibility, but the @range field is the one that holds the actual data. For now I had to disable the tests in test_comments.rb as the #apply_comments (which it effectively tested) relied on only some nodes having the line number data available.

Each Token and Node range is now a 2D array of the following: [[start_line, start_char], [end_line, end_char]] One can now know the exact position of each node.

Instead of attempting to associate all comments with syntax nodes (and actually failing to correctly do so) all comments found are added to the root SoureElementsNode.comments - all other nodes will have an empty #comments array.

Create classes for holding character positions data. - The CharPos contains both line/char and absolute character index. - The CharRange combines two CharPos objects. - Both have #next method to abstract away the creation of the ranges inside Tokenizer.

nene · 2013-06-12T20:12:28Z

Already took care of the two points mentioned and replaced the array-based ranges with CharRange and CharPos objects. Much cleaner.

Alias the #to_s to #inspect so that when using `pp` to inspect the created AST, the ranges will appear nice and compact. Also tweak to #to_s a bit to output its stuff in a bit shorter form.

nene added 5 commits June 5, 2013 19:47

Trim trailing whitespace in tokenizer.rb

f3e4958

Line and character based range.

3e20e39

Each Token and Node range is now a 2D array of the following: [[start_line, start_char], [end_line, end_char]] One can now know the exact position of each node.

Add all comments to root node of the AST.

91ec5d2

Instead of attempting to associate all comments with syntax nodes (and actually failing to correctly do so) all comments found are added to the root SoureElementsNode.comments - all other nodes will have an empty #comments array.

Make CharRange more debugging-friendly.

25c0508

Alias the #to_s to #inspect so that when using `pp` to inspect the created AST, the ranges will appear nice and compact. Also tweak to #to_s a bit to output its stuff in a bit shorter form.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct begin and end (line+char) positions for each token and node #23

Correct begin and end (line+char) positions for each token and node #23

nene commented Jun 12, 2013

nene commented Jun 12, 2013

Correct begin and end (line+char) positions for each token and node #23

Are you sure you want to change the base?

Correct begin and end (line+char) positions for each token and node #23

Conversation

nene commented Jun 12, 2013

nene commented Jun 12, 2013