-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extracting comments from source code #31
Comments
Not possible. The way Ruby works, you can only simultaneously lex and parse it. For example, compare these code fragments: division and comment: a = 1
a / 10 # / call and regexp: a / 10 # / Ruby's (>=1.9.2) grammar is context-sensitive for all intents and purposes, and with the current LALR-parse+context-sensitive-lexer implementation it only ever makes sense to perform parsing and lexing simultaneously. I'm not sure which interface for storing comments would be the most sane. |
I see. Maybe comments should be tracked during lex/parse and the be stored in some structure that's finally attached to the source map of the top level AST node. |
Might be. One of the problems here is that people seem to like comments attached to the def/class/module nodes, and these two approaches do not compose well. (I don't want to have the same data in two disjoint places in my AST.) |
What I'm currently thinking about is to make the #parse API look like this:
Questions:
|
The API change seems reasonable to me. I think we should include The post-processor idea sounds good; we can probably just discard comments that appear inside nodes - seems unlikely that someone would like to map those to nodes anyways. I'm reasonably sure we should map only class/module/def/constant comments to nodes. The other comments would likely be needed just for bulk comment processing and having them only in the |
What is your reasoning for including I believe that YARD allows documenting attr_accessor clauses and similar ones, that's why we must support attaching them to arbitrary nodes. Well, once you have constant assignments, everything else is also there, so it's not much of a problem. What would an algorithm for associating comments to nodes look like? I'm thinking of this:
This way, comments in weird places will get associated to the next nearest node, which sounds reasonable to me. I'm also interested in another opinions... @mbj, @yorickpeterse, @judofyr? |
Wouldn't a map/reduce like system (or just reduce in this case) work as well? Maybe I'm just not entirely understanding the description but consider the following:
In terms of code that would look something like the following: class Container
attr_reader :type, :value, :line
def initialize(type, value, line)
@type = type
@value = value
@line = line
end
def inspect
return "(#{type} #{value} #{line})"
end
end
comments = [
Container.new(:comment, 'hello', 1),
Container.new(:comment, 'world', 5)
]
ast = [
Container.new(:node, 'a', 2),
Container.new(:node, 'b', 3),
Container.new(:node, 'c', 4),
Container.new(:node, 'd', 6)
]
associations = {}
ast.each do |node|
comment = comments[0]
if comment and comment.line < node.line
associations[node] = comments.shift
end
end
p associations This would result in the following:
Note that we might be talking about the same algorithm here. |
@yorickpeterse Kinda the same. |
@whitequark My reasoning is simply that I need to be able to differentiate between |
@whitequark, @yorickpeterse Dont have a better / different idea than yours. |
Hm, I made #parse return [ast, comments] but it's kind of ugly. Maybe make separate #parse and #parse_with_comments? Not sure here. |
@whitequark Why not change the method signature of |
Um, no, I don't like such return value polymorphism at all. It means I don't know what is returned unless I have a hash literal at the call site. |
+1 for #parse_with_comments |
Done. Code:
Associations:
|
Not really a issue, just a question I wasn't sure where to ask.
rubocop has some comment style checks that need access to the tokens generated by the lexer, since comments for obvious reasons are not part of the parser AST. Basically I need some equivalent of
Ripper.lex
. I noticed theLexer
class in Parser's source code and its use to generate the output ofruby-parser -E
, but I'm quite certain how can I interact with it to simply get a list of tokens with their text and locations.The text was updated successfully, but these errors were encountered: