Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC for the event-based plugin system with YARD parsing plugin #1321

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

okuramasafumi
Copy link
Contributor

This PR introduces the event-based plugin system along with a YARD-parsing plugin as an example of concrete plugins.

It demonstrates how we can build practical plugins while keeping the core code small. In this case, YARD-related logic is encapsulated in a plugin and used only when users specify with --plugins option.
This YARD plugin successfully parses https://github.com/okuramasafumi/alba comments written with YARD, but doesn't print YARD-specific information such as types since current RDoc doesn't support type information.

This PR is completely proof of concept, and is not meant to be merged now.
I'd like to discuss if this is the right way to extend RDoc for developers, and even for core team, and improve maintainability.

Comment on lines +879 to +881
opt.on("--plugins=PLUGINS", "-P", Array, "Use plugins") do |value|
@plugins.concat value
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use multiple --plugin to add multiple plugins.

Suggested change
opt.on("--plugins=PLUGINS", "-P", Array, "Use plugins") do |value|
@plugins.concat value
end
opt.on("--plugin=PLUGIN", "-P", "Use plugin") do |plugin|
@plugins << plugin
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which interface is better.

# This
rdoc --plugins yard_plugin,another_plugin

# Or that
rdoc --plugin yard_plugin --plugin another_plugin

Correct me if I'm wrong, but I think the later way is kind of rare.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "rare" mean here?
The latter is used in many cases. For example, ruby's -I is the latter interface. curl's --data/--header/... are also the latter interface.

The latter style is easyer to use from a script:

args=()
if [ "$USE_X_PLUGIN" = "yes" ]; then
  args+=(--plugin X_plugin)
fi
if [ "$USE_Y_PLUGIN" = "yes" ]; then
  args+=(--plugin Y_plugin)
fi
rdoc "${args[@]}" ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou Thank you for correcting me, after some investigation, I noticed that we have both styles in Ruby and other tools.
Personally I don't have a strong opinion here. For the future version we can consider YAML configuration as a primary way to enable plugins (like rubocop).
I'm not against using multiple --plugin option here.

@@ -449,6 +455,9 @@ def document options
end
@options.finish

::RDoc::BasePlugin.activate_with(self)
@options.load_plugins
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we load plugins dynamically and use require, plugins will not be loaded to multiple RDoc instance.

Do we need to load plugins for each RDoc::RDoc#document?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, initialize is a better place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#initialize is better than #document but do we need to load plugins for each RDoc instance?
Can we load plugins only once in one process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot see that using multiple RDoc::RDoc instance is common. From CLI or Rake task, I think we have only one instance of RDoc::RDoc.
However, we might have multiple instances in the future, probably for concurrency.
So then, where is the ideal place to activate plugins? I through it's RDoc::Rdoc since it's in the execution path anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gem install A B C creates a RDoc::RDoc for each package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a plugin system in RDoc::Markup::PreProcess.

Could you explain what is the merit of "the event-based plugin system" you suggested (and problems in the existing system)? Should existing RDoc::Markup::PreProcess be deprecated by this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that RDoc::Markup::PreProcess requires adding directives to existing code, and that is a problem. One of the advantages of a new plugin system is that just enabling plugins works without modifying existing code.

Should existing RDoc::Markup::PreProcess be deprecated by this?

Using directives has its own advantages, so I'd like to keep it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that RDoc::Markup::PreProcess has the "post_process" hook too:

##
# Adds a post-process handler for directives. The handler will be called
# with the result RDoc::Comment (or text String) and the code object for the
# comment (if any).
def self.post_process &block
@post_processors << block
end

Should we deprecate it by suggested plugin system?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, this PreProcess works with directives, and this time it's not I want to work with. The goal of this PR is to introduce a way to make RDoc understand YARD without modifying existing code.
That said, we don't have to (or should not) deprecate post_process here since it's about directives and removing it will be backward-incompatible, and cannot be replaced by the plugin system in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. It seems that the "post_process" hook isn't depend on directives.

Could you try the following?

RDoc::Markup::PreProcess.post_process do |comment, code_object|
  if code_object.is_a?(RDoc::AnyMethod)
    puts "Parsing #{code_object.name}"
  end
end

It seems that the hook is called for all methods.

Comment on lines +14 to +15
# meth.params = parsed_comment.param.map(&:to_s).join("\n")
meth.comment.text = parsed_comment.plain.join("\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugins should update the given objects destructively?
How to extend RDoc by a plugin when existing RDoc objects don't have a place to store additional data? In this case, how to use the parsed type information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I consider this and don't have a confident conclusion now.
I think there are two ways, one is using env object and another is modifying existing objects (meth in his case).
Using env would be safer, and storing type information would look like:

# In a plugin
env[meth] = DataFromYardPlugin.new(type: type)

# In a generator
type = env[meth].type

Here, DataFromYardPlugin contains various information along with type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are multiple plugins that want to attach additional information to meth, the approach may be fragile. (One plugin may overwrite env[meth] set by another plugin.)

Can we provide more safer mechanism?

Comment on lines +54 to +55
current_indentation_level = line[/^#\s*/]&.size || 0
if current_indentation_level >= @base_indentation_level + 2
Copy link
Member

@kou kou Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work when the first line is # ...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly this indentation-related implementation is not perfect and will fail for many edge cases. I'll definitely add test cases for this.

}
def initialize(comment)
@comment = comment
@parsed_comment = ParsedComment.new([], nil, [], [])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we can use a local variable not an instance variable for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, thank you for pointing out.

desc = matchdata[:desc1] || matchdata[:desc2]
parsed_type = TypeParser.parse(type)
@parsed_comment[:param] << ParamData.new(type: parsed_type, name: name, desc: desc)
@mode = :param
Copy link
Member

@tompng tompng Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing previous_tag here instead of storing mode = :param, appending part will be simple

tag = ParamData.new(type: parsed_type, name: name, desc: desc)
@parsed_comment[:param] << tag
previous_tag = tag
# Append to the previous tag
- data = @mode == :param || @mode == :raise ? @parsed_comment[@mode].last : @parsed_comment[@mode]
- data.append_desc(line)
+ previous_tag.append_desc(line)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I'll take this!

type_name: /#\w+|((::)?\w+)+/,
literal: /(?:
'(?:\\'|[^'])*' |
"(?:\\"|[^"])*" |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some regexps are wrong.
"\" (escaped, quote is not closed) matches to this string literal regexp.
Integer and Float literal(integer part) matches to type_name: /#\w+|((::)?\w+)+/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't notice this. I tested this code with some variations of type and it seems working. Tests are missing and we need to add them to ensure this regex working correctly since it's quite complex.

# Without calling this, plugins won't work

def self.activate_with(rdoc = ::RDoc::RDoc.current)
@@rdoc = rdoc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using class variable for this means any later plugin calls activate_with with its own RDoc will apply that to all other plugins. I don't think this is what we want?

class Base
  @@name = "Base"

  def name
    @@name
  end

  def self.update_name(name)
    @@name = name
  end
end

class Foo < Base
end

class Bar < Base
end

Foo.update_name("Foo")
f = Foo.new

Bar.update_name("Bar")
b = Bar.new

puts f.name # => "Bar"
puts b.name # => "Bar"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it's safe to use a class variable here since we don't have to update rdoc object.
However, some information such as a name of the plugin, should be stored as a class instance variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If we don't expect plugins to call this method, then I don't think it should be in the base class as it is.
  • It doesn't look like the yard plugin needs this attribute yet? If that's the case, let's not introduce this until we have a real use case to help us define the API better.
  • If we want to pass RDoc's states down to plugins, let's avoid passing the RDoc object unless we have a solid use case for it. There's a lot of unnecessary coupling between RDoc's major components and I have been untangling for a while (example). Exposing RDoc directly will potentially make it harder for such work as all of its attributes & methods will be considered public APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I didn't notice that it's not used at all now. I used it at first, but store has enough information for the plugin so I removed it.
Then we can remove this class variable. And I agree, we should avoid RDoc::RDoc object (thank you for untangling it!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd expect we expose Options and Store objects first if we need to share states with plugins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/ruby/rdoc/pull/1321/files#diff-2458b7fcdf31ccd44fa0681ba7e381360b3e4623c9f5409358222e4826c3c60dR6

Sorry it turned out that we do use rdoc object here because we need some global data store for event handling.
I still agree that we should avoid RDoc::RDoc instance, but then where should we put these event registration logic? Maybe we can introduce something like PluginManager and pass it to plugins?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants