Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anonymous rules must_err<> #97

Closed
engelant opened this issue Mar 14, 2018 · 4 comments
Closed

Anonymous rules must_err<> #97

engelant opened this issue Mar 14, 2018 · 4 comments
Assignees

Comments

@engelant
Copy link

I think it might be a good idea in terms of convinience to extend the availible match rules by must_err<R, ERR>.
The thought behind that is defining an Error message for an anonymous set of rules.
The implementation of must<R>is described as:

sor< R, raise< R > >

Having a scenario as following:

struct foo : seq< Spacing, string<'v', 'a', 'r'>, must< Spacing, Identifier, Spacing > > {};

I would need to define an error_control for this anonymous segment. Rather that or naming this sequence but I really would like to avoid that. Also this is ambigious, as Spacing, Identifier, Spacing is a very generic term, which may need very different error messages, depending of it's apperance inside a rule.
What I would suggest instead is:

struct foo : seq< Spacing, string<'v', 'a', 'r'>, must_err< seq< Spacing, Identifier, Spacing >, ERR_MSG("var must be followed by an Identifier") > > {};

Where ERR_MSG( "..." ) is a macro like TAO_PEGTL_STRING( "..." ), just returning something more simple than string<> (without match method).

must_err<> then rises an tao::pegtl::parse_error, with msg ERR_MSG( "..." ). While overwriting the content of msg with the custom message (together with filename / linenumber etc..), I would suggest parse_error would get some more methods/members, that give access to:

  • Filename (if any)
  • Line number
  • Character position
  • the "pure" error message
  • maybe some stuff I didn't think of right now

I tried it out but as always my template programming skills are very limited.
For the sake of simplicity I extended the given string<> template with a static data() method and created my own rule must_err<>.

         template< char... Cs >
         struct string
         {
           ... 
            static const std::string data() {
                const std::initializer_list< char >& l = { Cs... };
                const std::string str (l.begin(), l.end());
                return str;
            }
         };

And copied and modified the must<> rule

         template< typename Rule, class ERR_MSG = string<'f', 'o', 'o'> >
         struct must_err
         {
            using analyze_t = typename Rule::analyze_t;

            template< apply_mode A,
                      rewind_mode,
                      template< typename... > class Action,
                      template< typename... > class Control,
                      typename Input,
                      typename... States >
            static bool match( Input& in, States&&... st )
            {
               //Actually a raise should happen here
               std::cout << "Error: " << ERR_MSG::data() <<  std::endl;
               return true;
            }
         };

Looking forward on your thoughts on this topic.

@d-frey d-frey self-assigned this Mar 14, 2018
@d-frey
Copy link
Member

d-frey commented Mar 14, 2018

I'd try to keep things simple. What you really want is a custom error message that is embedded in the grammar. There is no need to tie that to the previous rule that did not match, hence instead of must_err< Rule, ERROR_MACRO("...") >, I'd just use sor< Rule, RAISE_MACRO("...") > where the latter macro throws an exception with a custom error message (like raise<>). Now the question is whether you want/need to route that through the Control class' raise()-method. If you do not need that, you could simply try and add

template< char... Cs >
struct raise_message
{
   using analyze_t = analysis::generic< analysis::rule_type::ANY >;

   template< typename Input >
   static void raise( const Input& in, const std::initializer_list< char >& l )
   {
      throw parse_error( std::string( &*l.begin(), l.size() ), in );
   }

   template< apply_mode A,
             rewind_mode,
             template< typename... > class Action,
             template< typename... > class Control,
             typename Input,
             typename... States >
   static bool match( Input& in, States&&... /*unused*/ )
   {
      raise( in, { Cs... } );
   }
};

#define TAO_PEGTL_RAISE( x ) \
   TAO_PEGTL_INTERNAL_STRING( raise_message, x )

(Note: The above is untested, I currently have some trouble running my development VM)

About your final point: parse_error has what() to access the actual message, and a public member for the positions. The latter is a std::vector<position>, as an error may result from a file included by another file or from expanding a macro or what have you. You can access e.positions.front().source or e.positions.front().line. The information is there, but not as convenient as you hope. OTOH, the nesting makes it kind of wrong to add accessors on a higher level. But I'll discuss it with Colin, let's see if we'll add some convenience accessors.

@engelant
Copy link
Author

The member std::vector<position> inside parse_error makes sense, honestly I overlooked it.

Regarding keeping it simple and the sor< Rule, RAISE_MACRO("...") > construct, yes, that would work. But since there are other rules defined for convenience this leaves me with the question: Am I using this wrong? Am I supposed to split my rules into (wierdly?) named sub rules and define their error_control::message?
If I'm not misusing this library then this would be a common situation, and I suggest It would make sense to add those "convenience" rule and/or macro to the library header.

@d-frey
Copy link
Member

d-frey commented Mar 14, 2018

I don't think it is wrong, it is just another style. We don't want to force any style on you, instead we try to support as many (reasonable) styles as possible. Feel free to experiment with it, having your own custom-rules when using the PEGTL is quite normal.

Our reasons for separating the error message definitions from the actual grammar are: The grammar is shorter and easier to read, must and the convenience rules based on must provide convenient error hooks and help to reason about the grammar itself. Writing grammars is complicated enough already :) Additionally, you could have different error messages, e.g. localized to different languages, without changing the grammar.

We'll consider adding more convenience rules at another time, if you (or us) have more experience with the style you proposed.

@engelant
Copy link
Author

Ok, so be it (for now). I will continue to implement my parser mith my own style and see if it all comes together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants