Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master' into fix-issue-216
Browse files Browse the repository at this point in the history
  • Loading branch information
Bjoe committed Dec 3, 2020
2 parents b22fec0 + e32db6b commit be65e10
Show file tree
Hide file tree
Showing 12 changed files with 416 additions and 96 deletions.
9 changes: 8 additions & 1 deletion doc/Changelog.md
@@ -1,8 +1,15 @@
# Changelog

## 3.0.1

Released: (not yet)

* Optionally made `analyze()` more verbose to aid finding the rule cycles.


## 3.0.0

**Not yet released**
Released 2020-11-28

* Use the [**migration guide**](Migration-Guide.md#version-300) when updating.
* Infrastructure
Expand Down
37 changes: 2 additions & 35 deletions doc/Contrib-and-Examples.md
Expand Up @@ -31,36 +31,6 @@ For all questions and remarks contact us at **taocpp(at)icemx.net**.
* Ready for production use.
* Superceeded by `TAO_PEGTL_STRING()`.

###### `<tao/pegtl/contrib/change_action.hpp>`

* Changes the action class template.
* Ready for production use.

###### `<tao/pegtl/contrib/change_action_and_state.hpp>`

* Changes the action class template and the state.
* Ready for production use but might be changed in the future.

###### `<tao/pegtl/contrib/change_control.hpp>`

* Changes the control class template.
* Ready for production use.

###### `<tao/pegtl/contrib/change_state.hpp>`

* Changes the state.
* Ready for production use but might be changed in the future.

###### `<tao/pegtl/contrib/disable_action.hpp>`

* Disables actions.
* Ready for production use.

###### `<tao/pegtl/contrib/enable_action.hpp>`

* Enables actions.
* Ready for production use.

###### `<tao/pegtl/contrib/http.hpp>`

* HTTP 1.1 grammar according to [RFC 7230](https://tools.ietf.org/html/rfc7230).
Expand Down Expand Up @@ -98,12 +68,9 @@ For all questions and remarks contact us at **taocpp(at)icemx.net**.

Utility function `to_string<>()` that converts template classes with arbitrary sequences of characters as template arguments into a `std::string` that contains these characters.

###### `<tao/pegtl/contrib/tracer.hpp>`
###### `<tao/pegtl/contrib/trace.hpp>`

* Control class that prints a line of information to `std::cerr`
1. when and where a rule is attempted to match,
2. when and where a rule succeeded to match,
3. when and where a rule failed to match.
* See [Tracer](Getting-Started.md#tracer).

###### `<tao/pegtl/contrib/unescape.hpp>`

Expand Down
2 changes: 1 addition & 1 deletion doc/Control-and-Debug.md
Expand Up @@ -114,7 +114,7 @@ If either produce a (local) failure then `C< R >::failure()` is called.

In all cases where an action is called, the success or failure hooks are invoked after the action returns.

The included class `tao::pegtl::tracer` in `<tao/pegtl/contrib/tracer.hpp>` gives a practical example that can be used as control class to debug grammars.
The included `<tao/pegtl/contrib/trace.hpp>` gives a practical example that shows how the control class can be used to debug grammars.

## Exception Throwing

Expand Down
61 changes: 33 additions & 28 deletions doc/Getting-Started.md
Expand Up @@ -60,18 +60,19 @@ namespace hello

int main( int argc, char* argv[] )
{
if( argc > 1 ) {
// Start a parsing run of argv[1] with the string
// variable 'name' as additional argument to the
// action; then print what the action put there.
if( argc != 2 ) return 1;

std::string name;
// Start a parsing run of argv[1] with the string
// variable 'name' as additional argument to the
// action; then print what the action put there.

pegtl::argv_input in( argv, 1 );
pegtl::parse< hello::grammar, hello::action >( in, name );
std::string name;

std::cout << "Good bye, " << name << "!" << std::endl;
}
pegtl::argv_input in( argv, 1 );
pegtl::parse< hello::grammar, hello::action >( in, name );

std::cout << "Good bye, " << name << "!" << std::endl;
return 0;
}
```
Expand Down Expand Up @@ -105,6 +106,8 @@ The correct way of handling errors is shown at the last paragraph of this page.
## Parsing Expression Grammars

The PEGTL creates parsers according to a [Parsing Expression Grammar](http://en.wikipedia.org/wiki/Parsing_expression_grammar) (PEG).
The table below shows how the PEG combinators map to PEGTL [rule classes](Rule-Reference.md#combinators) (strictly speaking: class templates).
Beyond these standard combinators the PEGTL contains a [large number of additional combinators](Rule-Reference.md) as well as the possibility of [creating custom rules](Rules-and-Grammars.md#creating-new-rules).

| PEG | `tao::pegtl::` |
| --- | --- |
Expand All @@ -116,16 +119,16 @@ The PEGTL creates parsers according to a [Parsing Expression Grammar](http://en.
| *e*<sub>1</sub> / *e*<sub>2</sub> | [`sor< R... >`](Rule-Reference.md#sor-r-) <sup>[(combinators)](Rule-Reference.md#combinators)</sup> |
| *e** | [`star< R... >`](Rule-Reference.md#star-r-) <sup>[(combinators)](Rule-Reference.md#combinators)</sup> |

The PEGTL also contains a [large number of atomic rules](Rule-Reference.md) for matching ASCII and Unicode characters, strings, ranges and similar, beginning-of-file or end-of-line and similar, and more...

## Grammar Analysis

Every grammar must be free of cycles that make no progress, i.e. the cycle does not consume any input.
This is a common problem in parsing called [left recursion](https://en.wikipedia.org/wiki/Left_recursion).
Especially with the PEG formalism, it results in an infinite loop and, eventually, in a stack overflow.
Every grammar must be free of cycles that make no progress, i.e. it must not contain unbounded recursive or iterative rules that do not consume any input, as such grammar might enter an infinite loop.
One common pattern for these kinds of problematic grammars is the so-called [left recursion](https://en.wikipedia.org/wiki/Left_recursion) that, while not a problem for less deterministic formalisms like CFGs, must be avoided with PEGs in order to prevent aforementioned infinite loops.

The PEGTL provides a [grammar analysis](Grammar-Analysis.md) with which a grammar can be verified.
Note that this is done at runtime as a pure compile-time analysis would lead to insupportable compile-times.
The analysis, however, is only based on the grammar itself and not on a specific input.
Additionally, the analysis is typically written as a separate program to keep any overhead from your normal applications.
The PEGTL provides a [grammar analysis](Grammar-Analysis.md) which analyses a grammar for cycles that make no progress.
While it could be implemented with compile-time meta-programming, to prevent the compiler from exploding the analysis is done at run-time.
It is best practice to create a separate dedicated program that does nothing else but run the grammar analysis, thus keeping this development and debug aid out of the main application.

```c++
#include <tao/pegtl.hpp>
Expand All @@ -148,11 +151,12 @@ int main()
return 0;
}
```
For more information see [Grammar Analysis](Grammar-Analysis.md).

## Tracer

One of the most basic tools when developing a grammar is a tracer that prints every step of a parsing run.
The PEGTL provides a tracer that will print to stderr, as well as allowing users to write their own tracers to output other formats.
A fundamental tool used when developing a grammar is a tracer that prints every step of a parsing run, thereby showing exactly which rule was attempted to match where, and what the result was.
The PEGTL provides a tracer that will print to `stderr`, and of course allows users to write their own tracers with custom output formats.

```c++
#include <tao/pegtl.hpp>
Expand All @@ -169,26 +173,25 @@ int main( int argc, char** argv )
{
if( argc != 2 ) return 1;

pegtl::argv_input in( argv, i );
pegtl::argv_input in( argv, 1 );
pegtl::standard_trace< grammar >( in );

return 0;
}
```

In the above each command line parameter is parsed as a JSON string.
As the output gets long quickly, we will not show it here, please have a look at the [Tracer](Tracer.md) documentation.
In the above each command line parameter is parsed as a JSON string and a trace is given to understand how the grammar matches the input.

TODO: Write `Tracer.md`.
For more information see `tao/pegtl/contrib/trace.hpp`.

## Parse Tree / AST

When developing grammars, a common goal is to generate a [parse tree](https://en.wikipedia.org/wiki/Parse_tree) or an [AST](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
When developing parsers, a common goal after creating the grammar is to generate a [parse tree](https://en.wikipedia.org/wiki/Parse_tree) or an [AST](https://en.wikipedia.org/wiki/Abstract_syntax_tree).

The PEGTL provides a [Parse Tree](Parse-Tree.md) builder that can filter and/or transform tree nodes on-the-fly.
Additionally, a helper is provided to print out the resulting data structure in the [DOT](https://en.wikipedia.org/wiki/DOT_(graph_description_language)) format, suitable for creating a graphical representation of the parse tree.
Additionally, a helper is provided to print out the resulting data structure in [DOT](https://en.wikipedia.org/wiki/DOT_(graph_description_language)) format, suitable for creating a graphical representation of the parse tree.

The following example uses a selector to filter the parse tree nodes, as otherwise the graphical representation may become confusing quite quickly.
The following example uses a selector to choose which rules generate parse tree nodes, as the graphical representation will usually be too large and confusing when not using a filter and generating nodes for *all* rules.

```c++
#include <tao/pegtl.hpp>
Expand Down Expand Up @@ -220,7 +223,7 @@ int main( int argc, char** argv )
{
if( argc != 2 ) return 1;

pegtl::argv_input in( argv, i );
pegtl::argv_input in( argv, 1 );
const auto root = parse_tree::parse< grammar, selector >( in );
if( root ) {
parse_tree::print_dot( std::cout, *root );
Expand All @@ -240,10 +243,12 @@ The above will generate an SVG file with a graphical representation of the parse

![JSON Parse Tree](Json-Parse-Tree.svg)

For more information see [Parse Tree](Parse-Tree.md).

## Error Handling

Although the PEGTL could be used without exceptions, most programs will use input classes or grammars that might throw exceptions.
Typically, the following pattern helps to print the exceptions properly:
Although the PEGTL could be used without exceptions, most programs will use input classes, grammars and/or actions that can throw exceptions.
Typically, the following pattern helps to print the exceptions in a human friendly way:

```c++
// The surrounding try/catch for normal exceptions.
Expand Down
2 changes: 1 addition & 1 deletion doc/Grammar-Analysis.md
Expand Up @@ -21,7 +21,7 @@ const std::size_t issues = tao::pegtl::analyze< my_grammar >();
```

The `analyze()` function prints some information about the found issues to `std::cout` and returns the total number of issues found.
The output can be suppressed by passing `false` as sole function argument.
The output can be suppressed by passing `-1` as sole function argument, or be extended to give some information about the issues when called with `1`.

Analysing a grammar is usually only done while developing and debugging a grammar, or after changing it.

Expand Down
8 changes: 7 additions & 1 deletion doc/README.md
Expand Up @@ -96,10 +96,16 @@
* [Full Parse Tree](Parse-Tree.md#full-parse-tree)
* [Partial Parse Tree](Parse-Tree.md#partial-parse-tree)
* [Transforming Nodes](Parse-Tree.md#transforming-nodes)
* [Transformers](Parse-Tree.md#transformers)
* [Transformer](Parse-Tree.md#transformer)
* [`tao::pegtl::parse_tree::node`](Parse-Tree.md#taopegtlparse_treenode)
* [Custom Node Class](Parse-Tree.md#custom-node-class)
* [Meta Data and Visit](Meta-Data-and-Visit.md)
* [Internals](Meta-Data-and-Visit.md#internals)
* [Rule Type](Meta-Data-and-Visit.md#rule-type)
* [Sub Rules](Meta-Data-and-Visit.md#sub-rules)
* [Grammar Visit](Meta-Data-and-Visit.md#grammar-visit)
* [Grammar Print](Meta-Data-and-Visit.md#grammar-print)
* [Rule Coverage](Meta-Data-and-Visit.md#rule-coverage)
* [Contrib and Examples](Contrib-and-Examples.md)
* [Contrib](Contrib-and-Examples.md#contrib)
* [Examples](Contrib-and-Examples.md#examples)
Expand Down
67 changes: 41 additions & 26 deletions include/tao/pegtl/contrib/analyze.hpp
Expand Up @@ -20,6 +20,7 @@
#include "analyze_traits.hpp"

#include "internal/set_stack_guard.hpp"
#include "internal/vector_stack_guard.hpp"

#include "../internal/dependent_false.hpp"

Expand Down Expand Up @@ -51,8 +52,9 @@ namespace TAO_PEGTL_NAMESPACE
[[nodiscard]] std::size_t problems()
{
for( auto i = m_info.begin(); i != m_info.end(); ++i ) {
m_results[ i->first ] = work( i, false );
m_cache.clear();
assert( m_trace.empty() );
assert( m_stack.empty() );
m_results[ i->first ] = work( *i, false );
}
return m_problems;
}
Expand All @@ -64,72 +66,85 @@ namespace TAO_PEGTL_NAMESPACE
}

protected:
explicit analyze_cycles_impl( const bool verbose ) noexcept
explicit analyze_cycles_impl( const int verbose ) noexcept
: m_verbose( verbose ),
m_problems( 0 )
{}

[[nodiscard]] std::map< std::string_view, analyze_entry >::const_iterator find( const std::string_view name ) const noexcept
[[nodiscard]] const std::pair< const std::string_view, analyze_entry >& find( const std::string_view name ) const noexcept
{
const auto iter = m_info.find( name );
assert( iter != m_info.end() );
return iter;
return *iter;
}

[[nodiscard]] bool work( const std::map< std::string_view, analyze_entry >::const_iterator& start, const bool accum )
[[nodiscard]] bool work( const std::pair< const std::string_view, analyze_entry >& info, const bool accum )
{
if( const auto j = m_cache.find( start->first ); j != m_cache.end() ) {
return j->second;
}
if( const auto g = set_stack_guard( m_stack, start->first ) ) {
switch( start->second.type ) {
if( const auto g = set_stack_guard( m_stack, info.first ) ) {
const auto v = vector_stack_guard( m_trace, info.first );
switch( info.second.type ) {
case analyze_type::any: {
bool a = false;
for( const auto& r : start->second.subs ) {
for( const auto& r : info.second.subs ) {
a = a || work( find( r ), accum || a );
}
return m_cache[ start->first ] = true;
return true;
}
case analyze_type::opt: {
bool a = false;
for( const auto& r : start->second.subs ) {
for( const auto& r : info.second.subs ) {
a = a || work( find( r ), accum || a );
}
return m_cache[ start->first ] = false;
return false;
}
case analyze_type::seq: {
bool a = false;
for( const auto& r : start->second.subs ) {
for( const auto& r : info.second.subs ) {
a = a || work( find( r ), accum || a );
}
return m_cache[ start->first ] = a;
return a;
}
case analyze_type::sor: {
bool a = true;
for( const auto& r : start->second.subs ) {
for( const auto& r : info.second.subs ) {
a = a && work( find( r ), accum );
}
return m_cache[ start->first ] = a;
return a;
}
}
assert( false ); // LCOV_EXCL_LINE
}
assert( !m_trace.empty() );

if( !accum ) {
// LCOV_EXCL_START
++m_problems;
if( m_verbose ) {
std::cerr << "problem: cycle without progress detected at rule class " << start->first << std::endl; // LCOV_EXCL_LINE
if( ( m_verbose >= 0 ) && ( m_trace.front() == info.first ) ) {
for( const auto& r : m_trace ) {
if( r < info.first ) {
return accum;
}
}
std::cerr << "problem: cycle without progress detected at rule " << info.first << std::endl;
if( m_verbose > 0 ) {
for( const auto& r : m_trace ) {
std::cerr << " involved (transformed) rule: " << r << std::endl;
}
}
}
// LCOV_EXCL_END
}
return m_cache[ start->first ] = accum;
return accum;
}

const bool m_verbose;
const int m_verbose;

std::size_t m_problems;

std::map< std::string_view, analyze_entry > m_info;

std::set< std::string_view > m_stack;
std::map< std::string_view, bool > m_cache;
std::vector< std::string_view > m_trace;
std::map< std::string_view, bool > m_results;
};

Expand All @@ -156,7 +171,7 @@ namespace TAO_PEGTL_NAMESPACE
: public analyze_cycles_impl
{
public:
explicit analyze_cycles( const bool verbose )
explicit analyze_cycles( const int verbose )
: analyze_cycles_impl( verbose )
{
analyze_insert< Grammar >( m_info );
Expand All @@ -166,7 +181,7 @@ namespace TAO_PEGTL_NAMESPACE
} // namespace internal

template< typename Grammar >
[[nodiscard]] std::size_t analyze( const bool verbose = true )
[[nodiscard]] std::size_t analyze( const int verbose = 0 )
{
return internal::analyze_cycles< Grammar >( verbose ).problems();
}
Expand Down

0 comments on commit be65e10

Please sign in to comment.