libpypa is a Python parser implemented in pure C++
C++
Switch branches/tags
Nothing to show
Clone or download

README.md

Coverity Scan Build Status Build Status

Join the chat at https://gitter.im/vinzenz/libpypa

libpypa - A Python Parser Library in C++

Introduction

libpypa is a Python parser implemented in pure C++. It neither uses any tools like flex, yacc, bison etc, nor is it using any parser framework like Boost.Spirit. It's implementation is pure C++ code.

Motivation

I started getting involved into the pyston project where it had an entry in their getting involved list for implementing a parser for Python. Never having properly tackled the problem of creating a parser library for any language, I decided it might be worth a try, since most of the libraries I found, where basically just using the builtin Python parser or where implemented in Python itself.

Goal

The first goal of the library is to support python 2.7 syntax, later on 3.x syntax might be added.

Example

An example file:

$cat hello_world.py
#! /usr/bin/env python
# -*- coding: utf-8 -*-
#

"""
    A "Hello World" example for the pypa parser
"""
import sys

print >> sys.stdout, "Hello", "World!"

And here the output of the test parser:

$ ./parser-test hello_world.py
Parsing successfull

[Module]
  - body:
    [Suite]
      - items: [
            [DocString]
              - doc:
    A "Hello World" example for the pypa parser


            [Import]
              - names:
                [Alias]
                  - as_name: <NULL>
                  - name:
                    [Name]
                      - context: Load
                      - dotted: False
                      - id: sys

            [Print]
              - destination:
                [Attribute]
                  - attribute:
                    [Name]
                      - context: Load
                      - dotted: False
                      - id: stdout
                  - context: Load
                  - value:
                    [Name]
                      - context: Load
                      - dotted: False
                      - id: sys
              - newline: True
              - values: [
                    [Str]
                      - value: Hello

                    [Str]
                      - value: World!
                    ]
            ]
  - kind: Module

And here the parse tree of python: (astdump.py can be found in tools)

[Module]
    - body: [

        [Expr]
            - value:
            [Str]
                - s:
    A "Hello World" example for the pypa parser


        [Import]
            - names: [

                [alias]
                    - asname: None
                    - name: sys
            ]

        [Print]
            - dest:
            [Attribute]
                - attr: stdout
                - ctx: Load
                - value:
                [Name]
                    - ctx: Load
                    - id: sys
            - nl: True
            - values: [

                [Str]
                    - s: Hello

                [Str]
                    - s: World!
            ]
    ]

Error Reporting

The parser supports also SyntaxError and IndentionError reporting:

Let's take a look at this file syntax_error.py which clearly has a syntax error:

#! /usr/bin/env python
# -*- coding: utf-8 -*-
"""
    Syntax error example
"""

print x y z

This is the output of the test parser:

$./parser-test syntax_error.py
  File "syntax_error.py", line 7
    print x y z
            ^
SyntaxError: Expected new line after statement
-> Reported @pypa/parser/parser.cc:944 in bool pypa::simple_stmt(pypa::{anonymous}::State&, pypa::AstStmt&)

Parsing failed

And this of cpython 2.7:

$ python syntax_error.py
  File "syntax_error.py", line 7
    print x y z
            ^
SyntaxError: invalid syntax

libpypa uses different error messages than python, however in the hopes that that would increase the clarity.

Requirements

To be able using libpypa, you have to have a C++11 compiler available. libpypa was developed on top of g++ 4.8.2 and it heavily uses C++11 features where seen fit.

libpypa currently does not depend on any other libraries than the C++11 standard library with the exception of the class FileBuf which currently uses system libraries, but might be changed to just use fopen/fread/ fclose.

Structure

libpypa currently consists of 3 major parts:

  1. Lexer
  2. Parser
  3. AST

Lexer

The Lexer portion of the library tokenizes the input for the Parser and distinguishes the different types of tokens for the Parser.

Parser

The Parser utilizes the Lexer to parse the input and generates a preliminary AST from the input.

AST

The AST contains the definition of all syntax elements in the code. The main parts of the definition are in pypa/ast/ast.hh which makes heavily use of preprocessor macros to define typedefs, mappings for compile time type lookups by AstType (enum class), and an implementation for a switch based visitor.

The AST types do not implement any methods, they are just structures with data. The only thing which is in there for some of the bases is the constructor, to set the type id value and initialize the line and column values.

License

Copyright 2014 Vinzenz Feenstra

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

License for src/double-conversion

Copyright 2006-2011, the V8 project authors. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above
      copyright notice, this list of conditions and the following
      disclaimer in the documentation and/or other materials provided
      with the distribution.
    * Neither the name of Google Inc. nor the names of its
      contributors may be used to endorse or promote products derived
      from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.