<script async src="https://www.googletagmanager.com/gtag/js?id=UA-59152712-8"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  gtag('config', 'UA-59152712-8');
</script>

# Convert LaTeX Sentence to SymPy Expression

## Author: Ken Sible

## The following module will demonstrate a recursive descent parser for LaTeX.

### NRPy+ Source Code for this module:
1. [latex_parser.py](../edit/latex_parser.py); [\[**tutorial**\]](Tutorial-LaTeX_SymPy_Conversion.ipynb) The latex_parser.py script will convert a LaTeX sentence to a SymPy expression using the following function: parse(sentence).

<a id='toc'></a>

# Table of Contents
$$\label{toc}$$

1. [Step 1](#intro): Introduction: Lexical Analysis and Syntax Analysis
1. [Step 2](#sandbox): Demonstration and Sandbox (LaTeX Parser)
1. [Step 3](#tensor): Tensor Support with Einstein Notation (WIP)
1. [Step 4](#latex_pdf_output): $\LaTeX$ PDF Output

<a id='intro'></a>

# Step 1: Lexical Analysis and Syntax Analysis \[Back to [top](#toc)\]
$$\label{intro}$$

In the following section, we discuss [lexical analysis](https://en.wikipedia.org/wiki/Lexical_analysis) (lexing) and [syntax analysis](https://en.wikipedia.org/wiki/Parsing) (parsing). In the process of lexical analysis, a lexer will tokenize a character string, called a sentence, using substring pattern matching (or tokenizing). We implemented a regex-based lexer for NRPy+, which does pattern matching using a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) for each token pattern. In the process of syntax analysis, a parser will receive a token iterator from the lexer and build a parse tree containing all syntactic information of the language, as specified by a [formal grammar](https://en.wikipedia.org/wiki/Formal_grammar). We implemented a [recursive descent parser](https://en.wikipedia.org/wiki/Recursive_descent_parser) for NRPy+, which will build a parse tree in [preorder](https://en.wikipedia.org/wiki/Tree_traversal#Pre-order_(NLR)), starting from the root [nonterminal](https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols), using a [right recursive](https://en.wikipedia.org/wiki/Left_recursion) grammar. The following right recursive, [context-free grammar](https://en.wikipedia.org/wiki/Context-free_grammar) was written for parsing [LaTeX](https://en.wikipedia.org/wiki/LaTeX), adhering to the canonical (extended) [BNF](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) notation used for describing a context-free grammar:
```
<ROOT>       -> <EXPRESSION> | ( <CONFIG> | <ASSIGNMENT> ) { <LINE_BREAK> ( <CONFIG> | <ASSIGNMENT> ) }*
<CONFIG>     -> '%' <ARRAY> '[' <INTEGER> ']' [ ':' <SYMMETRY> ] { ',' <ARRAY> '[' <INTEGER> ']' [ ':' <SYMMETRY> ] }*
<ASSIGNMENT> -> <VARIABLE> = <EXPRESSION>
<EXPRESSION> -> <TERM> { ( '+' | '-' ) <TERM> }*
<TERM>       -> <FACTOR> { [ '/' ] <FACTOR> }*
<FACTOR>     -> <BASE> { '^' <EXPONENT> }*
<BASE>       -> [ '-' ] ( <ATOM> | '(' <EXPRESSION> ')' | '[' <EXPRESSION> ']' )
<EXPONENT>   -> <BASE> | '{' <BASE> '}'
<ATOM>       -> <VARIABLE> | <NUMBER> | <COMMAND>
<VARIABLE>   -> <ARRAY> | <SYMBOL> [ '_' ( <SYMBOL> | <INTEGER> ) ]
<NUMBER>     -> <RATIONAL> | <DECIMAL> | <INTEGER>
<COMMAND>    -> <SQRT> | <FRAC>
<SQRT>       -> '\\sqrt' [ '[' <INTEGER> ']' ] '{' <EXPRESSION> '}'
<FRAC>       -> '\\frac' '{' <EXPRESSION> '}' '{' <EXPRESSION> '}'
<ARRAY>      -> ( <SYMBOL | <TENSOR> ) 
                    [ '_' ( <SYMBOL> | '{' { <SYMBOL> }+ '}' ) [ '^' ( <SYMBOL> | '{' { <SYMBOL> }+ '}' ) ]
                    | '^' ( <SYMBOL> | '{' { <SYMBOL> }+ '}' ) [ '_' ( <SYMBOL> | '{' { <SYMBOL> }+ '}' ) ] ]
```

<small>**Source**: Robert W. Sebesta. Concepts of Programming Languages. Pearson Education Limited, 2016.</small>

**TODO:** DEMONSTRATE PARSE TREE FOR A SIMPLE EXPRESSION

In [1]:
from latex_parser import * # Import NRPy+ module for lexing and parsing LaTeX
from sympy import srepr    # Import SymPy function for expression tree representation

In [2]:
lexer = Lexer(); lexer.initialize(r'\sqrt{5}(x + 2/3)^2')
print(', '.join(token for token in lexer.tokenize()))

SQRT_CMD, LEFT_BRACE, INTEGER, RIGHT_BRACE, LEFT_PAREN, SYMBOL, PLUS, RATIONAL, RIGHT_PAREN, CARET, INTEGER


In [3]:
expr = parse(r'\sqrt{5}(x + 2/3)^2', expression=True)
print(expr, ':', srepr(expr))

sqrt(5)*(x + 2/3)**2 : Mul(Pow(Integer(5), Rational(1, 2)), Pow(Add(Symbol('x'), Rational(2, 3)), Integer(2)))


<a id='sandbox'></a>

# Step 2: Demonstration and Sandbox (LaTeX Parser) \[Back to [top](#toc)\]
$$\label{sandbox}$$

We implemented a wrapper function for the parse() method that will accept a LaTeX sentence and return a SymPy expression. Furthermore, the entire parsing module was designed for extendibility. We apply the following procedure for extending parser functionality to include an unsupported LaTeX command: append that command to the grammar dictionary in the Lexer class with the mapping regex:token, write a grammar abstraction (similar to a regular expression) for that command, add the associated nonterminal (the command name) to the command abstraction in the Parser class, and finally implement the straightforward (private) method for parsing the grammar abstraction. We shall demonstrate the extension procedure using the `\sqrt` LaTeX command.

```<SQRT> -> '\\sqrt' [ '[' <INTEGER> ']' ] '{' <EXPRESSION> '}'```
```
def _sqrt(self):
    if self.accept('LEFT_BRACKET'):
        integer = self.lexer.lexeme
        self.expect('INTEGER')
        root = Rational(1, integer)
        self.expect('RIGHT_BRACKET')
    else: root = Rational(1, 2)
    self.expect('LEFT_BRACE')
    expr = self.__expr()
    self.expect('RIGHT_BRACE')
    return Pow(expr, root)
```

In [4]:
print(parse(r'\sqrt[3]{\alpha_0}', expression=True))

alpha_0**(1/3)


In addition to expression parsing, we included support for equation parsing, which will produce a dictionary mapping LHS $\mapsto$ RHS, where LHS must be a symbol, and insert that mapping into the global namespace of the previous stack frame, as demonstrated below.

In [5]:
parse(r'x = n\sqrt{2}^n'); print(x)

2**(n/2)*n


We implemented robust error messaging, using the custom `ParseError` exception, which should handle every conceivable case to identify, as detailed as possible, invalid syntax inside of a LaTeX sentence. The following are runnable examples of possible error messages (simply uncomment and run the cell):

In [6]:
# parse(r'\sqrt[*]{2}')
    # ParseError: \sqrt[*]{2}
    #                   ^
    # unexpected '*' at position 6

# parse(r'\sqrt[0.5]{2}')
    # ParseError: \sqrt[0.5]{2}
    #                   ^
    # expected token INTEGER at position 6

# parse(r'\command{}')
    # ParseError: \command{}
    #             ^
    # unsupported command '\command' at position 0

In the sandbox code cell below, you can experiment with the LaTeX parser using the wrapper function parse(sentence), where sentence must be a [raw string](https://docs.python.org/3/reference/lexical_analysis.html) to interpret a backslash as a literal character rather than an [escape sequence](https://en.wikipedia.org/wiki/Escape_sequence).

In [7]:
# Write Sandbox Code Here

<a id='tensor'></a>

# Step 3: Tensor Support with Einstein Notation (WIP) \[Back to [top](#toc)\]
$$\label{tensor}$$

In the following section, we demonstrate the current parser support for tensor notation using the Einstein summation convention. The first example will parse a simple equation for raising an index using the metric tensor, while assuming a 3-dimensional space (i.e. `i` and `j` range over `0, 1, 2`) and $g_{ij}$ symmetric:
$$v^i=g_{ij}v_j.$$
The second example will parse an equation for a simple tensor contraction, while assuming $h^\mu{}_\mu$ not symmetric:
$$h=h^\mu{}_\mu.$$

**TODO:** REMOVE THE FOLLOWING PARAGRAPH AND REPLACE WITH CONFIGURATION PARAGRAPH

We should mention that a future build of the parser would require a configuration file be specified before parsing a tensorial equation. The process demonstrated below for declaring a tensor, adding that tensor to a namespace, and passing that namespace to the parser would be eliminated.

**Configuration Syntax**: `% <TENSOR> [<DIMENSION>]: <SYMMETRY>, <TENSOR> [<DIMENSION>]: <SYMMETRY>, ... ;`

In [8]:
parse(r"""
    % g^{ij} [3]: sym01, v_j [3];
    v^i = g^{ij}v_j
""")

['gUU', 'vD', 'vU']

In [9]:
print('gUU = %s\nvU  = %s\nvD  = %s' % (gUU, vD, vU))

gUU = [[gUU00, gUU01, gUU02], [gUU01, gUU11, gUU12], [gUU02, gUU12, gUU22]]
vU  = [vD0, vD1, vD2]
vD  = [gUU00*vD0 + gUU01*vD1 + gUU02*vD2, gUU01*vD0 + gUU11*vD1 + gUU12*vD2, gUU02*vD0 + gUU12*vD1 + gUU22*vD2]


In [10]:
parse(r"""
    % h^\mu_\mu [4]: nosym;
    h = h^\mu{}_\mu
""")

['h', 'hUD']

In [11]:
print('h   = %s\nhUD = %s' % (h, hUD))

h   = hUD00 + hUD11 + hUD22 + hUD33
hUD = [[hUD00, hUD01, hUD02, hUD03], [hUD10, hUD11, hUD12, hUD13], [hUD20, hUD21, hUD22, hUD23], [hUD30, hUD31, hUD32, hUD33]]


**TODO**: ADD SECTION ABOUT ERROR HANDLING FOR TENSOR INDEXING

<a id='latex_pdf_output'></a>

# Step 4: Output this notebook to $\LaTeX$-formatted PDF file \[Back to [top](#toc)\]
$$\label{latex_pdf_output}$$

The following code cell converts this Jupyter notebook into a proper, clickable $\LaTeX$-formatted PDF file. After the cell is successfully run, the generated PDF may be found in the root NRPy+ tutorial directory, with filename
[Tutorial-LaTeX_SymPy_Conversion.pdf](Tutorial-LaTeX_SymPy_Conversion.pdf) (Note that clicking on this link may not work; you may need to open the PDF file through another means.)

In [12]:
import cmdline_helper as cmd    # NRPy+: Multi-platform Python command-line interface
cmd.output_Jupyter_notebook_to_LaTeXed_PDF("Tutorial-LaTeX_SymPy_Conversion")

Created Tutorial-LaTeX_SymPy_Conversion.tex, and compiled LaTeX file to PDF
    file Tutorial-LaTeX_SymPy_Conversion.pdf
