<script async src="https://www.googletagmanager.com/gtag/js?id=UA-59152712-8"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  gtag('config', 'UA-59152712-8');
</script>

# Convert LaTeX Sentence to SymPy Expression

## Author: Ken Sible

## The following module will demonstrate a recursive descent parser for LaTeX.

### NRPy+ Source Code for this module:
1. [latex_parser.py](../edit/latex_parser.py); [\[**tutorial**\]](Tutorial-LaTeX_SymPy_Conversion.ipynb) The latex_parser.py script will convert a LaTeX sentence to a SymPy expression using the following function: parse(sentence).

<a id='toc'></a>

# Table of Contents
$$\label{toc}$$

1. [Part 1](#lexparse): Introduction: Lexical Analysis and Syntax Analysis
1. [Part 2](#sandbox): Demonstration and Sandbox (LaTeX Parser)
1. [Part 3](#tensor_support) (Preliminary) Tensor Support
1. [Part 4](#latex_pdf_output): $\LaTeX$ PDF Output

<a id='lexparse'></a>

# Part 1: Lexical Analysis and Syntax Analysis \[Back to [top](#toc)\]
$$\label{lexparse}$$

In the following section, we discuss [lexical analysis](https://en.wikipedia.org/wiki/Lexical_analysis) (lexing) and [syntax analysis](https://en.wikipedia.org/wiki/Parsing) (parsing). In the process of lexical analysis, a lexer will tokenize a character string, called a sentence, using substring pattern matching (or tokenizing). We implemented a regex-based lexer for NRPy+, which does pattern matching using a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) for each token pattern. In the process of syntax analysis, a parser will receive a token iterator from the lexer and build a parse tree containing all syntactic information of the language, as specified by a [formal grammar](https://en.wikipedia.org/wiki/Formal_grammar). We implemented a [recursive descent parser](https://en.wikipedia.org/wiki/Recursive_descent_parser) for NRPy+, which will build a parse tree in [preorder](https://en.wikipedia.org/wiki/Tree_traversal#Pre-order_(NLR)), starting from the root [nonterminal](https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols), using a [right recursive](https://en.wikipedia.org/wiki/Left_recursion) grammar. The following right recursive, [context-free grammar](https://en.wikipedia.org/wiki/Context-free_grammar) was written for parsing [LaTeX](https://en.wikipedia.org/wiki/LaTeX), adhering to the canonical (extended) [BNF](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) notation used for describing a context-free grammar:
```
<ROOT>          -> <VARIABLE> = <EXPR> | <EXPR>
<EXPR>          -> [ - ] <TERM> { ( + | - ) <TERM> }
<TERM>          -> <FACTOR> { [ / ] <FACTOR> }
<FACTOR>        -> <SUBEXPR> { ^( <SUBEXPR> | {<EXPR>} ) }
<SUBEXPR>       -> <OPERAND> | (<EXPR>) | [<EXPR>]
<OPERAND>       -> <VARIABLE> | <NUMBER> | <COMMAND>
<VARIABLE>      -> <ARRAY> | <SYMBOL> [ _( <SYMBOL> | <INTEGER> ) ]
<NUMBER>        -> <RATIONAL> | <DECIMAL> | <INTEGER>
<COMMAND>       -> <SQRT> | <FRAC>
<SQRT>          -> \ sqrt [ [<INTEGER>] ] {<EXPR>}
<FRAC>          -> \ frac {<EXPR>} {<EXPR>}
<ARRAY>         -> <TENSOR> ( _( <SYMBOL> | {{ <SYMBOL> }} ) [ ^( <SYMBOL> | {{ <SYMBOL> }} ) ]
                    | ^( <SYMBOL> | {{ <SYMBOL> }} ) [ _( <SYMBOL> | {{ <SYMBOL> }} ) ] )
```

<small>**Source**: Robert W. Sebesta. Concepts of Programming Languages. Pearson Education Limited, 2016.</small>

In [1]:
from latex_parser import * # Import NRPy+ module for lexing and parsing LaTeX
from sympy import srepr    # Import SymPy function for expression tree representation

In [2]:
lexer = Lexer(); lexer.initialize(r'\sqrt{5}(x + 2/3)^2')
print(', '.join(token for token in lexer.tokenize()))

SQRT_CMD, LEFT_BRACE, INTEGER, RIGHT_BRACE, LEFT_PAREN, SYMBOL, PLUS, RATIONAL, RIGHT_PAREN, CARET, INTEGER


In [3]:
expr = parse(r'\sqrt{5}(x + 2/3)^2')
print(expr, ':', srepr(expr))

sqrt(5)*(x + 2/3)**2 : Mul(Pow(Integer(5), Rational(1, 2)), Pow(Add(Symbol('x'), Rational(2, 3)), Integer(2)))


<a id='sandbox'></a>

# Part 2: Demonstration and Sandbox (LaTeX Parser) \[Back to [top](#toc)\]
$$\label{sandbox}$$

We implemented a wrapper function for the parse() method that will accept a LaTeX sentence and return a SymPy expression. Furthermore, the entire parsing module was designed for extendibility. We apply the following procedure for extending parser functionality to include an unsupported LaTeX command: append that command to the grammar dictionary in the Lexer class with the mapping regex:token, write a grammar abstraction (similar to a regular expression) for that command, add the associated nonterminal (the command name) to the command abstraction in the Parser class, and finally implement the straightforward (private) method for parsing the grammar abstraction. We shall demonstrate the extension procedure using the `\sqrt` LaTeX command.

```<SQRT> -> sqrt [ [<INTEGER>] ] {<EXPRESSION>}```
```
def __sqrt(self):
	if self.__accept('LEFT_BRACKET'):
		root = self.lexer.word
		self.__expect('INTEGER')
		self.__expect('RIGHT_BRACKET')
	else: root = 2
	self.__expect('LEFT_BRACE')
	expr = self.__expression()
	self.__expect('RIGHT_BRACE')
	return 'Pow(%s, Rational(1, %s))' % (expr, root)
```

In [4]:
print(parse(r'\sqrt[3]{\alpha_0}'))

alpha_0**(1/3)


In addition to expression parsing, we included support for equation parsing, which will return a dictionary mapping LHS $\mapsto$ RHS where LHS must be a symbol.

In [5]:
print(parse(r'x = n\sqrt{2}^n'))

{x: 2**(n/2)*n}


In [6]:
eqn_list = [r'x_1 = x + 1', r'x_2 = x + 2', r'x_3 = x + 3']

var_map  = parse(eqn_list[0])
for eqn in eqn_list:
    var_map.update(parse(eqn))
print(var_map)

{x_1: x + 1, x_2: x + 2, x_3: x + 3}


We implemented robust error messaging, using the custom `ParseError` exception, which should handle every conceivable case to identify, as detailed as possible, invalid syntax inside of a LaTeX sentence. The following are runnable examples of possible error messages (simply uncomment and run the cell):

In [7]:
# parse(r'\sqrt[*]{2}')
    # ParseError: \sqrt[*]{2}
    #                   ^
    # unexpected '*' at position 6

# parse(r'\sqrt[0.5]{2}')
    # ParseError: \sqrt[0.5]{2}
    #                   ^
    # expected token INTEGER at position 6

# parse(r'\command{}')
    # ParseError: \command{}
    #             ^
    # unsupported command '\command' at position 0

In the sandbox code cell below, you can experiment with the LaTeX parser using the wrapper function parse(sentence), where sentence must be a [raw string](https://docs.python.org/3/reference/lexical_analysis.html) to interpret a backslash as a literal character rather than an [escape sequence](https://en.wikipedia.org/wiki/Escape_sequence).

In [8]:
# Write Sandbox Code Here

<a id='tensor_support'></a>


# Part 3: (Preliminary) Tensor Support \[Back to [top](#toc)\]
$$\label{tensor_support}$$

Here we demonstrate basic tensor support within the parser, including implied summation.

Let's first consider
$$
v^i = g^{ij}v_j,
$$
assuming that we are in 3D (`DIM=3`), and that $g^{ij}$ is symmetric in its indices.

In [9]:
# Note that the following section will be handled with a configuration
#  header in commented-out LaTeX prior to the equation. Basically if the
#  user inputs the tensorial expression without a configuration, a
#  reasonable default configuration would be implemented, and
#  We had something
#  in mind like:
# % DIM=3
# % rank1D: v
# % rank1U: v
# % rank2DD,sym01: g
import indexedexp as ixp
DIM=3 # Currently hardcoded, easy fix
vD = ixp.declarerank1('vD', DIM=DIM)
gUU = ixp.declarerank2('gUU', 'sym01', DIM=DIM)
namespace = {'vD': vD, 'gUU': gUU}
names = 'g v'
# The below should be all the user sees/needs other than the auto-generated/user-modified configuration.
from latex_parser import parse
parse(r'v^i = g^{ij}v_j', names, namespace)

{'vU': [gUU00*vD0 + gUU01*vD1 + gUU02*vD2,
  gUU01*vD0 + gUU11*vD1 + gUU12*vD2,
  gUU02*vD0 + gUU12*vD1 + gUU22*vD2]}

Let's try a tensor contraction:
$$
W = h^{jk} R_{jk},
$$
where both $h^{jk}$ and $R_{jk}$ are symmetric in their indices.

In [10]:
import indexedexp as ixp
import sympy as sp
DIM=3 # Currently hardcoded, easy fix
W = sp.symbols('W',real=True)
hUU = ixp.declarerank2('hUU', 'sym01', DIM=DIM)
RDD = ixp.declarerank2('RDD', 'sym01', DIM=DIM)
namespace = {'hUU': hUU, 'RDD': RDD}
names = 'h R'
# The below should be all the user sees/needs other than the auto-generated/user-modified configuration.
from latex_parser import parse
print("Known bug in parser: W does not appear:")
parse(r'W = h^{jk} R_{jk}', names, namespace)

Known bug in parser: W does not appear:


{'': RDD00*hUU00 + 2*RDD01*hUU01 + 2*RDD02*hUU02 + RDD11*hUU11 + 2*RDD12*hUU12 + RDD22*hUU22}

Now let's try out the distributive property, to compute
$$
v^i = \alpha g^{ij}(v_j + u_j)
$$

In [11]:
import indexedexp as ixp
import sympy as sp
DIM=3 # Currently hardcoded, easy fix
alpha = sp.symbols('alpha',real=True)
vD = ixp.declarerank1('vD', DIM=DIM)
uD = ixp.declarerank1('uD', DIM=DIM)
gUU = ixp.declarerank2('gUU', 'sym01', DIM=DIM)
namespace = {'alpha' : alpha,'vD': vD,'uD': uD, 'gUU': gUU}
names = 'g v u alpha'
# The below should be all the user sees/needs other than the auto-generated/user-modified configuration.
from latex_parser import parse
parse(r'v^i = \alpha g^{ij}(v_j + u_j)', names, namespace)

{'vU': [alpha*(gUU00*uD0 + gUU00*vD0 + gUU01*uD1 + gUU01*vD1 + gUU02*uD2 + gUU02*vD2),
  alpha*(gUU01*uD0 + gUU01*vD0 + gUU11*uD1 + gUU11*vD1 + gUU12*uD2 + gUU12*vD2),
  alpha*(gUU02*uD0 + gUU02*vD0 + gUU12*uD1 + gUU12*vD1 + gUU22*uD2 + gUU22*vD2)]}

<a id='latex_pdf_output'></a>

# Part 4: Output this notebook to $\LaTeX$-formatted PDF file \[Back to [top](#toc)\]
$$\label{latex_pdf_output}$$

The following code cell converts this Jupyter notebook into a proper, clickable $\LaTeX$-formatted PDF file. After the cell is successfully run, the generated PDF may be found in the root NRPy+ tutorial directory, with filename
[Tutorial-LaTeX_SymPy_Conversion.pdf](Tutorial-LaTeX_SymPy_Conversion.pdf) (Note that clicking on this link may not work; you may need to open the PDF file through another means.)

In [12]:
import cmdline_helper as cmd    # NRPy+: Multi-platform Python command-line interface
cmd.output_Jupyter_notebook_to_LaTeXed_PDF("Tutorial-LaTeX_SymPy_Conversion")

Created Tutorial-LaTeX_SymPy_Conversion.tex, and compiled LaTeX file to PDF
    file Tutorial-LaTeX_SymPy_Conversion.pdf
