<script async src="https://www.googletagmanager.com/gtag/js?id=UA-59152712-8"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  gtag('config', 'UA-59152712-8');
</script>

# Convert LaTeX Sentence to SymPy Expression

## Author: Ken Sible

## The following module will demonstrate a recursive descent parser for LaTeX.

### NRPy+ Source Code for this module:
1. [latex_parser.py](../edit/latex_parser.py); [\[**tutorial**\]](Tutorial-LaTeX_SymPy_Conversion.ipynb) The latex_parser.py script will convert a LaTeX sentence to a SymPy expression using the following function: parse(sentence).

<a id='toc'></a>

# Table of Contents
$$\label{toc}$$

1. [Step 1](#lexparse): Introduction: Lexical Analysis and Syntax Analysis
1. [Step 2](#sandbox): Demonstration and Sandbox (LaTeX Parser)
1. [Step 3](#latex_pdf_output): $\LaTeX$ PDF Output

<a id='lexparse'></a>

# Step 1: Lexical Analysis and Syntax Analysis \[Back to [top](#toc)\]
$$\label{lexparse}$$

In the following section, we discuss [lexical analysis](https://en.wikipedia.org/wiki/Lexical_analysis) (lexing) and [syntax analysis](https://en.wikipedia.org/wiki/Parsing) (parsing). In the process of lexical analysis, a lexer will tokenize a character string, called a sentence, using substring pattern matching (or tokenizing). We implemented a regex-based lexer for NRPy+, which does pattern matching using a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) for each token pattern. In the process of syntax analysis, a parser will receive a token iterator from the lexer and build a parse tree containing all syntactic information of the language, as specified by a [formal grammar](https://en.wikipedia.org/wiki/Formal_grammar). We implemented a [recursive descent parser](https://en.wikipedia.org/wiki/Recursive_descent_parser) for NRPy+, which will build a parse tree in [preorder](https://en.wikipedia.org/wiki/Tree_traversal#Pre-order_(NLR)), starting from the root [nonterminal](https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols), using a [right recursive](https://en.wikipedia.org/wiki/Left_recursion) grammar. The following right recursive, [context-free grammar](https://en.wikipedia.org/wiki/Context-free_grammar) was written for parsing [LaTeX](https://en.wikipedia.org/wiki/LaTeX), adhering to the canonical [BNF](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) notation used for describing a context-free grammar:
```
<EXPRESSION> -> { - } <TERM> { ( + | - ) <TERM> }
<TERM>       -> <FACTOR> { { ( / | ^ ) } <FACTOR> }
<FACTOR>	 -> <OPERAND> | \(<EXPRESSION>\) | \{<EXPRESSION>\}
<OPERAND>    -> <SYMBOL> | <NUMBER> | <COMMAND>
<SYMBOL>	 -> a | ... | z | A | ... | Z
<NUMBER>     -> <RATIONAL> | <DECIMAL> | <INTEGER>
<COMMAND>    -> \ ( <SQRT> | ... )
<SQRT>       -> sqrt { [<INTEGER>] } \{<EXPRESSION>\}
```

<small>**Source**: Robert W. Sebesta. Concepts of Programming Languages. Pearson Education Limited, 2016.</small>

In [1]:
from latex_parser import * # Import NRPy+ module for lexing and parsing LaTeX
from sympy import srepr    # Import SymPy function for expression tree representation

In [2]:
lexer = Lexer(); lexer.initialize(r'\sqrt{5}(x + 2/3)^2')
print(', '.join(token for token in lexer.tokenize()))

COMMAND, CMD_SQRT, LEFT_BRACE, INTEGER, RIGHT_BRACE, LEFT_PAREN, SYMBOL, PLUS, RATIONAL, RIGHT_PAREN, SUPERSCRIPT, INTEGER


In [3]:
expr = parse(r'\sqrt{5}(x + 2/3)^2')
print(expr, ':', srepr(expr))

sqrt(5)*(x + 2/3)**2 : Mul(Pow(Integer(5), Rational(1, 2)), Pow(Add(Symbol('x'), Rational(2, 3)), Integer(2)))


<a id='sandbox'></a>

# Step 2: Demonstration and Sandbox (LaTeX Parser) \[Back to [top](#toc)\]
$$\label{sandbox}$$

We implemented a wrapper function for the parse() method that will accept a LaTeX sentence and return a SymPy expression. Furthermore, the entire parsing module was designed for extendibility. We apply the following procedure for extending parser functionality to include an unsupported LaTeX command: append that command to the grammar dictionary in the Lexer class with the mapping regex:token, write a grammar abstraction (similar to a regular expression) for that command, add the associated nonterminal (the command name) to the command abstraction in the Parser class, and finally implement the straightforward (private) method for parsing the grammar abstraction. We shall demonstrate the extension procedure using the `\sqrt` LaTeX command.

```<SQRT> -> sqrt { [<INTEGER>] } \{<EXPRESSION>\}```
```
def __sqrt(self):
	if self.__accept('LEFT_BRACKET'):
		root = self.__number()
		self.__expect('RIGHT_BRACKET')
	else: root = 2
	self.__expect('LEFT_BRACE')
	expr = self.__expression()
	self.__expect('RIGHT_BRACE')
	return 'Pow(%s, Rational(1, %s))' % (expr, root)
```

In [4]:
print(parse(r'\sqrt[3]{5}'))

5**(1/3)


In the sandbox code cell below, you can experiment with the LaTeX parser using the wrapper function parse(sentence), where sentence must be a [raw string](https://docs.python.org/3/reference/lexical_analysis.html) to interpret a backslash as a literal character rather than an [escape sequence](https://en.wikipedia.org/wiki/Escape_sequence).

In [5]:
# Write Sandbox Code Here

<a id='latex_pdf_output'></a>

# Step 4: Output this notebook to $\LaTeX$-formatted PDF file \[Back to [top](#toc)\]
$$\label{latex_pdf_output}$$

The following code cell converts this Jupyter notebook into a proper, clickable $\LaTeX$-formatted PDF file. After the cell is successfully run, the generated PDF may be found in the root NRPy+ tutorial directory, with filename
[Tutorial-LaTeX_SymPy_Conversion.pdf](Tutorial-LaTeX_SymPy_Conversion.pdf) (Note that clicking on this link may not work; you may need to open the PDF file through another means.)

In [6]:
import cmdline_helper as cmd    # NRPy+: Multi-platform Python command-line interface
cmd.output_Jupyter_notebook_to_LaTeXed_PDF("Tutorial-LaTeX_SymPy_Conversion")

Created Tutorial-LaTeX_SymPy_Conversion.tex, and compiled LaTeX file to PDF file Tutorial-LaTeX_SymPy_Conversion.pdf
