#### Language cascades | part 1: introduction
Tutorial on neural theorem proving\
Author: Sean Welleck

----------------

Tools such as [Chat-GPT]() show the flexibility of modern neural language generators.\
Namely, a single generation system can often perform a task by simply providing a suitable *prompt*:

$\quad y=f(p_\theta(\cdot|x;P)),$

where $x$ is an input, $P$ is a prompt, and $f(\cdot)$ is a decoding algorithm.


Let's look at one of these functions:


In [1]:
import time
import os
import openai

class LMFunction(object):
    def __init__(self, engine='gpt-3.5-turbo', max_tokens=512):
        self.engine = engine
        self.max_tokens = max_tokens
        self.openai = openai
        openai.api_key = os.environ['OPENAI_API_KEY']

    def _call_api(self, prompt, engine, max_tokens, max_retries=10, retry_wait=2):
        for i in range(max_retries):
            try:
                return self.openai.ChatCompletion.create(
                    model=engine,
                    messages=[
                        {"role": "system", "content": "You are a helpful assistant."},
                        {"role": "user", "content": prompt}
                    ],
                    max_tokens=max_tokens,
                    temperature=1.0
                )
            except self.openai.error.OpenAIError as e:
                time.sleep(retry_wait)
        return {'choices': [{'message': {'content': ''}}]}

    def _parse_message(self, msg):
        try:
            content = msg['choices'][0]['message']['content']
            content = content.strip().split('\n')[0]
        except (IndexError, KeyError):
            content = ''
        return content

    def f(self, prompt, x):
        msg = self._call_api(
            prompt=prompt+x,
            engine=self.engine,
            max_tokens=self.max_tokens
        )
        evaluation = self._parse_message(msg)
        return evaluation

In [2]:
prompt = """Multiply two numbers. Here are some examples:
432*342=147744
98*19=1862
"""

p = LMFunction('gpt-4')

outputs = [p.f(prompt, '872*2233=') for _ in range(10)]
[(output, output==str(872*2233)) for output in outputs]

[('1947256', False),
 ('1945256', False),
 ('1946256', False),
 ('1947176', True),
 ('1947176', True),
 ('1947056', False),
 ('1947456', False),
 ('1947256', False),
 ('1947256', False),
 ('1947256', False)]

In [13]:
872*2233

1947176

The result above shows two interesting things:
1. The function is stochastic; it can return different answers each time it is called.
2. The function is capable of producing a correct answer; it gets it correct 2 times.

Therefore, one way a stochastic function like this is useful is to pair it with a reliable verifier.


#### Why is this useful?
The main attraction of these functions is their flexibility. For instance,
it is easy to implement a function that maps language input to a call to Sympy:

```
    sympy_expression ~ f("872 times 2233 =")
    answer = g(sympy_expression)
```
where $g(\cdot)$ is Sympy evaluation.

This yields functionality that is difficult to program manually:

In [8]:
from sympy.parsing.sympy_parser import parse_expr

prompt = """Solve the multiplication problem by writing input to a sympy function.
Do not add additional text. Here are some examples:

432 multiplied by 342 is: 432*342
98* 19 is how much? 98*19
"""

p = LMFunction('gpt-4')

g = parse_expr

sympy_expression = p.f(prompt, 'There are 800+72 apples in a barrel. How many apples in 2233 barrels?')
answer = g(sympy_expression)
print(sympy_expression)
print(answer)

(800+72)*2233
1947176


In [7]:
872*2233

1947176

#### Language cascades

A [language model cascade [Dohan et al 2022]](https://arxiv.org/abs/2207.10342) formalizes the idea of composing multiple functions, some of which are stochastic functions implemented by a language model.

The result can be seen as a probabilistic program, whose samples "execute the function", e.g.:

In [56]:
def multiply(x):
    y1 = p.f(prompt, x)
    y2 = g(y1)
    return y2


multiply('I bought 32 cases of apples, with one hundred and 42 apples per case. How many total apples?')

4544

In [57]:
32*142

4544

#### Cascades for neural theorem proving

The two ideas mentioned above: composing multiple functions and using a verifier, make neural theorem proving a natural setting for language cascades.

Namely, the goal will be to decompose theorem proving into different functions, then use the proof assistant to verify the final output.

In the next notebook, we will see a cascade called [Draft, Sketch, Prove [Jiang et al ICLR 2023]](https://arxiv.org/abs/2210.12283) that does so with three components: \
**draft** an informal proof, **sketch** a formal proof, and **prove** the remaining gaps.

The end result is a model and proof search procedure that is qualitatively much different than the next-step predictors we used in Part I.