[NeurIPS 2023 Tutorial on Machine Learning for Theorem Proving](https://github.com/lean-dojo/LeanCopilot)
===============================================================

In [1]:
import torch
import random
import numpy as np
from tqdm import tqdm
from transformers import (
  AutoModelForSeq2SeqLM,
  AutoTokenizer,
  Seq2SeqTrainer,
  Seq2SeqTrainingArguments,
  DataCollatorForSeq2Seq,
)
from datasets import Dataset
from typing import List, Optional

# https://arxiv.org/abs/2109.08203
random.seed(3407)
np.random.seed(3407)
torch.manual_seed(3407)

<torch._C.Generator at 0x7f3789be7130>

## Roadmap

* Training the tactic generator
  * Using [**LeanDojo**](https://github.com/lean-dojo/LeanDojo) to extract data (state-tactic pairs) from mathlib.
  * Finetuning a language model for tactic generation
* Searching for proofs
  * Interacting with Lean using [**LeanDojo**](https://github.com/lean-dojo/LeanDojo)
  * Proof search with DFS
* Using the model in Lean with [**Lean Copilot**](https://github.com/lean-dojo/LeanInfer)

## Data Extraction

We use **[LeanDojo](https://leandojo.org/)** to extract state-tactic pairs from mathlib.

### Trace the Repo

In [2]:
from lean_dojo import *

In [3]:
repo = LeanGitRepo(
    "https://github.com/leanprover-community/mathlib4",
    "3ce43c18f614b76e161f911b75a3e1ef641620ff",
)

In [4]:
traced_repo = trace(repo)  # A few minutes, depending on #CPUs.

[32m2023-12-11 16:57:03.654[0m | [1mINFO    [0m | [36mlean_dojo.data_extraction.trace[0m:[36mtrace[0m:[36m182[0m - [1mLoading the traced repo from /home/kaiyu/.cache/lean_dojo/leanprover-community-mathlib4-3ce43c18f614b76e161f911b75a3e1ef641620ff/mathlib4[0m
2023-12-11 16:57:05,673	INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at [1m[32m127.0.0.1:8265 [39m[22m
100%|██████████| 4462/4462 [07:27<00:00,  9.96it/s]  
Following Github server redirection from /repos/mhuisi/lean4-cli to /repositories/341363356


### Extract State-Tactic Pairs

`traced_repo` is a data structure containing all data extracted from `repo`. We can post-process it to extract state-tactic pairs.

In [5]:
theorems = traced_repo.get_traced_theorems()
print(f"{len(theorems)} theorems/proofs extracted")

103234 theorems/proofs extracted


In [6]:
state_tactic_pairs = []

for thm in tqdm(theorems):
  for t in thm.get_traced_tactics():
    state_tactic_pairs.append({
        "state": t.state_before, 
        "tactic": t.tactic
    })

print(f"{len(state_tactic_pairs)} state-tactic pairs")

100%|██████████| 103234/103234 [00:10<00:00, 9775.05it/s]

245127 state-tactic pairs





In [7]:
st = state_tactic_pairs[0]
print(st["state"])

α : Type u_1
β : Type u_2
ks : Array α
vs : Array β
h : Array.size ks = Array.size vs
i : Fin (Array.size ks)
j : Fin (Array.size vs)
k : α
v : β
⊢ Array.size (Array.set ks i k) = Array.size (Array.set vs j v)


In [8]:
print(st["tactic"])

simp [h]


## Finetuning Language Models for Tactic Generation

There are many excellent libraries that can be used for finetuning tactic generators (e.g., [Pytorch Lightning](https://lightning.ai/), [ReProver](https://github.com/lean-dojo/ReProver)). The code below is only an illustration of the process. DO NOT USE IT FOR PRODUCTION.

We finetune a [ByT5](https://arxiv.org/abs/2105.13626) model. It is a tokenization-free version of T5, with the same encoder-decoder Transformer architecture.

In [9]:
model = AutoModelForSeq2SeqLM.from_pretrained("google/byt5-small")
tokenizer = AutoTokenizer.from_pretrained("google/byt5-small")

Let's pick a random subset of the data and look at one example.

In [10]:
dataset = Dataset.from_list(state_tactic_pairs).shuffle().select(range(10000))

def tokenize(examples):
  model_inputs = tokenizer(examples["state"], max_length=2048, truncation=True)
  labels = tokenizer(text_target=examples["tactic"], max_length=2048, truncation=True)
  model_inputs["labels"] = labels["input_ids"]
  return model_inputs

tokenized_dataset = dataset.map(tokenize, batched=True)

print(tokenized_dataset[0])

Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

{'state': 'α : Type u_1\nβ : Type u_2\nι : Type u_3\nmα : MeasurableSpace α\nρ : Measure (α × ℝ)\ninst✝ : IsFiniteMeasure ρ\n⊢ Tendsto (fun r => ∫⁻ (a : α), preCdf ρ r a ∂Measure.fst ρ) atBot (𝓝 0)', 'tactic': 'convert ρ.tendsto_IicSnd_atBot MeasurableSet.univ', 'input_ids': [209, 180, 35, 61, 35, 87, 124, 115, 104, 35, 120, 98, 52, 13, 209, 181, 35, 61, 35, 87, 124, 115, 104, 35, 120, 98, 53, 13, 209, 188, 35, 61, 35, 87, 124, 115, 104, 35, 120, 98, 54, 13, 112, 209, 180, 35, 61, 35, 80, 104, 100, 118, 120, 117, 100, 101, 111, 104, 86, 115, 100, 102, 104, 35, 209, 180, 13, 210, 132, 35, 61, 35, 80, 104, 100, 118, 120, 117, 104, 35, 43, 209, 180, 35, 198, 154, 35, 229, 135, 160, 44, 13, 108, 113, 118, 119, 229, 159, 160, 35, 61, 35, 76, 118, 73, 108, 113, 108, 119, 104, 80, 104, 100, 118, 120, 117, 104, 35, 210, 132, 13, 229, 141, 165, 35, 87, 104, 113, 103, 118, 119, 114, 35, 43, 105, 120, 113, 35, 117, 35, 64, 65, 35, 229, 139, 174, 229, 132, 190, 35, 43, 100, 35, 61, 35, 209, 180, 4

In [11]:
# This is just an example. Don't run it.
training_args = Seq2SeqTrainingArguments(
  output_dir="./results",
  learning_rate=1e-5,
  per_device_train_batch_size=8,
  max_steps=2,
  use_cpu=True,
)

data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

trainer = Seq2SeqTrainer(
  model=model,
  tokenizer=tokenizer,
  args=training_args,
  train_dataset=tokenized_dataset,
  data_collator=data_collator,
)

trainer.train()

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


Step,Training Loss


TrainOutput(global_step=2, training_loss=3.8050713539123535, metrics={'train_runtime': 62.8255, 'train_samples_per_second': 0.255, 'train_steps_per_second': 0.032, 'total_flos': 37510822582272.0, 'train_loss': 3.8050713539123535, 'epoch': 0.0})

## Inspecting the Tactic Generator

In [12]:
tokenizer = AutoTokenizer.from_pretrained("kaiyuy/leandojo-lean4-tacgen-byt5-small")
model = AutoModelForSeq2SeqLM.from_pretrained("kaiyuy/leandojo-lean4-tacgen-byt5-small")

In [13]:
def generate_one_tactic(state: str) -> str:
  """Generate a single tactic."""
  tokenized_state = tokenizer(state, return_tensors="pt")
  tactic_ids = model.generate(tokenized_state.input_ids, max_length=1024)
  tactic = tokenizer.decode(tactic_ids[0], skip_special_tokens=True)
  print(tactic, end="\n\n")

In [14]:
generate_one_tactic("∀ (a b c : ℕ), a + b + c = a + c + b")

intro a b c



In [15]:
def generate_tactics(state: str, k: int = 16) -> List[str]:
  """Generate multiple tactics via beam search."""
  tokenized_state = tokenizer(state, return_tensors="pt")
  tactic_candidates_ids = model.generate(
    tokenized_state.input_ids,
    max_length=256,
    num_beams=k,
    length_penalty=0.0,
    do_sample=False,
    num_return_sequences=k,
    early_stopping=False,
  )
  tactic_candidates = tokenizer.batch_decode(
    tactic_candidates_ids, skip_special_tokens=True
  )
  return tactic_candidates

In [16]:
for tac in generate_tactics("∀ (a b c : ℕ), a + b + c = a + c + b"):
  print(tac)

intro a b c
intros
simp
intros a b c
aesop
tauto
intro
simp [add_assoc]
simp [add_comm]
constructor
apply Nat.forall_congr'
exact fun a b c => by simp
apply Nat.zero_add
apply Nat.zero_le
apply fun a b c => intro a b c
apply Nat.strongInductionOn


## Interacting with Lean

[**LeanDojo**](https://github.com/lean-dojo/LeanDojo) supports interacting with Lean in Python. We'll use the `add_abc` theorem as an example.

![add_abc.jpg](./add_abc.jpg)

In [17]:
repo = LeanGitRepo(
    "https://github.com/yangky11/lean4-example",
    "41f6a6aed00cfb71326dd9d941f7427ee3ae0cb7",
)
theorem = Theorem(repo, "Lean4Example.lean", "add_abc")

In [18]:
dojo, s0 = Dojo(theorem).__enter__()



In [19]:
print(s0.pp)

⊢ ∀ (a b c : ℕ), a + b + c = a + c + b


In [20]:
s1 = dojo.run_tac(s0, "intro a b c")

print(s1.pp)

a b c : ℕ
⊢ a + b + c = a + c + b


In [21]:
dojo.run_tac(s1, "rw [Nat.add_right_comm]")

ProofFinished(tactic_state_id=2, message='')

In [22]:
dojo.run_tac(s0, "cases n")

LeanError(error="tactic 'induction' failed, major premise type is not an inductive type \n  ?m.142\nx✝ : ?m.142\n⊢ ∀ (a b c : ℕ), a + b + c = a + c + b")

In [23]:
dojo.run_tac(s1, "hello world!")

LeanError(error='<stdin>:1:1: unknown tactic')

## Proof Search

We combine the tactic generator with Depth First Search (DFS) to search for proofs.

In [24]:
Tactic = str
Proof = List[Tactic]

num_candidates = 16
depth_limit = 3

def search(state : TacticState, depth : int) -> Optional[Proof]:
    """Try to prove `state` using depth-first search (DFS)."""
    if depth >= depth_limit:
        return None

    tactics = generate_tactics(state.pp, num_candidates)

    # Run the tactics.
    for tac in tactics:
        next_state = dojo.run_tac(state, tac)
        if isinstance(next_state, ProofFinished):
            return [tac]  # Found a proof!
        elif not isinstance(next_state, LeanError):
            # Call `dfs` recursively.
            subproof = search(next_state, depth + 1)
            if subproof is not None:
                return [tac] + subproof
    
    return None

In [25]:
proof = search(s0, depth=0)

if proof is not None:
    print("Found a proof!\n")
    print("\n".join(proof))
else:
    print("Failed to find a proof :(")

Found a proof!

intro a b c
rw [Nat.add_right_comm]


## Recap

### Steps

1. Use **[LeanDojo](https://github.com/lean-dojo/LeanDojo)** to extract state-tactic pairs from [mathlib](https://github.com/leanprover-community/mathlib4).
2. Finetune an encoder-decoder Transformer for tactic generation.
3. Combine the tactic generator with search algorithms such as DFS.


### Open-Source Tools

* [LeanDojo](https://github.com/lean-dojo/LeanDojo)
* [ReProver](https://github.com/lean-dojo/ReProver/issues)
* [Lean Copilot](https://github.com/lean-dojo/LeanCopilot)
