Skip to content

zhangir-azerbayev/mathlib-semantic-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Note: Because this repo relied on the now unsupported code-davinci-002, the search demo will no longer work. It is trivial to replace codex with the chat api.

mathlib-semantic-search

Semantic search for Lean mathlib using OpenAI's embeddings API and Faiss.

Tested with python 3.10.4. Packages in requirements.txt.

Running demo

Save an openai api key to the environment variable OPENAI_API_KEY. Then cd into src and run demo_search.py. The mathlib embeddings are pre-computed, but do note embedding a query will cost you $0.0004 / 1K tokens.

Recomputing mathlib embeddings

This step is only documented for reproducibility purposes. It is not necessary to run the demo.

Pull leanprover-community/doc-gen and run the gen_docs script. When generating the docs, we used mathlib commit hash 06d0adfa76594f304b4650d098273d4366ede61b. Move the generated export.json to src/parse_docgen/docgen_export.json.

Then, in the src/parse_docgen directory run python parse.py. Then, cd into the src/embed_mathlib directory and run python embed_mathlib.py and embeddings_to_numpy.py.

How demo_search.py works

At a high-level, the script embeds queries using the embeddings api, then uses Faiss to do a fast kNN search against precomputed embeddings of mathlib declarations.

The search indexes into the file src/parse_docgen/docgen_export_with_formal_statement.jsonl, which is loaded into the docs variable in src/demo_search.py. The entries of docs are in exactly the same format as the entries of the export.json generated by gen_docs, so in theory we should be use to reuse the doc-gen code that maps export.json entries to html.

Each entry of docs actually contains one additional field compared to export.json: the formal_statement field, which is a deparsed string representation of the theorem statement.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published