A simple search engine that allow searching for chess games based on queries about opening names & opening moves. Built with Python 3.10 and python-chess.
-
How To Run:
- Install Python.
- Create a virtual environment & install requirements from requirements.txt.
- Activate the virtual environment. Usually the command is:
.\[venv name]\Scripts\activate
. - Run app.py from the project's root directory:
python app.py
.
-
Main Features:
- Search by Opening Name: Search by querying the name of an opening (e.g. "Queen Gambit"). Casing does not matter.
- Search by Opening Moves: Search by querying opening plies (up to the first 3 moves / 6 plies) (e.g. "e4 e5 Ke2"). No move number necessary.
- Output: Top 20 results, sorted by most recent games by default.
-
Data Source: Data was taken from lichess's database. A total of 121,332 games from Jan 2013 were analyzed.
-
How results are ranked:
-
The goal of the algorithm is to rank results based on the following priorities:
- Priority #1: Documents with more matching terms > documents with fewer matching terms.
- Priority #2: Documents with more important terms > documents with less important terms.
- If two documents have equal weight, fallback to default ranking (by most recent).
-
-
How queries are processed:
-
Step 1:
- Given a query string, a list of n terms [t1, t2,... tn] is extracted.
- Terms are ordered by increasing document frequencies (t1 have the lowest df and therefore is the most significant).
-
Step 2:
- Retrieve a list of postings lists [p1, p2,... pn] based on the terms.
- p1 has the smallest size.
-
Step 3:
- Intersect the lists, starting with intersection of n lists (Find documents that have all n terms). (Priority #1)
- If number of retrieved docs < 20, continue finding intersection of n-1 lists, n-2 lists,... 1 list.
- In each iteration of intersecting n-k lists, prioritize docs that have terms with low df (t1 > t2 > ... > tn). (Priority #2)
- Continue until 20 documents are retrieved.
-
-
How results are ranked:
- The goal of the algorithm is to retrieve only games that match the exact valid move sequences.
- Due to the strict matching, the only ranking priority is: game with more matching plies > game with fewer matching plies.
-
How queries are processed:
-
Step 1:
- A list of maximum 6 plies [ply1, ply2,... ply6] is extracted.
- The entire list of plies must be valid plies.
-
Step 2:
- Retrieve a list of 6 postings lists [pos1, pos2,... pos6] such that: pos[i] = games that contain ply[i] as the [i]th ply.
- For example:
- Ply sequence: [e4, e5, ...].
- Postings list: pos2 = games that have e5 as the second ply.
-
Step 3:
- Intersect the postings lists, starting with the first 6 lists (Find games that match the entire 6 plies / 3 moves).
- If not enough 20 results, continue finding intersection of the first 5 lists, 4 lists,...
- Continue until 20 documents are retrieved.
-
These are not optimized, so probably very large.
Name | Description |
---|---|
Python 3.10 | Main development language. |
python-chess 1.7.0 | Library for handling .pgn files. |
nltk 3.6.5 | Library for natural language processing (tokenizing, stemming). |
PyQt 5 | Framework for building GUI with Python. |
Main screen.
Searching by opening name.
Searching by opening moves.