Lexical subspaces in LLMs

What language do you think in? Is it even a language or some higher level of abstraction? What about LLMs?

Ever since I learned English (to complement my native Russian) I have always been puzzled when people tried to reason which language we think in. I had a clear feeling that, personally, I do not think in a certain language, — rather in some abstract concepts that later find their meaning in words when I try to convey them to another person. At times, it is easier to express some ideas in a certain language than the other, so I quite frequently jump between them.

In mathematical terms, it seems that there is a high-dimensional space of abstract ideas that is being projected to the lower dimensional lexical spaces. The fact that sometimes it is easier to express some ideas in one language than the other is evidence of the lower dimensionality of the lexical spaces. When we say that “it is easier” we usually mean that the projection captures more information and, therefore, the reconstruction of the original idea from the projection has a smaller loss.

This projects attempts to explore the idea of identifying and separating lexical subspaces.

Experiments

Live documents with the results (upadted irregularly whenever I make some progress)

PCA subspace identification (Qwen, mGPT)
Language classifier based on hidden spaces (Qwen)

How to run

Install uv
Run uv sync
Start a Jupyter notebook with one of the experiments

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
src		src
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
presentation.pdf		presentation.pdf
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Lexical subspaces in LLMs

Experiments

How to run

About

Uh oh!

Uh oh!

Languages

License

aigoncharov/lexical-reflections

Folders and files

Latest commit

History

Repository files navigation

Lexical subspaces in LLMs

Experiments

How to run

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages