A simple in memory inverted index in Python
Python Makefile
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
docs
inverted_index
tests
.gitignore
LICENSE
Makefile
README.md
requirements.txt
setup.py

README.md

Inverted Index

A simple in-memory inverted index system, with a modest query language.

i = inverted_index.Index()
i.index(1, "this is the day they give babies away with half a pound of tea")
i.index(1, "if you know any ladies who need any babies just send them round to ")
i.index(2, "babies are born in the circle of the sun")
results, err = i.query("babies")
print(results)
{1,2}
results, err = i.query("babies AND ladies")
print(results)
{1}
i.index(3, "WHERE ARE THE BABIES", tokenizer=lambda s:s.lower().split())
results, err = i.query("babies")
print(results)
{1,2,3}
i.unindex(3)
results, err = i.query("babies")
print(results)
{1,2}

Any hashable object can be the "document", and a tokenizer can be specified to tokenize the text to index. There are also add_token and add_tokens methods to directly index on individual tokens.

The query language is very simple: it understands AND and OR, NOT, and parentheses. For example:

term OR term
term AND term OR term
(term AND term) OR term
NOT term
NOT term AND (term OR term)

AND, OR, and NOT have equal precedence, so use parentheses to disambiguate.

I'm pretty sure you don't want to use this in production code :)