Skip to content

Commit

Permalink
Drop Python 2.7 support
Browse files Browse the repository at this point in the history
Require Python >= 3.5
Drop references from classifiers
Remove six + future compatibility code
Remove 2.7 from tox.ini
Remove importing of six
Gitignore .eggs
Lint fix
Add lint make recipe
Remove 2.7 builds from Travis
  • Loading branch information
Yeray Diaz Diaz authored and yeraydiazdiaz committed Jun 13, 2020
1 parent 0634322 commit 1aff2ac
Show file tree
Hide file tree
Showing 38 changed files with 64 additions and 150 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
__pycache__/
*.pyc
*.egg-info/
.eggs/
.coverage
coverage.xml
htmlcov/
Expand Down
2 changes: 0 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,8 @@ sudo: false
dist: trusty
language: python
python:
- "2.7"
- "3.5"
- "3.6"
- "pypy2.7-5.8.0"
- "pypy3.5"

before_install:
Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,7 @@ release-pypi: package
read ans && \
[ $${ans:-N} = y ] && \
twine upload dist/*

lint:
flake8 lunr tests
black lunr tests
57 changes: 43 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,40 +11,68 @@ A Python implementation of [Lunr.js](https://lunrjs.com) by [Oliver Nightingale]

> A bit like Solr, but much smaller and not as bright.
This Python version of Lunr.js aims to bring the simple and powerful full text search capabilities into Python guaranteeing results as close as the original implementation as possible.
This Python version of Lunr.js aims to bring the simple and powerful full text search
capabilities into Python guaranteeing results as close as the original
implementation as possible.

- [Documentation](http://lunr.readthedocs.io/en/latest/)

## What does this even do?

Lunr is a simple full text search solution for situations where deploying a full scale solution like Elasticsearch isn't possible, viable or you're simply prototyping.
Lunr is a simple full text search solution for situations where deploying a full
scale solution like Elasticsearch isn't possible, viable or you're simply prototyping.
Lunr parses a set of documents and creates an inverted index for quick full text
searches in the same way other more complicated solution.

Lunr parses a set of documents and creates an inverted index for quick full text searches.
The trade-off is that Lunr keeps the inverted index in memory and requires you
to recreate or read the index at the start of your application.

The typical use case is to integrate Lunr in a web application, an example would be the [MkDocs documentation library](http://www.mkdocs.org/). In order to do this, you'd integrate [Lunr.js](https://lunrjs.com) in the Javascript code of your application, which will need to fetch and parse a JSON of your documents and create the index at startup of your application. Depending on the size of your document set this can take some time and potentially block the browser's main thread.
## Interoperability with Lunr.js

Lunr.py provides a backend solution, allowing you to parse the documents ahead of time and create a Lunr.js compatible index you can pass have the browser version read, minimizing start up time of your application.
A core objective of Lunr.py is to provide interoperability with the JavaScript version.

Of course you could also use Lunr.py to power full text search in desktop applications or backend services to search on your documents mimicking Elasticsearch.
An example can be found in the [MkDocs documentation library](http://www.mkdocs.org/).
MkDocs produces a set of documents from the pages of the documentation and uses
[Lunr.js](https://lunrjs.com) in the frontend to power its built-in searching
engine. This set of documents is in the form of a JSON file which needs to be
fetched and parsed by Lunr.js to create the inverted index at startup of your application.

While this is not a problem for most sites, depending on the size of your document
set, this can take some time.

Lunr.py provides a backend solution, allowing you to parse the documents in Python
of time and create a serialized Lunr.js index you can pass have the browser
version read, minimizing start up time of your application.

## Installation

Simply `pip install lunr` for the english only, best compatibility with Lunr.js version.
`pip install lunr`

An optional and experimental support for other languages via the [Natural Language Toolkit](http://www.nltk.org/) stemmers is also available via `pip install lunr[languages]`. Please refer to the [documentation page on languages](https://lunr.readthedocs.io/en/latest/languages/) for more information.
An optional and experimental support for other languages thanks to the
[Natural Language Toolkit](http://www.nltk.org/) stemmers is also available via
`pip install lunr[languages]`. The usage of the language feature is subject to
[NTLK corpus licensing clauses](https://github.com/nltk/nltk#redistributing).

Please refer to the
[documentation page on languages](https://lunr.readthedocs.io/en/latest/languages/)
for more information.

## Current state

Each version of lunr.py [targets a specific version of lunr.js](https://github.com/yeraydiazdiaz/lunr.py/blob/master/lunr/__init__.py#L12) and produces the same results as it both in Python 2.7 and 3 for [non-trivial corpus of documents](https://github.com/yeraydiazdiaz/lunr.py/blob/master/tests/acceptance_tests/fixtures/mkdocs_index.json).

Lunr.py also serializes `Index` instances respecting the [`lunr-schema`](https://github.com/olivernn/lunr-schema) which are consumable by Lunr.js and viceversa.
Each version of lunr.py
[targets a specific version of lunr.js](https://github.com/yeraydiazdiaz/lunr.py/blob/master/lunr/__init__.py#L12)
and produces the same results as it both in Python 2.7 and 3 for
[non-trivial corpus of documents](https://github.com/yeraydiazdiaz/lunr.py/blob/master/tests/acceptance_tests/fixtures/mkdocs_index.json).

The API is in alpha stage and likely to change.
Lunr.py also serializes `Index` instances respecting the
[`lunr-schema`](https://github.com/olivernn/lunr-schema) which are consumable by
Lunr.js and viceversa.

## Usage

You'll need a list of dicts representing the documents you want to search on. These documents must have a unique field which will serve as a reference and a series of fields you'd like to search on.
First, you'll need a list of dicts representing the documents you want to search on.
These documents must have a unique field which will serve as a reference and a
series of fields you'd like to search on.

Lunr provides a convenience `lunr` function to quickly index this set of documents:

Expand All @@ -69,4 +97,5 @@ Lunr provides a convenience `lunr` function to quickly index this set of documen
[{'ref': 'b', 'score': 0.23576799568081389, 'match_data': <MatchData "studi">}, {'ref': 'a', 'score': 0.2236629211724517, 'match_data': <MatchData "studi">}]
```

Please refer to the [documentation](http://lunr.readthedocs.io/en/latest/) for more usage examples.
Please refer to the [documentation](http://lunr.readthedocs.io/en/latest/)
for more usage examples.
2 changes: 0 additions & 2 deletions lunr/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
from __future__ import unicode_literals

import logging

from lunr.__main__ import lunr
Expand Down
7 changes: 1 addition & 6 deletions lunr/__main__.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,3 @@
from __future__ import unicode_literals


from past.builtins import basestring

from lunr import languages as lang
from lunr.builder import Builder
from lunr.stemmer import stemmer
Expand Down Expand Up @@ -31,7 +26,7 @@ def lunr(ref, fields, documents, languages=None):
Index: The populated Index ready to search against.
"""
if languages is not None and lang.LANGUAGE_SUPPORT:
if isinstance(languages, basestring):
if isinstance(languages, str):
languages = [languages]

unsupported_languages = set(languages) - set(lang.SUPPORTED_LANGUAGES)
Expand Down
3 changes: 0 additions & 3 deletions lunr/builder.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
from __future__ import unicode_literals, division

from collections import defaultdict

from builtins import str, dict # noqa

from lunr.pipeline import Pipeline
from lunr.tokenizer import Tokenizer
Expand Down
3 changes: 0 additions & 3 deletions lunr/exceptions.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
from __future__ import unicode_literals


class BaseLunrException(Exception):
pass

Expand Down
5 changes: 0 additions & 5 deletions lunr/field_ref.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,6 @@
from __future__ import unicode_literals

import six

from lunr.exceptions import BaseLunrException


@six.python_2_unicode_compatible
class FieldRef:

JOINER = "/"
Expand Down
2 changes: 0 additions & 2 deletions lunr/idf.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
from __future__ import unicode_literals

import math


Expand Down
7 changes: 1 addition & 6 deletions lunr/index.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,7 @@
from __future__ import unicode_literals

from collections import defaultdict
import json
import logging

from builtins import str, dict # noqa
from past.builtins import basestring

from lunr.exceptions import BaseLunrException
from lunr.field_ref import FieldRef
from lunr.match_data import MatchData
Expand Down Expand Up @@ -341,7 +336,7 @@ def load(cls, serialized_index):
"""Load a serialized index"""
from lunr import __TARGET_JS_VERSION__

if isinstance(serialized_index, basestring):
if isinstance(serialized_index, str):
serialized_index = json.loads(serialized_index)

if serialized_index["version"] != __TARGET_JS_VERSION__:
Expand Down
1 change: 0 additions & 1 deletion lunr/languages/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
from __future__ import unicode_literals
from itertools import chain
from functools import partial

Expand Down
2 changes: 0 additions & 2 deletions lunr/languages/trimmer.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
from __future__ import unicode_literals

import re


Expand Down
2 changes: 0 additions & 2 deletions lunr/match_data.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
from __future__ import unicode_literals

from copy import deepcopy


Expand Down
9 changes: 2 additions & 7 deletions lunr/pipeline.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
from __future__ import unicode_literals

import logging

from builtins import str
import six

from lunr.exceptions import BaseLunrException
from lunr.token import Token

Expand Down Expand Up @@ -86,7 +81,7 @@ def after(self, existing_fn, new_fn):
index = self._stack.index(existing_fn)
self._stack.insert(index + 1, new_fn)
except ValueError as e:
six.raise_from(BaseLunrException("Cannot find existing_fn"), e)
raise BaseLunrException("Cannot find existing_fn") from e

def before(self, existing_fn, new_fn):
"""Adds a single function before a function that already exists in the
Expand All @@ -98,7 +93,7 @@ def before(self, existing_fn, new_fn):
index = self._stack.index(existing_fn)
self._stack.insert(index, new_fn)
except ValueError as e:
six.raise_from(BaseLunrException("Cannot find existing_fn"), e)
raise BaseLunrException("Cannot find existing_fn") from e

def remove(self, fn):
"""Removes a function from the pipeline."""
Expand Down
9 changes: 3 additions & 6 deletions lunr/query.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
from __future__ import unicode_literals


from enum import Enum


Expand All @@ -12,7 +9,7 @@ class QueryPresence(Enum):
PROHIBITED = 3 # documents that contain this term will not be returned


class Query(object):
class Query:
"""A `lunr.Query` provides a programmatic way of defining queries to be
performed against a `lunr.Index`.
Expand Down Expand Up @@ -105,7 +102,7 @@ def is_negated(self):
)


class Clause(object):
class Clause:
"""A single clause in a `lunr.Query` contains a term and details on
how to match that term against a `lunr.Index`
Expand Down Expand Up @@ -135,7 +132,7 @@ def __init__(
wildcard=Query.WILDCARD_NONE,
presence=QueryPresence.OPTIONAL,
):
super(Clause, self).__init__()
super().__init__()
self.term = term
self.fields = fields or []
self.edit_distance = edit_distance
Expand Down
2 changes: 0 additions & 2 deletions lunr/query_lexer.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
from __future__ import unicode_literals

from lunr.tokenizer import default_separator


Expand Down
8 changes: 2 additions & 6 deletions lunr/query_parser.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
from __future__ import unicode_literals

import six

from lunr.query_lexer import QueryLexer
from lunr.query import Clause, QueryPresence
from lunr.exceptions import QueryParseError
Expand Down Expand Up @@ -132,7 +128,7 @@ def parse_edit_distance(cls, parser):
try:
edit_distance = int(lexeme["string"])
except ValueError as e:
six.raise_from(QueryParseError("Edit distance must be numeric"), e)
raise QueryParseError("Edit distance must be numeric") from e

parser.current_clause.edit_distance = edit_distance

Expand All @@ -145,7 +141,7 @@ def parse_boost(cls, parser):
try:
boost = int(lexeme["string"])
except ValueError as e:
six.raise_from(QueryParseError("Boost must be numeric"), e)
raise QueryParseError("Boost must be numeric") from e

parser.current_clause.boost = boost

Expand Down
2 changes: 0 additions & 2 deletions lunr/stemmer.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
from __future__ import unicode_literals

"""
Implementation of Porter Stemming Algorithm from
https://tartarus.org/martin/PorterStemmer/python.txt
Expand Down
4 changes: 0 additions & 4 deletions lunr/stop_word_filter.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
from __future__ import unicode_literals

from builtins import str

from lunr.pipeline import Pipeline

WORDS = {
Expand Down
8 changes: 0 additions & 8 deletions lunr/token.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,3 @@
from __future__ import unicode_literals

from builtins import str

import six


@six.python_2_unicode_compatible
class Token:
def __init__(self, string="", metadata=None):
self.string = string
Expand Down
8 changes: 0 additions & 8 deletions lunr/token_set.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,3 @@
from __future__ import unicode_literals

from builtins import str

import six


@six.python_2_unicode_compatible
class TokenSet:
"""
A token set is used to store the unique list of all tokens
Expand Down
4 changes: 0 additions & 4 deletions lunr/token_set_builder.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
from __future__ import unicode_literals

from builtins import str

from lunr.token_set import TokenSet
from lunr.exceptions import BaseLunrException

Expand Down
3 changes: 0 additions & 3 deletions lunr/tokenizer.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
from __future__ import unicode_literals

from builtins import str
from copy import deepcopy

from lunr.token import Token
Expand Down
2 changes: 0 additions & 2 deletions lunr/trimmer.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
from __future__ import unicode_literals

import re

from lunr.pipeline import Pipeline
Expand Down
5 changes: 0 additions & 5 deletions lunr/utils.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,3 @@
from __future__ import unicode_literals

from builtins import str


def as_string(obj):
return "" if not obj else str(obj)

Expand Down
2 changes: 0 additions & 2 deletions lunr/vector.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
from __future__ import unicode_literals, division

from math import sqrt

from lunr.exceptions import BaseLunrException
Expand Down

0 comments on commit 1aff2ac

Please sign in to comment.