Drop Python 2.7 support

Require Python >= 3.5 Drop references from classifiers Remove six + future compatibility code Remove 2.7 from tox.ini Remove importing of six Gitignore .eggs Lint fix Add lint make recipe Remove 2.7 builds from Travis
yeraydiazdiaz · Jun 13, 2020 · 1aff2ac · 1aff2ac
1 parent 0634322
commit 1aff2ac
Show file tree

Hide file tree

Showing 38 changed files with 64 additions and 150 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,7 @@
 __pycache__/
 *.pyc
 *.egg-info/
+.eggs/
 .coverage
 coverage.xml
 htmlcov/

diff --git a/.travis.yml b/.travis.yml
@@ -2,10 +2,8 @@ sudo: false
 dist: trusty
 language: python
 python:
-  - "2.7"
   - "3.5"
   - "3.6"
-  - "pypy2.7-5.8.0"
   - "pypy3.5"
 
 before_install:

diff --git a/Makefile b/Makefile
@@ -43,3 +43,7 @@ release-pypi: package
 		read ans && \
 		[ $${ans:-N} = y ] && \
 		twine upload dist/*
+
+lint:
+	flake8 lunr tests
+	black lunr tests
diff --git a/README.md b/README.md
@@ -11,40 +11,68 @@ A Python implementation of [Lunr.js](https://lunrjs.com) by [Oliver Nightingale]
 
 > A bit like Solr, but much smaller and not as bright.
 
-This Python version of Lunr.js aims to bring the simple and powerful full text search capabilities into Python guaranteeing results as close as the original implementation as possible.
+This Python version of Lunr.js aims to bring the simple and powerful full text search
+capabilities into Python guaranteeing results as close as the original
+implementation as possible.
 
 - [Documentation](http://lunr.readthedocs.io/en/latest/)
 
 ## What does this even do?
 
-Lunr is a simple full text search solution for situations where deploying a full scale solution like Elasticsearch isn't possible, viable or you're simply prototyping.
+Lunr is a simple full text search solution for situations where deploying a full
+scale solution like Elasticsearch isn't possible, viable or you're simply prototyping.
+Lunr parses a set of documents and creates an inverted index for quick full text
+searches in the same way other more complicated solution.
 
-Lunr parses a set of documents and creates an inverted index for quick full text searches.
+The trade-off is that Lunr keeps the inverted index in memory and requires you
+to recreate or read the index at the start of your application.
 
-The typical use case is to integrate Lunr in a web application, an example would be the [MkDocs documentation library](http://www.mkdocs.org/). In order to do this, you'd integrate [Lunr.js](https://lunrjs.com) in the Javascript code of your application, which will need to fetch and parse a JSON of your documents and create the index at startup of your application. Depending on the size of your document set this can take some time and potentially block the browser's main thread.
+## Interoperability with Lunr.js
 
-Lunr.py provides a backend solution, allowing you to parse the documents ahead of time and create a Lunr.js compatible index you can pass have the browser version read, minimizing start up time of your application.
+A core objective of Lunr.py is to provide interoperability with the JavaScript version.
 
-Of course you could also use Lunr.py to power full text search in desktop applications or backend services to search on your documents mimicking Elasticsearch.
+An example can be found in the [MkDocs documentation library](http://www.mkdocs.org/).
+MkDocs produces a set of documents from the pages of the documentation and uses
+[Lunr.js](https://lunrjs.com) in the frontend to power its built-in searching
+engine. This set of documents is in the form of a JSON file which needs to be
+fetched and parsed by Lunr.js to create the inverted index at startup of your application.
+
+While this is not a problem for most sites, depending on the size of your document
+set, this can take some time.
+
+Lunr.py provides a backend solution, allowing you to parse the documents in Python
+of time and create a serialized Lunr.js index you can pass have the browser
+version read, minimizing start up time of your application.
 
 ## Installation
 
-Simply `pip install lunr` for the english only, best compatibility with Lunr.js version.
+`pip install lunr`
 
-An optional and experimental support for other languages via the [Natural Language Toolkit](http://www.nltk.org/) stemmers is also available via `pip install lunr[languages]`. Please refer to the [documentation page on languages](https://lunr.readthedocs.io/en/latest/languages/) for more information.
+An optional and experimental support for other languages thanks to the
+[Natural Language Toolkit](http://www.nltk.org/) stemmers is also available via
+`pip install lunr[languages]`. The usage of the language feature is subject to
+[NTLK corpus licensing clauses](https://github.com/nltk/nltk#redistributing).
 
+Please refer to the
+[documentation page on languages](https://lunr.readthedocs.io/en/latest/languages/)
+for more information.
 
 ## Current state
 
-Each version of lunr.py [targets a specific version of lunr.js](https://github.com/yeraydiazdiaz/lunr.py/blob/master/lunr/__init__.py#L12) and produces the same results as it both in Python 2.7 and 3 for [non-trivial corpus of documents](https://github.com/yeraydiazdiaz/lunr.py/blob/master/tests/acceptance_tests/fixtures/mkdocs_index.json).
-
-Lunr.py also serializes `Index` instances respecting the [`lunr-schema`](https://github.com/olivernn/lunr-schema) which are consumable by Lunr.js and viceversa.
+Each version of lunr.py
+[targets a specific version of lunr.js](https://github.com/yeraydiazdiaz/lunr.py/blob/master/lunr/__init__.py#L12)
+and produces the same results as it both in Python 2.7 and 3 for
+[non-trivial corpus of documents](https://github.com/yeraydiazdiaz/lunr.py/blob/master/tests/acceptance_tests/fixtures/mkdocs_index.json).
 
-The API is in alpha stage and likely to change.
+Lunr.py also serializes `Index` instances respecting the
+[`lunr-schema`](https://github.com/olivernn/lunr-schema) which are consumable by
+Lunr.js and viceversa.
 
 ## Usage
 
-You'll need a list of dicts representing the documents you want to search on. These documents must have a unique field which will serve as a reference and a series of fields you'd like to search on.
+First, you'll need a list of dicts representing the documents you want to search on.
+These documents must have a unique field which will serve as a reference and a
+series of fields you'd like to search on.
 
 Lunr provides a convenience `lunr` function to quickly index this set of documents:
 
@@ -69,4 +97,5 @@ Lunr provides a convenience `lunr` function to quickly index this set of documen
 [{'ref': 'b', 'score': 0.23576799568081389, 'match_data': <MatchData "studi">}, {'ref': 'a', 'score': 0.2236629211724517, 'match_data': <MatchData "studi">}]
 ```
 
-Please refer to the [documentation](http://lunr.readthedocs.io/en/latest/) for more usage examples.
+Please refer to the [documentation](http://lunr.readthedocs.io/en/latest/)
+for more usage examples.
diff --git a/lunr/__init__.py b/lunr/__init__.py
@@ -1,5 +1,3 @@
-from __future__ import unicode_literals
-
 import logging
 
 from lunr.__main__ import lunr

diff --git a/lunr/__main__.py b/lunr/__main__.py
@@ -1,8 +1,3 @@
-from __future__ import unicode_literals
-
-
-from past.builtins import basestring
-
 from lunr import languages as lang
 from lunr.builder import Builder
 from lunr.stemmer import stemmer
@@ -31,7 +26,7 @@ def lunr(ref, fields, documents, languages=None):
         Index: The populated Index ready to search against.
     """
     if languages is not None and lang.LANGUAGE_SUPPORT:
-        if isinstance(languages, basestring):
+        if isinstance(languages, str):
             languages = [languages]
 
         unsupported_languages = set(languages) - set(lang.SUPPORTED_LANGUAGES)

diff --git a/lunr/builder.py b/lunr/builder.py
@@ -1,8 +1,5 @@
-from __future__ import unicode_literals, division
-
 from collections import defaultdict
 
-from builtins import str, dict  # noqa
 
 from lunr.pipeline import Pipeline
 from lunr.tokenizer import Tokenizer

diff --git a/lunr/exceptions.py b/lunr/exceptions.py
@@ -1,6 +1,3 @@
-from __future__ import unicode_literals
-
-
 class BaseLunrException(Exception):
     pass
 

diff --git a/lunr/field_ref.py b/lunr/field_ref.py
@@ -1,11 +1,6 @@
-from __future__ import unicode_literals
-
-import six
-
 from lunr.exceptions import BaseLunrException
 
 
-@six.python_2_unicode_compatible
 class FieldRef:
 
     JOINER = "/"

diff --git a/lunr/idf.py b/lunr/idf.py
@@ -1,5 +1,3 @@
-from __future__ import unicode_literals
-
 import math
 
 

diff --git a/lunr/index.py b/lunr/index.py
@@ -1,12 +1,7 @@
-from __future__ import unicode_literals
-
 from collections import defaultdict
 import json
 import logging
 
-from builtins import str, dict  # noqa
-from past.builtins import basestring
-
 from lunr.exceptions import BaseLunrException
 from lunr.field_ref import FieldRef
 from lunr.match_data import MatchData
@@ -341,7 +336,7 @@ def load(cls, serialized_index):
         """Load a serialized index"""
         from lunr import __TARGET_JS_VERSION__
 
-        if isinstance(serialized_index, basestring):
+        if isinstance(serialized_index, str):
             serialized_index = json.loads(serialized_index)
 
         if serialized_index["version"] != __TARGET_JS_VERSION__:

diff --git a/lunr/languages/__init__.py b/lunr/languages/__init__.py
@@ -1,4 +1,3 @@
-from __future__ import unicode_literals
 from itertools import chain
 from functools import partial
 

diff --git a/lunr/languages/trimmer.py b/lunr/languages/trimmer.py
@@ -1,5 +1,3 @@
-from __future__ import unicode_literals
-
 import re
 
 

diff --git a/lunr/match_data.py b/lunr/match_data.py
@@ -1,5 +1,3 @@
-from __future__ import unicode_literals
-
 from copy import deepcopy
 
 

diff --git a/lunr/pipeline.py b/lunr/pipeline.py
@@ -1,10 +1,5 @@
-from __future__ import unicode_literals
-
 import logging
 
-from builtins import str
-import six
-
 from lunr.exceptions import BaseLunrException
 from lunr.token import Token
 
@@ -86,7 +81,7 @@ def after(self, existing_fn, new_fn):
             index = self._stack.index(existing_fn)
             self._stack.insert(index + 1, new_fn)
         except ValueError as e:
-            six.raise_from(BaseLunrException("Cannot find existing_fn"), e)
+            raise BaseLunrException("Cannot find existing_fn") from e
 
     def before(self, existing_fn, new_fn):
         """Adds a single function before a function that already exists in the
@@ -98,7 +93,7 @@ def before(self, existing_fn, new_fn):
             index = self._stack.index(existing_fn)
             self._stack.insert(index, new_fn)
         except ValueError as e:
-            six.raise_from(BaseLunrException("Cannot find existing_fn"), e)
+            raise BaseLunrException("Cannot find existing_fn") from e
 
     def remove(self, fn):
         """Removes a function from the pipeline."""

diff --git a/lunr/query.py b/lunr/query.py
@@ -1,6 +1,3 @@
-from __future__ import unicode_literals
-
-
 from enum import Enum
 
 
@@ -12,7 +9,7 @@ class QueryPresence(Enum):
     PROHIBITED = 3  # documents that contain this term will not be returned
 
 
-class Query(object):
+class Query:
     """A `lunr.Query` provides a programmatic way of defining queries to be
     performed against a `lunr.Index`.
 
@@ -105,7 +102,7 @@ def is_negated(self):
         )
 
 
-class Clause(object):
+class Clause:
     """A single clause in a `lunr.Query` contains a term and details on
     how to match that term against a `lunr.Index`
 
@@ -135,7 +132,7 @@ def __init__(
         wildcard=Query.WILDCARD_NONE,
         presence=QueryPresence.OPTIONAL,
     ):
-        super(Clause, self).__init__()
+        super().__init__()
         self.term = term
         self.fields = fields or []
         self.edit_distance = edit_distance

diff --git a/lunr/query_lexer.py b/lunr/query_lexer.py
@@ -1,5 +1,3 @@
-from __future__ import unicode_literals
-
 from lunr.tokenizer import default_separator
 
 

diff --git a/lunr/query_parser.py b/lunr/query_parser.py
@@ -1,7 +1,3 @@
-from __future__ import unicode_literals
-
-import six
-
 from lunr.query_lexer import QueryLexer
 from lunr.query import Clause, QueryPresence
 from lunr.exceptions import QueryParseError
@@ -132,7 +128,7 @@ def parse_edit_distance(cls, parser):
         try:
             edit_distance = int(lexeme["string"])
         except ValueError as e:
-            six.raise_from(QueryParseError("Edit distance must be numeric"), e)
+            raise QueryParseError("Edit distance must be numeric") from e
 
         parser.current_clause.edit_distance = edit_distance
 
@@ -145,7 +141,7 @@ def parse_boost(cls, parser):
         try:
             boost = int(lexeme["string"])
         except ValueError as e:
-            six.raise_from(QueryParseError("Boost must be numeric"), e)
+            raise QueryParseError("Boost must be numeric") from e
 
         parser.current_clause.boost = boost
 

diff --git a/lunr/stemmer.py b/lunr/stemmer.py
@@ -1,5 +1,3 @@
-from __future__ import unicode_literals
-
 """
 Implementation of Porter Stemming Algorithm from
 https://tartarus.org/martin/PorterStemmer/python.txt

diff --git a/lunr/stop_word_filter.py b/lunr/stop_word_filter.py
@@ -1,7 +1,3 @@
-from __future__ import unicode_literals
-
-from builtins import str
-
 from lunr.pipeline import Pipeline
 
 WORDS = {

diff --git a/lunr/token.py b/lunr/token.py
@@ -1,11 +1,3 @@
-from __future__ import unicode_literals
-
-from builtins import str
-
-import six
-
-
-@six.python_2_unicode_compatible
 class Token:
     def __init__(self, string="", metadata=None):
         self.string = string

diff --git a/lunr/token_set.py b/lunr/token_set.py
@@ -1,11 +1,3 @@
-from __future__ import unicode_literals
-
-from builtins import str
-
-import six
-
-
-@six.python_2_unicode_compatible
 class TokenSet:
     """
     A token set is used to store the unique list of all tokens

diff --git a/lunr/token_set_builder.py b/lunr/token_set_builder.py
@@ -1,7 +1,3 @@
-from __future__ import unicode_literals
-
-from builtins import str
-
 from lunr.token_set import TokenSet
 from lunr.exceptions import BaseLunrException
 

diff --git a/lunr/tokenizer.py b/lunr/tokenizer.py
@@ -1,6 +1,3 @@
-from __future__ import unicode_literals
-
-from builtins import str
 from copy import deepcopy
 
 from lunr.token import Token

diff --git a/lunr/trimmer.py b/lunr/trimmer.py
@@ -1,5 +1,3 @@
-from __future__ import unicode_literals
-
 import re
 
 from lunr.pipeline import Pipeline

diff --git a/lunr/utils.py b/lunr/utils.py
@@ -1,8 +1,3 @@
-from __future__ import unicode_literals
-
-from builtins import str
-
-
 def as_string(obj):
     return "" if not obj else str(obj)
 

diff --git a/lunr/vector.py b/lunr/vector.py
@@ -1,5 +1,3 @@
-from __future__ import unicode_literals, division
-
 from math import sqrt
 
 from lunr.exceptions import BaseLunrException