Skip to content

Commit

Permalink
Documentation for language support
Browse files Browse the repository at this point in the history
  • Loading branch information
Yeray Diaz Diaz committed Apr 15, 2018
1 parent c566cca commit 6536880
Show file tree
Hide file tree
Showing 8 changed files with 78 additions and 5 deletions.
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Changelog

## 0.2.0 (Unreleased)
## 0.2.0

- Support for languages via NLTK
- Experimental support for languages via NLTK, currently supported languages are arabic, danish, dutch, english, finnish, french, german, hungarian, italian, norwegian, portuguese, romanian, russian, spanish and swedish. Note compatibility with Lunr.js and lunr-languages is reduced.

## 0.1.2

Expand Down
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,16 @@ Lunr.py provides a backend solution, allowing you to parse the documents ahead o

Of course you could also use Lunr.py to power full text search in desktop applications or backend services to search on your documents mimicking Elasticsearch.

## Installation

Simply `pip install lunr` for the english only, best compatibility with Lunr.js version.

An optional and experimental support for other languages via the [Natural Language Toolkit](http://www.nltk.org/) stemmers is also available via `pip install lunr[languages]`.

Supported languages are arabic, danish, dutch, english, finnish, french, german, hungarian, italian, norwegian, portuguese, romanian, russian, spanish and swedish.

Note the compatibility with Lunr.js is not guaranteed at the moment when using this experimental feature.

## Current state

Each version of lunr.py [targets a specific version of lunr.js](https://github.com/yeraydiazdiaz/lunr.py/blob/master/lunr/__init__.py#L12) and produces the same results as it both in Python 2.7 and 3 for [non-trivial corpus of documents](https://github.com/yeraydiazdiaz/lunr.py/blob/master/tests/acceptance_tests/fixtures/mkdocs_index.json).
Expand Down
10 changes: 10 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,16 @@ Lunr.py provides a backend solution, allowing you to parse the documents ahead o

Of course you could also use Lunr.py to power full text search in desktop applications or backend services to search on your documents mimicking Elasticsearch.

## Installation

Simply `pip install lunr` for the english only, best compatibility with Lunr.js version.

An optional and experimental support for other languages via the [Natural Language Toolkit](http://www.nltk.org/) stemmers is also available via `pip install lunr[languages]`.

Supported languages are arabic, danish, dutch, english, finnish, french, german, hungarian, italian, norwegian, portuguese, romanian, russian, spanish and swedish.

Note the compatibility with Lunr.js is not guaranteed at the moment when using this experimental feature.

## Current state

Each version of lunr.py [targets a specific version of lunr.js](https://github.com/yeraydiazdiaz/lunr.py/blob/master/lunr/__init__.py#L12) and produces the same results as it both in Python 2.7 and 3 for [non-trivial corpus of documents](https://github.com/yeraydiazdiaz/lunr.py/blob/master/tests/acceptance_tests/fixtures/mkdocs_index.json).
Expand Down
52 changes: 52 additions & 0 deletions docs/languages.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Language support

An optional and experimental support for other languages via the [Natural Language Toolkit](http://www.nltk.org/) stemmers. To install Lunr with this feature use `pip install lunr[languages]`.

Assuming you have a set of documents in one of the supported languages:

- arabic
- danish
- dutch
- english
- finnish
- french
- german
- hungarian
- italian
- norwegian
- portuguese
- romanian
- russian
- spanish
- swedish

```python
>>> documents = [
... {
... "id": "a",
... "text": (
... "Este es un ejemplo inventado de lo que sería un documento en el "
... "idioma que se más se habla en España."),
... "title": "Ejemplo de documento en español"
... },
... {
... "id": "b",
... "text": (
... "Según un estudio que me acabo de inventar porque soy un experto en"
... "idiomas que se hablan en España."),
... "title": "Español es el tercer idioma más hablado del mundo"
... },
... ]
```

Simply define specify the [ISO-639-1 code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) for the language of you documents as a parameter to the `lunr` function:

```python
>>> from lunr import lunr
>>> idx = lunr('id', ['title', 'text'], documents, language='es')
>>> idx.search('inventando')
[{'ref': 'a', 'score': 0.1300928764641503, 'match_data': <MatchData "invent">},
{'ref': 'b', 'score': 0.08967151299297255, 'match_data': <MatchData "invent">}]
```

Please note compatibility with Lunr.js might be affected when using this feature.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ site_name: Lunr.py
pages:
- Home: index.md
- Searching: usage.md
- Languages: languages.md
1 change: 1 addition & 0 deletions requirements/dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
twine==1.11.0
mkdocs==0.17.3
pytest-benchmark==3.1.1
wheel==0.31.0
1 change: 0 additions & 1 deletion requirements/packaging.txt

This file was deleted.

4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def find_version():
name='lunr',
version=find_version(),
url='https://github.com/yeraydiazdiaz/lunr.py',
license='BSD',
license='MIT',
description='A Python implementation of Lunr.js',
long_description=LONG_DESCRIPTION,
author='Yeray Diaz Diaz',
Expand All @@ -53,7 +53,7 @@ def find_version():
classifiers=[
'Development Status :: 3 - Alpha',
'Intended Audience :: Developers',
'License :: OSI Approved :: BSD License',
'License :: OSI Approved :: MIT License',
'Operating System :: OS Independent',
'Programming Language :: Python',
'Programming Language :: Python :: 2',
Expand Down

0 comments on commit 6536880

Please sign in to comment.