NLP & Elasticsearch powered product search

Sample app intended to illustrate how Natural Language Processing (NLP), specifically Named Entity Recognition ( NER) can be used to improve the accuracy of Elasticsearch queries and the overall user experience. There are two key benefits to using NLP alongside Elasticsearch (or any other full text search engine):

Pre-selecting search filters

Given the query 'black jacket costing less than $200', we can infer the color and max price, and apply these search filters for the user. This concept can be extended to other fields (e.g. brand) and also support conjugations e.g. 'black or dark green barbour jacket'

Distinguishing between essential and desirable query terms

Imagine you work for an outdoor clothing and equipment store. You're building a catalog search feature. Given the query 'packable jacket', how should the database choose between a 'packable mosquito net' and a 'lightweight jacket'. Both products partially match. TF-IDF will most likely select the mosquito net as there will be fewer instances of 'packable' than 'jacket' in the corpus. However when looking at the query it's clear that the lightweight jacket would be the better match.

We typically solve this problem by boosting certain document fields e.g. by attaching more weight to the title or product type fields than the description. This sort of works, but the logic is wrong. We're essentially telling the shopper "based on what we sell, this is what we think is important to you".

Humans understand that given the query 'packable jacket', the shopper wants a jacket first and foremost. That's because we understand that 'jacket' is a product type and 'packable' is an attribute of the product. Natural Language Processing (NLP) allows us to apply this same reasoning programmatically. In simple terms we can perform an elasticsearch bool query in which we must have a match for 'jacket' and should have a match for 'packable'.

Caveats

Firstly, and most importantly this is not a production implementation. The NLP model used for this example is really basic. For production use we'd build something far more robust, trained with historic search data. We'd also employ Part of Speech Tagging along with Dependency Parsing to get a better understanding of the sentences and fragments of text.

Secondly, the elasticsearch code is very basic. For production use we'd want to use custom tokenizers, analysers & synonyms. Of course, we'd have many more fields and lots more documents.

Finally, there's no error handling!.

So please treat this in the spirit in which it was created - a proof of concept!

Getting started

Setup your environment
Fire up an elasticsearch instance
Create the index and mapping
Import some test data
Fire up a simple webserver to handle search queries
Cleanup

Setup your environment

The python code needs a 3.9.7+ environment. I recommend running this in a virtualenv using either venv or pyenv/virtualenv

$ pyenv install 3.9.7
$ pyenv virtualenv 3.9.7 nlp-search-poc
$ pyenv local nlp-search-poc 
$ pip install -U pip
$ pip install -r requirements.txt

Run elasticsearch

I've provided a docker-compose.yml file, so you can fire up a simple elasticsearch instance

$ docker-compose up -d elasticsearch-7

Test the setup

Python dependencies and paths can be tricky, so I provided a simple utility to check everything is working as expected. Note: elasticsearch can take a few seconds to come online.

$ python -m src.tools ping
Elasticsearch alive: True

Create the index & import test data

$ python -m src.tools create
productRepository  INFO      Creating products index
productRepository  INFO      products created
$ python -m src.tools ingest
productRepository  INFO      Ingesting lightweight black jacket
productRepository  INFO      Ingesting midweight black jacket
...

Run the server

I created a wrapper shell script to fire up uvicorn/fastapi

$ bin/server.sh
uvicorn.error    INFO      Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
...

Perform the search

Make a GET request to http://localhost:8000 passing a json body:

{
    "query": "lightweight black jacket less than $100"
}

Postman is probably the best tool for this, but I've also included a simple client:

$ python -m src.client 'lightweight black jacket less than $100'

{
    "ner_prediction": {
        "text": "lightweight black jacket less than $100",
        "product": "jacket",
        "price_from": null,
        "price_to": 100,
        "colors": [
            "black"
        ],
        "attrs": [
            "lightweight"
        ]
    },
    "results": [
        {
            "title": "lightweight black jacket",
            "product_type": "jacket",
            "price": 100,
            "colors": [
                "black"
            ],
            "attrs": [
                "lightweight"
            ]
        }
    ]
}

Important: If you choose to use this script you should enclose your search query in single quotes to avoid variable expansion.

Cleanup

Kill the running server

Hit Ctrl + c

Don't worry about the asyncio.exceptions.CancelledError - it's caused by the hot reload feature of the uvicorn server.

Drop the index

$ python -m src.tools drop
productRepository  INFO      Dropping products index
productRepository  INFO      products dropped

Take down elasticsearch

$ docker-compose down
Stopping elasticsearch-7 ... done
Removing elasticsearch-7 ... done
Removing network nlp-search-poc_default

Docker (optional)

I've provided a Dockerfile in case you want to run everything inside docker

$ docker build -t nlp-search-poc .

Then run elasticsearch and the server

$ docker-compose up -d

Ingesting test data

If you also want to use docker to ingest the test data into elasticsearch you can do so:

$ docker run -it --rm --network nlp-search-poc_default -e "ELASTIC_SEARCH_HOST=elasticsearch-7" nlp-search-poc "python" "-m" "src.tools" "reset"

Note: The network name is determined by docker's networking rules

Querying

docker-compose.yml exposes the server's port 8000, so you can query as before:

$ python -m src.client 'packable jacket'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP & Elasticsearch powered product search

Pre-selecting search filters

Distinguishing between essential and desirable query terms

Caveats

Getting started

Setup your environment

Run elasticsearch

Test the setup

Create the index & import test data

Run the server

Perform the search

Cleanup

Kill the running server

Drop the index

Take down elasticsearch

Docker (optional)

Ingesting test data

Querying

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bin		bin
conf		conf
data		data
nlp_models		nlp_models
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

License

viko-ai/nlp-search-poc

Folders and files

Latest commit

History

Repository files navigation

NLP & Elasticsearch powered product search

Pre-selecting search filters

Distinguishing between essential and desirable query terms

Caveats

Getting started

Setup your environment

Run elasticsearch

Test the setup

Create the index & import test data

Run the server

Perform the search

Cleanup

Kill the running server

Drop the index

Take down elasticsearch

Docker (optional)

Ingesting test data

Querying

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages