# Exploration of Match Query

In [1]:
# Import elastic search module
from elasticsearch import Elasticsearch,ElasticsearchException

In [2]:
host = 'http://localhost:9200/'
user = "elastic"
password = "khan123"
index_name = 'employee101'
doc_type = "_doc"
elastic_obj = Elasticsearch([host],http_auth=(user, password)) # elastci_object
if not elastic_obj.ping():
    print("Elasticsearch server is not running")
else:
    print("Elastic search engine is running........")


Elastic search engine is running........


In [3]:
# This is standard elasticsearch format, we need to make it to inser bulk data

def __elastic_format(row):
    return {"_index": index_name, "_doc":"_doc", "_source": row}


In [4]:
 
 data_mapping = {"mappings":
            {
            "properties":{

            "id": {
                "type": "integer"
            },
            "name": {
                "type": "text"
            },

            "email": {
                "type": "keyword"
            },
            "date_of_birth": {
                "type": "date",
            "format": "dd/MM/yyyy"
            },

            "ip_address": {
                "type": "ip"
            },
            "gender": {
                "type": "keyword"
            },
           
            "company": {
                "type": "text"
            },
        "position": {
                "type": "text"
            },
 "experience": {
                "type": "integer"
            },
"country": {
                "type": "text"
            },
"phrase": {
                "type": "text"
            },
"salary": {
                "type": "integer"
            }
        }
            }
        }



In [5]:
data_list = [
    
{"id":1,"name":"Huntlee Dargavel","email":"hdargavel0@japanpost.jp","gender":"male","ip_address":"58.11.89.193","date_of_birth":"11/09/1990","company":"Talane","position":"Research Associate","experience":7,"country":"China","phrase":"Multi-channelled coherent leverage","salary":180025},

{"id":2,"name":"Othilia Cathel","email":"ocathel1@senate.gov","gender":"female","ip_address":"3.164.153.228","date_of_birth":"22/07/1987","company":"Edgepulse","position":"Structural Engineer","experience":11,"country":"China","phrase":"Grass-roots heuristic help-desk","salary":193530},

{"id":3,"name":"Winston Waren","email":"wwaren2@4shared.com","gender":"male","ip_address":"202.37.210.94","date_of_birth":"10/11/1985","company":"Yozio","position":"Human Resources Manager","experience":12,"country":"China","phrase":"Versatile object-oriented emulation","salary":50616},

{"id" : 4,"name" : "Alan Thomas","email" : "athomas2@example.com","gender" : "male","ip_address" : "200.47.210.95","date_of_birth" : "11/12/1985","company" : "Yamaha","position" : "Resources Manager","experience" : 12,"country" : "China","phrase" : "Emulation of roots heuristic coherent systems","salary" : 300000}
             
           ]
elastic_data1 = []
for row in data_list:
    formatted_data = __elastic_format(row)
    elastic_data1.append(formatted_data)
    
len(elastic_data1)

4

In [6]:
# Create new instance of elastcisearch
from elasticsearch.helpers import bulk
res = elastic_obj.indices.create(index = index_name,ignore=400,body = data_mapping)
# Now insert our data
bulk(elastic_obj, elastic_data1)

(4, [])

In [7]:
def fetch_elastic_data(elastic_index,query):
    try:
        data = elastic_obj.search(index=elastic_index, body=query)
    except ElasticsearchException as e:
        print(str(e))
#     print(data)
    hits = data['hits']['hits']
    return hits

In [8]:
import pandas as pd
def show_result(elastic_result):
    list_dict = []
    for row in elastic_result:
        data = row['_source']
        list_dict.append(data)
    
    datafram = pd.DataFrame(list_dict)
    return datafram 
    

In [9]:
query = {
    "size":120,
  "query": {
    "match_all": {}
  }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
1,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
2,3,Winston Waren,wwaren2@4shared.com,male,202.37.210.94,10/11/1985,Yozio,Human Resources Manager,12,China,Versatile object-oriented emulation,50616
3,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000
4,4,Jennifer Lawrence,jlaw@example.com,female,100.37.110.59,17/05/1995,Monsnto,Resources Manager,10,Germany,Emulation of roots heuristic complete systems,300000


# Match Query
The “match” query is one of the most basic and commonly used queries in Elasticsearch and functions as a full-text query. We can use this query to search for text, numbers or boolean values.<br>
Let us search for the word “heuristic” contained in the field called “phrase” in the documents we ingested earlier.



In [10]:
query = {
  "query": {
    "match": {
      "phrase": {
        "query" : "heuristic"
      }
    }
  }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.head(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
1,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000
2,4,Jennifer Lawrence,jlaw@example.com,female,100.37.110.59,17/05/1995,Monsnto,Resources Manager,10,Germany,Emulation of roots heuristic complete systems,300000


**Out of the 4 documents in our Index, only 2 documents return containing the word “heuristic” in the “phrase” field**

# What happens if we want to search for more than one word? Using the same query we just performed, let’s search for “heuristic roots help

In [11]:
query = {
  "query": {
    "match": {
      "phrase": {
        "query" :"heuristic roots help"
      }
    }
  }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.head(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
1,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000
2,4,Jennifer Lawrence,jlaw@example.com,female,100.37.110.59,17/05/1995,Monsnto,Resources Manager,10,Germany,Emulation of roots heuristic complete systems,300000


**This returns the same document as before because by default, Elasticsearch treats each word in the search query with an OR operator. In our case, the query will match any document which contains “heuristic” OR “roots” OR “help**

# Note
Changing The Operator Parameter
The default behavior of the OR operator being applied to multi-word searches can be changed using the “operator” parameter passed along with the “match” query.
We can specify the operator parameter with “OR” or “AND” values.
Let’s see what happens when we provide the operator parameter “AND” in the same query we performed earlier

In [12]:
query = {
  "query": {
    "match": {
      "phrase": {
        "query" :"heuristic roots help",
          "operator" : "AND"
      }
    }
  }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.head(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
1,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530


Now the results will return only one document (document id=2) since that is the only document containing all three search keywords in the “phrase” field.

# minimum_should_match
Taking things a bit further, we can set a threshold for a minimum amount of matching words that the document must contain. For example, if we set this parameter to 1, the query will check for any documents with a minimum of 1 matching word.
Now if we set the “minium_should_match” parameter to 3, then all three words must appear in the document in order to be classified as a match.<br>
**In our case, the following query would return only 1 document (with id=2) as that is the only one matching our criteria**



In [13]:
query = {
  "query": {
    "match": {
      "phrase": {
        "query" :"heuristic roots help",
          "minimum_should_match": 3
      }
    }
  }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.head(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
1,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530


# Multi-Match Query
So far we’ve been dealing with matches on a single field – that is we searched for the keywords inside a single field named “phrase”.
But what if we needed to search keywords across multiple fields in a document? This is where the multi-match query comes into play.
Let’s try an example search for the keyword “research help” in the “position” and “phrase” fields contained in the documents.


In [14]:
query = {
  "query": {
    "multi_match": {
        "query" : "research help"
        , "fields": ["position","phrase"]
    }
  }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.head(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
1,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
2,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
3,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530


# Match Phrase
Match_phrase is another commonly used query which, like its name indicates, matches phrases in a field.
If we needed to search for the phrase “roots heuristic coherent” in the “phrase” field in the employee index, we can use the “match_phrase” query:


In [15]:
query = {
  "query": {
    "match_phrase": {
      "phrase": {
        "query": "roots heuristic coherent"
      }
    }
  }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.head(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000
1,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000


**This will return the documents with the exact phrase “roots heuristic coherent”, including the order of the words. In our case, we have only one result matching the above criteria, as shown in the below response**

# Slop Parameter
A useful feature we can make use of in the match_phrase query is the “slop” parameter which allows us to create more flexible searches.
Suppose we searched for “roots coherent” with the match_phrase query. We wouldn’t receive any documents returned from the employee index. This is because for match_phrase to match, the terms need to be in the exact order.
Now, let’s use the slop parameter and see what happens

In [16]:
query = {
  "query": {
    "match_phrase": {
      "phrase": {
        "query": "roots coherent",
        "slop": 1
      }
    }
  }
}

records = fetch_elastic_data(index_name,query)
df_frame = show_result(records)
df_frame.head(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000
1,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000


**With slop=1, the query is indicating that it is okay to move one word for a match, and therefore we’ll receive the following response. In the below response, you can see that the “roots coherent” matched the “roots heuristic coherent” document. This is because the slop parameter allows skipping 1 term**.

# Match Phrase Prefix.
The match_phrase_prefix query is similar to the match_phrase query, but here the last term of the search keyword is considered as a prefix and is used to match any term starting with that prefix term.
First, let’s insert a document into our index to better understand the match_phrase_prefix query:

In [17]:
document = {
  "id": 4,
  "name": "Jennifer Lawrence",
  "email": "jlaw@example.com",
  "gender": "female",
  "ip_address": "100.37.110.59",
  "date_of_birth": "17/05/1995",
  "company": "Monsnto",
  "position": "Resources Manager",
  "experience": 10,
  "country": "Germany",
  "phrase": "Emulation of roots heuristic complete systems",
  "salary": 300000
}
# Create first index, Are you exited to create!,hmmm
response = elastic_obj.index(index=index_name, body=document)
response

{'_index': 'employee101',
 '_type': '_doc',
 '_id': 'q0KjL3gBPzEYHeBxBL4M',
 '_version': 1,
 'result': 'created',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 11,
 '_primary_term': 8}

In [18]:
query = {
"_source": [ "phrase" ],
  "query": {
    "match_phrase_prefix": {
      "phrase": {
        "query": "roots heuristic co"
      }
    }
  }
}

records = fetch_elastic_data(index_name,query)
df_frame = show_result(records)
df_frame.head(10) # Limit result

Unnamed: 0,phrase
0,Emulation of roots heuristic coherent systems
1,Emulation of roots heuristic complete systems
2,Emulation of roots heuristic coherent systems


**In the results below, we can see that the documents with coherent and complete matched the query. We can also use the slop parameter in the “match_phrase” query.**

# Term Level Queries
**Term level queries are used to query structured data, which would usually be the exact values.**
# 2.1 Term Query/Terms Query
This is the simplest of the term level queries. This query searches for the exact match of the search keyword against the field in the documents.
For example, if we search for the word “Male” using the term query against the field “gender”, it will search exactly as the word is, even with the casing.


In [19]:
query = {
  "query": {
    "term":{
        "gender" :"female"
    }
  }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
1,4,Jennifer Lawrence,jlaw@example.com,female,100.37.110.59,17/05/1995,Monsnto,Resources Manager,10,Germany,Emulation of roots heuristic complete systems,300000
2,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530


In [20]:
query = {
  "query": {
    "term":{
        "gender" :"Female"
    }
  }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result

**In the above case, the only difference between the two queries is that of the casing of the search keyword. Case 1 had all lowercase, which was matched because that is how it was saved against the field. But for Case 2, the search didn’t get any result, because there was no such token against the field “gender” with a capitalized “F”**

We can also pass multiple terms to be searched on the same field, by using the terms query. Let us search for “female” and “male” in the gender field. For that, we can use the terms query as below:

In [21]:
query = {
  "query": {
    "terms": {
      "gender": [
        "female",
        "male"
      ]
    }
  }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
1,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
2,3,Winston Waren,wwaren2@4shared.com,male,202.37.210.94,10/11/1985,Yozio,Human Resources Manager,12,China,Versatile object-oriented emulation,50616
3,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000
4,4,Jennifer Lawrence,jlaw@example.com,female,100.37.110.59,17/05/1995,Monsnto,Resources Manager,10,Germany,Emulation of roots heuristic complete systems,300000
5,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
6,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
7,3,Winston Waren,wwaren2@4shared.com,male,202.37.210.94,10/11/1985,Yozio,Human Resources Manager,12,China,Versatile object-oriented emulation,50616
8,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000


#  2.2 Exists Queries
Sometimes it happens that there is no indexed value for a field, or the field does not exist in the document. In such cases, it helps in identifying such documents and analyzing the impact.
For example, let us index a document like below to the “employees” index

In [22]:
document = {
  "id": 5,
  "name": "Michael Bordon",
  "email": "mbordon@example.com",
  "gender": "male",
  "ip_address": "10.47.210.65",
  "date_of_birth": "12/12/1995",
  "position": "Resources Manager",
  "experience": 12,
  "country": None,
  "phrase": "Emulation of roots heuristic coherent systems",
  "salary": 300000
}
# Create first index, Are you exited to create!,hmmm
response = elastic_obj.index(index=index_name, body=document)
response

{'_index': 'employee101',
 '_type': '_doc',
 '_id': 'sEKjL3gBPzEYHeBxBr5J',
 '_version': 1,
 'result': 'created',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 12,
 '_primary_term': 8}

**This document has no field named “company” and the value of the “country” field is null**
Now if we want to find the documents with the field “company”, we can use the exist query as below

In [23]:
# The above query will list all the documents which have the field “company”.
query = {
    "query": {
        "exists": {
            "field": "company"
        }
    }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
1,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
2,3,Winston Waren,wwaren2@4shared.com,male,202.37.210.94,10/11/1985,Yozio,Human Resources Manager,12,China,Versatile object-oriented emulation,50616
3,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000
4,4,Jennifer Lawrence,jlaw@example.com,female,100.37.110.59,17/05/1995,Monsnto,Resources Manager,10,Germany,Emulation of roots heuristic complete systems,300000
5,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
6,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
7,3,Winston Waren,wwaren2@4shared.com,male,202.37.210.94,10/11/1985,Yozio,Human Resources Manager,12,China,Versatile object-oriented emulation,50616
8,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000


**Perhaps a more useful solution would be to list all the documents without the “company” field. This can also be achieved by using the exist query as below**

In [24]:
# The above query will list all the documents which have the field “company”.
query = {
  "query": {
    "bool": {
      "must_not": [
        {
          "exists": {
            "field": "company"
          }
        }
      ]
    }
  }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result

In [25]:
 #Let us delete the now inserted document from the index, for the cause of convenience and uniformity
query = {
    "query": {
        "match": {"id":5}
    }
}
resp = elastic_obj.delete_by_query(index=index_name, body=query )
resp

{'took': 144,
 'timed_out': False,
 'total': 1,
 'deleted': 1,
 'batches': 1,
 'version_conflicts': 0,
 'noops': 0,
 'retries': {'bulk': 0, 'search': 0},
 'throttled_millis': 0,
 'requests_per_second': -1.0,
 'throttled_until_millis': 0,
 'failures': []}

#  2.3 Range Queries
Another most commonly used query in the Elasticsearch world is the range query. The range query allows us to get the documents that contain the terms within the specified range. Range query is a term level query (means using to query structured data) and can be used against numerical fields, date fields, etc.<br>
**Range query on numeric fields**<br>
we have created, if we need to filter out the people who have experience level between 5 to 10 years, we can apply the following range query for the same

In [26]:
# The above query will list all the documents which have the field “company”.
query = {
    "query": {
        "range" : {
            "experience" : {
                "gte" : 5,
                "lte" : 10
            }
        }
    }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
1,4,Jennifer Lawrence,jlaw@example.com,female,100.37.110.59,17/05/1995,Monsnto,Resources Manager,10,Germany,Emulation of roots heuristic complete systems,300000
2,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
3,4,Jennifer Lawrence,jlaw@example.com,female,100.37.110.59,17/05/1995,Monsnto,Resources Manager,10,Germany,Emulation of roots heuristic complete systems,300000


# Range query on date fields
Similarly, range queries can be applied to the date fields as well. If we need to find out those who were born after 1986, we can fire a query like the one given below:


In [27]:
# The above query will list all the documents which have the field “company”.
query = {
    "query": {
        "range" : {
            "date_of_birth" : {
                "gte" : "01/01/1986"
            }
        }
    }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
1,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
2,4,Jennifer Lawrence,jlaw@example.com,female,100.37.110.59,17/05/1995,Monsnto,Resources Manager,10,Germany,Emulation of roots heuristic complete systems,300000
3,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
4,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
5,4,Jennifer Lawrence,jlaw@example.com,female,100.37.110.59,17/05/1995,Monsnto,Resources Manager,10,Germany,Emulation of roots heuristic complete systems,300000


# 2.4  Prefix Queries
The prefix query is used to fetch documents that contain the given search string as the prefix in the specified field.
Suppose we need to fetch all documents which contain “al” as the prefix in the field “name”, then we can use the prefix query as below:

In [28]:

query = {
  "query": {
    "prefix": {
      "name": "al"
    }
  }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000
1,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000


Since the prefix query is a term query, it will pass the search string as it is. That is searching for “al” and “Al” is different. If in the above example, we search for “Al”, we will get 0 results as there is no token starting with “Al” in the inverted index of the field “name”. But if we query on the field “name.keyword”, with “Al” we will get the above result and in this case, querying for “al” will result in zero hits.

# 2.6 Wildcard Queries
Will fetch the documents that have terms that match the given wildcard pattern.
For example, let us search for “c*a” using the wildcard query on the field “country” like below:

In [29]:

query = {
    "query": {
        "wildcard": {
            "country": {
                "value": "c*a"
            }
        }
    }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
1,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
2,3,Winston Waren,wwaren2@4shared.com,male,202.37.210.94,10/11/1985,Yozio,Human Resources Manager,12,China,Versatile object-oriented emulation,50616
3,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000
4,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
5,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
6,3,Winston Waren,wwaren2@4shared.com,male,202.37.210.94,10/11/1985,Yozio,Human Resources Manager,12,China,Versatile object-oriented emulation,50616
7,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000


The above query will fetch all the documents with the “country” name starting with “c” and ending with “a” (eg: China, Canada, Cambodia, etc).

**Here the * operator can match zero or more characters.**

# 2.7 Regexp
This is similar to the “wildcard” query we saw above but will accept regular expressions as input and fetch documents matching those.

In [30]:

query = {
  "query": {
    "regexp": {
      "position": "res[a-z]*"
    }
  }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
1,3,Winston Waren,wwaren2@4shared.com,male,202.37.210.94,10/11/1985,Yozio,Human Resources Manager,12,China,Versatile object-oriented emulation,50616
2,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000
3,4,Jennifer Lawrence,jlaw@example.com,female,100.37.110.59,17/05/1995,Monsnto,Resources Manager,10,Germany,Emulation of roots heuristic complete systems,300000
4,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
5,3,Winston Waren,wwaren2@4shared.com,male,202.37.210.94,10/11/1985,Yozio,Human Resources Manager,12,China,Versatile object-oriented emulation,50616
6,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000
7,4,Jennifer Lawrence,jlaw@example.com,female,100.37.110.59,17/05/1995,Monsnto,Resources Manager,10,Germany,Emulation of roots heuristic complete systems,300000


**The above query will get us the documents matching the words that match the regular expression res[a-z]***

# 2.8 Fuzzy
The Fuzzy query can be used to return documents containing terms similar to that of the search term. This is especially good when dealing with spelling mistakes.
We can get results even if we search for “Chnia” instead of “China”, using the fuzzy query.

In [31]:

query = {
  "query": {
    "fuzzy": {
      "country": {
        "value": "Chnia",
        "fuzziness": "2"
      }
    }
  }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
1,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
2,3,Winston Waren,wwaren2@4shared.com,male,202.37.210.94,10/11/1985,Yozio,Human Resources Manager,12,China,Versatile object-oriented emulation,50616
3,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000
4,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
5,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
6,3,Winston Waren,wwaren2@4shared.com,male,202.37.210.94,10/11/1985,Yozio,Human Resources Manager,12,China,Versatile object-oriented emulation,50616
7,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000


Here fuzziness is the maximum edit distance allowed for matching. The parameters like “max_expansions” etc, which we saw in the “match_phrase” query can also be used
**Fuzzy queries can also come in with the “match” query types. The following example shows the fuzziness being used in a multi_match query**

In [32]:

query = {
    "query": {
        "multi_match" : {
            "query" : "heursitic reserch",
            "fields": ["phrase","position"],
            "fuzziness": 2
        }
    },
    "size": 10
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
1,1,Huntlee Dargavel,hdargavel0@japanpost.jp,male,58.11.89.193,11/09/1990,Talane,Research Associate,7,China,Multi-channelled coherent leverage,180025
2,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
3,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
4,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000
5,4,Jennifer Lawrence,jlaw@example.com,female,100.37.110.59,17/05/1995,Monsnto,Resources Manager,10,Germany,Emulation of roots heuristic complete systems,300000
6,4,Alan Thomas,athomas2@example.com,male,200.47.210.95,11/12/1985,Yamaha,Resources Manager,12,China,Emulation of roots heuristic coherent systems,300000
7,4,Jennifer Lawrence,jlaw@example.com,female,100.37.110.59,17/05/1995,Monsnto,Resources Manager,10,Germany,Emulation of roots heuristic complete systems,300000


**The above query will return the documents matching either “heuristic” or “research” despite the spelling mistakes in the query.**

# Boosting
While querying, it is often helpful to get the more favored results first. The simplest way of doing this is called boosting in Elasticsearch. And this comes in handy when we query multiple fields. 

In [33]:

query = {
    "query": {
        "multi_match" : {
            "query" : "versatile Engineer",
            "fields": ["position^3", "phrase"]
        }
    }
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
1,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
2,3,Winston Waren,wwaren2@4shared.com,male,202.37.210.94,10/11/1985,Yozio,Human Resources Manager,12,China,Versatile object-oriented emulation,50616
3,3,Winston Waren,wwaren2@4shared.com,male,202.37.210.94,10/11/1985,Yozio,Human Resources Manager,12,China,Versatile object-oriented emulation,50616


**This will return the response with the documents matching the “position” field to be in the top rather than with that of the field “phrase”.**

# Default Sorting:
When there is no sort parameter specified in the search request, Elasticsearch returns the document based on the descending values of the “_score” field. This “_score” is computed by how well the query has matched using the default scoring methodologies of Elasticsearch. In all the examples we have discussed above you can see the same behavior in the results.

In [34]:
query = {
    "size":120,
  "query": {
    "match": {"country":"China"}
  }
}
records = fetch_elastic_data(index_name,query)

# df_frame = show_result(records)
# df_frame.tail(10) # Limit result
records

[{'_index': 'employee101',
  '_type': '_doc',
  '_id': 'XJHSEngBSW6JAdey4eh-',
  '_score': 0.2578291,
  '_source': {'id': 1,
   'name': 'Huntlee Dargavel',
   'email': 'hdargavel0@japanpost.jp',
   'gender': 'male',
   'ip_address': '58.11.89.193',
   'date_of_birth': '11/09/1990',
   'company': 'Talane',
   'position': 'Research Associate',
   'experience': 7,
   'country': 'China',
   'phrase': 'Multi-channelled coherent leverage',
   'salary': 180025}},
 {'_index': 'employee101',
  '_type': '_doc',
  '_id': 'XZHSEngBSW6JAdey4eh-',
  '_score': 0.2578291,
  '_source': {'id': 2,
   'name': 'Othilia Cathel',
   'email': 'ocathel1@senate.gov',
   'gender': 'female',
   'ip_address': '3.164.153.228',
   'date_of_birth': '22/07/1987',
   'company': 'Edgepulse',
   'position': 'Structural Engineer',
   'experience': 11,
   'country': 'China',
   'phrase': 'Grass-roots heuristic help-desk',
   'salary': 193530}},
 {'_index': 'employee101',
  '_type': '_doc',
  '_id': 'XpHSEngBSW6JAdey4eh-',


# Sort by a Field
Elasticsearch gives us the option to sort on the basis of a field. Say, let us need to sort the employees based on their descending order of experience

In [35]:
query = {
   "_source": ["name","experience","salary"], 
  "sort": [
    {
      "experience": {
        "order": "desc"
      }
    }
  ],

}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result


Unnamed: 0,name,experience,salary
0,Winston Waren,12,50616
1,Alan Thomas,12,300000
2,Winston Waren,12,50616
3,Alan Thomas,12,300000
4,Othilia Cathel,11,193530
5,Othilia Cathel,11,193530
6,Jennifer Lawrence,10,300000
7,Jennifer Lawrence,10,300000
8,Huntlee Dargavel,7,180025
9,Huntlee Dargavel,7,180025


# Sort by Multiple Fields


In [36]:
query = {
  "_source": [
    "name",
    "experience",
    "salary"
  ],
  "sort": [
    {
      "salary": {
        "order": "desc"
      }
    },
    {
      "experience": {
        "order": "desc"
      }
    }
  ]
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.tail(10) # Limit result


Unnamed: 0,name,experience,salary
0,Alan Thomas,12,300000
1,Alan Thomas,12,300000
2,Jennifer Lawrence,10,300000
3,Jennifer Lawrence,10,300000
4,Othilia Cathel,11,193530
5,Othilia Cathel,11,193530
6,Huntlee Dargavel,7,180025
7,Huntlee Dargavel,7,180025
8,Winston Waren,12,50616
9,Winston Waren,12,50616


# Pagination

In [40]:

query = {
    "query": {
        "multi_match" : {
            "query" : "heursitic reserch",
            "fields": ["phrase","position"],
            "fuzziness": 2
        }
    },
    "size": 2,
    "from":2
}
records = fetch_elastic_data(index_name,query)

df_frame = show_result(records)
df_frame.head(10) # Limit result

Unnamed: 0,id,name,email,gender,ip_address,date_of_birth,company,position,experience,country,phrase,salary
0,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530
1,2,Othilia Cathel,ocathel1@senate.gov,female,3.164.153.228,22/07/1987,Edgepulse,Structural Engineer,11,China,Grass-roots heuristic help-desk,193530


# Compound Queries:
we have seen that we fired single queries, like finding a text match or finding the age ranges, etc. But more often in the real world, we need multiple conditions to be checked and documents to be returned based on that. Also, we might need to modify the relevance or score parameter of the queries or to change the behavior of the individual queries, etc. Compound queries are the queries which help us to achieve the above scenarios. In this section, let us have a look into a few of the most helpful compound queries.

# The Bool Query:
Bool query provides a way to combine multiple queries in a boolean manner. That is for example if we want to retrieve all the documents with the keyword “researcher” in the field “position” and those who have more than 12 years of experience we need to use the combination of the match query and that of the range query. This kind of query can be formulated using the bool query. The bool query has mainly 4 types of occurrences defined:<br>
**must**: The conditions or queries in this must occur in the documents to consider them a match. Also, this contributes to the score value. <br>
**Example**:Eg: if we keep query A and query B in the must section, each document in the result would satisfy both the queries, ie query A AND query B <br>
**should**:The conditions/queries should match. <br>
**Example**:Result = query A OR query B <br>
**filter**: Same a the must clause, but the score will be ignored <br>
**must_not**:The conditions/queries specified must not occur in the documents. Scoring is ignored and kept as 0 as the results are ignored.
**structure**<br>
**{
  "query": {
    "bool" : {
      "must" : [],
      "filter": [],
      "must_not" : [],
      "should" : []
    }
  }
}**
