# Elasticsearch query and filter
By default, Elasticsearch sorts matching search results by relevance score,
which measures how well each document matches a query.
The relevance score is a positive floating point number, returned in the _score metadata field of the search API. The higher the _score, the more relevant the document. While each query type can calculate relevance scores differently, 
score calculation also depends on whether the query clause is run in a query or filter context.

**Note**
Here we are going to use same index **'market_data'** which we have ingested in prevous session.

In [2]:
from elasticsearch import Elasticsearch,ElasticsearchException
host = 'http://localhost:9200/'
elastic_obj = Elasticsearch([host]) # elastci_object
index_name = 'market_data'
if not elastic_obj.ping():
    print("Elasticsearch server is not running")
else:
    print("Elastic search engine is running........")


Elastic search engine is running........


In [3]:
def fetch_elastic_data(query):
    try:
        data = elastic_obj.search(index=index_name, body=query)
    except ElasticsearchException as e:
        print(str(e))
    hits = data['hits']['hits']
    return hits

In [6]:
import pandas as pd
def show_result(elastic_result):
    list_dict = []
    for row in elastic_result:
        data = row['_source']
        list_dict.append(data)
    
    datafram = pd.DataFrame(list_dict)
    return datafram 
    

In [7]:
query = {
  "query": {
    "match": {
      "Item_Type": "Soft Drinks"
    }
  }
}
records = fetch_elastic_data(query)
df_frame = show_result(records)
df_frame.head(10) # Limit result

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Establishment_Year,Outlet_Size,Outlet_Location_Type,Outlet_Type,Item_Outlet_Sales
0,DRE60,6.635,low fat,0.278974,Soft Drinks,225.372,OUT019,1985,Small,Tier 1,Grocery Store,679.116
1,DRJ13,12.65,Low Fat,0.062838,Soft Drinks,161.5578,OUT013,1987,High,Tier 3,Supermarket Type1,2406.867
2,DRH01,17.5,Low Fat,0.097904,Soft Drinks,174.8738,OUT046,1997,Small,Tier 1,Supermarket Type1,2085.2856
3,DRZ11,8.85,Regular,0.113124,Soft Drinks,122.5388,OUT018,2009,Medium,Tier 3,Supermarket Type2,1609.9044
4,DRF49,7.27,Low Fat,0.071078,Soft Drinks,114.2518,OUT046,1997,Small,Tier 1,Supermarket Type1,2618.5914
5,DRK01,7.63,Low Fat,0.061053,Soft Drinks,95.4436,OUT035,2004,Small,Tier 2,Supermarket Type1,1418.154
6,DRH37,17.6,Low Fat,0.041701,Soft Drinks,164.8526,OUT045,2002,Small,Tier 2,Supermarket Type1,2302.3364
7,DRI25,19.6,Low Fat,0.03397,Soft Drinks,55.1614,OUT045,2002,Medium,Tier 2,Supermarket Type1,1381.535
8,DRC01,5.92,Regular,0.019278,Soft Drinks,48.2692,OUT018,2009,Medium,Tier 3,Supermarket Type2,443.4228
9,DRF36,16.1,LF,0.023625,Soft Drinks,189.3846,OUT045,2002,Medium,Tier 2,Supermarket Type1,3630.6074


In [9]:
# All setup upto prevous session, Let;s go ahead

# Query Contex:
In the query context, a query clause answers the question “How well does this document match this query clause?” Besides deciding whether or not the document matches, the query clause also calculates a relevance score in the _score metadata field
# Filter Contex:
In a filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data, e.g.

Does this **timestamp** fall into the range 2015 to 2016?
Is the **status** field set to "Active"?

**Example:**
Let's fetch records where 'Outlet_Size' contain 'Small' and Item_Fat_Content contain 'Item_Fat_Content',filter by establish year


In [12]:
query = {
  "query": { 
    "bool": { 
      "must": [
        { "match": { "Outlet_Size":"Small"  }},
        { "match": { "Item_Fat_Content": "Low fat" }}
      ],
      "filter": [ 
        { "range": { "Outlet_Establishment_Year": { "gte": "2007" }}}
      ]
    }
  }
}
records = fetch_elastic_data(query)
df_frame = show_result(records)
df_frame.head(10) # Limit result

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Establishment_Year,Outlet_Size,Outlet_Location_Type,Outlet_Type,Item_Outlet_Sales
0,FDS52,8.89,low fat,0.005505,Frozen Foods,102.4016,OUT017,2007,Small,Tier 2,Supermarket Type1,2732.4432
1,FDT25,7.5,Low Fat,0.051038,Canned,121.7072,OUT017,2007,Small,Tier 2,Supermarket Type1,3552.7088
2,FDW13,8.5,Low Fat,0.098438,Canned,51.1324,OUT017,2007,Small,Tier 2,Supermarket Type1,259.662
3,FDW11,12.6,Low Fat,0.049058,Breads,62.7194,OUT017,2007,Small,Tier 2,Supermarket Type1,866.8716
4,NCD06,13.0,Low Fat,0.099887,Household,45.906,OUT017,2007,Small,Tier 2,Supermarket Type1,838.908
5,DRJ25,14.6,Low Fat,0.151419,Soft Drinks,50.3692,OUT017,2007,Small,Tier 2,Supermarket Type1,1034.6532
6,FDP25,15.2,Low Fat,0.021327,Canned,216.8824,OUT017,2007,Small,Tier 2,Supermarket Type1,2838.9712
7,FDK44,16.6,Low Fat,0.122919,Fruits and Vegetables,173.0738,OUT017,2007,Small,Tier 2,Supermarket Type1,3823.0236
8,DRA12,11.6,Low Fat,0.041178,Soft Drinks,140.3154,OUT017,2007,Small,Tier 2,Supermarket Type1,2552.6772
9,FDV38,19.25,Low Fat,0.10235,Dairy,52.7956,OUT017,2007,Small,Tier 2,Supermarket Type1,928.1252


**Note:**
The **query** parameter indicates query context.
The **bool** and two **match** clauses are used in query context, which means that they are used to score how well each document matches.
The **filte**r parameter indicates filter context. Its **range** clauses is used in filter context. They will filter out documents which do not match, but they will not affect the score for matching documents.

# Boolean Queries:
A query that matches documents matching boolean combinations of other queries. The bool query maps to Lucene BooleanQuery. It is built using one or more boolean clauses, each clause with a typed occurrence. 
The occurrence types are<br>:
**must** : The clause (query) must appear in matching documents and will contribute to the score<br>.
**filter** : The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored. Filter clauses are executed in filter context, meaning that scoring is ignored and clauses are considered for caching<br>.
**should** : The clause (query) should appear in the matching document<br>.
**must_not** : The clause (query) must not appear in the matching documents. Clauses are executed in filter context meaning that scoring is ignored and clauses are considered for caching. Because scoring is ignored, a score of 0 for all documents is returned.
