# The heart of the Elastic Stack {background-color="white" background-image="images/heart.jpeg" background-size="50%" background-opacity="1" background-position="top"}

## Elasticsearch


:::: {.columns}

::: {.fragment .column width="50%"}
![](https://static-www.elastic.co/v3/assets/bltefdd0b53724fa2ce/blt36f2da8d650732a0/5d0823c3d8ff351753cbc99f/logo-elasticsearch-32-color.svg)

Elasticsearch is

- an open source distributed
- RESTful search
- analytics engine
- scalable data store
- vector database
- capable of addressing a growing number of use cases. 
::: 

::: {.fragment .column width="50%"}
As the heart of the Elastic Stack, it centrally stores your data for 

- lightning-fast search
- fine‑tuned relevancy
- powerful analytics
- that scale with ease.
:::
::::



## Elastic{background-color="#153385" background-image="https://static-www.elastic.co/v3/assets/bltefdd0b53724fa2ce/bltb0c1a82d7358d31c/664385d38a9705f7b814ebe6/illustration-hero-platform-search-security-observability-2x.png" background-size="100%" background-opacity="1"}

<https://elastic.co>

## Origin

:::: {.columns}

::: {.fragment .column width="35%"}
It started with a recipe app...

![](https://i.imgflip.com/41pcjq.jpg)
[NicsMeme](https://imgflip.com/i/41pcjq)
[Tears and Sweet](https://dictionary.cambridge.org/dictionary/english/blood-sweat-and-tears)
::: 

::: {.fragment .column width="65%"}
- In a London apartment, Shay Banon was looking for a job while his wife attended cooking school at Le Cordon Bleu. In his spare time, he started building a search engine for her growing list of recipes.
- His first iteration was called Compass. The second was Elasticsearch (with [Apache Lucene](https://lucene.apache.org/) under the hood). He open sourced Elasticsearch, created the #elasticsearch IRC channel, and waited for users to appear.
- The response was impressive. Users took to it naturally and easily. Adoption went through the roof, a community started to form, and people noticed — namely Steven Schuurman, Uri Boness, and Simon Willnauer. Together, they founded a search company.
:::
::::


## Still number 1 in search engine

:::: {.columns}


  
::: {.fragment .column width="50%"}

![](images/ES-Rank2024.png) 

<https://db-engines.com/en/ranking/search+engine>

and 7th in the global list <https://db-engines.com/en/ranking>

::: 

::: {.fragment .column width="50%"}
![](https://static.fanpage.it/wp-content/uploads/sites/34/2021/12/viva-ledilizia-1200x900.jpg)
:::
::::


# Characteristics

## Query & Analyze
Ask your data questions of all kinds

:::: {.columns}

::: {.fragment .column width="50%"}
### Search your way
![](https://images.contentstack.io/v3/assets/bltefdd0b53724fa2ce/blt5cfd03e50d91fe0f/5e6157c725d22d7db56a574d/icon-search-ui-32-color.svg)
Elasticsearch lets you perform and combine many types of searches 
- structured
- unstructured
- geo
- metric 
- from a piped query language. 
::: 

::: {.fragment .column width="50%"}
### Analyze at scale
![](https://images.contentstack.io/v3/assets/bltefdd0b53724fa2ce/blta28bdc6896c47075/5d0ca0cd77f34fd55839aeae/icon-scale-32-color.svg)
It's one thing to find the 10 best documents to match your query. 

But how do you make sense of, say, a billion log lines? 

Elasticsearch aggregations let you zoom out to explore trends and patterns in your data.
:::
::::

## Speed
Elasticsearch is fast. Really, really fast.

:::: {.columns}

::: {.fragment .column width="33%"}
### Rapid results
When you get answers instantly, your relationship with your data changes. 

You can afford to iterate and cover more ground
::: 

::: {.fragment .column width="33%"}
### Powerful design
Being this fast isn't easy. 

We've implemented:

- [inverted indices](https://en.wikipedia.org/wiki/Inverted_index) with [finite state transducers](https://en.wikipedia.org/wiki/Finite-state_transducer) for full-text querying
- [BKD trees](https://en.wikipedia.org/wiki/K-D-B-tree) for storing numeric and geo data, 
- a [column store](https://www.elastic.co/blog/elasticsearch-as-a-column-store) for analytics.
:::

::: {.fragment .column width="33%"}
### All-inclusive
And since everything is indexed, you're never left with index envy. 

You can leverage and access all of your data at ludicrously awesome speeds.
::: 

::::

## Scalability

:::: {.columns}

::: {.fragment .column width="50%"}
**Run it on your laptop. Or hundreds of servers with petabytes of data.** 

Go from prototype to production seamlessly; you talk to Elasticsearch running on a single node the same way you would in a 300-node cluster.

It scales horizontally to handle kajillions of events per second, while automatically managing how indices and queries are distributed across the cluster for oh-so-smooth operations.
::: 

::: {.fragment .column width="50%"}
![](https://images.contentstack.io/v3/assets/bltefdd0b53724fa2ce/blt8c328002d82e303e/5d0d573477f34fd55839b61f/illustration-elasticsearch-scalability-555.png)
:::
::::


## FLEXIBILITY

:::: {.columns}

::: {.fragment .column width="50%"}

Store and explore data to fit your needs

Data is constantly evolving, and it can become expensive to store and search all of it. With Elasticsearch you can balance performance and cost. Store data locally for fast queries or [remotely on low-cost S3](https://www.elastic.co/elasticsearch/elasticsearch-searchable-snapshots) for unlimited data. With [runtime fields](https://www.elastic.co/elasticsearch/elasticsearch-runtime-fields), you can also quickly onboard your data and adapt to changes.
::: 

::: {.fragment .column width="50%"}
![](https://static-www.elastic.co/v3/assets/bltefdd0b53724fa2ce/blt50671ed5bbad50f3/608fbbdc2d1d221032193ff1/illustration-balance-cost.png)
:::
::::


## Resiliency
:::: {.columns}

::: {.fragment .column width="50%"}
**We cover the bases while you swing for the fences.** 

Hardware rebels. Networks partition. 

Elasticsearch detects failures to keep your cluster (and your data) safe and available. 

With cross-cluster replication, a secondary cluster can spring into action as a hot backup. 

Elasticsearch operates in a distributed environment designed from the ground up for perpetual peace of mind.
::: 

::: {.fragment .column width="50%"}
![](https://images.contentstack.io/v3/assets/bltefdd0b53724fa2ce/blt61799e12d10f4581/5e6158f8dc0f1706df255d1c/illustration-elasticsearch-resiliency-555.png)
:::
::::

## Use Cases
Numbers, text, geo, structured, unstructured. All data types are welcome. 

Full-text search just scratches the surface of how companies around the world are relying on Elasticsearch to solve a variety of challenges. 
![](images/ES-apps.png)

## Features
![](images/ES-features.png)
https://www.elastic.co/elasticsearch/features

## Client Libraries
- Interact with Elasticsearch in the programming language you choose
- Elasticsearch uses standard RESTful APIs and JSON.
- We also build and maintain clients in many languages such as Java, Python, .NET, SQL, and PHP. Plus, our community has contributed many more. They're easy to work with, feel natural to use, and, just like Elasticsearch, don't limit what you might want to do with them. 

## Let's start using it
<https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker>


::: {.fragment}
Some highlights:

- Use bind volumes for data, config, logs
- Manually set the heap size 
::: 

::: {.fragment}
```bash
docker run --name elasticsearch  -p 9200:9200 --rm -it -m 1GB \
-e "discovery.type=single-node" -e "xpack.security.enabled=false" \ 
docker.elastic.co/elasticsearch/elasticsearch:8.17.0
```

Open
<http://192.168.45.65:9200/>
:::



## Access to Elastic using REST 

Using request
<https://requests.readthedocs.io/en/master/>

In [1]:
# Install
!pip install requests 

Defaulting to user installation because normal site-packages is not writeable


In [2]:
# Import
import requests

## Add records

In [4]:
# Single record
doc = """{
  "name": "Salvo Nicotra"
}"""

url="http://localhost:9200/tap/_doc/1"
headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}

r = requests.post(url,data=doc, headers=headers)
r.status_code

201

In [5]:
r.text

'{"_index":"tap","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}'

In [6]:
# And get it back
url="http://localhost:9200/tap/_doc/1"
r = requests.get(url)
r.json()


{'_index': 'tap',
 '_id': '1',
 '_version': 1,
 '_seq_no': 0,
 '_primary_term': 1,
 'found': True,
 '_source': {'name': 'Salvo Nicotra'}}

## Add multiple data 

Bulk Upload allows to upload multiple records, usually batch with 1000 are a good starting point

```json
{
  "account_number": 1,
  "balance": 39225,
  "firstname": "Amber",
  "lastname": "Duke",
  "age": 32,
  "gender": "M",
  "address": "880 Holmes Lane",
  "employer": "Pyrami",
  "email": "amberduke@pyrami.com",
  "city": "Brogan",
  "state": "IL"
}
```
[Accounts.json](https://github.com/linuxacademy/content-elasticsearch-deep-dive/blob/master/sample_data/accounts.json)

## Download data

In [None]:
%%bash
wget https://raw.githubusercontent.com/linuxacademy/content-elasticsearch-deep-dive/master/sample_data/accounts.json
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"

--2024-12-19 21:11:53--  https://raw.githubusercontent.com/linuxacademy/content-elasticsearch-deep-dive/master/sample_data/accounts.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 244848 (239K) [text/plain]
Saving to: ‘accounts.json’

     0K .......... .......... .......... .......... .......... 20% 2.33M 0s
    50K .......... .......... .......... .......... .......... 41% 2.98M 0s
   100K .......... .......... .......... .......... .......... 62% 9.65M 0s
   150K .......... .......... .......... .......... .......... 83% 12.8M 0s
   200K .......... .......... .......... .........            100% 4.03M=0.06s

2024-12-19 21:11:53 (4.19 MB/s) - ‘accounts.json’ saved [244848/244848]

  % Total    % Received % Xferd  Average Speed   Time    Time     Time 

{
  "errors" : false,
  "took" : 600,
  "items" : [
    {
      "index" : {
        "_index" : "bank",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "forced_refresh" : true,
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "bank",
        "_id" : "6",
        "_version" : 1,
        "result" : "created",
        "forced_refresh" : true,
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "bank",
        "_id" : "13",
        "_version" : 1,
        "result" : "created",
        "forced_refresh" : true,
        "_shards" : {
          "total" : 2,
          "successful" : 1,
   

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 253,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "bank",
        "_id" : "273",
        "_version" : 1,
        "result" : "created",
        "forced_refresh" : true,
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 254,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "bank",
        "_id" : "278",
        "_version" : 1,
        "result" : "created",
        "forced_refresh" : true,
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 255,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "bank",
        "_id" : "280",
       

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



        "_version" : 1,
        "result" : "created",
        "forced_refresh" : true,
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 498,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "bank",
        "_id" : "496",
        "_version" : 1,
        "result" : "created",
        "forced_refresh" : true,
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 499,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "bank",
        "_id" : "504",
        "_version" : 1,
        "result" : "created",
        "forced_refresh" : true,
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 500,
        "_primary_term" : 1,
        "status" : 201
      }
 

In [11]:
# Index Status
r = requests.get('http://localhost:9200/_cat/indices?v')
r.text

'health status index uuid                   pri rep docs.count docs.deleted store.size pri.store.size dataset.size\nyellow open   bank  gAT9gVxfQGGxMhGPPAjttw   1   1       1000            0    372.8kb        372.8kb      372.8kb\nyellow open   tap   HBdOyHeDQ8WbB94A6kgdGQ   1   1          1            0      4.7kb          4.7kb        4.7kb\n'

## Search 
Once you have ingested some data into an Elasticsearch index, you can search it by sending requests to the _search endpoint. To access the full suite of search capabilities, you use the Elasticsearch Query DSL to specify the search criteria in the request body. You specify the name of the index you want to search in the request URI.

For example, the following request retrieves all documents in the bank index sorted by account number:

In [8]:
data = """{
  "query": { "match_all": {} },
  "sort": [
    { "balance": "desc" }
  ]
}"""

url="http://localhost:9200/bank/_search?pretty"

headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}

r = requests.post(url,data=data, headers=headers)

r.json()

{'took': 138,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1000, 'relation': 'eq'},
  'max_score': None,
  'hits': [{'_index': 'bank',
    '_id': '248',
    '_score': None,
    '_source': {'account_number': 248,
     'balance': 49989,
     'firstname': 'West',
     'lastname': 'England',
     'age': 36,
     'gender': 'M',
     'address': '717 Hendrickson Place',
     'employer': 'Obliq',
     'email': 'westengland@obliq.com',
     'city': 'Maury',
     'state': 'WA'},
    'sort': [49989]},
   {'_index': 'bank',
    '_id': '854',
    '_score': None,
    '_source': {'account_number': 854,
     'balance': 49795,
     'firstname': 'Jimenez',
     'lastname': 'Barry',
     'age': 25,
     'gender': 'F',
     'address': '603 Cooper Street',
     'employer': 'Verton',
     'email': 'jimenezbarry@verton.com',
     'city': 'Moscow',
     'state': 'AL'},
    'sort': [49795]},
   {'_index': 'bank',
    '_id': '240',
    '

## Search String

To search for specific terms within a field, you can use a match query. For example, the following request searches the address field to find customers whose addresses contain mill or lane:

In [14]:
data = """{
  "query": { "match": { "address": "Mill Lane" } }
}"""

url="http://localhost:9200/bank/_search?pretty"

headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}

r = requests.post(url,data=data, headers=headers)
r.json()

{'took': 29,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 19, 'relation': 'eq'},
  'max_score': 9.507477,
  'hits': [{'_index': 'bank',
    '_id': '136',
    '_score': 9.507477,
    '_source': {'account_number': 136,
     'balance': 45801,
     'firstname': 'Winnie',
     'lastname': 'Holland',
     'age': 38,
     'gender': 'M',
     'address': '198 Mill Lane',
     'employer': 'Neteria',
     'email': 'winnieholland@neteria.com',
     'city': 'Urie',
     'state': 'IL'}},
   {'_index': 'bank',
    '_id': '970',
    '_score': 5.4032025,
    '_source': {'account_number': 970,
     'balance': 19648,
     'firstname': 'Forbes',
     'lastname': 'Wallace',
     'age': 28,
     'gender': 'M',
     'address': '990 Mill Road',
     'employer': 'Pheast',
     'email': 'forbeswallace@pheast.com',
     'city': 'Lopezo',
     'state': 'AK'}},
   {'_index': 'bank',
    '_id': '345',
    '_score': 5.4032025,
    '_source': 

# Search Exact

To perform a phrase search rather than matching individual terms, you use match_phrase instead of match. For example, the following request only matches addresses that contain the phrase mill lane:

In [8]:
data = """{
  "query": { "match_phrase": { "address": "mill Lane" } }
}"""

url="http://localhost:9200/bank/_search?pretty"

headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}

r = requests.post(url,data=data, headers=headers)
r.json()

{'took': 97,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1, 'relation': 'eq'},
  'max_score': 9.507477,
  'hits': [{'_index': 'bank',
    '_id': '136',
    '_score': 9.507477,
    '_source': {'account_number': 136,
     'balance': 45801,
     'firstname': 'Winnie',
     'lastname': 'Holland',
     'age': 38,
     'gender': 'M',
     'address': '198 Mill Lane',
     'employer': 'Neteria',
     'email': 'winnieholland@neteria.com',
     'city': 'Urie',
     'state': 'IL'}}]}}

## Combine Multiple Query 

To construct more complex queries, you can use a bool query to combine multiple query criteria. You can designate criteria as required (must match), desirable (should match), or undesirable (must not match).

For example, the following request searches the bank index for accounts that belong to customers who are 40 years old, but excludes anyone who lives in Idaho (ID)

In [15]:
data = """{

  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } },
        { "match": { "gender": "F" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}"""

url="http://localhost:9200/bank/_search?pretty"

headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}

r = requests.post(url,data=data, headers=headers)
r.json()

{'took': 17,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 20, 'relation': 'eq'},
  'max_score': 1.707232,
  'hits': [{'_index': 'bank',
    '_id': '474',
    '_score': 1.707232,
    '_source': {'account_number': 474,
     'balance': 35896,
     'firstname': 'Obrien',
     'lastname': 'Walton',
     'age': 40,
     'gender': 'F',
     'address': '192 Ide Court',
     'employer': 'Suremax',
     'email': 'obrienwalton@suremax.com',
     'city': 'Crucible',
     'state': 'UT'}},
   {'_index': 'bank',
    '_id': '878',
    '_score': 1.707232,
    '_source': {'account_number': 878,
     'balance': 49159,
     'firstname': 'Battle',
     'lastname': 'Blackburn',
     'age': 40,
     'gender': 'F',
     'address': '234 Hendrix Street',
     'employer': 'Zilphur',
     'email': 'battleblackburn@zilphur.com',
     'city': 'Wanamie',
     'state': 'PA'}},
   {'_index': 'bank',
    '_id': '885',
    '_score': 1.707232,
   

## Filters

Each must, should, and must_not element in a Boolean query is referred to as a query clause. 

How well a document meets the criteria in each must or should clause contributes to the document’s relevance score. The higher the score, the better the document matches your search criteria. 

By default, Elasticsearch returns documents ranked by these relevance scores.

The criteria in a must_not clause is treated as a filter. It affects whether or not the document is included in the results, but does not contribute to how documents are scored. You can also explicitly specify arbitrary filters to include or exclude documents based on structured data.

For example, the following request uses a range filter to limit the results to accounts with a balance between 20,000 and 30,000 (inclusive).

In [16]:
data = """{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}"""

url="http://localhost:9200/bank/_search?pretty"

headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}

r = requests.post(url,data=data, headers=headers)
r.json()

{'took': 16,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 217, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'bank',
    '_id': '49',
    '_score': 1.0,
    '_source': {'account_number': 49,
     'balance': 29104,
     'firstname': 'Fulton',
     'lastname': 'Holt',
     'age': 23,
     'gender': 'F',
     'address': '451 Humboldt Street',
     'employer': 'Anocha',
     'email': 'fultonholt@anocha.com',
     'city': 'Sunriver',
     'state': 'RI'}},
   {'_index': 'bank',
    '_id': '102',
    '_score': 1.0,
    '_source': {'account_number': 102,
     'balance': 29712,
     'firstname': 'Dena',
     'lastname': 'Olson',
     'age': 27,
     'gender': 'F',
     'address': '759 Newkirk Avenue',
     'employer': 'Hinway',
     'email': 'denaolson@hinway.com',
     'city': 'Choctaw',
     'state': 'NJ'}},
   {'_index': 'bank',
    '_id': '133',
    '_score': 1.0,
    '_source': {'account_number': 133,

## Analyze results with aggregations
Elasticsearch aggregations enable you to get meta-information about your search results and answer questions like, "How many account holders are in Texas?" or "What’s the average balance of accounts in Tennessee?" You can search documents, filter hits, and use aggregations to analyze the results all in one request.

For example, the following request uses a terms aggregation to group all of the accounts in the bank index by state, and returns the ten states with the most accounts in descending order:

In [9]:
data = """{
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}"""

url="http://localhost:9200/bank/_search?pretty"

headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}

r = requests.post(url,data=data, headers=headers)
r.json()

{'took': 68,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1000, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'bank',
    '_id': '1',
    '_score': 1.0,
    '_source': {'account_number': 1,
     'balance': 39225,
     'firstname': 'Amber',
     'lastname': 'Duke',
     'age': 32,
     'gender': 'M',
     'address': '880 Holmes Lane',
     'employer': 'Pyrami',
     'email': 'amberduke@pyrami.com',
     'city': 'Brogan',
     'state': 'IL'}},
   {'_index': 'bank',
    '_id': '6',
    '_score': 1.0,
    '_source': {'account_number': 6,
     'balance': 5686,
     'firstname': 'Hattie',
     'lastname': 'Bond',
     'age': 36,
     'gender': 'M',
     'address': '671 Bristol Street',
     'employer': 'Netagy',
     'email': 'hattiebond@netagy.com',
     'city': 'Dante',
     'state': 'TN'}},
   {'_index': 'bank',
    '_id': '13',
    '_score': 1.0,
    '_source': {'account_number': 13,
     'balance':

You can combine aggregations to build more complex summaries of your data. For example, the following request nests an avg aggregation within the previous group_by_state aggregation to calculate the average account balances for each state.

In [18]:
data = """{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}"""

url="http://localhost:9200/bank/_search?pretty"

headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}

r = requests.post(url,data=data, headers=headers)
r.json()

{'took': 38,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1000, 'relation': 'eq'},
  'max_score': None,
  'hits': []},
 'aggregations': {'group_by_state': {'doc_count_error_upper_bound': 0,
   'sum_other_doc_count': 743,
   'buckets': [{'key': 'TX',
     'doc_count': 30,
     'average_balance': {'value': 26073.3}},
    {'key': 'MD',
     'doc_count': 28,
     'average_balance': {'value': 26161.535714285714}},
    {'key': 'ID',
     'doc_count': 27,
     'average_balance': {'value': 24368.777777777777}},
    {'key': 'AL', 'doc_count': 25, 'average_balance': {'value': 25739.56}},
    {'key': 'ME', 'doc_count': 25, 'average_balance': {'value': 21663.0}},
    {'key': 'TN', 'doc_count': 25, 'average_balance': {'value': 28365.4}},
    {'key': 'WY', 'doc_count': 25, 'average_balance': {'value': 21731.52}},
    {'key': 'DC',
     'doc_count': 24,
     'average_balance': {'value': 23180.583333333332}},
    {'key': 'MA',


Instead of sorting the results by count, you could sort using the result of the nested aggregation by specifying the order within the terms aggregation:

In [19]:
data = """{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}"""

url="http://localhost:9200/bank/_search?pretty"

headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}

r = requests.post(url,data=data, headers=headers)
r.json()

{'took': 17,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1000, 'relation': 'eq'},
  'max_score': None,
  'hits': []},
 'aggregations': {'group_by_state': {'doc_count_error_upper_bound': -1,
   'sum_other_doc_count': 827,
   'buckets': [{'key': 'CO',
     'doc_count': 14,
     'average_balance': {'value': 32460.35714285714}},
    {'key': 'NE', 'doc_count': 16, 'average_balance': {'value': 32041.5625}},
    {'key': 'AZ',
     'doc_count': 14,
     'average_balance': {'value': 31634.785714285714}},
    {'key': 'MT',
     'doc_count': 17,
     'average_balance': {'value': 31147.41176470588}},
    {'key': 'VA', 'doc_count': 16, 'average_balance': {'value': 30600.0625}},
    {'key': 'GA', 'doc_count': 19, 'average_balance': {'value': 30089.0}},
    {'key': 'MA',
     'doc_count': 24,
     'average_balance': {'value': 29600.333333333332}},
    {'key': 'IL',
     'doc_count': 22,
     'average_balance': {'value': 29489

## Client Library
https://elasticsearch-py.readthedocs.io/en/v8.2.0/

### Install

In [20]:
!pip install elasticsearch

Defaulting to user installation because normal site-packages is not writeable
Collecting elasticsearch
  Downloading elasticsearch-8.13.2-py3-none-any.whl (478 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m478.0/478.0 KB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting elastic-transport<9,>=8.13
  Downloading elastic_transport-8.13.0-py3-none-any.whl (64 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.3/64.3 KB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: elastic-transport, elasticsearch
Successfully installed elastic-transport-8.13.0 elasticsearch-8.13.2


### Import

In [20]:
from elasticsearch import Elasticsearch

In [21]:
# by default we connect to localhost:9200
es = Elasticsearch("http://localhost:9200")
es

<Elasticsearch(['http://localhost:9200'])>

In [22]:
es.get(index="bank", id=1)

ObjectApiResponse({'_index': 'bank', '_id': '1', '_version': 1, '_seq_no': 0, '_primary_term': 1, 'found': True, '_source': {'account_number': 1, 'balance': 39225, 'firstname': 'Amber', 'lastname': 'Duke', 'age': 32, 'gender': 'M', 'address': '880 Holmes Lane', 'employer': 'Pyrami', 'email': 'amberduke@pyrami.com', 'city': 'Brogan', 'state': 'IL'}})

In [23]:
# New in ES 8
es.search(index="bank", query={"match": {'firstname':'Aurelia'}})

ObjectApiResponse({'took': 6, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 1, 'relation': 'eq'}, 'max_score': 6.5032897, 'hits': [{'_index': 'bank', '_id': '44', '_score': 6.5032897, '_source': {'account_number': 44, 'balance': 34487, 'firstname': 'Aurelia', 'lastname': 'Harding', 'age': 37, 'gender': 'M', 'address': '502 Baycliff Terrace', 'employer': 'Orbalix', 'email': 'aureliaharding@orbalix.com', 'city': 'Yardville', 'state': 'DE'}}]}})

## Some more data
https://ikeptwalking.com/elasticsearch-sample-data/


In [None]:
%%bash

curl -H "Content-Type: application/json" -XPOST "localhost:9200/companydatabase/_bulk?pretty&refresh" --data-binary "@/Users/nics/Downloads/Employees100K.json"

Do a little trick...
Replace string ,"_type":"employees" -> ""

In [24]:
# http://localhost:9200/companydatabase/_count?pretty=true
#Interests':'Writing','writing.Songs
es.search(index="companydatabase", query={"match": {'Interests':'writing'}})

ObjectApiResponse({'took': 14, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 3835, 'relation': 'eq'}, 'max_score': 5.056841, 'hits': [{'_index': 'companydatabase', '_id': 'TWeev48BizKIjV8lRjxa', '_score': 5.056841, '_source': {'FirstName': 'WYATT', 'LastName': 'FETHEROLF', 'Designation': 'Senior Software Engineer', 'Salary': '69000', 'DateOfJoining': '2016-01-07', 'Address': '437 Edgewood Lane Garden City, NY 11530', 'Gender': 'Male', 'Age': 34, 'MaritalStatus': 'Unmarried', 'Interests': 'Writing Music,Writing Songs'}}, {'_index': 'companydatabase', '_id': '5Giev48BizKIjV8lRjTt', '_score': 5.056841, '_source': {'FirstName': 'MECHELLE', 'LastName': 'FRAUTSCHI', 'Designation': 'Trainee', 'Salary': '35000', 'DateOfJoining': '2016-10-24', 'Address': '814 Beaver Ridge St. Central Islip, NY 11722', 'Gender': 'Female', 'Age': 24, 'MaritalStatus': 'Unmarried', 'Interests': 'Taxidermy,Writing,Writing Songs'}}, {'_index': 'co