## Documentation

To read more about the search API, visit the docs [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html) and [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html).

![search_api_docs](../images/search_api_docs.png)

## Connect to ElasticSearch

In [1]:
from pprint import pprint
from elasticsearch import Elasticsearch

es = Elasticsearch('http://localhost:9200')
client_info = es.info()
print('Connected to Elasticsearch!')
pprint(client_info.body)

Connected to Elasticsearch!
{'cluster_name': 'docker-cluster',
 'cluster_uuid': 'iNEgsrfzSs-A5IWMvnKk8w',
 'name': '5af1aab6c380',
 'tagline': 'You Know, for Search',
 'version': {'build_date': '2024-08-05T10:05:34.233336849Z',
             'build_flavor': 'default',
             'build_hash': '1a77947f34deddb41af25e6f0ddb8e830159c179',
             'build_snapshot': False,
             'build_type': 'docker',
             'lucene_version': '9.11.1',
             'minimum_index_compatibility_version': '7.0.0',
             'minimum_wire_compatibility_version': '7.17.0',
             'number': '8.15.0'}}


## Inserting documents

In [2]:
es.indices.delete(index='index_1', ignore_unavailable=True)
es.indices.create(index='index_1')

es.indices.delete(index='index_2', ignore_unavailable=True)
es.indices.create(index='index_2')

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'index_2'})

Let's index the documents sequentially in both indices.

In [3]:
import json
from tqdm import tqdm # type: ignore


dummy_data = json.load(open("../data/dummy_data.json"))
for document in tqdm(dummy_data, total=len(dummy_data)):
    response = es.index(index='index_1', body=document)

for document in tqdm(dummy_data, total=len(dummy_data)):
    response = es.index(index='index_2', body=document)

100%|██████████| 3/3 [00:00<00:00, 93.06it/s]
100%|██████████| 3/3 [00:00<00:00, 139.96it/s]


## Searching

We can provide the `index` argument one index at a time.

In [4]:
response = es.search(
    index='index_1',
    body={
        "query": {"match_all": {}}
    }
)

n_hits = response['hits']['total']['value']
print(f"Found {n_hits} documents in index_1")

Found 3 documents in index_1


In [5]:
response['hits']

{'total': {'value': 3, 'relation': 'eq'},
 'max_score': 1.0,
 'hits': [{'_index': 'index_1',
   '_id': 'qEbGL5UB6odEtf1M_EnP',
   '_score': 1.0,
   '_source': {'title': 'Sample Title 1',
    'text': 'This is the first sample document text.',
    'created_on': '2024-09-22'}},
  {'_index': 'index_1',
   '_id': 'qUbGL5UB6odEtf1M_Enn',
   '_score': 1.0,
   '_source': {'title': 'Sample Title 2',
    'text': 'Here is another example of a document.',
    'created_on': '2024-09-24'}},
  {'_index': 'index_1',
   '_id': 'qkbGL5UB6odEtf1M_Enq',
   '_score': 1.0,
   '_source': {'title': 'Sample Title 3',
    'text': 'The content of the third document goes here.',
    'created_on': '2024-09-24'}}]}

In [6]:
response = es.search(
    index='index_2',
    body={
        "query": {"match_all": {}}
    }
)

n_hits = response['hits']['total']['value']
print(f"Found {n_hits} documents in index_2")

Found 3 documents in index_2


Or we can provide the `index` argument multiple indices at once.

In [7]:
response = es.search(
    index='index_1,index_2',
    body={
        "query": {"match_all": {}}
    }
)

n_hits = response['hits']['total']['value']
print(f"Found {n_hits} documents in index_1 and index_2")

Found 6 documents in index_1 and index_2


We can also use wildcards `*` to match multiple indices without listing them individually, such as `"index*"`.

In [8]:
response = es.search(
    index='index*',
    body={
        "query": {"match_all": {}}
    }
)

n_hits = response['hits']['total']['value']
print(f"Found {n_hits} documents in all indexes with name starting with 'index'")

Found 6 documents in all indexes with name starting with 'index'


Or, to search all indices, we use `_all`.

In [9]:
response = es.search(
    index='_all',
    body={
        "query": {"match_all": {}}
    }
)

n_hits = response['hits']['total']['value']
print(f"Found {n_hits} documents in all indexes")

Found 20 documents in all indexes
