# Introduction to Elasticsearch

First, let's initialize a connection to Elasticsearch. We'll call it `es`. 

In [4]:
from elasticsearch import Elasticsearch

# by default we connect to localhost:9200
es = Elasticsearch()

Next, let's create an index called `test-index`. We can use this for storing some sample documents, which we will later visualize in Kibana.

In [5]:
es.indices.create(index='test-index')

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'test-index'}

Now that we have an index, we can put some documents into the index

In [6]:
from datetime import datetime, timezone
import random
for i in range(0,100):
    doc = {
        "description": "random price data", 
        "timestamp": datetime.now(tz=timezone.utc), 
        'price': random.randint(1,100)
    }
    es.index(index="test-index", doc_type="test-type", id=i, body=doc)

We just generated 100 random price objects. We stored each with a randomized “price” between 1 and 100 and captured the time at which we indexed the object. Note that we also defined the `doc_type` on the fly as we indexed the objects. We called the type test-type. One of the helpful things about ES is that it can infer a schema from objects you provide it. In ES terminology this is called a _mapping_. Note that if your objects schema change, down the road ES can accommodate that as well. Let’s try it just to prove the point:

In [7]:
from datetime import datetime, timezone
import random
coin_types = ['dogecoin','bitcoin','ethereum','litecoin','dash']
for i in range(100,200):
    doc = {
        "description": "random price data", 
        "timestamp": datetime.now(tz=timezone.utc), 
        'price': random.randint(1,100), 
        'coin_type': coin_types[random.randint(0,4)]}
    es.index(index="test-index", doc_type="test-type", id=i, body=doc)

We just created objects that have a new field (`coin_type`) and indexed them to the same type. Be careful! ES handles this gracefully, but it can also cause problems down the road if you expect all objects to have certain fields but they do not. We can compare two objects we indexed to see how they look. Let's grab `id` 42 and 142.

In [13]:
from datetime import datetime, timezone
import random
coin_types = ['dogecoin','bitcoin','ethereum','litecoin','dash']
for i in range(200,300):
    doc = {
        "description": "random price data", 
        "timestamp": datetime.now(tz=timezone.utc), 
        'price': random.randint(1,100), 
        'coin_type': coin_types[random.randint(0,4)]}
    es.index(index="test-index", doc_type="test-type", id=i, body=doc)

RequestError: RequestError(400, 'illegal_argument_exception', 'Rejecting mapping update to [test-index] as the final mapping would have more than 1 type: [test-type3, test-type]')

In [8]:
es.get(index="test-index", doc_type="test-type", id=42)['_source']

{'description': 'random price data',
 'timestamp': '2019-05-21T13:25:55.304941+00:00',
 'price': 56}

In [9]:
es.get(index="test-index", doc_type="test-type", id=142)['_source']

{'description': 'random price data',
 'timestamp': '2019-05-21T13:28:26.029781+00:00',
 'price': 6,
 'coin_type': 'dogecoin'}

As expected, 142 contains `coin_type` and 42 does not. We can run searches against this data using the ES query syntax. Let's try to find prices greater than 90.

In [10]:
query = {
    "query": {
        "range" : {
            "price" : {
                "gt" : 90
            }
        }
    }
}

To actually run the search try:

In [12]:
es.search(body=query, index='test-index')

{'took': 95,
 'timed_out': False,
 '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0},
 'hits': {'total': 14,
  'max_score': 1.0,
  'hits': [{'_index': 'test-index',
    '_type': 'test-type',
    '_id': '52',
    '_score': 1.0,
    '_source': {'description': 'random price data',
     'timestamp': '2019-05-21T13:25:55.373825+00:00',
     'price': 95}},
   {'_index': 'test-index',
    '_type': 'test-type',
    '_id': '110',
    '_score': 1.0,
    '_source': {'description': 'random price data',
     'timestamp': '2019-05-21T13:28:25.836151+00:00',
     'price': 99,
     'coin_type': 'dogecoin'}},
   {'_index': 'test-index',
    '_type': 'test-type',
    '_id': '120',
    '_score': 1.0,
    '_source': {'description': 'random price data',
     'timestamp': '2019-05-21T13:28:25.904006+00:00',
     'price': 97,
     'coin_type': 'ethereum'}},
   {'_index': 'test-index',
    '_type': 'test-type',
    '_id': '30',
    '_score': 1.0,
    '_source': {'description': 'random price 