# Let's tinker with Elasticsearch

<img src="https://static-www.elastic.co/assets/bltfdc1abb6ea9e2157/icon-elasticsearch.svg" style="width: 100px; float: right; drop-shadow: black 10px;"/>

I have [Elasticsearch locally](http://localhost:9200), let's learn a little bit and work with Python. For Python, the `elasticsearch` module is a low-level client, and [`elasticsearch_dsl`](http://elasticsearch-dsl.readthedocs.io) is a higher level thing.

DSL is the Elasticsearch *Domain specific language* (or whatever), the query language. There Elasticsearch API contains all kinds of things for requesting the engine to analyze (to understand how it interprets input), explaining how queries are interpreted and responded to, mappings, NLP at indexing time etc.

In [109]:
import elasticsearch
import elasticsearch_dsl
import json
import pandas as pd

In [12]:
client = elasticsearch.Elasticsearch()
client.info()

{'cluster_name': 'elasticsearch',
 'name': 'Malice',
 'tagline': 'You Know, for Search',
 'version': {'build_date': '2016-06-27T16:23:46.861Z',
  'build_hash': '3f5b994',
  'build_snapshot': False,
  'lucene_version': '6.1.0',
  'number': '5.0.0-alpha4'}}

In [142]:
s = elasticsearch_dsl.Search(using=client)
s.query(query="last_name:Smith")
resp = s.execute()
for h in resp.hits.hits:
    print(h['_source']['last_name'])

Smith
Smith
Fir


An early [sketch for loading an Elasticsearch response to Pandas dataframe](http://stackoverflow.com/questions/25186148/creating-dataframe-from-elasticsearch-results):

In [138]:
#pd.DataFrame.from_dict(resp.hits.hits)
pd.concat(map(pd.DataFrame.from_dict, resp.hits.hits), axis=1)['_source'].T.reset_index(drop=True)

Unnamed: 0,about,age,first_name,interests,last_name
0,I like to collect rock albums,32,Jane,[music],Smith
1,I love to go rock climbing,25,John,"[sports, music]",Smith
2,I like to build cabinets,35,Douglas,[forestry],Fir


Note, we ask for explanations here too.

In [217]:
s = elasticsearch_dsl.Search(using=client, index="megacorp") \
    .query("match", last_name='Smith') \
    .aggs.metric('avg_age', 'avg', field='age')

resp = s.extra(explain=True).execute()

for hit in resp:
    print(hit.first_name, hit.age)

print(resp.aggregations.avg_age.value)

Jane 32
John 25
28.5


Let's look at the rew response, which should now include the explanation too.

In [224]:
resp.to_dict()

{'_shards': {'failed': 0, 'successful': 5, 'total': 5},
 'aggregations': {'avg_age': {'value': 28.5}},
 'hits': {'hits': [{'_explanation': {'description': 'weight(last_name:smith in 0) [PerFieldSimilarity], result of:',
     'details': [{'description': 'score(doc=0,freq=1.0 = termFreq=1.0\n), product of:',
       'details': [{'description': 'idf(docFreq=1, docCount=1)',
         'details': [],
         'value': 0.2876821},
        {'description': 'tfNorm, computed from:',
         'details': [{'description': 'termFreq=1.0',
           'details': [],
           'value': 1.0},
          {'description': 'parameter k1', 'details': [], 'value': 1.2},
          {'description': 'parameter b', 'details': [], 'value': 0.75},
          {'description': 'avgFieldLength', 'details': [], 'value': 1.0},
          {'description': 'fieldLength', 'details': [], 'value': 1.0}],
         'value': 1.0}],
       'value': 0.2876821}],
     'value': 0.2876821},
    '_id': '2',
    '_index': 'megacorp',
    '_

Also the `Search` object is iterable, with the results. Weird.

In [226]:
for t in s:
    print(t.first_name)

Jane
John


The object is serializable like so:

In [227]:
s.to_dict()

{'aggs': {'avg_age': {'avg': {'field': 'age'}}},
 'query': {'match': {'last_name': 'Smith'}}}