# ElasticSearch

* Literature: <https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html>
* In this notebook we follow the steps in this guide
* Java JDK 8 is strongly recommended, so you may need to upgrade your Java
    (on my mac with java 1.6 elasticsearch did not work)

In [8]:
! java -version

java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)


In [None]:
# download elastic search
# https://www.elastic.co/guide/en/elasticsearch/guide/current/_installing_elasticsearch.html

% cd /Applications/
! curl -L -O https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.1.zip
! unzip    elasticsearch-1.7.1.zip


In [2]:
! rm elasticsearch-1.7.1.zip
% cd elasticsearch-1.7.1
!ls -l

/Applications/elasticsearch-1.7.1
total 56
-rw-rw-r--   1 admin  admin  11358 Mar 23 14:00 LICENSE.txt
-rw-rw-r--   1 admin  admin    150 Jun  9 12:31 NOTICE.txt
-rw-rw-r--   1 admin  admin   8700 Jun  9 12:31 README.textile
drwxr-xr-x  12 admin  admin    408 Jul 29 09:56 [34mbin[m[m
drwxr-xr-x   4 admin  admin    136 Jul 29 09:56 [34mconfig[m[m
drwxr-xr-x  26 admin  admin    884 Jul 29 09:56 [34mlib[m[m


In [9]:
# installing marvel
! ./bin/plugin -i elasticsearch/marvel/latest
! echo 'marvel.agent.enabled: false' >> ./config/elasticsearch.yml

-> Installing elasticsearch/marvel/latest...
Trying http://download.elasticsearch.org/elasticsearch/marvel/marvel-latest.zip...
Downloading .....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................DONE
Installed elasticsearch/marvel/latest into /Applications/elasticsearch-1.7.1/plugins/marvel


In [12]:
# running eleastic search
# https://www.elastic.co/guide/en/elasticsearch/guide/current/running-elasticsearch.html
# In a notebook you should add the -d option otherwise you cannot run other cells

!./bin/elasticsearch -d

In [43]:
!curl 'http://localhost:9200/?pretty'

{
  "status" : 200,
  "name" : "Winky Man",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.7.1",
    "build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",
    "build_timestamp" : "2015-07-29T09:54:16Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}


## Shutting down
* <https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-shutdown.html>

In [76]:
!  curl -XPOST 'http://localhost:9200/_shutdown'

{"cluster_name":"elasticsearch","nodes":{"8VID4RMQSzO2ipAFfnzTLQ":{"name":"Winky Man"}}}

In [77]:
# Check whether it worked
!curl 'http://localhost:9200/?pretty'

curl: (7) Failed to connect to localhost port 9200: Connection refused


# We start elastic search again
* Now just follow the guide and learn
* Instead of using the sense plugin or curl, you can talk to elastic search using the python API

In [20]:
!./bin/elasticsearch -d

# Using the Python elastic search api

* Documentation: <https://elasticsearch-py.readthedocs.org/en/master/>

In [26]:
import sys
import json
from elasticsearch import Elasticsearch

HOST = 'http://localhost:9200/'
es = Elasticsearch(hosts=[HOST])


query={
  "query": {
    "match_all": {}
  }
}

es.search(body=query)

{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5},
 u'hits': {u'hits': [{u'_id': u'marvelOpts',
    u'_index': u'.marvel-kibana',
    u'_score': 1.0,
    u'_source': {u'report': True,
     u'status': u'trial',
     u'version': u'1.3.1-b725888'},
    u'_type': u'appdata'}],
  u'max_score': 1.0,
  u'total': 1},
 u'timed_out': False,
 u'took': 1}

In [44]:
# The example from https://www.elastic.co/guide/en/elasticsearch/guide/current/_talking_to_elasticsearch.html
es.count(body=query)

{u'_shards': {u'failed': 0, u'successful': 10, u'total': 10}, u'count': 2}

# Putting information in the DB

* We follow <https://www.elastic.co/guide/en/elasticsearch/guide/current/_indexing_employee_documents.html>

* Notice that the path /megacorp/employee/1 contains three pieces of information:
    * megacorp: The index name
    * employee: The type name
    * 1 : The ID of this particular employee
    
* We use the `es.index` method 

In [37]:
employee1= {
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

es.index(index='megacorp', doc_type='employee', id=1, body=employee1)

{u'_id': u'1',
 u'_index': u'megacorp',
 u'_type': u'employee',
 u'_version': 2,
 u'created': False}

In [39]:
res = es.get(index='megacorp', doc_type='employee', id=1)
print(res['_source'])

{u'interests': [u'sports', u'music'], u'age': 25, u'about': u'I love to go rock climbing', u'last_name': u'Smith', u'first_name': u'John'}


In [42]:
es.indices.refresh(index="megacorp")

res = es.search(index="megacorp", body={"query": {"match_all": {}}})
print("Got %d Hits:" % res['hits']['total'])
for hit in res['hits']['hits']:
    print("%(first_name)s %(last_name)s is  %(age)d years old" % hit["_source"])

Got 1 Hits:
John Smith is  25 years old


In [50]:
# Example from https://www.elastic.co/guide/en/elasticsearch/guide/current/_search_lite.html
# GET /megacorp/employee/_search?q=last_name:Smith
# View the query in sense to see the specific JSON way of writing it

q= {
  "query": {
    "match": {
      "last_name": "smith"
    }
  }
}
res = es.search(index="megacorp", body=q)
res

{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5},
 u'hits': {u'hits': [{u'_id': u'1',
    u'_index': u'megacorp',
    u'_score': 0.30685282,
    u'_source': {u'about': u'I love to go rock climbing',
     u'age': 25,
     u'first_name': u'John',
     u'interests': [u'sports', u'music'],
     u'last_name': u'Smith'},
    u'_type': u'employee'}],
  u'max_score': 0.30685282,
  u'total': 1},
 u'timed_out': False,
 u'took': 2}

In [53]:
# res is a dict
res['hits']['hits']

[{u'_id': u'1',
  u'_index': u'megacorp',
  u'_score': 0.30685282,
  u'_source': {u'about': u'I love to go rock climbing',
   u'age': 25,
   u'first_name': u'John',
   u'interests': [u'sports', u'music'],
   u'last_name': u'Smith'},
  u'_type': u'employee'}]

In [54]:
# score of first hit 
res['hits']['hits'][0]['_score']

0.30685282

# Bulk indexing

If you index a lot of documents you need to use the bulk index methods.

See 
* <https://www.elastic.co/guide/en/elasticsearch/guide/current/bulk.html> for the explanation in the guide
* <http://unroutable.blogspot.nl/2015/03/quick-example-elasticsearch-bulk-index.html> for the Python way

In [65]:
>>> import itertools
>>> import string
>>> from elasticsearch import  helpers
 
>>> # k is a generator expression that produces
... # a series of dictionaries containing test data.
... # The test data are just letter permutations
... # created with itertools.permutations.
... #
... # We then reference k as the iterator that's
... # consumed by the elasticsearch.helpers.bulk method.
>>> k = ({'_type':'foo', '_index':'test2','letters':''.join(letters)}
...      for letters in itertools.permutations(string.letters,2))

>>> # calling k.next() shows examples
... # (while consuming the generator, of course)
>>> # each dict contains a doc type, index, and data (at minimum)
>>> k.next()

{'_index': 'test2', '_type': 'foo', 'letters': 'AB'}

In [73]:
# What is this k generator?

letters=  [letters for letters in itertools.permutations(string.letters,4)]

len(letters),letters[:5]

(6497400,
 [('A', 'B', 'C', 'D'),
  ('A', 'B', 'C', 'E'),
  ('A', 'B', 'C', 'F'),
  ('A', 'B', 'C', 'G'),
  ('A', 'B', 'C', 'H')])

In [66]:
k.next()

{'_index': 'test2', '_type': 'foo', 'letters': 'AC'}

In [64]:
>>> # create our test index
>>> es.indices.create('test2')

{u'acknowledged': True}

In [67]:

>>> helpers.bulk(es,k)

(2650, [])

In [68]:
>>> # check to make sure we got what we expected...
>>> es.count(index='test')

{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5}, u'count': 2651}

# Your turn
* Make quite a bit more documents by changing the 2 in the definition of k to 3, or 4...
* index them again and query, and notice performance
* find out how you can delete an index ;-)