# Elastic System Setup
##### This note shows how to setup the following...

### 1. ElasticSearch Container (Docker)
### 2. Kibana Container (Docker)
### 3. Random Data CSV
### 4. Index CSV - ElasticSearch
### 5. Query Test  

<img src='elastic.png' width='550'/>

# 1. ElasticSearch Microservice w/ Docker

In [165]:
elastic_version    = '6.6.2'
h_elastic_rest_port  = '9200'
c_elastic_rest_port  = '9200'
h_elastic_node_comms = '9300'
c_elastic_node_comms = '9300'

!docker run -d                                                \
            -p {h_elastic_rest_port}:{c_elastic_rest_port}    \
            -p {h_elastic_node_comms}:{c_elastic_node_comms}  \
            -e "discovery.type=single-node"                   \
            -it                                               \
            -h elasticsearch                                  \
            --name elasticsearch                              \
            elasticsearch:{elastic_version}

docker: Error response from daemon: Conflict. The container name "/elasticsearch" is already in use by container "5e64c764824b94111d7e0de1cb4f099eac26ab15e80cdcf82592922626fc692d". You have to remove (or rename) that container to be able to reuse that name.
See 'docker run --help'.


In [76]:
!docker ps

CONTAINER ID        IMAGE                 COMMAND                  CREATED             STATUS              PORTS                                            NAMES
5e64c764824b        elasticsearch:6.6.2   "/usr/local/bin/dock…"   5 minutes ago       Up 5 minutes        0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp   elasticsearch


### Verify ElasticSearch

http://localhost:9200/?pretty=true

http://localhost:9200/_aliases?pretty=true

<img src='kibana.png' width='550'/>

# 1. Kibana Microservice w/ Docker

In [77]:
h_kibana_port        = '5601'
c_kibana_port        = '5601'
!docker run -d                                  \
            -p {h_kibana_port}:{c_kibana_port}  \
            -h kibana                           \
            --name kibana                       \
            --link elasticsearch:elasticsearch  \
            kibana:{elastic_version}

Unable to find image 'kibana:6.6.2' locally
6.6.2: Pulling from library/kibana

[1B4930cb5d: Already exists 
[1Be1d65b99: Pulling fs layer 
[1Bb0dccaae: Pulling fs layer 
[1Bee7bbfdf: Pulling fs layer 
[1B69d7f611: Pulling fs layer 
[1B9d3a3f3d: Pulling fs layer 
[1B0f99056a: Pulling fs layer 
[1B4b1fd541: Pulling fs layer 
[1BDigest: sha256:80e6f3b9ad20ce9d7a48c6c72828bc5b00369d77fa8208ed4bae1b9c8dc6e1ef[2K[5A[2K[8A[2K[6A[2K[8A[2K[6A[2K[8A[2K[8A[2K[4A[2K[8A[2K[6A[2K[8A[2K[6A[2K[8A[2K[6A[2K[8A[2K[8A[2K[8A[2K[6A[2K[8A[2K[2A[2K[8A[2K[6A[2K[8A[2K[6A[2K[8A[2K[1A[2K[6A[2K[6A[2K[6A[2K[8A[2K[6A[2K[6A[2K[6A[2K[8A[2K[6A[2K[8A[2K[8A[2K[6A[2K[8A[2K[8A[2K[6A[2K[6A[2K[6A[2K[7A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6

In [78]:
!docker ps

CONTAINER ID        IMAGE                 COMMAND                  CREATED             STATUS              PORTS                                            NAMES
0c598d4ce7ac        kibana:6.6.2          "/usr/local/bin/kiba…"   21 seconds ago      Up 21 seconds       0.0.0.0:5601->5601/tcp                           kibana
5e64c764824b        elasticsearch:6.6.2   "/usr/local/bin/dock…"   7 minutes ago       Up 7 minutes        0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp   elasticsearch


### Verify Kibana 
http://localhost:5601

<img src='travel.png' />

### Random Data >> CSV

In [None]:
!rm -rf people.csv

In [93]:
# inspired by: https://stackoverflow.com/questions/553303/generate-a-random-date-between-two-other-dates

def random_date_epoch(start, end, prop, format = '%m/%d/%Y'):
    import time
    stime = time.mktime(time.strptime(start, format))
    etime = time.mktime(time.strptime(end, format))
    ptime = stime + prop * (etime - stime)

    rando_time = time.strftime(format, time.localtime(ptime))    
    rando_time = time.mktime(time.strptime(rando_time, format))
    rando_time = int(rando_time)
    return rando_time
    
print(random_date("4/1/2019", "4/30/2019", random.random()))

In [151]:
def gen_row(i):
    
    land = random_date("4/1/2019", "4/30/2019", random.random())
    leave = random_date("4/1/2019", "4/30/2019", random.random())
    while land > leave:
        land = random_date("4/1/2019", "4/30/2019", random.random())
        leave = random_date("4/1/2019", "4/30/2019", random.random())
        
    return dict([
        ('id', i),
        ('name', random.choice(['Bob', 'Bill', 'Bubba', 'Brody', 'Blair', 'Beth'])),
        ('age', str(random.randint(22,66))),
        ('airport', random.choice(['LAX', 'MSY', 'JFK', 'LGA', 'SFO', 'RDU', 'DCA', 'IAW'])),
        ('land', land),
        ('leave', leave)]
    )
print(gen_row(17))

{'id': 17, 'name': 'Brody', 'age': '33', 'airport': 'LAX', 'land': 1554523200, 'leave': 1554523200}


In [153]:
# inspired by ...
# https://gist.github.com/AlanHohn/293c98f9dadfc67443b8078d843d4401

import csv
import random
import time

N=20
print("Making %d records\n" % records)

fieldnames=['id','name','age','airport','land','leave']

writer = csv.DictWriter(open("people.csv", "w"), fieldnames=fieldnames)

writer.writerow(dict(zip(fieldnames, fieldnames)))

for i in range(0, N):
    writer.writerow(gen_row(i))
    
!ls -l   people.csv
!wc -l   people.csv
!head -3 people.csv

Making 20 records

-rw-r--r--  1 wihill  staff  790 Jan 11 16:22 people.csv
      21 people.csv
id,name,age,airport,land,leave
0,Brody,49,JFK,1554177600,1556251200
1,Brody,66,LGA,1554177600,1555473600


### Test File Importer
http://localhost:5601/app/ml#/filedatavisualizer

<img src='travel_search.png' />

# Verify Index w/ Test Search

In [166]:
!wget http://localhost:9200/itens/_search\?pretty\=true\&q\=airport:MSY -q -O -

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 1.3397744,
    "hits" : [
      {
        "_index" : "itens",
        "_type" : "_doc",
        "_id" : "COx9lm8BWea_MqX5V5_4",
        "_score" : 1.3397744,
        "_source" : {
          "@timestamp" : "2019-04-02T04:00:00.000Z",
          "leave" : "1554696000",
          "name" : "Blair",
          "land" : "1554177600",
          "id" : "4",
          "age" : "22",
          "airport" : "MSY"
        }
      },
      {
        "_index" : "itens",
        "_type" : "_doc",
        "_id" : "D-x9lm8BWea_MqX5V5_4",
        "_score" : 1.3397744,
        "_source" : {
          "@timestamp" : "2019-04-14T04:00:00.000Z",
          "leave" : "1555214400",
          "name" : "Blair",
          "land" : "1555214400",
          "id" : "11",
          "age" : "47",
       

1. Simulate 2 parameters: coordinates & date range
2. Generate a GUID
3. Encapsulate in JSON
4. Convert to XML

10. Convert results to XML
11. ETL to PostgreSQL
12. Query & sort correlations

# 1. coordinates and date range

In [162]:
import datetime

In [163]:
# date range - April 13-20, 2019
date_start = datetime.date(2019, 4, 13)
date_end = datetime.date(2019, 4, 20)
date_range = (str(date_start), str(date_end))
date_range

('2019-04-13', '2019-04-20')

In [164]:
# MSY -- Louis Armstrong New Orleans International Airport
lat = 29.9911
long = -90.2592
coords = (lat, long)
coords

(29.9911, -90.2592)

# 2. Generate GUI

In [34]:
import uuid 
GUID = str(uuid.uuid1())
GUID

'9b5e0734-3478-11ea-b7dc-acde48001122'

# 3. JSON wrap

In [35]:
params = {
    'coords' : coords,
    'date_range' : date_range,
    'guid' : GUID
}
params

{'coords': (29.9911, -90.2592),
 'date_range': ('2019-04-13', '2019-04-20'),
 'guid': '9b5e0734-3478-11ea-b7dc-acde48001122'}

# 4. XML Conversion

In [69]:
from dicttoxml import dicttoxml

params_xml = dicttoxml(params)
params_xml_file = open(GUID + '.xml','w')
params_xml_file.write(params_xml.decode())
params_xml_file.close()

In [72]:
from xml.dom.minidom import parseString

dom = parseString(params_xml)
print(dom.toprettyxml())

<?xml version="1.0" ?>
<root>
	<coords type="list">
		<item type="float">29.9911</item>
		<item type="float">-90.2592</item>
	</coords>
	<date_range type="list">
		<item type="str">2019-04-13</item>
		<item type="str">2019-04-20</item>
	</date_range>
	<guid type="str">9b5e0734-3478-11ea-b7dc-acde48001122</guid>
</root>

