## Inserting and extracting NoSQL database data in Python

#### Inserting data into Elasticsearch

##### Before you can query Elasticsearch, you will need to load some data into an index. In the previous section, you used a library, psycopg2, to access PostgreSQL
##### Elasticsearch, you will use the elasticsearch library. To load data, you need to create the connection, then you can issue commands to Elasticsearch. Follow the given steps to add a record to Elasticsearch

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
#Import the libraries. You can also create the Faker object to generate random data:
from elasticsearch import Elasticsearch
from faker import Faker
fake=Faker()

In [3]:
#Create a connection to elastic search and type your password
es=Elasticsearch('https://localhost:9200', verify_certs=False, basic_auth=('elastic', '5_AQGZ0kKSiqSI_fhaPF'))

##### Now, you can issue commands to your Elasticsearch instance. The index method will allow you to add data. The method takes an index name, the document type, and a body. 

In [4]:
#The following code creates a JSON object to add to the database, then uses index to send it to the users index (which will be created automatically during the index operation):
doc={"name": fake.name(),"street": fake.street_address(),"city": fake.city(),"zip":fake.zipcode()}
res=es.index(index="users",id=1,document=doc)
print(res['result'])

updated


### Inserting data using helpers

##### Using the bulk method, you can insert many documents at a time. The process is similar to inserting a single record, except that you will generate all the data, then insert it. The steps are as follows

In [5]:
#You need to import the helpers library to access the bulk method
from elasticsearch import helpers

In [6]:
#Create a connection to elastic search and type your password
es=Elasticsearch('https://localhost:9200', verify_certs=False, basic_auth=('elastic', '5_AQGZ0kKSiqSI_fhaPF'))

In [7]:
actions = [
 {
 "_index": "users",
 "document": "doc",
 "_source": {
"name": fake.name(),
"street": fake.street_address(),
"city": fake.city(),
"zip":fake.zipcode()}
 }
 for x in range(998) # or for i,r in df.iterrows()
]

In [8]:
#Now, you can call the bulk method and pass it the elasticsearch instance and the array of data.
#You can print the results to check that it worked
res = helpers.bulk(es, actions)
print(res[0])

998


### Querying Elasticsearch

##### Querying Elasticsearch follows the exact same steps as inserting data. The only difference is you use a different method – search – to send a different body object.

In [9]:
#Import the library and create your elasticsearch instance:
from elasticsearch import Elasticsearch

In [10]:
es=Elasticsearch('https://localhost:9200', verify_certs=False, basic_auth=('elastic', '5_AQGZ0kKSiqSI_fhaPF'))

In [11]:
#Create the JSON object to send to Elasticsearch. The object is a query, using the match_all search
doc={"query":{"match_all":{}}}

In [12]:
#Pass the object to Elasticsearch using the search method. Pass the index and the return size
res=es.search(index="users",body=doc,size=10)

In [13]:
#Lastly, you can print the documents:
print(res['hits']['hits'])

[{'_index': 'users', '_id': 'rAR5iIIBg9cq9YlZgMtZ', '_score': 1.0, '_source': {'name': 'Brittany Martinez', 'street': '9943 Anthony Orchard', 'city': 'North Fernando', 'zip': '83082'}}, {'_index': 'users', '_id': 'rQR5iIIBg9cq9YlZgMtZ', '_score': 1.0, '_source': {'name': 'Alejandra Martinez', 'street': '491 Michael Mountains Suite 350', 'city': 'North Elijahtown', 'zip': '23170'}}, {'_index': 'users', '_id': 'rgR5iIIBg9cq9YlZgMtZ', '_score': 1.0, '_source': {'name': 'Brian Young', 'street': '59274 Gonzales Coves', 'city': 'North Jennifer', 'zip': '57597'}}, {'_index': 'users', '_id': 'rwR5iIIBg9cq9YlZgMtZ', '_score': 1.0, '_source': {'name': 'Julie Miller', 'street': '78739 Johnson Garden Apt. 604', 'city': 'Williamstown', 'zip': '52301'}}, {'_index': 'users', '_id': 'sAR5iIIBg9cq9YlZgMtZ', '_score': 1.0, '_source': {'name': 'Rachel Nguyen', 'street': '28414 Spencer Neck Suite 052', 'city': 'Moralesfort', 'zip': '65555'}}, {'_index': 'users', '_id': 'sQR5iIIBg9cq9YlZgMtZ', '_score': 1.

In [14]:
#Or you can iterate through grabbing _source only
for doc in res['hits']['hits']:
    print(doc['_source'])

{'name': 'Brittany Martinez', 'street': '9943 Anthony Orchard', 'city': 'North Fernando', 'zip': '83082'}
{'name': 'Alejandra Martinez', 'street': '491 Michael Mountains Suite 350', 'city': 'North Elijahtown', 'zip': '23170'}
{'name': 'Brian Young', 'street': '59274 Gonzales Coves', 'city': 'North Jennifer', 'zip': '57597'}
{'name': 'Julie Miller', 'street': '78739 Johnson Garden Apt. 604', 'city': 'Williamstown', 'zip': '52301'}
{'name': 'Rachel Nguyen', 'street': '28414 Spencer Neck Suite 052', 'city': 'Moralesfort', 'zip': '65555'}
{'name': 'Christine Wilson', 'street': '0063 Underwood Forges Suite 565', 'city': 'New Tami', 'zip': '20938'}
{'name': 'Christopher Coleman', 'street': '269 Beltran Garden Apt. 204', 'city': 'New Maria', 'zip': '84287'}
{'name': 'Kimberly Martin', 'street': '6355 Deanna Locks Apt. 945', 'city': 'Calvinland', 'zip': '76652'}
{'name': 'Emily Elliott', 'street': '6566 Erika Trail', 'city': 'Weberstad', 'zip': '26831'}
{'name': 'Eugene Morgan', 'street': '498

In [15]:
#Load the query in pandas dataframe
#To load the results into a DataFrame, import json_normalize from the pandas json library, and use it (json_normalize) on the JSON results, as shown in the following code
from pandas.io.json import json_normalize
df=json_normalize(res['hits']['hits'])
df

Unnamed: 0,_index,_id,_score,_source.name,_source.street,_source.city,_source.zip
0,users,rAR5iIIBg9cq9YlZgMtZ,1.0,Brittany Martinez,9943 Anthony Orchard,North Fernando,83082
1,users,rQR5iIIBg9cq9YlZgMtZ,1.0,Alejandra Martinez,491 Michael Mountains Suite 350,North Elijahtown,23170
2,users,rgR5iIIBg9cq9YlZgMtZ,1.0,Brian Young,59274 Gonzales Coves,North Jennifer,57597
3,users,rwR5iIIBg9cq9YlZgMtZ,1.0,Julie Miller,78739 Johnson Garden Apt. 604,Williamstown,52301
4,users,sAR5iIIBg9cq9YlZgMtZ,1.0,Rachel Nguyen,28414 Spencer Neck Suite 052,Moralesfort,65555
5,users,sQR5iIIBg9cq9YlZgMtZ,1.0,Christine Wilson,0063 Underwood Forges Suite 565,New Tami,20938
6,users,sgR5iIIBg9cq9YlZgMtZ,1.0,Christopher Coleman,269 Beltran Garden Apt. 204,New Maria,84287
7,users,swR5iIIBg9cq9YlZgMtZ,1.0,Kimberly Martin,6355 Deanna Locks Apt. 945,Calvinland,76652
8,users,tAR5iIIBg9cq9YlZgMtZ,1.0,Emily Elliott,6566 Erika Trail,Weberstad,26831
9,users,tQR5iIIBg9cq9YlZgMtZ,1.0,Eugene Morgan,49820 Moss Estate Suite 548,Caitlinstad,40015


##### Using the match_all query, I know I have a document with the name Ronald Goodman. You can query on a field using the match query

In [16]:
doc={"query":{"match":{"name":"Daniel"}}}
res=es.search(index="users",body=doc, size=10)
print(res['hits']['hits'][1]['_source'])

{'name': 'Daniel Richardson', 'street': '3439 Jesse Union Suite 976', 'city': 'North Kellyville', 'zip': '27389'}


##### You can also use a Lucene syntax for queries. In Lucene, you can specify field:value.
##### When performing this kind of search, you do not need a document to send. You can pass the q parameter to the search method:

In [17]:
res=es.search(index="users",q="name:Ronald Goodman",size=10)
print(res['hits']['hits'][0]['_source'])

{'name': 'Dennis Dennis', 'street': '81532 Goodman Junction', 'city': 'West Briana', 'zip': '31963'}


##### Using the City field, you can search for East. It will return many records:. Elasticsearch will tokenize strings with spaces in them, splitting them into multiple strings to search

In [18]:
doc={"query":{"match":{"city":"East"}}}
res=es.search(index="users",body=doc,size=10)
print(res['hits']['hits'])

[{'_index': 'users', '_id': 'tgR5iIIBg9cq9YlZgMtZ', '_score': 2.319874, '_source': {'name': 'Sarah Gentry', 'street': '70776 Perez Parks', 'city': 'East Josephstad', 'zip': '04994'}}, {'_index': 'users', '_id': 'wwR5iIIBg9cq9YlZgMtZ', '_score': 2.319874, '_source': {'name': 'Jenny Richardson', 'street': '7632 Colon Plains', 'city': 'East Jill', 'zip': '39777'}}, {'_index': 'users', '_id': 'xAR5iIIBg9cq9YlZgMtZ', '_score': 2.319874, '_source': {'name': 'Tina Butler', 'street': '301 Hanson Fort', 'city': 'East Joseph', 'zip': '90848'}}, {'_index': 'users', '_id': 'xgR5iIIBg9cq9YlZgMtZ', '_score': 2.319874, '_source': {'name': 'Todd Williams', 'street': '25497 Leonard Plain Suite 207', 'city': 'East Austin', 'zip': '17888'}}, {'_index': 'users', '_id': '2wR5iIIBg9cq9YlZgMtZ', '_score': 2.319874, '_source': {'name': 'Jessica Dalton', 'street': '262 Jennifer Roads Suite 829', 'city': 'East Gregoryshire', 'zip': '37649'}}, {'_index': 'users', '_id': '5wR5iIIBg9cq9YlZgMta', '_score': 2.319874

##### You can use Boolean queries to specify multiple search criteria. For example, you can use must, must not, and should before your queries. Using a Boolean query, you can filter out Jamesberg. Using a must match on Jamesberg as the city (which will return two records), and adding a filter on the ZIP, you can make sure only Jamesberg with the ZIP 63792 is returned. You could also use a must not query on the Lake Jameson ZIP:

In [19]:
doc={"query":{"bool":{"must":{"match":{"city":"East"}},"filter":{"term":{"zip":"89229"}}}}}
res=es.search(index="users",body=doc,size=10)
print(res['hits']['hits'])

[{'_index': 'users', '_id': 'tpfTY4IBfUA28nYC3IBT', '_score': 2.319874, '_source': {'name': 'Carmen Newton', 'street': '2586 Rebecca Plaza Suite 200', 'city': 'East Debrafort', 'zip': '89229'}}]


### Using scroll to handle larger results

##### In the first example, you used a size of 10 for your search. You could have grabbed all 1,000 records, but what do you do when you have more than 10,000 and you need all of them? Elasticsearch has a scroll method that will allow you to iterate over the results until you get them all

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
#Import the library and create your Elasticsearch instance:
from elasticsearch import Elasticsearch
es=Elasticsearch('https://localhost:9200', verify_certs=False, basic_auth=('elastic', '5_AQGZ0kKSiqSI_fhaPF'))

#####  You will pass a new parameter to the search method – scroll. This parameter specifies how long you want to make the results available for. I am using 20 milliseconds. Adjust this number to make sure you have enough time to get the data – it will depend on the document size and network speed

In [3]:
res = es.search(index = 'users', scroll = '20m', size = 500, body = {"query":{"match_all":{}}})
res

ObjectApiResponse({'_scroll_id': 'FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFjJRVWgyck1NUWwyS2lucTZBd2I3MUEAAAAAAABPgxZkLUdqTy1jVVF6T09sd2hyQjdWUXdR', 'took': 1, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10979, 'relation': 'eq'}, 'max_score': 1.0, 'hits': [{'_index': 'users', '_id': 'rAR5iIIBg9cq9YlZgMtZ', '_score': 1.0, '_source': {'name': 'Brittany Martinez', 'street': '9943 Anthony Orchard', 'city': 'North Fernando', 'zip': '83082'}}, {'_index': 'users', '_id': 'rQR5iIIBg9cq9YlZgMtZ', '_score': 1.0, '_source': {'name': 'Alejandra Martinez', 'street': '491 Michael Mountains Suite 350', 'city': 'North Elijahtown', 'zip': '23170'}}, {'_index': 'users', '_id': 'rgR5iIIBg9cq9YlZgMtZ', '_score': 1.0, '_source': {'name': 'Brian Young', 'street': '59274 Gonzales Coves', 'city': 'North Jennifer', 'zip': '57597'}}, {'_index': 'users', '_id': 'rwR5iIIBg9cq9YlZgMtZ', '_score': 1.0, '_source': {'name': 'Julie Miller

In [4]:
#The results will include _scroll_id, which you will need to pass to the scroll method later
#Save the scroll ID and the size of the result set
sid = res['_scroll_id']
size = res['hits']['total']['value']
sid

'FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFjJRVWgyck1NUWwyS2lucTZBd2I3MUEAAAAAAABPgxZkLUdqTy1jVVF6T09sd2hyQjdWUXdR'

In [5]:
size

10979

##### To start scrolling, use a while loop to get records until the size is 0, meaning there is no more data. Inside the loop, you will call the scroll method and pass _scroll_id and how long to scroll. This will grab more of the results from the original query

In [None]:
while (size > 0):
    res = es.scroll(scroll_id = sid, scroll = '20m')

In [6]:
#get the new scroll ID and the size so that you can loop through again if the data still exists
sid = res['_scroll_id']
size = len(res['hits']['hits'])
sid

'FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFjJRVWgyck1NUWwyS2lucTZBd2I3MUEAAAAAAABPgxZkLUdqTy1jVVF6T09sd2hyQjdWUXdR'

In [7]:
size

500

In [8]:
#Lastly, you can do something with the results of the scrolls. In the following code,
#you will print the source for every record
for doc in res['hits']['hits']:
    print(doc['_source'])

{'name': 'Brittany Martinez', 'street': '9943 Anthony Orchard', 'city': 'North Fernando', 'zip': '83082'}
{'name': 'Alejandra Martinez', 'street': '491 Michael Mountains Suite 350', 'city': 'North Elijahtown', 'zip': '23170'}
{'name': 'Brian Young', 'street': '59274 Gonzales Coves', 'city': 'North Jennifer', 'zip': '57597'}
{'name': 'Julie Miller', 'street': '78739 Johnson Garden Apt. 604', 'city': 'Williamstown', 'zip': '52301'}
{'name': 'Rachel Nguyen', 'street': '28414 Spencer Neck Suite 052', 'city': 'Moralesfort', 'zip': '65555'}
{'name': 'Christine Wilson', 'street': '0063 Underwood Forges Suite 565', 'city': 'New Tami', 'zip': '20938'}
{'name': 'Christopher Coleman', 'street': '269 Beltran Garden Apt. 204', 'city': 'New Maria', 'zip': '84287'}
{'name': 'Kimberly Martin', 'street': '6355 Deanna Locks Apt. 945', 'city': 'Calvinland', 'zip': '76652'}
{'name': 'Emily Elliott', 'street': '6566 Erika Trail', 'city': 'Weberstad', 'zip': '26831'}
{'name': 'Eugene Morgan', 'street': '498