# FLIP(01):  Advanced Data Science
**(Module 01: A Touch of Data Science - DataBase)**

---
- Materials in this module include resources collected from various open-source online repositories.
- You are free to use, but NOT allowed to change or distribute this package.
- If you found any issue/bug for this document, please submit an issue at [tulip-lab/mds](https://github.com/tulip-lab/mds/issues)

Prepared by and for 
**Student Members** |
2006-2019 [TULIP Lab](http://www.tulip.org.au)

---


# Session L - Creating an IBM Cloudant Query

This notebook demonstrates how to create a database, populate it with documents, create an index, and use the index to query the database.

In [None]:
!pip install cloudant==2.3.1

In [None]:
{
  "apikey": "3e66W8JcTMNo8c0E63PGkToFYHuZtFX7o8nOSsrBLP6o",
  "host": "1d045142-d596-4215-b7ad-ab991fb5c436-bluemix.cloudantnosqldb.appdomain.cloud",
  "iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - crn:v1:bluemix:public:cloudantnosqldb:us-south:a/9a364ef23c6c168288ff07e00a0554a5:d91a5400-b791-45f3-a2c6-07795a9cd2fd::",
  "iam_apikey_name": "auto-generated-apikey-56725052-90eb-474a-aad0-0535280017d8",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Manager",
  "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/9a364ef23c6c168288ff07e00a0554a5::serviceid:ServiceId-00b5e86c-a72f-43a0-899e-eba8dc92cc4c",
  "password": "5e2183427288160da8da935272599d451f35f8ebcd2df42011b58a0e3ebca657",
  "port": 443,
  "url": "https://1d045142-d596-4215-b7ad-ab991fb5c436-bluemix:5e2183427288160da8da935272599d451f35f8ebcd2df42011b58a0e3ebca657@1d045142-d596-4215-b7ad-ab991fb5c436-bluemix.cloudantnosqldb.appdomain.cloud",
  "username": "1d045142-d596-4215-b7ad-ab991fb5c436-bluemix"
}

In [None]:
serviceUsername = "1d045142-d596-4215-b7ad-ab991fb5c436-bluemix"
servicePassword = "5e2183427288160da8da935272599d451f35f8ebcd2df42011b58a0e3ebca657"
serviceURL  = "https://1d045142-d596-4215-b7ad-ab991fb5c436-bluemix:5e2183427288160da8da935272599d451f35f8ebcd2df42011b58a0e3ebca657@1d045142-d596-4215-b7ad-ab991fb5c436-bluemix.cloudantnosqldb.appdomain.cloud"

In [None]:
from cloudant.client import Cloudant
from cloudant.error import CloudantException
from cloudant.result import Result, ResultByKey

In [None]:
client = Cloudant(serviceUsername, servicePassword, url=serviceURL)
client.connect()

To begin, you create the query-demo database and some documents that contain the data for these exercises.

## Assumptions

Before you begin, follow these steps to prepare for the notebook:

1. Create an IBM Cloud account
2. Log in the IBM Cloud Dashboard
3. Create an IBM Cloudant instance on IBM Cloud

## Open IBM Cloud Dashboard

1. Open the IBM Cloudant service instance that you created.
2. On IBM Cloudant service page, click **Launch**. The Databases tab opens.

<img src='https://github.com/tulip-lab/mds/raw/master/Jupyter/image/cloudant/database-dashboard.png' width = '300' height = '300' align = center />

3. Click **Create Database**.
4. Enter `query-demo` and click **Create**

   The `query-demo` database automatically opens.

## Creating documents in the database

1. Click **+**.
2. Then select **New Document**. The 'New Document' window opens.
3. To create a document, copy the following sample text and replace the existing text in the new document.

First sample document:

In [None]:
{
"firstname": "Sally",
"lastname": "Brown",
"age": 16,
"location": "New York City, NY",
"_id": "doc1"
}

4. Repeat step 2 to add the remaining documents to the database.

Second sample document:

In [None]:
{
"firstname": "John",
"lastname": "Brown",
"age": 21,
"location": "New York City, NY",
"_id": "doc2"
}

Third sample document:

In [None]:
{
"firstname": "Greg",
"lastname": "Greene",
"age": 35,
"location": "San Diego, CA",
"_id": "doc3"
}

Fourth sample document:

In [None]:
{
"firstname": "Anna",
"lastname": "Greene",
"age": 44,
"location": "Baton Rouge, LA",
"_id": "doc4"
}

Fifth sample document:

In [None]:
{
"firstname": "Lois",
"lastname": "Brown",
"age": 33,
"location": "New York City, NY",
"_id": "doc5"
}

The `query-demo` database was populated with five records. You can see the records from the Table view in the following screen capture:

<img src='https://github.com/tulip-lab/mds/raw/master/Jupyter/image/cloudant/table.png' width = '1000' height = '1000' align = center />

## Creating an index

IBM Cloudant provides views and indexes to query the database. A view runs a query that is saved to the database, and the result is called the result set. When you submit a query to the view, your query searches the result set. An index is a way to structure data that improves retrieval time.

We use IBM Cloudant Query in this tutorial, which uses Mongo-style query syntax to search for documents by using logical operators. IBM Cloudant Query is a combination of a view and a search index.

When you use IBM Cloudant Query, the query planner looks at the selector (your query) to determine the right index to choose from. If it does not find a suitable index, it uses the `_all_docs` special index, which looks up documents by ID. In the worst case scenario, it returns all the documents by ID (full table scan). In memory, we filter out the documents by the selector, which is why, even without an index, you can still query with various fields. Full table scans are expensive, and we recommend that you create an index. See a description of different types of indexes in the following list:

* Primary index – look up a document or list of documents by ID.
* View - search for information in the database that matches the search criteria that you specify, such as counts, sums, averages, and other mathematical functions. The criteria you can search is specified in the view's definition. Views use the MapReduce paradigm.
* Search index – search one or more fields, large amounts of text, or use wildcards, fuzzy search, or facets with [Lucene Query Parser Syntax](http://lucene.apache.org/core/4_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Overview)

**Tips:** If there is no available defined index that matches the specified query, then IBM Cloudant uses the `_all_docs` index.

1. Click **+ > Query Indexes** on either the **All Documents** or **Design Documents** tab.
2. Paste the following sample JSON data into the **Index** field:

In [None]:
{
"index": {
    "fields": [
        "age",
        "lastname"
    ],
    "partial_filter_selector": {
        "age": {
            "$gte": 30
        },
        "lastname": {
            "$eq": "Greene"
        }
    }
},
      "ddoc": "partial-index",
    "type": "json"
}

The index was created. You can see the index in the following screen capture:

<img src='https://github.com/tulip-lab/mds/raw/master/Jupyter/image/cloudant/query-index-result.png' width = '460' height = '460' align = center />

## Creating a query

Queries allow you to extract your data from IBM Cloudant. A well-written query can narrow your search and its results to include only the data you want.

This exercise shows you how to write and run a simple query, query with two fields, and query with two operators. You query with an operator by specifying at least one field and its corresponding value. The query then uses this value to search the database for matches.

For anything but the most simple query, add the JSON to a data file and run it from the command line.

### Running a simple query

1. Click the `Query` tab.
2. Copy and paste the following sample JSON into the IBM Cloudant Query window:

In [None]:
{
  "selector": {
        "lastname" : "Greene",
        "firstname" : "Anna"            
     }        
}

3. Click **Run Query**.

The query results display. You can see them from the Table view in the following screen capture:

<img src='https://github.com/tulip-lab/mds/raw/master/Jupyter/image/cloudant/query-index-result1.png' width = '1000' height = '1000' align = center />

### Running a query with two fields

This example uses two fields to find everyone that is named `Brown` who lives in `New York City`, `NY`.

We describe the search by using a 'selector' expression that looks like the following example:

In [None]:
  {
    "selector": {
      "lastname": "Brown",
      "location": "New York City, NY"
    }
  }

We can tailor the results to meet our needs by adding more details within the selector expression. The `fields` parameter specifies the fields to include with the results. In our example, the results include the first name, last name, and location. The results are sorted by first name in ascending order based on the values in the `sort` parameter. The extra details look like the following example:

In [None]:
{
  "fields" : [
    "firstname",
    "lastname",
    "location"
  ]
}

1. Click the **Query** tab.
2. Copy and paste the following sample JSON into the IBM Cloudant Query window:

In [None]:
{
"selector": {
  "lastname": "Brown",
  "location": "New York City, NY"
},
"fields": [
  "firstname",
  "lastname",
  "location"
] 
}

3. Click **Run Query**.

The query results display. You can see them from the Table view in the following screen capture:

<img src='https://github.com/tulip-lab/mds/raw/master/Jupyter/image/cloudant/query-index-result2.png' width = '1000' height = '1000' align = center />

### Running a query with operators

In this example, the `$eq` (equal) and `$gt` (greater than) operators are used to search for documents that contain the last name `Greene` and an age that is greater than `30`.

We use a selector expression like the following example:

In [None]:
{
  "selector": {
    "age": {
      "$gt": 30
    },
    "lastname": {
      "$eq": "Greene"
    }
  }
}

The results are sorted by age in ascending order based on the values specified in the `sort` parameter.

1. Click the **Query** tab.
2. Copy and paste the following sample JSON into the IBM Cloudant Query window:

In [None]:
{
"selector": {
  "age": {
     "$gt": 30
  },
  "lastname": {
     "$eq": "Greene"
  }
},
"fields": [
  "age",
  "firstname"
],
"sort": [
  {
     "age": "asc"
  }
],
"use_index": "_design/partial-index"
}


3. Click **Run Query**.

The query results display. You can see them from the Table view in the following screen capture:

<img src='https://github.com/tulip-lab/mds/raw/master/Jupyter/image/cloudant/query-index-result3.png' width = '800' height = '800' align = center />

For more information about IBM Cloudant, see the [IBM Cloudant Documentation](https://console.bluemix.net/docs/services/Cloudant/cloudant.html#overview).