### Outline
This is the first stage of the project. We need to connect to a API to collect data. We will call Google's QPX Express API every 30 minutes through a cron job in the EC2 instance, and store the JSON response data in AWS S3. 

* 1) Get API credentials
* 2) Write a Python script to get data from API and dump into S3
* 3) Create a Bucket in S3
* 4) Create an EC2 instance
* 6) Create Kinesis Firehose 
* 6) Set up a cron job

# Step 1: API Credentials

**API credentials are stored in a .yml file in .ssh:**

    qpx_express_cred.yml


### API INFO

**Credentials**:
<img src="images/qpx_express_cred.png">

There is a free quota of 50 queries per day.


#### POST Request

standard: 

     GET/POST https://www.googleapis.com/apiName/apiVersion/resourcePath?parameters

QPX Express: 

     POST https://www.googleapis.com/qpxExpress/v1/trips/search


#### Resource Path

* Trips

         POST https://www.googleapis.com/qpxExpress/v1/trips
         

#### Request Parameters

* Overview

         POST https://www.googleapis.com/qpxExpress/v1/trips/overview


* Search 

         POST https://www.googleapis.com/qpxExpress/v1/trips/search
         
Returns a list of flights.         
         
** Search Requests** 


* slice
    * "kind": "qpxexpress#sliceInput",
    * "origin": string,
    * "destination": string,
    * "date": string,
    * "maxStops": integer,
    * "maxConnectionDuration": integer,
    * "preferredCabin": string,
    * "permittedDepartureTime": {
    * "kind": "qpxexpress#timeOfDayRange",
    * "earliestTime": string,
    * "latestTime": string
    
* passanger
     * "kind": "qpxexpress#passengerCounts",
     * "adultCount": integer,
     * "childCount": integer,
     * "infantInLapCount": integer,
     * "infantInSeatCount": integer,
     * "seniorCount": integer
     * "alliance": string,
     * "prohibitedCarrier": string
* maxPrice: string,
* saleCountry: string,
* ticketingCountry: string,
* refundable: boolean,
* solutions: integer

More info at: [API Response: JSON schema](https://developers.google.com/qpx-express/v1/trips/search#response)

#### Json Request Exported from [QPX Express API](https://qpx-express-demo.itasoftware.com)

Json request to QPX API is stored in the file: 

    qpx_json_request.json


#### Pretty Print JSON 

[Pretty Print](http://jsonprettyprint.com/)

Returns the json response in a human-readable format

# Python Script 
The script below make POST requests to the QPX Express API and stores the data in S3 using Kinesis Firehose.

     api_checks.py

### Python Script - 1st Test

In [1]:
#def connect_qpx_express(self, config_filepath='~/.ssh/qpx_express_cred.yml'):  # Stored in .ssh
#    ''' connect to QPX Express, and return a connection
#        INPUT: yaml config filepath
#        OUTPUT: QPX Express object
#    '''

In [2]:
import json
import requests
import yaml
import pprint
from boto.s3.connection import S3Connection
from boto.s3.key import Key

api_key = "AIzaSyBxO2z1DjQ99WERQCHGApEoi-ccxLpy4eg"

url = "https://www.googleapis.com/qpxExpress/v1/trips/search?key=" + api_key

headers = {'content-type': 'application/json'}

params = {
  "request": {
    "slice": [
      {
        "origin": "SFO",
        "destination": "GRU",
        "date": "2017-07-22"
      },
      {
        "origin": "GRU",
        "destination": "SFO",
        "date": "2017-08-13"
      } 
    ],
    "passengers": {
        "adultCount": 1,
        "infantInLapCount": 0,
        "infantInSeatCount": 0,
        "childCount": 0,
        "seniorCount": 0
    },
    "solutions": 10,
    "refundable": False
  }
}

# Make a post request to QPX Express API by passing endpoint, API key,  params, and headers.
response = requests.post(url, data=json.dumps(params), headers=headers)

In [6]:
data = response.json()
data

{'kind': 'qpxExpress#tripsSearch',
 'trips': {'data': {'aircraft': [{'code': '319',
     'kind': 'qpxexpress#aircraftData',
     'name': 'Airbus A319'},
    {'code': '32B',
     'kind': 'qpxexpress#aircraftData',
     'name': 'Airbus A321 (Sharklets)'},
    {'code': '738', 'kind': 'qpxexpress#aircraftData', 'name': 'Boeing 737'},
    {'code': '773', 'kind': 'qpxexpress#aircraftData', 'name': 'Boeing 777'},
    {'code': '789', 'kind': 'qpxexpress#aircraftData', 'name': 'Boeing 787'}],
   'airport': [{'city': 'DFW',
     'code': 'DFW',
     'kind': 'qpxexpress#airportData',
     'name': 'Dallas/Fort Worth International'},
    {'city': 'SAO',
     'code': 'GRU',
     'kind': 'qpxexpress#airportData',
     'name': 'Sao Paulo Guarulhos International'},
    {'city': 'MIA',
     'code': 'MIA',
     'kind': 'qpxexpress#airportData',
     'name': 'Miami International'},
    {'city': 'PTY',
     'code': 'PTY',
     'kind': 'qpxexpress#airportData',
     'name': "Panama City Tocumen Int'l"},
    

In [150]:
# Http response Headers is a dictionary
response.headers

{'Date': 'Thu, 13 Apr 2017 08:48:31 GMT', 'ETag': '"1syu42EjUdK2xe3y0r4xzvGJtSA/8NxRmUmRignTGx3J8VIUdObEAmA"', 'Pragma': 'no-cache', 'X-Content-Type-Options': 'nosniff', 'Alt-Svc': 'quic=":443"; ma=2592000; v="37,36,35"', 'Expires': 'Mon, 01 Jan 1990 00:00:00 GMT', 'X-Frame-Options': 'SAMEORIGIN', 'Server': 'GSE', 'X-XSS-Protection': '1; mode=block', 'Transfer-Encoding': 'chunked', 'Content-Type': 'application/json; charset=UTF-8', 'Vary': 'Origin, X-Origin', 'Content-Encoding': 'gzip', 'Cache-Control': 'no-cache, no-store, max-age=0, must-revalidate'}

In [151]:
# Get the content-type from the dictionary.
response.headers["content-type"]

'application/json; charset=UTF-8'

In [110]:
import pandas as pd
df = pd.DataFrame(data)
df

Unnamed: 0,kind,trips
data,qpxExpress#tripsSearch,"{'airport': [{'city': 'DFW', 'code': 'DFW', 'k..."
kind,qpxExpress#tripsSearch,qpxexpress#tripOptions
requestId,qpxExpress#tripsSearch,JA9ztPVeoTqnR6Xsi0QLk1
tripOption,qpxExpress#tripsSearch,"[{'slice': [{'duration': 985, 'kind': 'qpxexpr..."


# Step 3: Create Bucket in S3 to load API data

   Bucket was created using AWS console. The bucket is called "qpxexpress".
   
   <img src="images/S3_bucket.png">

# Step 4: Create an EC2 instance

**SSH into EC2 instance**

    ssh -i ~/.ssh/MyKeyPair.pem ec2-user@ec2-54-209-30-187.compute-1.amazonaws.com



#### Transfer Python Scrit File from local computer to EC2
     scp -i ~/.ssh/file.pem api_checks.py ec2-user@ec2-54-209-30-187.compute-1.amazonaws.com


# Step 5: Create Firehose Stream
    

Use boto3 to connect Kinesis Firehose to connect to API and dump data into S3

# Step 6: Create a cron job

Set up a cronjob to run every 30 minutes. 

   * ssh into EC2
   * run cron job
        * crontab -e 

In [None]:
*/30 * * * * python3 api_checks.py

### Get notified in case cronjob fail 
<img src="images/cronjob_email_notification.png">
<img src="images/cron_mail.png">


[Helpful Cron tips](https://www.liquidweb.com/kb/how-to-display-list-all-jobs-in-cron-crontab/)

In [3]:
import json
import requests
import yaml
import pprint
from boto.s3.connection import S3Connection
from boto.s3.key import Key

api_key = "AIzaSyBxO2z1DjQ99WERQCHGApEoi-ccxLpy4eg"

url = "https://www.googleapis.com/qpxExpress/v1/trips/search?key=" + api_key

headers = {'content-type': 'application/json'}

date_of_travel = ["2017-07-01","2017-07-02","2017-07-03","2017-07-04","2017-07-05","2017-07-06"
"2017-07-07","2017-07-08","2017-07-09","2017-07-10","2017-07-11","2017-07-12","2017-07-13",
"2017-07-14","2017-07-15","2017-07-16","2017-07-17", "2017-07-18","2017-07-19",
"2017-07-20","2017-07-21","2017-07-22","2017-07-23","2017-07-24","2017-07-25","2017-07-26",
"2017-07-27", "2017-07-28","2017-07-29","2017-07-30","2017-07-31"]

params = {
  "request": {
    "slice": [
      {
        "origin": "SFO",
        "destination": "GRU",
        "date": date_of_travel
      },
      {
        "origin": "GRU",
        "destination": "SFO",
        "date": date_of_travel
      } 
    ],
    "passengers": {
        "adultCount": 1,
        "infantInLapCount": 0,
        "infantInSeatCount": 0,
        "childCount": 0,
        "seniorCount": 0
    },
    "solutions": 10,
    "refundable": False
  }
}

# Make a post request to QPX Express API by passing endpoint, API key,  params, and headers.
response = requests.post(url, data=json.dumps(params), headers=headers)

In [4]:
data = response.json()
data

{'kind': 'qpxExpress#tripsSearch',
 'trips': {'data': {'aircraft': [{'code': '738',
     'kind': 'qpxexpress#aircraftData',
     'name': 'Boeing 737'},
    {'code': '753', 'kind': 'qpxexpress#aircraftData', 'name': 'Boeing 757'},
    {'code': '777', 'kind': 'qpxexpress#aircraftData', 'name': 'Boeing 777'}],
   'airport': [{'city': 'SAO',
     'code': 'GRU',
     'kind': 'qpxexpress#airportData',
     'name': 'Sao Paulo Guarulhos International'},
    {'city': 'MEX',
     'code': 'MEX',
     'kind': 'qpxexpress#airportData',
     'name': 'Mexico City Benito Juarez International'},
    {'city': 'CHI',
     'code': 'ORD',
     'kind': 'qpxexpress#airportData',
     'name': "Chicago O'Hare"},
    {'city': 'PTY',
     'code': 'PTY',
     'kind': 'qpxexpress#airportData',
     'name': "Panama City Tocumen Int'l"},
    {'city': 'SFO',
     'code': 'SFO',
     'kind': 'qpxexpress#airportData',
     'name': 'San Francisco International'}],
   'carrier': [{'code': 'AD',
     'kind': 'qpxexpress#c

### Check the number of files and size in my S3 bucket

In [None]:
aws s3 ls --summarize --human-readable --recursive s3://qpxexpress/2017

<img src="images/bucket_size.png">