# Obtaining data from API 
We will be using the Python `requests` module. The `requests` module is simplified http module that allows us to quickly pull the data and convert into json. 

In [0]:
import requests

Let's now try to pull from data.gov.sg's Weekly Infectious Disease Bulletin. 

The URL is at
https://data.gov.sg/dataset/weekly-infectious-disease-bulletin-cases

In [0]:
api_url = 'https://data.gov.sg/api/action/datastore_search?resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390'
results = requests.get(api_url)

Let us know look at the JSON data. Most API endpoints returns their data as a JSON data structure. JSON stands for JavaScript Object Notation.

In [4]:
results.json()

{'help': 'https://data.gov.sg/api/3/action/help_show?name=datastore_search',
 'result': {'_links': {'next': '/api/action/datastore_search?offset=100&resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390',
   'start': '/api/action/datastore_search?resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390'},
  'fields': [{'id': '_id', 'type': 'int4'},
   {'id': 'epi_week', 'type': 'text'},
   {'id': 'disease', 'type': 'text'},
   {'id': 'no._of_cases', 'type': 'numeric'}],
  'records': [{'_id': 1,
    'disease': 'Acute Viral hepatitis B',
    'epi_week': '2012-W01',
    'no._of_cases': '0'},
   {'_id': 2,
    'disease': 'Acute Viral hepatitis C',
    'epi_week': '2012-W01',
    'no._of_cases': '0'},
   {'_id': 3,
    'disease': 'Avian Influenza',
    'epi_week': '2012-W01',
    'no._of_cases': '0'},
   {'_id': 4,
    'disease': 'Chikungunya Fever',
    'epi_week': '2012-W01',
    'no._of_cases': '0'},
   {'_id': 5,
    'disease': 'Cholera',
    'epi_week': '2012-W01',
    'no._of_cases': '0'},
   {'_i

Let's look at the number of records returned. 

In [8]:
data = results.json()
len(data['result']['records'])

100

And now look at the total records available. 

In [9]:
data['result']['total']

14052

# Pagination in the REST API
Only 100 records returned but there are a total of 14052 records. 

Most API endpoints limits the number of records returned within a single call to prevent overwhelming the system. 

Get the next page of results using the offset returned by the API call

In [14]:
data['result']['_links']

{'next': '/api/action/datastore_search?offset=100&resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390',
 'start': '/api/action/datastore_search?resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390'}

In [0]:
results = requests.get('https://data.gov.sg' + '/api/action/datastore_search?offset=100&resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390')

In [18]:
results.json()['result']['records']

[{'_id': 101,
  'disease': 'Diphtheria',
  'epi_week': '2012-W07',
  'no._of_cases': '0'},
 {'_id': 102,
  'disease': 'Encephalitis',
  'epi_week': '2012-W07',
  'no._of_cases': '0'},
 {'_id': 103,
  'disease': 'Haemophilus influenzae type b',
  'epi_week': '2012-W07',
  'no._of_cases': '0'},
 {'_id': 104,
  'disease': 'Legionellosis',
  'epi_week': '2012-W07',
  'no._of_cases': '0'},
 {'_id': 105,
  'disease': 'Melioidosis',
  'epi_week': '2012-W07',
  'no._of_cases': '0'},
 {'_id': 106,
  'disease': 'Meningococcal Infection',
  'epi_week': '2012-W07',
  'no._of_cases': '0'},
 {'_id': 107,
  'disease': 'Nipah virus infection',
  'epi_week': '2012-W07',
  'no._of_cases': '0'},
 {'_id': 108,
  'disease': 'Pertussis',
  'epi_week': '2012-W07',
  'no._of_cases': '0'},
 {'_id': 109,
  'disease': 'Plague',
  'epi_week': '2012-W07',
  'no._of_cases': '0'},
 {'_id': 110,
  'disease': 'Poliomyelitis',
  'epi_week': '2012-W07',
  'no._of_cases': '0'},
 {'_id': 111, 'disease': 'SARS', 'epi_week'

Now, let's write a single loop to paginate through the API to obtain all the data. 

In [28]:
data_gov_domain = 'https://data.gov.sg'
url = data_gov_domain + '/api/action/datastore_search?resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390'
records = []

record_len = 1

while record_len > 0:
  response = requests.get(url)

  if response.status_code != 200:
    print('Opps! Something went wrong!')
    break

  data = response.json()
  record_len = len(data['result']['records'])

  records.extend(data['result']['records'])
  url = data_gov_domain + data['result']['_links']['next']

https://data.gov.sg/api/action/datastore_search?resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390
https://data.gov.sg/api/action/datastore_search?offset=100&resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390
https://data.gov.sg/api/action/datastore_search?offset=200&resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390
https://data.gov.sg/api/action/datastore_search?offset=300&resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390
https://data.gov.sg/api/action/datastore_search?offset=400&resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390
https://data.gov.sg/api/action/datastore_search?offset=500&resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390
https://data.gov.sg/api/action/datastore_search?offset=600&resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390
https://data.gov.sg/api/action/datastore_search?offset=700&resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390
https://data.gov.sg/api/action/datastore_search?offset=800&resource_id=ef7e44f1-9b14-4680-a60a-37d2c9dda390
https://data.gov.sg/api/action/datastor

In [29]:
len(records)

14052

List the first 5 records

In [38]:
records[:5]

[{'_id': 1,
  'disease': 'Acute Viral hepatitis B',
  'epi_week': '2012-W01',
  'no._of_cases': '0'},
 {'_id': 2,
  'disease': 'Acute Viral hepatitis C',
  'epi_week': '2012-W01',
  'no._of_cases': '0'},
 {'_id': 3,
  'disease': 'Avian Influenza',
  'epi_week': '2012-W01',
  'no._of_cases': '0'},
 {'_id': 4,
  'disease': 'Chikungunya Fever',
  'epi_week': '2012-W01',
  'no._of_cases': '0'},
 {'_id': 5, 'disease': 'Cholera', 'epi_week': '2012-W01', 'no._of_cases': '0'}]

# Let us now convert to a Pandas DataFrame

In [0]:
import pandas as pd

In [0]:
df = pd.DataFrame.from_dict(records)

In [35]:
df.head(10)

Unnamed: 0,epi_week,_id,disease,no._of_cases
0,2012-W01,1,Acute Viral hepatitis B,0
1,2012-W01,2,Acute Viral hepatitis C,0
2,2012-W01,3,Avian Influenza,0
3,2012-W01,4,Chikungunya Fever,0
4,2012-W01,5,Cholera,0
5,2012-W01,6,Dengue Haemorrhagic Fever,0
6,2012-W01,7,Diphtheria,0
7,2012-W01,8,Encephalitis,0
8,2012-W01,9,Haemophilus influenzae type b,0
9,2012-W01,10,Legionellosis,0


Let's change the column names to something more presentable.

In [0]:
df.columns= ['EPI Week', 'ID', 'Disease', 'No of Cases']

In [45]:
df.head(10)

Unnamed: 0,EPI Week,ID,Disease,No of Cases
0,2012-W01,1,Acute Viral hepatitis B,0
1,2012-W01,2,Acute Viral hepatitis C,0
2,2012-W01,3,Avian Influenza,0
3,2012-W01,4,Chikungunya Fever,0
4,2012-W01,5,Cholera,0
5,2012-W01,6,Dengue Haemorrhagic Fever,0
6,2012-W01,7,Diphtheria,0
7,2012-W01,8,Encephalitis,0
8,2012-W01,9,Haemophilus influenzae type b,0
9,2012-W01,10,Legionellosis,0


In [49]:
df.dtypes

EPI Week       object
ID              int64
Disease        object
No of Cases    object
dtype: object

In [0]:
df['No of Cases'] = pd.to_numeric(df['No of Cases'])

In [52]:
df.dtypes

EPI Week       object
ID              int64
Disease        object
No of Cases     int64
dtype: object

In [60]:
df[df['No of Cases'].between(5,10)]

Unnamed: 0,EPI Week,ID,Disease,No of Cases
11793,2016-W20,12018,Measles,6
11824,2012-W03,11814,Pertussis,5
11825,2012-W03,11815,Typhoid,5
11826,2012-W05,11816,Pneumococcal Disease (invasive),5
11827,2012-W05,11817,Typhoid,5
...,...,...,...,...
12486,2019-W31,12477,Measles,10
12487,2019-W45,12478,Mumps,10
12488,2019-W48,12479,Mumps,10
12489,2019-W48,12480,Campylobacter enteritis,10
