## Using the Requests module to get a file

Documentation for Requests is available at https://requests.readthedocs.io/en/latest/

This demonstration simply requests a file from the HHS Open Data portal: https://healthdata.gov/dataset/electronic-health-record-ehr-incentive-program-payments-eligible-hospitals

In this example, we get the file from HHS inspect some interesting information about it, then write the data to a local file.

In [None]:
import requests

In [None]:
%%time
r = requests.get('https://healthdata.gov/api/views/g62h-syeh/rows.csv?accessType=DOWNLOAD')

CPU times: user 292 ms, sys: 39.9 ms, total: 332 ms
Wall time: 13.4 s


In [None]:
type(r)

requests.models.Response

In [None]:
r.status_code

200

In [None]:
len(r.text)

23944257

In [None]:
print(r.text[0:1000])

state,date,critical_staffing_shortage_today_yes,critical_staffing_shortage_today_no,critical_staffing_shortage_today_not_reported,critical_staffing_shortage_anticipated_within_week_yes,critical_staffing_shortage_anticipated_within_week_no,critical_staffing_shortage_anticipated_within_week_not_reported,hospital_onset_covid,hospital_onset_covid_coverage,inpatient_beds,inpatient_beds_coverage,inpatient_beds_used,inpatient_beds_used_coverage,inpatient_beds_used_covid,inpatient_beds_used_covid_coverage,previous_day_admission_adult_covid_confirmed,previous_day_admission_adult_covid_confirmed_coverage,previous_day_admission_adult_covid_suspected,previous_day_admission_adult_covid_suspected_coverage,previous_day_admission_pediatric_covid_confirmed,previous_day_admission_pediatric_covid_confirmed_coverage,previous_day_admission_pediatric_covid_suspected,previous_day_admission_pediatric_covid_suspected_coverage,staffed_adult_icu_bed_occupancy,staffed_adult_icu_bed_occupancy_coverage,staffed_icu_

In [None]:
with open('nadac.csv','w') as f:
    f.write(r.text)

In [None]:
lines = 0
for row in r.text.split('\n'):
    lines += 1

In [None]:
lines

51258

In [None]:
r.headers

{'Server': 'nginx', 'Date': 'Wed, 12 Oct 2022 01:09:49 GMT', 'Content-Type': 'text/csv; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Access-Control-Allow-Origin': '*', 'Content-disposition': 'attachment; filename=COVID-19_Reported_Patient_Impact_and_Hospital_Capacity_by_State_Timeseries.csv', 'Cache-Control': 'public, must-revalidate, max-age=21600', 'ETag': '"Y2hhcmxpZS4xMzEzMTVfOV8yMzUzb3JtbWIzZHhVb2hqUnBGRjVqYWhPbnFwYXFB---gzip8EhIa9u82HZdXde7sgGpWYKUHjc--gzip--gzip"', 'X-SODA2-Data-Out-Of-Date': 'false', 'X-SODA2-Truth-Last-Modified': 'Tue, 11 Oct 2022 15:58:10 GMT', 'X-SODA2-Secondary-Last-Modified': 'Tue, 11 Oct 2022 15:58:10 GMT', 'Last-Modified': 'Tue, 11 Oct 2022 15:58:10 GMT', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'gzip', 'Age': '0', 'X-Socrata-Region': 'aws-us-east-1-fedramp-prod', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'X-Socrata-RequestId': 'b3e0186ed9f4b9c47ff5629d1a77f4b1'}

In [None]:
import json
print(json.dumps(dict(r.headers), indent=4))

### Total payments in this file?

In [None]:
import csv

In [None]:
total = 0

with open('nadac.csv') as f:
    reader = csv.reader(f)

    header = next(reader)
    print(header)
    payments_idx = header.index('Total_payments')

    for record in reader:
        total += int(record[payments_idx])



In [None]:
print("CA hospitals have received ${:,.2f} in payments.".format(total))

## Reading internet files with Pandas

Pandas is smart enough to know that when you provide an HTTP url it is supposed to go access that data from the internet.

https://pandas.pydata.org/pandas-docs/version/0.23.4/io.html


In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('http://dhcs-chhsagency.opendata.arcgis.com/datasets/8e4f3a0c75b9424d888d11c1f949cc32_0.csv')

In [None]:
df.head()

In [None]:
df.shape

In [None]:
print("CA hospitals have received ${:,.2f} in payments.".format(df['Total_payments'].sum()))