# DATA ACQUISITION
## Import the necessary libraries

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.
https://pandas.pydata.org/

Pyodbc is an open source Python module that makes accessing ODBC databases simple. 
Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS) .
https://pypi.org/project/pyodbc/

To load the pandas package and start working with it, import the package. The community agreed alias for pandas is pd, so loading pandas as pd is assumed standard practice for all of the pandas documentation.

Import the package, aka import pandas as pd

A table of data is stored as a pandas DataFrame

Each column in a DataFrame is a Series

You can do things by applying a method to a DataFrame or Series

In [1]:
import pandas as pd
import pyodbc

Getting data in to pandas from many different file formats or data sources is supported by read_* functions.

Exporting data out of pandas is provided by different to_*methods.

The head/tail/info methods and the dtypes attribute are convenient for a first check.

In [2]:
accidents_data = pd.read_csv("/data/accidents.csv")
accidents_data

Unnamed: 0,Miles from Home,% of Accidents
0,less than 1,23
1,1 to 5,29
2,6 to 10,17
3,11 to 15,8
4,16 to 20,6
5,over 20,17


In [3]:
print(accidents_data)

  Miles from Home  % of Accidents
0     less than 1              23
1          1 to 5              29
2         6 to 10              17
3        11 to 15               8
4        16 to 20               6
5         over 20              17


In [4]:
accidents_data.columns

Index(['Miles from Home', '% of Accidents'], dtype='object')

In [5]:
accidents_data.index

RangeIndex(start=0, stop=6, step=1)

In [6]:
accidents_data.shape

(6, 2)

In [7]:
green_trip_data = pd.read_excel(r"D:\CSV\green_trip\green_tripdata_2015-09.xls")
green_trip_data

Unnamed: 0,VendorID,lpep_pickup_datetime,Lpep_dropoff_datetime,Store_and_fwd_flag,RateCodeID,Pickup_longitude,Pickup_latitude,Dropoff_longitude,Dropoff_latitude,Passenger_count,...,Fare_amount,Extra,MTA_tax,Tip_amount,Tolls_amount,Ehail_fee,improvement_surcharge,Total_amount,Payment_type,Trip_type
0,2,2015-09-01 00:02:34,2015-09-01 00:02:38,N,5,-73.979485,40.684956,-73.979431,40.685020,1,...,7.8,0.0,0.0,1.95,0.0,,0.0,9.75,1,2
1,2,2015-09-01 00:04:20,2015-09-01 00:04:24,N,5,-74.010796,40.912216,-74.010780,40.912212,1,...,45.0,0.0,0.0,0.00,0.0,,0.0,45.00,1,2
2,2,2015-09-01 00:01:50,2015-09-01 00:04:24,N,1,-73.921410,40.766708,-73.914413,40.764687,1,...,4.0,0.5,0.5,0.50,0.0,,0.3,5.80,1,1
3,2,2015-09-01 00:02:36,2015-09-01 00:06:42,N,1,-73.921387,40.766678,-73.931427,40.771584,1,...,5.0,0.5,0.5,0.00,0.0,,0.3,6.30,2,1
4,2,2015-09-01 00:00:14,2015-09-01 00:04:20,N,1,-73.955482,40.714046,-73.944412,40.714729,1,...,5.0,0.5,0.5,0.00,0.0,,0.3,6.30,2,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65530,2,2015-09-02 16:51:59,2015-09-02 17:04:00,N,1,-73.829605,40.759716,-73.832214,40.751514,1,...,9.0,1.0,0.5,0.00,0.0,,0.3,10.80,2,1
65531,2,2015-09-02 16:53:51,2015-09-02 17:04:32,N,1,-73.962112,40.805710,-73.984970,40.769550,1,...,10.5,1.0,0.5,2.46,0.0,,0.3,14.76,1,1
65532,2,2015-09-02 16:57:21,2015-09-02 17:05:03,N,1,-73.829941,40.713718,-73.831917,40.702145,1,...,7.0,1.0,0.5,2.20,0.0,,0.3,11.00,1,1
65533,2,2015-09-02 16:51:42,2015-09-02 17:05:28,N,1,-73.860748,40.832661,-73.845169,40.845306,1,...,10.5,1.0,0.5,2.46,0.0,,0.3,14.76,1,1


In [8]:
green_trip_data.columns

Index(['VendorID', 'lpep_pickup_datetime', 'Lpep_dropoff_datetime',
       'Store_and_fwd_flag', 'RateCodeID', 'Pickup_longitude',
       'Pickup_latitude', 'Dropoff_longitude', 'Dropoff_latitude',
       'Passenger_count', 'Trip_distance', 'Fare_amount', 'Extra', 'MTA_tax',
       'Tip_amount', 'Tolls_amount', 'Ehail_fee', 'improvement_surcharge',
       'Total_amount', 'Payment_type', 'Trip_type '],
      dtype='object')

In [9]:
green_trip_data.columns

Index(['VendorID', 'lpep_pickup_datetime', 'Lpep_dropoff_datetime',
       'Store_and_fwd_flag', 'RateCodeID', 'Pickup_longitude',
       'Pickup_latitude', 'Dropoff_longitude', 'Dropoff_latitude',
       'Passenger_count', 'Trip_distance', 'Fare_amount', 'Extra', 'MTA_tax',
       'Tip_amount', 'Tolls_amount', 'Ehail_fee', 'improvement_surcharge',
       'Total_amount', 'Payment_type', 'Trip_type '],
      dtype='object')

In [10]:
pd.read_json("https://openlibrary.org/api/books?bibkeys=ISBN:9780345354907,ISBN:0881847690,LCCN:2005041555,ISBN:0060957905&format=json", orient = "index")

Unnamed: 0,bib_key,preview,thumbnail_url,preview_url,info_url
ISBN:9780345354907,ISBN:9780345354907,borrow,https://covers.openlibrary.org/b/id/207586-S.jpg,https://archive.org/details/caseofcharlesdex00...,https://openlibrary.org/books/OL9831606M/The_C...
ISBN:0881847690,ISBN:0881847690,borrow,https://covers.openlibrary.org/b/id/9871313-S.jpg,https://archive.org/details/watchersoutoftim00...,https://openlibrary.org/books/OL22232644M/Watc...
ISBN:0060957905,ISBN:0060957905,noview,https://covers.openlibrary.org/b/id/676505-S.jpg,https://openlibrary.org/books/OL6784868M/Tales...,https://openlibrary.org/books/OL6784868M/Tales...
LCCN:2005041555,LCCN:2005041555,borrow,https://covers.openlibrary.org/b/id/8259841-S.jpg,https://archive.org/details/atmountainsofmad00...,https://openlibrary.org/books/OL3421202M/At_th...
