# Learning iPython
## Chapter 2. Interactive Data Analysis with pandas

In this chapter, topics as followings:
- Exploring a dataset
- Manipulating data
- Complex operations

### Dataset
Taxi trips made in New York City in 2013.
original dataset is 50GB

Use 0.5% of all trips with 850,000 rides in this example
https://raw.githubusercontent.com/ipython-books/minibook-2nd-data/master/nyc_taxi.zip


In [3]:
# I. Import library

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [5]:
# !wget https://github.com/ipython-books/minibook-2nd-data/blob/master/nyc_taxi.zip

In [6]:
# II. Load data

data_filename = 'data/nyc_data.csv'
fare_filename = 'data/nyc_fare.csv'

data = pd.read_csv(data_filename, parse_dates=['pickup_datetime','dropoff_datetime'])
fare = pd.read_csv(data_filename, parse_dates=['pickup_datetime'])

data.head(3)

Unnamed: 0,medallion,hack_license,vendor_id,rate_code,store_and_fwd_flag,pickup_datetime,dropoff_datetime,passenger_count,trip_time_in_secs,trip_distance,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude
0,76942C3205E17D7E7FE5A9F709D16434,25BA06A87905667AA1FE5990E33F0E2E,VTS,1,,2013-01-01 00:00:00,2013-01-01 00:05:00,3,300,0.61,-73.955925,40.781887,-73.963181,40.777832
1,517C6B330DBB3F055D007B07512628B3,2C19FBEE1A6E05612EFE4C958C14BC7F,VTS,1,,2013-01-01 00:05:00,2013-01-01 00:21:00,1,960,3.28,-74.005501,40.745735,-73.964943,40.755722
2,ED15611F168E41B33619C83D900FE266,754AEBD7C80DA17BA1D81D89FB6F4D1D,CMT,1,N,2013-01-01 00:05:52,2013-01-01 00:12:18,1,386,1.5,-73.969955,40.79977,-73.954567,40.787392


In [None]:
data.describe()

In [None]:
p_lng=data.pickup_longitude
p_lat=data.pickup_latitude
d_lng=data.dropoff_longitude
d_lat=data.dropoff_latitude

In [None]:
# Convert coordinates points into pixels
def lat_lng_to_pixels(lat, lng):
    lat_rad = lat*np.pi/180.0
    lat_rad=np.log(np.tan((lat_rad+np.pi/2.0)/2.0))
    x=100*(lng+180.0)/360.0
    y=100*(lat_rad-np.pi)/(2.0*np.pi)
    return (x,y)

In [None]:
px, py = lat_lng_to_pixels(p_lat, p_lng)

In [None]:
# III. Basic Plot
# scatter plot
plt.scatter(px,py)

In [None]:
# Customize plot
plt.figure(figsize=(8,6))
plt.scatter(px,py,s=.1,alpha=.03)
plt.axis('equal')
plt.xlim(29.40,29.55)
plt.ylim(-37.63,-37.54)
plt.axis('off')

In [None]:
# IV. Decriptive statistics with pandas and seaborn
import seaborn as sns
data.trip_distance.hist(bins=np.linspace(0.,10., 100))

In [None]:
# V. Manipulating data
