<h1>**Pandas**</h1>

<h3>**What is Pandas**</h3>
A Python library providing data structures and data analysis tools. In particular, it offers data structures and operations for manipulating numerical tables and time series. The name comes from "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals.

<h3>**Benefits**</h3>

* Efficient storage and processing of data.
* Includes many built in functions for data transformation, aggregations, and plotting.
* Great for exploratory work.


<h3>**Pandas is built on Numpy**</h3>

Numpy is one of the fundamental packages for scientific computing in Python.

* They are like lists in Python however they allow faster computation
    * They are stored as one contiguous block of memory, rather than being spread out across multiple locations like a list.
    * Each item in a numpy array is of the same data type (i.e. all integers, all floats, etc.), rather than a conglomerate of any number of data types (as a list is). We call this idea homogeneity, as opposed to the possible heterogeneity of Python lists.

In [8]:
import pandas as pd
import numpy as np
from numpy.random import randn

In [11]:
np.random.randn(3,4)  
#return samples from the standard normal distribution

array([[-1.66499615, -0.62802534,  0.67027933,  0.59446066],
       [ 0.84255299,  1.85357958, -1.1917617 , -0.6183688 ],
       [-0.56007774,  0.05985073, -0.28644748,  0.0301136 ]])

<h3>**Pandas Series**</h3>

In [17]:
dt_index = pd.date_range('2015-1-1', 
                        '2015-11-1', 
                        freq='m')
dt_series = pd.Series(randn(10), 
                      index = dt_index)
dt_series

2015-01-31   -1.813841
2015-02-28   -0.519178
2015-03-31   -1.069356
2015-04-30    1.224095
2015-05-31    0.745135
2015-06-30   -0.195817
2015-07-31    0.415823
2015-08-31   -1.567158
2015-09-30    0.080546
2015-10-31   -0.641566
Freq: M, dtype: float64

In [18]:
dt_series.mean()

-0.3341317602667212

<h3>**Pandas DataFrames**</h3>
A set of Pandas Series that share the same index

In [19]:
df = pd.DataFrame(randn(10, 5), index=dt_index, columns=[x for x in 'abcde'])
df

Unnamed: 0,a,b,c,d,e
2015-01-31,0.094848,1.015013,1.021551,0.245632,0.938821
2015-02-28,2.591438,-0.185937,-1.32721,1.273205,1.150543
2015-03-31,-0.704332,0.921512,-0.528921,-0.39593,0.311761
2015-04-30,1.422352,-0.398548,-0.499881,-0.426022,-0.496697
2015-05-31,0.544095,-1.1606,0.731082,0.251877,-0.380728
2015-06-30,-0.059261,0.95664,-1.776007,-0.889266,0.42477
2015-07-31,-1.498531,1.443224,1.077946,0.503206,1.734434
2015-08-31,-0.086334,-0.613005,-0.233307,0.133118,-0.701191
2015-09-30,0.688128,-1.129493,-1.323366,-0.427901,0.362137
2015-10-31,0.318171,1.334676,1.118304,-1.861925,1.272098


<h4>Reference</h4>

* https://en.wikipedia.org/wiki/Pandas_(software)
* http://pandas.pydata.org/pandas-docs/stable/index.html
* http://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html

* Bullet 1
* Bullet 2
  * Bullet 2a
  * Bullet 2b
* Bullet 3