# Contents 

## Pandas 
- Basic definition
- Pandas Datastructures
    - Series 
    - Dataframes

# Pandas 
https://pandas.pydata.org/pandas-docs/stable/getting_started/overview.html

### What is Pandas?
- Its a python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive
- The key words here: 
    data structures designed to make working with “relational” or “labeled” data  
    

### What Data Structures are in Pandas?

#### The two primary data structures of pandas are

##### Series (1-dimensional): 
    Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index.
    s = pd.Series(data, index=index)
    
##### DataFrame (2-dimensional)
- These structures handle the vast majority of typical use cases in finance, statistics, social science, etc.

### Why two Data Structures? 

#### Data Structures are containers of scalars aka. numbers
- scalar
- series is container of [scalars] or [scalars]
- Dataframe is container of series

We would like to be able to insert and remove objects from these containers in a dictionary-like fashion.

### Table

| Dimensions | Name  |Description|Example|
|------------|-------|-----------|--------|
|   1        | Series|1D labeled sametype arry| 
|------------|-------|-----------|
|   2        | Data Frame |2D labeled different types of tabluar structure|

## Intro to Data Structures
https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html#dsintro

Deep dive into two data structures

In [7]:
import numpy as np
import pandas as pd

### SERIES 
- s = pd.Series(data, index=index)
- s = pd.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

#### data can be many different things:
- a Python dict
- an ndarray
- a scalar value (like 5)

#### Index is a list of axis labels. Index has few use cases, which depends on the data type

- If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0, ..., len(data) - 1].

### Series using ndarray

In [16]:
s = pd.Series(np.random.randn(5), index = ['a','b','c','d','e'])

In [22]:
s.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [53]:
s

a    1.630411
b    0.882977
c    0.081189
d    0.019680
e   -0.177060
dtype: float64

In [54]:
# If you need the actual array backing a Series, use Series.array.
# Accessing the array can be useful when you need to do some operation without the index
s.array

<PandasArray>
[  1.6304111923506772,   0.8829769253300764,  0.08118900996746493,
 0.019679873743148495, -0.17705995158086585]
Length: 5, dtype: float64

### Accessing elements in pandas data structures will be new lessons
(Here: http://localhost:8888/notebooks/Pandas/Lesson_2_Indexing.ipynb) 

A sample to access elements

In [43]:
# Slicing the ndarray series 
s[0]

1.6304111923506772

In [50]:
# Slicing the ndarray series, this will also slice the index
s[:3]

a    1.630411
b    0.882977
c    0.081189
dtype: float64

In [51]:
s[3:]

d    0.01968
e   -0.17706
dtype: float64

### Series using Dictonary

In [58]:
d = {'aa': 22, 'bb': 44, 'cc': 66}
dd = pd.Series(d)

In [59]:
dd

aa    22
bb    44
cc    66
dtype: int64

In [61]:
# Naming the SERIES 

dd = pd.Series(d, name='doubles')

In [62]:
dd

aa    22
bb    44
cc    66
Name: doubles, dtype: int64

In [63]:
dd.index

Index(['aa', 'bb', 'cc'], dtype='object')

In [57]:
#Accessing the array can be useful when you need to do some operation without the index
dd.array

<PandasArray>
[22, 44, 66]
Length: 3, dtype: int64

In [55]:
#If an index is passed, the values in data corresponding to the labels in the index will be pulled out.
pd.Series(d, index=['aa','b','cc','d','e'])

aa    22.0
b      NaN
cc    66.0
d      NaN
e      NaN
dtype: float64

### From scalar value

If data is a scalar value, an index must be provided. The value will be repeated to match the length of index.

In [81]:
sc = pd.Series(5, index=['a', 'b', 'c', 'd', 'e'])

In [82]:
sc

a    5
b    5
c    5
d    5
e    5
dtype: int64

### Vectorized operations and label alignment with Series

Series can also be passed into most NumPy methods expecting an ndarray

In [86]:
dd + 2

aa    24
bb    46
cc    68
Name: doubles, dtype: int64

In [87]:
dd * 2

aa     44
bb     88
cc    132
Name: doubles, dtype: int64

### NOTE: Since Series is a object there will be loads of attributes and methods assoicated with it. 

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series