# Fundamentals Introduction

### Contents
- Object Creation
- Viewing Data
- Selection
    - Getting 
    - Selection by Label 
    - Selection by Position 
- Boolean Indexing
- Setting 
- Missing Data
- Operations 
    - Stats 
    - Apply
    - Histogramming 
    - String Methods
- Merge
    - Concat
    - Join
    - Append 
- Grouping 
    - Splitting 
    - Applying 
    - Combining
- Reshaping 
    - Stack 
    - Unstack 
- Pivot Tables
- Timeseries 
- Categorical
- Plotting
- Getting Data in/out 


In [1]:
import numpy as np
import pandas as pd

## Object Creation
- Question: How to create Series and DataFrame? 
- Answer: Both Series and Dataframe can be create using:
    - numpy array (ndarray)
    - Dictionaries 
    - Scalars


###  Creating a Series by passing list of values
- Series is a one dimmensional container of scalars. It can hold any data types.
- s = pd.Series(data, index).
- index is optional, however, pandas will create a default index 



In [10]:
s = pd.Series([1,3,4,np.nan,6,7])

In [4]:
s

0    1.0
1    3.0
2    4.0
3    NaN
4    6.0
5    7.0
dtype: float64

### Creating a Dataframe by passing Numpy Array, with labeled columns and dates as index

In [8]:
dates = pd.date_range('20190101', periods=6)
dates

DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06'],
              dtype='datetime64[ns]', freq='D')

In [12]:
df = pd.DataFrame(np.random.randn(6,4), index = dates, columns = list('ABCD'))
df

Unnamed: 0,A,B,C,D
2019-01-01,0.839231,-1.061113,-0.626078,-1.252577
2019-01-02,0.384877,-1.561748,-0.936011,-0.974969
2019-01-03,1.009431,0.141091,1.44742,0.125754
2019-01-04,1.413354,-1.092345,-1.468236,-0.194967
2019-01-05,0.028326,-1.223371,-1.342722,-0.538534
2019-01-06,-1.217655,-1.347822,-1.420383,-0.126034


In [13]:
df.dtypes

A    float64
B    float64
C    float64
D    float64
dtype: object

### Creating a Dataframe by passing Dictionary, with labeled columns and dates as index

In [17]:
dict = {'Fruits': ["Apple", "Bannan", "Canteleoup", "Dewberries"], "Kilos" : pd.Series([3,44,6,11]),'sample': pd.Categorical(["test", "train", "test", "train"]) }

dict_df = pd.DataFrame(dict)

dict_df

Unnamed: 0,Fruits,Kilos,sample
0,Apple,3,test
1,Bannan,44,train
2,Canteleoup,6,test
3,Dewberries,11,train


In [19]:
dict_df.dtypes

Fruits      object
Kilos        int64
sample    category
dtype: object

### Viewing Data

In [21]:
df.head()

Unnamed: 0,A,B,C,D
2019-01-01,0.839231,-1.061113,-0.626078,-1.252577
2019-01-02,0.384877,-1.561748,-0.936011,-0.974969
2019-01-03,1.009431,0.141091,1.44742,0.125754
2019-01-04,1.413354,-1.092345,-1.468236,-0.194967
2019-01-05,0.028326,-1.223371,-1.342722,-0.538534


In [23]:
df.tail()

Unnamed: 0,A,B,C,D
2019-01-02,0.384877,-1.561748,-0.936011,-0.974969
2019-01-03,1.009431,0.141091,1.44742,0.125754
2019-01-04,1.413354,-1.092345,-1.468236,-0.194967
2019-01-05,0.028326,-1.223371,-1.342722,-0.538534
2019-01-06,-1.217655,-1.347822,-1.420383,-0.126034


In [26]:
df.index

DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06'],
              dtype='datetime64[ns]', freq='D')

In [27]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,0.409594,-1.024218,-0.724335,-0.493554
std,0.932292,0.599544,1.112767,0.532479
min,-1.217655,-1.561748,-1.468236,-1.252577
25%,0.117463,-1.316709,-1.400968,-0.86586
50%,0.612054,-1.157858,-1.139366,-0.36675
75%,0.966881,-1.068921,-0.703561,-0.143267
max,1.413354,0.141091,1.44742,0.125754


In [28]:
# TRANSPOSE 
df.T

Unnamed: 0,2019-01-01 00:00:00,2019-01-02 00:00:00,2019-01-03 00:00:00,2019-01-04 00:00:00,2019-01-05 00:00:00,2019-01-06 00:00:00
A,0.839231,0.384877,1.009431,1.413354,0.028326,-1.217655
B,-1.061113,-1.561748,0.141091,-1.092345,-1.223371,-1.347822
C,-0.626078,-0.936011,1.44742,-1.468236,-1.342722,-1.420383
D,-1.252577,-0.974969,0.125754,-0.194967,-0.538534,-0.126034
