# Getting Started with pandas
Pandas is designed for working with tabular of heterogeneous data
NumPy is best suited for working with homogeneously typed numerical array data

## 5.1 Introduction to pandas Data Structures

### Series 
One-dimensional array-like object containing a sequence of values of the same type and an associated array of data labels. 


### DataFrame
DataFrame repersents a rectangular table of data and contains an ordered, named collection of columns. 


In [1]:
import pandas as pd
import numpy as np

In [2]:
# Create a series with an index identifying each data point with a label

obj2 = pd.Series([1, 2, 3, 4], index=["a", "b", "c", "d"]) 

# Select by index label like numpy
obj2[["c", "a", "d"]]

# Check if an element exist in the series
"b" in obj2

# Create a series from dictionary
sdata = {"a": 1, "b": 2}
obj3 = pd.Series(sdata)

# Convert series back to dictionary
obj3.to_dict() 

# Passing in an index to define the order or data, filter out the data that is not in the list
# If cannot found the data in index from dictionary, will assign NaN to it
data_index = ["c", "a"]
pd.Series(sdata, index = data_index)

# Detect missing datas 
pd.isna(obj2) 
obj2.isna()
pd.notna(obj2)
obj2.notna()

# Assign name sand index name
obj3.name = "Population"
obj3.index.name = "States"

# Alter Series's index by passing in another array, the array have to be same length as the series
obj3.index = ['x', 'y']

In [3]:
# Construct a DataFrame through a dictionary of equal-length lists or NumPy arrays
fdata = {
					"state":["Ohio", "Ohio", "Nevada"],
					"year": [2000, 2001, 2001],
					"pop": [1.5, 1.7, 3.6]
} 

frame = pd.DataFrame(fdata)

# First and last 5 rows of the DataFrame
frame.head()
frame.tail()

# specify a sequence of columns (if pass columns that isn't contained in dictionary, 
# will appear as missing value)
frame.columns=["state", "year", "pop"]

# Select a column from dataframe
frame['state']

# Select a row from dataframe
frame.loc[1]
frame.iloc[1]

# Modify the entire column
frame['state']= "Anything"
frame['state'] = np.arange(3)

frame['Year 2000'] = frame["year"] == 2000

# Delete a column from frame
del frame['pop']

# Transpose a frame
frame.T

# Return the data contained as a tow-dimensional ndarray
frame.to_numpy


<bound method DataFrame.to_numpy of    state  year  Year 2000
0      0  2000       True
1      1  2001      False
2      2  2001      False>

### Index objects
Responsible for holding the axis labels

Index objects are immutable

Pandas index can contain duplicate labels, selection with duplicate labels will select all occurrences of the label

In [4]:
fframe = pd.DataFrame(['foooo', 'foooo', 'foooo', 'barrr'], index=['foo', 'foo', 'foo', 'bar'])

# Select index with duplicated labels will select all occurrences of the label
fframe.loc["foo"]

Unnamed: 0,0
foo,foooo
foo,foooo
foo,foooo


### Index methods and properties

| Method | Description |
|-|-|
| append() | Concatenate with additional index object, producing a new index | 
| difference() | Compte set difference as an index | 
| intersection() | compute set intersection | 
| union() | Compute set union
| isin() | compute boolean array indicating whether each value is contained in the passed collection | 
| delete() | compute new index with elements at index i is deleted | 
| insert() | compute new index by inserting element at index i | 
| is_monotonic | Return true if each element is greater pr equal to the previous element |
| is_unique | return true is fht index has not duplicate values | 
| unique() | Compute the array of unique values in the index | 

## 5.2 Essential Functionality

### Reindexing 
Crete a new object with the values arranged to align with the new index 

In [5]:
obj = pd.Series([4.5, 3.2, -5.3, 3.6], index = [0, 2, 4, 6])
obj

0    4.5
2    3.2
4   -5.3
6    3.6
dtype: float64

In [6]:
# Rearrange the data accorading to the new index, introducing missing values if any idex values where not already present
# use 'ffill' forward-fills to fill upo the missing value
obj2 = obj.reindex(np.arange(10), method="ffill")

In [14]:
frame = pd.DataFrame(
    np.arange(9).reshape(3, 3),
    index=["a", "c", "d"],
    columns=["Ohio", "Texas", "California"],
)


In [15]:
# Reindex rows
frame2 = frame.reindex(index=["a", "b", "c", "d"])

In [19]:
# Reindex dolumns
states = ["Texas", "Utah", "California"]
frame3 = frame.reindex(columns=states)

In [23]:
frame

Unnamed: 0,Ohio,Texas,California
a,0,1,2
c,3,4,5
d,6,7,8


In [22]:
# reindex with loc operator 
# Works only if all of the new index labels already exist in the DataFrame
# Where reindex will insert missing data for new labels 
# df.loc[rows, columns]
frame.loc[["a", 'd', 'c'], ["California", "Texas"]]

Unnamed: 0,California,Texas
a,2,1
d,8,7
c,5,4


In [None]:
# Dropping Entries from an axis
