# Getting Started with pandas
Pandas is designed for working with tabular of heterogeneous data
NumPy is best suited for working with homogeneously typed numerical array data

## 5.1 Introduction to pandas Data Structures

### Series 
One-dimensional array-like object containing a sequence of values of the same type and an associated array of data labels. 


### DataFrame
DataFrame repersents a rectangular table of data and contains an ordered, named collection of columns. 


In [None]:
import pandas as pd
import numpy as np

In [None]:
# Create a series with an index identifying each data point with a label

obj2 = pd.Series([1, 2, 3, 4], index=["a", "b", "c", "d"]) 

# Select by index label like numpy
obj2[["c", "a", "d"]]

# Check if an element exist in the series
"b" in obj2

# Create a series from dictionary
sdata = {"a": 1, "b": 2}
obj3 = pd.Series(sdata)

# Convert series back to dictionary
obj3.to_dict() 

# Passing in an index to define the order or data, filter out the data that is not in the list
# If cannot found the data in index from dictionary, will assign NaN to it
data_index = ["c", "a"]
pd.Series(sdata, index = data_index)

# Detect missing datas 
pd.isna(obj2) 
obj2.isna()
pd.notna(obj2)
obj2.notna()

# Assign name sand index name
obj3.name = "Population"
obj3.index.name = "States"

# Alter Series's index by passing in another array, the array have to be same length as the series
obj3.index = ['x', 'y']

In [None]:
# Construct a DataFrame through a dictionary of equal-length lists or NumPy arrays
fdata = {
					"state":["Ohio", "Ohio", "Nevada"],
					"year": [2000, 2001, 2001],
					"pop": [1.5, 1.7, 3.6]
} 

frame = pd.DataFrame(fdata)

# First and last 5 rows of the DataFrame
frame.head()
frame.tail()

# specify a sequence of columns (if pass columns that isn't contained in dictionary, 
# will appear as missing value)
frame.columns=["state", "year", "pop"]

# Select a column from dataframe
frame['state']

# Select a row from dataframe
frame.loc[1]
frame.iloc[1]

# Modify the entire column
frame['state']= "Anything"
frame['state'] = np.arange(3)

frame['Year 2000'] = frame["year"] == 2000

# Delete a column from frame
del frame['pop']

# Transpose a frame
frame.T

# Return the data contained as a tow-dimensional ndarray
frame.to_numpy


### Index objects
Responsible for holding the axis labels

Index objects are immutable

Pandas index can contain duplicate labels, selection with duplicate labels will select all occurrences of the label

In [None]:
fframe = pd.DataFrame(['foooo', 'foooo', 'foooo', 'barrr'], index=['foo', 'foo', 'foo', 'bar'])

# Select index with duplicated labels will select all occurrences of the label
fframe.loc["foo"]

### Index methods and properties

| Method | Description |
|-|-|
| append() | Concatenate with additional index object, producing a new index | 
| difference() | Compte set difference as an index | 
| intersection() | compute set intersection | 
| union() | Compute set union
| isin() | compute boolean array indicating whether each value is contained in the passed collection | 
| delete() | compute new index with elements at index i is deleted | 
| insert() | compute new index by inserting element at index i | 
| is_monotonic | Return true if each element is greater pr equal to the previous element |
| is_unique | return true is fht index has not duplicate values | 
| unique() | Compute the array of unique values in the index | 

## 5.2 Essential Functionality

### Reindexing 
Crete a new object with the values arranged to align with the new index 

In [None]:
obj = pd.Series([4.5, 3.2, -5.3, 3.6], index = [0, 2, 4, 6])
obj

In [None]:
# Rearrange the data accorading to the new index, introducing missing values if any idex values where not already present
# use 'ffill' forward-fills to fill upo the missing value
obj2 = obj.reindex(np.arange(10), method="ffill")

In [None]:
frame = pd.DataFrame(
    np.arange(9).reshape(3, 3),
    index=["a", "c", "d"],
    columns=["Ohio", "Texas", "California"],
)


In [None]:
# Reindex rows
frame2 = frame.reindex(index=["a", "b", "c", "d"])

In [None]:
# Reindex dolumns
states = ["Texas", "Utah", "California"]
frame3 = frame.reindex(columns=states)

In [None]:
frame

In [None]:
# reindex with loc operator 
# Works only if all of the new index labels already exist in the DataFrame
# Where reindex will insert missing data for new labels 
# df.loc[rows, columns]
frame.loc[["a", 'd', 'c'], ["California", "Texas"]]

In [5]:
# Dropping Entries from an axis
# Use reindex method or .loc based indexing if already have an index array without those entries

obj = pd.Series(np.arange(5), index=["a", "b", "c", "d", "e"])

new_obj = obj.drop('c')
new_obj = obj.drop(['a', 'e'])

In [9]:
data = pd.DataFrame(
    np.arange(16).reshape((4, 4)),
    index=["Ohio", "Colorado", "Utah", "New York"],
    columns=["one", "two", "three", "four"],
)

# Drop values from row and columns
data.drop(index=['Ohio', 'Colorado'], columns=["two"])
# Drop values ba passing axis - 1 or "columns"
data.drop('two', axis=1)


Unnamed: 0,one,three,four
Ohio,0,2,3
Colorado,4,6,7
Utah,8,10,11
New York,12,14,15


In [28]:
# Indexing, selection and Filtering
obj = pd.Series(np.arange(4), index=["a", "b", "c", "d"])

obj[['a', 'b']]
obj[obj<=2]

# prefer to use loc because when using [], if object label contains integer, it will be use as strings
obj1 = pd.Series(np.arange(5), index=[4, 2, 1, 199, 0])

obj1[199]  # The value return is based on label, not index
# Use loc to select items by label
obj1.loc[[0,1,199]]

# Use iloc to select items by index
# Unlike obj1, obj doesn't contain numbers in index, so have tgo use iloc 
obj.iloc[[0,1,2]]

a    0
b    1
c    2
dtype: int64

In [31]:
# Indexing into a DataFrame retrieves one or more columns 
data = pd.DataFrame(
    np.arange(16).reshape((4, 4)),
    index=["Ohio", "Colorado", "Utah", "New York"],
    columns=["one", "two", "three", "four"],
)

data.loc["Ohio":"Colorado"]

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7


In [35]:
#Selecting data with a boolan array
data[data['three'] > 5]

Unnamed: 0,one,two,three,four
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15
