# PANDAS OBJECTS
1. Series
2. Dataframe
3. Indexed

## 1. Series 
### 1.1. Define 
``` python

1. data = pd.Series([0.25, 0.5, 0.75, 1.0])

2. data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])

3. population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135}
```

    - heterogeneous dict has dtype: object
    - typed dict keys and values have dtype: type e.g. int64.
    - dtype for string values is also 'object' in pandas.
``` python

4. pd.Series(5, index=[1,2,3,4,9])

5. pd.Series({2:'a', 1:'b', 5: 'k'}, index=[1,5, 8])
```

- Explicitly given index.
### 1.2. Access
1. Using Attributes

- .values
- .index

1. Using index and index range

### 1.3. Series as numpy array
- Can be accessed using implicit and explicit index, same as numpy array.

### 1.4. Series a Specialized dict
- Python dict can have heterogeneous datatype keys and values.
- Series have consistent datatype keys and values, therefore efficient.
- Can be accessed with key range.


## 2. Dataframe 
- Arrays organized in columns with aligned index forms DF.
- Flexible row and column indices (i.e. customized names)

### 2.1 Create DF
``` python
- pd.DataFrame({'population':population, 'area': area})

- pd.DataFrame([[1,2],[3,4]], index= ['first', 'second'], columns=['first_col', 'second_col'])

- pd.DataFrame([{'a':i, 'b': 2*i, 'c': 3*i} for i in range(1,5)])

- pd.DataFrame(np.random.rand(4,2), columns=['foo', 'bar'], index= ['a', 'b', 'c', 'd'])
```
### 2.2 DF as Specialized Dict
- Map column name as key and series data as value.

**ATTENTION**
- In a 2D Numpy array each array forms a row, thus accessing data[0] will give first row.
- But in DF each Series forms a column, thus accessing the data['column_name'] will give first column.
``` python
states['population']['Texas'] 
```
  - here population is col index and texas is row index.

## 3. Index object
### 3.1 Immutable
### 3.2 Ordered set
- Can be multiset index: repeated index.
  
``` python  
multi_index = pd.Index([1,2,3,4,2,7,11])
```

- Operations can be performed
```python

indA= pd.Index([1, 3, 5, 7, 9])
indB= pd.Index([2, 3, 5, 7, 11])

print(indA.intersection(indB))  # Intersection
print(indA.union(indB))  # Union
print(indA.symmetric_difference(indB))  # Symmetric difference
```
  

# DATA INDEXING AND SELECTION

## 1. Data selection in Series
1. As a Dict {key:value}
2. As 1D array - through indexing like numpy array

### 1.1 **As Dict**

data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])
``` python
1. data['c']  # 0.75
2. 'c' in data # true
3. data.keys()  # index list
4. data.values # value list
4. data.items()  # tuple(index:value)
5. data['e']=1.25    # modify Series

```

### 1.2 **As 1D array**
``` python
1. data['a':'c']  # Slicing with explicit index
2. data[0:2]       # Slicing with implicit index
3. data[(data>0.3) & (data<0.8)]  # Masking 
4. data[['a','e']]  # Fancy Indexing
```

#### Indexers
- loc - slice explicit index
- iloc - implicit index
```pyhton

data = pd.Series([23, 24, 25, 27, 20], index=[2, 3, 4, 5, 6])

1. data.reset_index() # Resets implicit index in Series/DF starting from 0 to 4

2. data.loc[3]  # Explicit index value 24

3. data.iloc[3] # Implicit index value 27

```


**ATTENTION** In Implicit indexing final index is excluded, while in explicit indexing final index is included.

## 2. Data selection in DF
1. As Dict{key:value}
2. As 2D array 

### 2.1 DF as Dict
``` python 
data = pd.DataFrame({'area':area, 'pop':pop})  # are and pop are arrays

1. data['area'] # key=column
2. data.area   # same with Attribute style.
                # Only when col name is string.
               
3. assert data.pop is not data['pop']  #If the name of column is same as any dataframe method, it will not work.

4. data['density'] = data['pop']/data['area'] # Create new column
```
### 2.2 DF as 2D array
``` python
1. data.values  # All rows

2. data.values[0]  # first row

3. data['area']  # access col by col name

2. data.T # transpose

```

#### Indexers
``` python
1. data.iloc[0:2, : ]   # Last index is not included

2. data.loc['California':'Florida', :]  # Includes last index also

```

#### Masking and fancy indexing
``` python

1. # Fancy Indexing
data.loc[data.density>100, ['pop','density']]  # Using boolean Expression to select rows

2. data.iloc[[1,2], [1,2]]  # Using list to select rows and cols with implict indexing.
```
**ATTENTION**
- when accessing df using [], we can either give row index or column index.
- To give both together [,] using comma inside [], we must specify .loc or .iloc.

``` python
3. data.iloc[0,2] = 0.0  # Modify value of DF

```

#### Additional Indexing Conventions

1. While indexing refers to columns, slicing refers to rows.
``` python
data['area']  # Indexing refers column

data['California':'New York'] # SLicing refers rows

data[0:3]  # Slicing with implicit rows
```
2. Direct masking means row-operation

``` python 
data[data['density'>100]]  # Filter rows
```