- Pandas is designed on Numpy, so all ufuncs for numpy arrays works with pandas series and df as well.
- Same numpy ufuncs will operate on pandas Series and DF by preserving rows and columns.

# 1. Ufuncs: Index Preseravation 

- While doing Unary operations, index are preserved.
- While doing binary operations index are aligned.

In [2]:
# Define Series

import pandas as pd
import numpy as np

rng = np.random.RandomState(42)

ser = pd.Series(rng.randint(0,10,4))
ser

0    6
1    3
2    7
3    4
dtype: int64

In [8]:
# Define a dataframe
df = pd.DataFrame(rng.randint(0,10, (3,4)), columns=['A', 'B', 'C', 'D'])
df

Unnamed: 0,A,B,C,D
0,8,1,9,8
1,9,4,1,3
2,6,7,2,0


In [6]:
# 1. Unary Operation on a Series

np.exp(ser)

0     403.428793
1      20.085537
2    1096.633158
3      54.598150
dtype: float64

In [7]:
# 2. Unary operation 2 on DF
np.sin(df* np.pi/4)

Unnamed: 0,A,B,C,D
0,-1.0,0.707107,-2.449294e-16,1.0
1,1.224647e-16,1.0,-1.0,1.224647e-16
2,-2.449294e-16,-1.0,0.7071068,0.7071068


# 2. Ufuncs: Index Alignment
- When we perform binary operations, pandas align index of both series or dfs.
- In Series, index of rows are aligned
- In DF, index of rows and columns are aligned
- If a value is missing in any of the data, pandas assign NaN to that in result.
- We can explicitly define NaN value fill type.
- Index alignmnet ensures that in Pandas the context of data is always maintained.
- Therefore it is very helful especially when dealing with misaligned data.

## 2.1 Index alignment in Series

In [11]:
# Define two Series area and population of some states 

area = pd.Series({'Alaska': 1723337, 'Texas': 695662, 'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193, 'New York': 19651127}, name='population')

# Find density= population/area

density = population/area

density


Alaska              NaN
California    90.413926
New York            NaN
Texas         38.018740
dtype: float64

- To avoid NaN, use fill type, and define by method instead of operator.

In [13]:
# Define two series

A = pd.Series([2,4,6], index=[0,1,2])
B = pd.Series([1,3,5], index=[1,2,3])

A+B  # Using operator



0    NaN
1    5.0
2    9.0
3    NaN
dtype: float64

In [14]:
# Using method 

A.add(B, fill_value=0)

0    2.0
1    5.0
2    9.0
3    5.0
dtype: float64

## 2.2 Index alignment in DataFrame
- Index of both rows and columns align when we operate binary functions between two DFs.

In [23]:
# Define DFs
np.random.seed(0)

A_df = pd.DataFrame(np.random.randint(0,10, (2,5)))
A_df

Unnamed: 0,0,1,2,3,4
0,5,0,3,3,7
1,9,3,5,2,4


In [18]:
B_df = pd.DataFrame(np.random.randint(0,10, (4,3)))
B_df

Unnamed: 0,0,1,2
0,8,9,4
1,3,0,3
2,5,0,2
3,3,8,1


In [19]:
# Binary operation on DFs

A_df + B_df

Unnamed: 0,0,1,2,3,4
0,13.0,9.0,7.0,,
1,12.0,3.0,8.0,,
2,,,,,
3,,,,,


In [25]:
# Use fill_value

A_df.add(B_df, fill_value= 0)

Unnamed: 0,0,1,2,3,4
0,13.0,9.0,7.0,3.0,7.0
1,12.0,3.0,8.0,2.0,4.0
2,5.0,0.0,2.0,,
3,3.0,8.0,1.0,,


- Pandas is aligning indices and columns from both A_df and B_df. When an index or column is present in one DataFrame but not in the other, the missing values are replaced with the specified fill_value (which is 0 in this case).

- However, **if an index and column combination is missing in both DataFrames, the result will be NaN**. This happens because there is no data from either DataFrame for that particular index-column pair.



| Python Operator | Pandas Method(s)            |
|-----------------|-----------------------------|
| `+`             | `add()`                     |
| `-`             | `sub()`, `subtract()`       |
| `*`             | `mul()`, `multiply()`       |
| `/`             | `truediv()`, `div()`, `divide()` |
| `//`            | `floordiv()`                |
| `%`             | `mod()`                     |
| `**`            | `pow()`                     |


# 3. Ufuncs: Operations Between DataFrame and Series

- It uses broadcasting to perform any operation between different shapes DFs or DFs and Series.
- Then pandas will automatically align the indices.
- Operation between Series and DF can be prformed row-wise(default) or column-wise(mention axis=0).

In [31]:
# Define a DF
np.random.seed(0)
A = np.random.randint(0,10, size=(3,4))
df = pd.DataFrame(A, columns=list('ABCD'))
df

Unnamed: 0,A,B,C,D
0,5,0,3,3
1,7,9,3,5
2,2,4,7,6


In [34]:
df.iloc[0]

A    5
B    0
C    3
D    3
Name: 0, dtype: int64

In [35]:
df - df.iloc[0]  # Default subtracts series from rows

Unnamed: 0,A,B,C,D
0,0,0,0,0
1,2,9,0,2
2,-3,4,4,3


In [36]:
# To subtract column - use object methods
df.subtract(df['A'], axis=0)

Unnamed: 0,A,B,C,D
0,0,-5,-2,-2
1,0,2,-4,-2
2,0,2,5,4


In [37]:
df.sub(df['A'])   

Unnamed: 0,A,B,C,D,0,1,2
0,,,,,,,
1,,,,,,,
2,,,,,,,


- In above default axis is 1 
- Pandas is trying to align index of column[0] with index of rows, which are not matching at all, therefore all NaN.

In [38]:
halfrow = df.iloc[0, ::2]
halfrow


A    5
C    3
Name: 0, dtype: int64

In [39]:
df-halfrow

Unnamed: 0,A,B,C,D
0,0.0,,0.0,
1,2.0,,0.0,
2,-3.0,,4.0,
