<div class="licence">
<span>Licence CC BY-NC-ND</span>
<span>Valérie Roy</span>
<span><img src="media/ensmp-25-alpha.png" /></span>
</div>

# **alignment** of **labels** (rows, columns)

   - *pandas* automatically **align labels**  
     to **perform** binary operations
   - operations will be performed on values with the **same row** and **same column** label

   - to have **label alignement** you must use **pandas** **Ufuncs**, not the **numpy**
   - *numpy* Ufuncs will **operate** on the underlying **ndarray** independently of the **labels**

In [None]:
import numpy as np
import pandas as pd

## **alignment** on *pandas.Series* (on **rows** labels)

In [None]:
s1 = pd.Series([1, 2, 3, 4],     index=['a', 'b', 'c', 'a'])
s2 = pd.Series([10, 20, 30, 40], index=['a', 'e', 'f', 'c'])

In [None]:
s1 + s2  # s1['a'] + s2['a'] = 1 + 10
         # s1['a'] + s2['a'] = 4 + 10
         # s1['b'] + np.NaN  
         # s1['c'] + s2['c'] = 3 + 40
         # np.NaN + s2['e']
         # np.NaN + s2['f']

   - **missing** values are replaced by *numpy.NaN*
   - note that a *numpy.NaN* **"contaminates"** an expression:
      - *numpy.NaN + 20 = numpy.Nan*

In [None]:
s1.add(s2) # the same as s1 + s2

   - you can **fill** missing values
   - (here missing values are replaced by $0$)

In [None]:
s1.add(s2, fill_value=0)

   - **but** *numpy* **does not align** labels

In [None]:
np.add(s1, s2) # it adds the two numpy.ndarrays

## **alignment** on *pandas.DataFrame* ( on **rows** and **columns** labels)

example
   - number of **kilometers** done in **bicycles**, **cars** and **bus**
   - by **Garance**, **Nathalie** et **Baptiste**

In [None]:
names = ['Garance', 'Nathalie', 'Baptiste']

bicycle = pd.Series([280, 340, 150], index=['Garance', 'Nathalie', 'Baptiste'])
car = pd.Series([1500, 450, 670], index=['Garance', 'Nathalie', 'Baptiste'])
bus = pd.Series([30, 11, 36], index=['Garance', 'Nathalie', 'Baptiste'])

trips_in_january = pd.DataFrame({'bicycle':bicycle, 'car': car, 'bus': bus})


In [None]:
trips_in_january

In [None]:
bicycle = pd.Series([130, 80], index=['Garance', 'Baptiste']) # missing Nathalie's values
car = pd.Series([270, 890], index=['Nathalie', 'Baptiste'])  # missing Garance's values
bus = pd.Series([27, 130], index=['Garance', 'Nathalie'])    # missing Baptiste' values

trips_in_february = pd.DataFrame({'bicycle':bicycle, 'car': car, 'bus': bus})

In [None]:
trips_in_february # missing values are np.NaN

In [None]:
trips_with_NaN = trips_in_january + trips_in_february # alignment is done on rows and columns

In [None]:
trips_with_NaN

In [None]:
trips = trips_in_january.add(trips_in_february, fill_value=0)  # alignment is done on rows and columns

In [None]:
trips

## **alignment** on *pandas.Series*  and *pandas.DataFrame*

In [None]:
df = pd.DataFrame({'a': [1, 2, 3], 'b': [10, 20, 30], 'c': [100, 200, 300]}, 
                  index=['x', 'y', 'z'])

In [None]:
df

In [None]:
# we will add a **row**
s_row = pd.Series([0.10, 0.20, 0.30], 
                  index=['a', 'b', 'c'])
s_row

   - the *pandas.Series* is considered as a **row**
   - the row is **broadcasted** on the three indexes
   - the **alignement** is done on the **row** and the **columns** **labels**

In [None]:
df + s_row

In [None]:
# now we tried the same trick
# but indexed on the columns instead 

s_col = pd.Series([1000, 2000, 3000], index=['x', 'y', 'z'])

In [None]:
df + s_col # it is wrong !
           # for pandas, the series is a **row**
           # 'x', 'y' and 'z' are considered as new **columns**
           # (axis is 1)

you must indicate the **axis**
   - *axis=0* **means** that the **Series labels** are **indexes**
   - the **broadcast** is done **column-wise**

In [None]:
df.add(s_col, axis=0)
# s_col is 'x' [1000]
#          'y' [2000]
#          'z' [3000]

# s_col broadcasted is 'x' [1000][1000][1000]
#                      'y' [2000][2000][2000]
#                      'z' [3000][3000][3000]