In [1]:
import pandas as pd

Data used for this tutorial:

<b> Air quality data</b>

<!-- For this tutorial, air quality data about NO2 is used, made available by openaq and using the py-openaq package. The air_quality_no2.csv data set provides NO2 values for the measurement stations FR04014, BETR801 and London Westminster in respectively Paris, Antwerp and London.

To raw data : https://github.com/pandas-dev/pandas/blob/master/doc/data/air_quality_no2.csv

--->

In [2]:
air_quality = pd.read_csv("data/air_quality_no2.csv", index_col=0, parse_dates=True)

In [3]:
air_quality.head()

Unnamed: 0_level_0,station_antwerp,station_paris,station_london
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-05-07 02:00:00,,,23.0
2019-05-07 03:00:00,50.5,25.0,19.0
2019-05-07 04:00:00,45.0,27.7,19.0
2019-05-07 05:00:00,,50.4,16.0
2019-05-07 06:00:00,,61.9,


# How to create new columns derived from existing columns?
![05_newcolumn_1.svg](attachment:05_newcolumn_1.svg)
https://pandas.pydata.org/docs/_images/05_newcolumn_1.svg

I want to express the NO2 concentration of the station in London in mg/m3
(if we assume temperature of 25 degrees Celcius and pressure of 1013 hPa, the conversion factor in 1.882

In [4]:
air_quality["london_mg_per_cubic"] = air_quality["station_london"] * 1.882

In [5]:
air_quality.head()

Unnamed: 0_level_0,station_antwerp,station_paris,station_london,london_mg_per_cubic
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-05-07 02:00:00,,,23.0,43.286
2019-05-07 03:00:00,50.5,25.0,19.0,35.758
2019-05-07 04:00:00,45.0,27.7,19.0,35.758
2019-05-07 05:00:00,,50.4,16.0,30.112
2019-05-07 06:00:00,,61.9,,


To create a new column, use the [] brackets with the new column name at the left side of the assignment

## **Note**

The calculation of the values is done <b>element_wise</b>. This means all values in the given column

In [7]:
air_quality["ratio_paris_antwerp"] = (
    air_quality["station_paris"] / air_quality["station_antwerp"]
)

In [8]:
air_quality.head()

Unnamed: 0_level_0,station_antwerp,station_paris,station_london,london_mg_per_cubic,ratio_paris_antwerp
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2019-05-07 02:00:00,,,23.0,43.286,
2019-05-07 03:00:00,50.5,25.0,19.0,35.758,0.49505
2019-05-07 04:00:00,45.0,27.7,19.0,35.758,0.615556
2019-05-07 05:00:00,,50.4,16.0,30.112,
2019-05-07 06:00:00,,61.9,,,


The calculation is again element-wise, so the / is applied for the values in each row.

Also other mathematical operators(+,-,\*,/) or logical operators (<,>,=,...) work element wise. The latter was already used in the subset data tutorial to filter rows of a table using a conditional expression.

If you need more advanced logic, you can use arbitrary Python code via apply().

I want to rename the data columns to the corresponding station identifiers used by openAQ

In [9]:
air_quality_renamed = air_quality.rename(
    columns={
        "station_antwerp": "BETR801",
        "station_paris": "FR04014",
        "stationlondon": "London Westminster",
    }
)

In [10]:
air_quality_renamed.head()

Unnamed: 0_level_0,BETR801,FR04014,station_london,london_mg_per_cubic,ratio_paris_antwerp
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2019-05-07 02:00:00,,,23.0,43.286,
2019-05-07 03:00:00,50.5,25.0,19.0,35.758,0.49505
2019-05-07 04:00:00,45.0,27.7,19.0,35.758,0.615556
2019-05-07 05:00:00,,50.4,16.0,30.112,
2019-05-07 06:00:00,,61.9,,,


The rename() function can be used for both row labels and column labels. Provide a dictionary with the keys the current names and the values the new names to update the corresponding names.

The mapping should not be restricted to fixed names only, but can be a mapping function as well. For example, converting the column names to lowercase letters can be done using a function as well:

In [11]:
air_quality_renamed = air_quality_renamed.rename(columns=str.lower)

In [12]:
air_quality_renamed.head()

Unnamed: 0_level_0,betr801,fr04014,station_london,london_mg_per_cubic,ratio_paris_antwerp
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2019-05-07 02:00:00,,,23.0,43.286,
2019-05-07 03:00:00,50.5,25.0,19.0,35.758,0.49505
2019-05-07 04:00:00,45.0,27.7,19.0,35.758,0.615556
2019-05-07 05:00:00,,50.4,16.0,30.112,
2019-05-07 06:00:00,,61.9,,,
