## Pandas `DataFrame` columns
`DataFrame`s are all about the columns of data!

In [1]:
import pandas as pd

Let's create the two dataframes that we examined before on baseball and air quality.

In [2]:
baseball_df = pd.DataFrame({
    'City': ['Pittsburgh', 'Cincinnati', 'Chicago', 'St. Louis', 'Milwaukee'], 
    'Team': ['Pirates', 'Reds', 'Cubs', 'Cardinals', 'Brewers'], 
    'Division': 5 * ['Central'], 
    'League': 5 * ['NL'], 
})
baseball_df

Unnamed: 0,City,Team,Division,League
0,Pittsburgh,Pirates,Central,NL
1,Cincinnati,Reds,Central,NL
2,Chicago,Cubs,Central,NL
3,St. Louis,Cardinals,Central,NL
4,Milwaukee,Brewers,Central,NL


In [3]:
column_headers = ["PM2.5", "PM10", "CO (ppm)", "NO2 (ppb)", "Temperature (C)", "Humidity (%)"]
air_quality_df = pd.DataFrame([
    [12.5, 25.0, 0.4, 15, 22.5, 45],
    [35.2, 50.1, 0.8, 30, 24.0, 50],
    [55.1, 80.3, 1.2, 45, 26.5, 55],
    [22.3, 40.5, 0.6, 20, 21.0, 48]
], columns=column_headers)
air_quality_df

Unnamed: 0,PM2.5,PM10,CO (ppm),NO2 (ppb),Temperature (C),Humidity (%)
0,12.5,25.0,0.4,15,22.5,45
1,35.2,50.1,0.8,30,24.0,50
2,55.1,80.3,1.2,45,26.5,55
3,22.3,40.5,0.6,20,21.0,48


## Selecting columns
To select columns of Pandas `DataFrames`, you can use:
* bracket notation: `df['column']`
* dot notation: `df.column`

In [4]:
baseball_df["City"]

0    Pittsburgh
1    Cincinnati
2       Chicago
3     St. Louis
4     Milwaukee
Name: City, dtype: object

In [5]:
baseball_df.City

0    Pittsburgh
1    Cincinnati
2       Chicago
3     St. Louis
4     Milwaukee
Name: City, dtype: object

You can provide a list of columns with bracket notation to select multiple columns.

In [6]:
air_quality_df.columns

Index(['PM2.5', 'PM10', 'CO (ppm)', 'NO2 (ppb)', 'Temperature (C)',
       'Humidity (%)'],
      dtype='object')

In [7]:
pm_air_quality = air_quality_df[['PM2.5', 'PM10']]
pm_air_quality

Unnamed: 0,PM2.5,PM10
0,12.5,25.0
1,35.2,50.1
2,55.1,80.3
3,22.3,40.5


In [8]:
type(pm_air_quality)

pandas.core.frame.DataFrame

## Adding and deleting columns

Use bracket notation to create a new column!

In [9]:
baseball_df['Wins'] = [87, 34, 54, 23, 23]
baseball_df

Unnamed: 0,City,Team,Division,League,Wins
0,Pittsburgh,Pirates,Central,NL,87
1,Cincinnati,Reds,Central,NL,34
2,Chicago,Cubs,Central,NL,54
3,St. Louis,Cardinals,Central,NL,23
4,Milwaukee,Brewers,Central,NL,23


To fill in a dataframe with a constant value, simply assign it to a new column.

In [10]:
baseball_df['Season'] = 23
baseball_df

Unnamed: 0,City,Team,Division,League,Wins,Season
0,Pittsburgh,Pirates,Central,NL,87,23
1,Cincinnati,Reds,Central,NL,34,23
2,Chicago,Cubs,Central,NL,54,23
3,St. Louis,Cardinals,Central,NL,23,23
4,Milwaukee,Brewers,Central,NL,23,23


## Deleting columns
Using `df.drop(columns=[<columns>])`

In [11]:
baseball_df.drop(columns=["Season"])
baseball_df

Unnamed: 0,City,Team,Division,League,Wins,Season
0,Pittsburgh,Pirates,Central,NL,87,23
1,Cincinnati,Reds,Central,NL,34,23
2,Chicago,Cubs,Central,NL,54,23
3,St. Louis,Cardinals,Central,NL,23,23
4,Milwaukee,Brewers,Central,NL,23,23


In [12]:
baseball_dropped = baseball_df.drop(columns=["Season"]).copy()
baseball_dropped

Unnamed: 0,City,Team,Division,League,Wins
0,Pittsburgh,Pirates,Central,NL,87
1,Cincinnati,Reds,Central,NL,34
2,Chicago,Cubs,Central,NL,54
3,St. Louis,Cardinals,Central,NL,23
4,Milwaukee,Brewers,Central,NL,23


In [13]:
baseball_df.drop(columns=["Wins"], inplace=True)
baseball_df

Unnamed: 0,City,Team,Division,League,Season
0,Pittsburgh,Pirates,Central,NL,23
1,Cincinnati,Reds,Central,NL,23
2,Chicago,Cubs,Central,NL,23
3,St. Louis,Cardinals,Central,NL,23
4,Milwaukee,Brewers,Central,NL,23
