In [1]:
import pandas as pd

# DataFrames Manipulations

## Create Weather datasets

In [2]:
eu_weather_df = pd.DataFrame({
  "town": ["Atina", "Oslo", "London"],
  "temp":[35,21,25],
  "rain": [False, False, True ]
})
eu_weather_df

Unnamed: 0,town,temp,rain
0,Atina,35,False
1,Oslo,21,False
2,London,25,True


In [3]:
bg_weather_df = pd.DataFrame({
  "town": ["Sofia", "Sandanski", "Pleven"],
  "temp":[25,32,21],
  "rain": [False, False, True ]
})
bg_weather_df

Unnamed: 0,town,temp,rain
0,Sofia,25,False
1,Sandanski,32,False
2,Pleven,21,True


## Summary statistics and info

In [4]:
# Summary statistics
eu_weather_df.describe()

Unnamed: 0,temp
count,3.0
mean,27.0
std,7.211103
min,21.0
25%,23.0
50%,25.0
75%,30.0
max,35.0


In [5]:
# Information about the DataFrame
eu_weather_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   town    3 non-null      object
 1   temp    3 non-null      int64 
 2   rain    3 non-null      bool  
dtypes: bool(1), int64(1), object(1)
memory usage: 183.0+ bytes


## Re-arange columns

In [6]:
# using list index, we can select which columns to be included in the new dataframe
eu_weather_df = eu_weather_df[["temp", "rain","town"]]
eu_weather_df


Unnamed: 0,temp,rain,town
0,35,False,Atina
1,21,False,Oslo
2,25,True,London


In [7]:
# but let get the order back:
eu_weather_df = eu_weather_df[['town', 'temp', 'rain']]
eu_weather_df


Unnamed: 0,town,temp,rain
0,Atina,35,False
1,Oslo,21,False
2,London,25,True


## Insert Columns

### Add new column at the end

df[new_col_name] = column_data

Note that if the column exists, the data will be overwritten

In [8]:
eu_weather_df['wind'] = [9.5, 7.5, 4]
eu_weather_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eu_weather_df['wind'] = [9.5, 7.5, 4]


Unnamed: 0,town,temp,rain,wind
0,Atina,35,False,9.5
1,Oslo,21,False,7.5
2,London,25,True,4.0


### Insert column into DF at specified location: df.insert()

Reference:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.insert.html

In [9]:
# it's a good practice to check for column existence, before inserting it
if not "wind2" in eu_weather_df.columns:
    eu_weather_df.insert(1,"wind2",[1.5, 7.5, 4])

eu_weather_df

Unnamed: 0,town,wind2,temp,rain,wind
0,Atina,1.5,35,False,9.5
1,Oslo,7.5,21,False,7.5
2,London,4.0,25,True,4.0


## Deleting columns

#### in place: with del or pop()

In [10]:
if 'wind' in eu_weather_df:
	del eu_weather_df["wind"]
	# eu_weather_df.pop("wind")

eu_weather_df

Unnamed: 0,town,wind2,temp,rain
0,Atina,1.5,35,False
1,Oslo,7.5,21,False
2,London,4.0,25,True


#### with DF.drop method

Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html

Note that `df.drop()` method can be used to remove rows or columns.

As most methods of DataFrame objects, `df.drop()` do not modify the original object, but returns the new modified object. If you want the operation to be performed in place, you can provide the `inplace=True` argument.

In [11]:
# we can use columns argument to drop column(s)
eu_weather_df.drop(columns=["temp","rain"])

Unnamed: 0,town,wind2
0,Atina,1.5
1,Oslo,7.5
2,London,4.0


In [12]:
# or we can pass list of column names as first argument, and specify axis=1, in order to drop columns:
eu_weather_df.drop(["temp","rain"], axis=1)

Unnamed: 0,town,wind2
0,Atina,1.5
1,Oslo,7.5
2,London,4.0


In [13]:
# note, that the original dataframe is not modifiead, as we did not set inplace=True
eu_weather_df

Unnamed: 0,town,wind2,temp,rain
0,Atina,1.5,35,False
1,Oslo,7.5,21,False
2,London,4.0,25,True


In [14]:
# so, let's remove columns from eu_weather_df inplace:
eu_weather_df.drop(columns=["wind2"], inplace=True)
eu_weather_df

Unnamed: 0,town,temp,rain
0,Atina,35,False
1,Oslo,21,False
2,London,25,True


## Concatenate DataDrames along rows or columns (a particular axis)

The pd.concat function in Pandas is a powerful tool for concatenating DataFrames along a particular axis (either rows or columns). This function is essential for combining data from different sources or for appending data programmatically.

Reference: https://pandas.pydata.org/docs/reference/api/pandas.concat.html

### Concatenate Rows (Vertical concatenation)

Using pd.concat() we can add rows from one DF into another.

In [15]:
# keep original indexes (default):
world_weather_df = pd.concat([eu_weather_df, bg_weather_df])
world_weather_df

Unnamed: 0,town,temp,rain
0,Atina,35,False
1,Oslo,21,False
2,London,25,True
0,Sofia,25,False
1,Sandanski,32,False
2,Pleven,21,True


In [16]:
# auto indexing:
world_weather_df = pd.concat([eu_weather_df, bg_weather_df], ignore_index=True)
world_weather_df

Unnamed: 0,town,temp,rain
0,Atina,35,False
1,Oslo,21,False
2,London,25,True
3,Sofia,25,False
4,Sandanski,32,False
5,Pleven,21,True


### Adding hierarchical index (multi-index)

Using the `keys` parameter to create a hierarchical index.

In [17]:
# add keys for each DF, which will result in multiindex:
world_weather_df = pd.concat(
    [eu_weather_df, bg_weather_df],
    keys=["EU", "BG"]
)

world_weather_df.index

MultiIndex([('EU', 0),
            ('EU', 1),
            ('EU', 2),
            ('BG', 0),
            ('BG', 1),
            ('BG', 2)],
           )

In [18]:
# we can retrieve data by index:
world_weather_df.loc["BG"]

Unnamed: 0,town,temp,rain
0,Sofia,25,False
1,Sandanski,32,False
2,Pleven,21,True


In [19]:
# we can get rows by multi-index using tupple:
world_weather_df.loc[("BG",1)]

town    Sandanski
temp           32
rain        False
Name: (BG, 1), dtype: object

### Concatenate Columns (Horizontal concatenation)

Using pd.concat() we can add columns from one DF into another.
Note, that we must pass `axis=1` in order to concatenate columns, not rows.

In [20]:
# define wind_df
wind_df=pd.DataFrame([3.4, 2, 6.5], columns=["wind"])
wind_df

Unnamed: 0,wind
0,3.4
1,2.0
2,6.5


In [21]:
# to add columns from wind_df into bg_weather_df we have to specify axis=1
new_bg_weather_df= pd.concat([bg_weather_df, wind_df],axis=1)
new_bg_weather_df

Unnamed: 0,town,temp,rain,wind
0,Sofia,25,False,3.4
1,Sandanski,32,False,2.0
2,Pleven,21,True,6.5
