## Pandas Tutorial 8: Concatenating Dataframes

In the previous tutorial, we explored how to group and analyze data using the `groupby()` method. Now, we'll five into another essential feature for working with data - **concatenation**. This tutorial will show you how to use Pandas' **`concat()`** function to join or append DataFrames, making it easier to combine data from multiple sources or segments.

#### Topics covered:
* **What is `concat()`?**
* **Concatenating Two DataFrames Using `concat()`**
* **Using the `ignore_index` Argument in `concat()`**
* **List of Arguments for the `concat()` Function**
* **What is "keys"? Passing "keys" to the `concat()` Function**
* **Using the `axis` Argument in `concat()`**
* **Joining a DataFrame with a Series Using the `concat()` Function**

This tutorial will build on the data manipulation techniques discussed earlier, helping you efficiently merge and combine DataFrames for more complex data analysis tasks.

In [1]:
import pandas as pd

In [3]:
# Creating a DataFrame for Indian Cities' Weather Data
india_weather = pd.DataFrame({
    "city": ["mumbai","delhi","bangalore"],
    "temperature": [32,45,30],
    "humidity": [80,60,70]
})
india_weather

Unnamed: 0,city,temperature,humidity
0,mumbai,32,80
1,delhi,45,60
2,bangalore,30,70


In [4]:
# Creating a DataFrame for US Cities' Weather Data
us_weather = pd.DataFrame({
    "city": ["new york","chicago","orlando"],
    "temperature": [21,14,35],
    "humidity": [68,65,75]
})
us_weather

Unnamed: 0,city,temperature,humidity
0,new york,21,68
1,chicago,14,65
2,orlando,35,75


### Concatenating DataFrames Using `concat()`

The `concat()` function in Pandas is used to concatenate multiple DataFrames. In this example, the `india_weather` and `us_weather` DataFrames are concatenated along the rows (default behavior), resulting in a single DataFrame containing data from both Indian and US cities.

**Key features:**
- `pd.concat([df1, df2])`: Combines DataFrames by stacking them vertically (row-wise) by default.
- The index remains unchanged unless you specify `ignore_index=True`.

In [5]:
# Concatenates the india_weather and us_weather DataFrames along rows (default axis=0)
df = pd.concat([india_weather, us_weather])
df

Unnamed: 0,city,temperature,humidity
0,mumbai,32,80
1,delhi,45,60
2,bangalore,30,70
0,new york,21,68
1,chicago,14,65
2,orlando,35,75


In [6]:
# Concatenates the india_weather and us_weather DataFrames along rows (default axis=0)
df = pd.concat([india_weather, us_weather], ignore_index=True)
df

Unnamed: 0,city,temperature,humidity
0,mumbai,32,80
1,delhi,45,60
2,bangalore,30,70
3,new york,21,68
4,chicago,14,65
5,orlando,35,75


### Concatenating DataFrames with `keys` Using `concat()`

The `keys` argument in the `concat()` function allows you to create a hierarchical index, which helps differentiate between the concatenated DataFrames. In this example, the `india_weather` and `us_weather` DataFrames are concatenated, and the keys `'india'` and `'us'` are assigned, creating a multi-level index.

**Key features:**
- `keys=["india", "us"]`: Adds a hierarchical key for each DataFrame, which can be useful for distinguishing the sources of data.
- The resulting DataFrame has a multi-index, where the top-level index indicates whether the data belongs to India or the US.

In [7]:
# Concatenates the india_weather and us_weather DataFrames, adding hierirchical keys ('india' and 'us') to differentiate between the two datasets
df = pd.concat([india_weather, us_weather], keys=["india", "us"])
df

Unnamed: 0,Unnamed: 1,city,temperature,humidity
india,0,mumbai,32,80
india,1,delhi,45,60
india,2,bangalore,30,70
us,0,new york,21,68
us,1,chicago,14,65
us,2,orlando,35,75


### Accessing Data Using `loc()` with Multi-level Index

When using `concat()` with `keys`, the resulting DataFrame has a multi-level index. You can use the `loc[]` method to access data corresponding to a specific key. In this example, `df.loc["india"]` retrieves all rows associated with the `'india'` key, which contains weather data for Indian cities.

**Key features:**
- `loc["key"]`: Allows you to access all data associated with a specific key in a multi-indexed DataFrame.
- Useful for selecting data from different groups after concatenation.

In [8]:
# Retrieves the rows corresponding to the `india` key from the multi-index DataFrame df
df.loc["india"]

Unnamed: 0,city,temperature,humidity
0,mumbai,32,80
1,delhi,45,60
2,bangalore,30,70


In [9]:
# Retrieves the rows corresponding to the `us` key from the multi-index DataFrame df
df.loc["us"]

Unnamed: 0,city,temperature,humidity
0,new york,21,68
1,chicago,14,65
2,orlando,35,75


In [10]:
temperature_df = pd.DataFrame({
    "city": ["mumbai","delhi","bangalore"],
    "temperature": [32, 45, 30]
}, index=[0,1,2])
temperature_df

Unnamed: 0,city,temperature
0,mumbai,32
1,delhi,45
2,bangalore,30


In [11]:
windspeed_df = pd.DataFrame({
    "city": ["mumbai","delhi","bangalore"],
    "windspeed": [7,12,9],
})
windspeed_df

Unnamed: 0,city,windspeed
0,mumbai,7
1,delhi,12
2,bangalore,9


In [12]:
windspeed_df = pd.DataFrame({
    "city": ["delhi","mumbai"],
    "windspeed": [7,12],
}, index=[1,0])
windspeed_df

Unnamed: 0,city,windspeed
1,delhi,7
0,mumbai,12


### Concatenating DataFrames Along Columns Using `concat()`

The `concat()` function can be used to concatenate DataFrames along different axes. In this case, `axis=1` is specified to concatenate `temperature_df` and `windspeed_df` side-by-side, column-wise. This results in a DataFrame where the columns from both DataFrames are combined.

**Key features:**
- `axis=1`: Concatenates the DataFrames along columns (side-by-side).
- Useful for combining related data that shares the same index.

In [13]:
# Concatenates temperature_df and windspeed_df along the columns (axis=1)
df = pd.concat([temperature_df,windspeed_df], axis=1)
df

Unnamed: 0,city,temperature,city.1,windspeed
0,mumbai,32,mumbai,12.0
1,delhi,45,delhi,7.0
2,bangalore,30,,


In [14]:
temperature_df

Unnamed: 0,city,temperature
0,mumbai,32
1,delhi,45
2,bangalore,30


In [15]:
# Creates a Pandas Series with weather events and assigns the column name 'event'
s = pd.Series(["Humid","Dry","Rain"], name="event")
s

0    Humid
1      Dry
2     Rain
Name: event, dtype: object

### Concatenating a DataFrame and a Series Along Columns Using `concat()`

The `concat()` function can also be used to concatenate a DataFrame and a Series. In this case, `axis=1` is specified to concatenate `temperature_df` and the Series `s` side-by-side, resulting in a DataFrame that contains the temperature data and the corresponding weather events.

**Key features:**
- `axis=1`: Concatenates the DataFrame and Series along columns, adding the Series as a new column to the DataFrame.
- This is useful when you want to append a single column of data (Series) to an existing DataFrame.

In [17]:
# Concatenates temperature_df and the Series s along the columns (axis=1)
df = pd.concat([temperature_df, s],axis=1)
df

Unnamed: 0,city,temperature,event
0,mumbai,32,Humid
1,delhi,45,Dry
2,bangalore,30,Rain
