# Concatenating, Merging and Joining DaraFrames

## Making DataFrames

### Loading Dataset

In [26]:
# importing pandas
import pandas as pd

# csv file location
url = 'https://dq-content.s3.amazonaws.com/291/f500.csv'

# making data frame from csv file
data = pd.read_csv(url)

# drop NaN
data.dropna()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
2,Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
3,China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
4,Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,Teva Pharmaceutical Industries,496,21903,11.5,329.0,92890,-79.3,Yitzhak Peterburg,Pharmaceuticals,Health Care,0,Israel,"Petach Tikva, Israel",http://www.tevapharm.com,1,56960,33337
496,New China Life Insurance,497,21796,-13.3,743.9,100609,-45.6,Wan Feng,"Insurance: Life, Health (stock)",Financials,427,China,"Beijing, China",http://www.newchinalife.com,2,54378,8507
497,Wm. Morrison Supermarkets,498,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111
498,TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006


Make DataFrame: revenues of top USA companies:

In [89]:
# select columns
data_revenues = data[['company', 'revenues', 'country']]

# select top USA companies
data_usa_revenues = data_revenues[data['country']=='USA'].head(3)

data_usa_revenues = data_usa_revenues[['company', 'revenues']]

data_usa_revenues

Unnamed: 0,company,revenues
0,Walmart,485873
7,Berkshire Hathaway,223604
8,Apple,215639


Make DataFrame: profits of top USA companies:

In [91]:
# select columns
data_profits = data[['company', 'profits', 'country']]

# select top USA companies
data_usa_profits = data_profits[data['country']=='USA'].head(3)

data_usa_profits = data_usa_profits[['company', 'profits']]

data_usa_profits

Unnamed: 0,company,profits
0,Walmart,13643.0
7,Berkshire Hathaway,24074.0
8,Apple,45687.0


Make DataFrame: revenues of top China companies:

In [92]:
# select columns
data_revenues = data[['company', 'revenues', 'country']]

# select top China companies
data_china_revenues = data_revenues[data['country']=='China'].head(3)

data_china_revenues = data_china_revenues[['company', 'revenues']]

data_china_revenues

Unnamed: 0,company,revenues
1,State Grid,315199
2,Sinopec Group,267518
3,China National Petroleum,262573


Make DataFrame: profits of top China companies:

In [93]:
# select columns
data_profits = data[['company', 'profits', 'country']]

# select top China companies
data_china_profits = data_profits[data['country']=='China'].head(3)

data_china_profits = data_china_profits[['company', 'profits']]

data_china_profits

Unnamed: 0,company,profits
1,State Grid,9571.3
2,Sinopec Group,1257.9
3,China National Petroleum,1867.5


Make DataFrame: revenues of top Japan companies:

In [94]:
# select columns
data_revenues = data[['company', 'revenues', 'country']]

# select top Japan companies
data_japan_revenues = data_revenues[data['country']=='Japan'].head(3)

data_japan_revenues = data_japan_revenues[['company', 'revenues']]

data_japan_revenues

Unnamed: 0,company,revenues
4,Toyota Motor,254694
28,Honda Motor,129198
32,Japan Post Holdings,122990


Make DataFrame: profits of top Japan companies:

In [95]:
# select columns
data_profits = data[['company', 'profits', 'country']]

# select top Japan companies
data_japan_profits = data_profits[data['country']=='Japan'].head(3)

data_japan_profits = data_japan_profits[['company', 'profits']]

data_japan_profits

Unnamed: 0,company,profits
4,Toyota Motor,16899.3
28,Honda Motor,5690.3
32,Japan Post Holdings,-267.4


## Concatenating DataFrames

Concatenating DataFrames means combining them either vertically (row-wise) or horizontally (column-wise).

We use the `concat()` method to concatenate DataFrames.

*Syntax:*

`pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None)`

*Parameters:*

* `objs`: A list of Series or DataFrame objects to concatenate.
* `axis`: Determines the concatenation direction (0 for rows, 1 for columns).
* `join`: Specifies how to handle columns when `axis=0` (or rows when `axis=1`) if they don't align (`'outer'` for union, `'inner'` for intersection).
* `ignore_index`: If `True`, the resulting DataFrame will have a new, clean index.

### Concatenate DataFrames Vertically

**Example:** concatenate DataFrames vertically (row-wise: `axis=0` default):

In [96]:
# concatenate dataframes vertically
data_revenues = pd.concat([data_usa_revenues, data_china_revenues, data_japan_revenues],
                          ignore_index=True)

data_revenues

Unnamed: 0,company,revenues
0,Walmart,485873
1,Berkshire Hathaway,223604
2,Apple,215639
3,State Grid,315199
4,Sinopec Group,267518
5,China National Petroleum,262573
6,Toyota Motor,254694
7,Honda Motor,129198
8,Japan Post Holdings,122990


**Example:**

In [97]:
# concatenate dataframes vertically
data_profits = pd.concat([data_usa_profits, data_china_profits, data_japan_profits],
                         ignore_index=True)

data_profits

Unnamed: 0,company,profits
0,Walmart,13643.0
1,Berkshire Hathaway,24074.0
2,Apple,45687.0
3,State Grid,9571.3
4,Sinopec Group,1257.9
5,China National Petroleum,1867.5
6,Toyota Motor,16899.3
7,Honda Motor,5690.3
8,Japan Post Holdings,-267.4


### Concatenate DataFrames Horizontally

**Example:** concatenate DataFrames horizontally (column-wise: `axis=1`):

In [98]:
# concatenate dataframes horizontally
data_revenues_profits = pd.concat([data_revenues, data_profits], axis=1)

data_revenues_profits

Unnamed: 0,company,revenues,company.1,profits
0,Walmart,485873,Walmart,13643.0
1,Berkshire Hathaway,223604,Berkshire Hathaway,24074.0
2,Apple,215639,Apple,45687.0
3,State Grid,315199,State Grid,9571.3
4,Sinopec Group,267518,Sinopec Group,1257.9
5,China National Petroleum,262573,China National Petroleum,1867.5
6,Toyota Motor,254694,Toyota Motor,16899.3
7,Honda Motor,129198,Honda Motor,5690.3
8,Japan Post Holdings,122990,Japan Post Holdings,-267.4


## Merging DataFrames

Merging two DataFrames means combining them based on common columns or index levels.

It's used when you want to bring together related information from different DataFrames.

The following figure illustrates the merging operation:

course-pandas-merge.svg

We use the `merge()` method to concatenate DataFrames.

*Syntax:*

`pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False)`


*Parameters:*

* `left`, `right`: The DataFrames to merge.
* `how`: Specifies the type of join ('inner', 'left', 'right', 'outer').
* `on`: Column(s) to join on (if present in both DataFrames).
* `left_on`, `right_on`: Columns to join on in the left and right DataFrames respectively (if names differ).
* `left_index`, `right_index`: Use the index of the left or right DataFrame as the join key.
* Types of Merges (`how` parameter):
  * Inner Join: Returns only rows where the join key exists in both DataFrames.
  * Left Join: Returns all rows from the left DataFrame and matching rows from the right.
  * Right Join: Returns all rows from the right DataFrame and matching rows from the left.
  * Outer Join: Returns all rows from both DataFrames, filling in NaN for non-matching values.

### Merging DataFrames Using One Key

#### Making DataFrames

Make left Dataframe:

In [99]:
# make left dataframe
data_usjp_revenues = pd.concat([data_usa_revenues, data_japan_revenues], ignore_index=True)

data_usjp_revenues

Unnamed: 0,company,revenues
0,Walmart,485873
1,Berkshire Hathaway,223604
2,Apple,215639
3,Toyota Motor,254694
4,Honda Motor,129198
5,Japan Post Holdings,122990


Make right DataFrame:

In [100]:
# make right dataframe
data_cnjp_profits = pd.concat([data_china_profits, data_japan_profits], ignore_index=True)

data_cnjp_profits

Unnamed: 0,company,profits
0,State Grid,9571.3
1,Sinopec Group,1257.9
2,China National Petroleum,1867.5
3,Toyota Motor,16899.3
4,Honda Motor,5690.3
5,Japan Post Holdings,-267.4


#### Inner Join

**Example:** merge DataFrames using one key and inner join:

In [104]:
# merge dataframes using one key and inner join
data_merge_inner = pd.merge(data_usjp_revenues, data_cnjp_profits, on='company', how='inner')

data_merge_inner

Unnamed: 0,company,revenues,profits
0,Toyota Motor,254694,16899.3
1,Honda Motor,129198,5690.3
2,Japan Post Holdings,122990,-267.4


#### Left Join

**Example:** merge DataFrames using one key and left join:

In [101]:
# merge dataframes using one key and left join
data_merge_left = pd.merge(data_usjp_revenues, data_cnjp_profits, on='company', how='left')

data_merge_left

Unnamed: 0,company,revenues,profits
0,Walmart,485873,
1,Berkshire Hathaway,223604,
2,Apple,215639,
3,Toyota Motor,254694,16899.3
4,Honda Motor,129198,5690.3
5,Japan Post Holdings,122990,-267.4


#### Right Join

**Example:** merge DataFrames using one key and right join:

In [102]:
# merge dataframes using one key and right join
data_merge_right = pd.merge(data_usjp_revenues, data_cnjp_profits, on='company', how='right')

data_merge_right

Unnamed: 0,company,revenues,profits
0,State Grid,,9571.3
1,Sinopec Group,,1257.9
2,China National Petroleum,,1867.5
3,Toyota Motor,254694.0,16899.3
4,Honda Motor,129198.0,5690.3
5,Japan Post Holdings,122990.0,-267.4


#### Outer Join

**Example:** merge DataFrames using one key and outer join:

In [103]:
# merge dataframes using one key and outer join
data_merge_right = pd.merge(data_usjp_revenues, data_cnjp_profits, on='company', how='outer')

data_merge_right

Unnamed: 0,company,revenues,profits
0,Apple,215639.0,
1,Berkshire Hathaway,223604.0,
2,China National Petroleum,,1867.5
3,Honda Motor,129198.0,5690.3
4,Japan Post Holdings,122990.0,-267.4
5,Sinopec Group,,1257.9
6,State Grid,,9571.3
7,Toyota Motor,254694.0,16899.3
8,Walmart,485873.0,


## Joining DataFrames

The `join()` method is used to combine columns of two DataFrames based on their indexes.

It's a simple way of merging two DataFrames when the relationship between them is primarily based on their row indexes.

**Example:**

Set index of the first DataFrame:

In [40]:
# set index to company
data_usa_revenues_indexed = data_usa_revenues.set_index('company')

data_usa_revenues_indexed

Unnamed: 0_level_0,revenues,revenue_change,country
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Walmart,485873,0.8,USA
Berkshire Hathaway,223604,6.1,USA
Apple,215639,-7.7,USA
Exxon Mobil,205004,-16.7,USA
McKesson,198533,3.1,USA


Set index of the second DataFrame: this DataFrame should not contain the column named 'country' because it already exists in the first DataFrame:

In [41]:
# select columns
data_usa_profits_selected = data_usa_profits[['company', 'profits', 'profit_change']]

# set index to company
data_usa_profits_indexed = data_usa_profits_selected.set_index('company')

data_usa_profits_indexed

Unnamed: 0_level_0,profits,profit_change
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Walmart,13643.0,-7.2
Berkshire Hathaway,24074.0,
Apple,45687.0,-14.4
Exxon Mobil,7840.0,-51.5
McKesson,5070.0,124.5


Join DataFrames:

In [42]:
# join dataframes
data_usa = data_usa_revenues_indexed.join(data_usa_profits_indexed)

data_usa

Unnamed: 0_level_0,revenues,revenue_change,country,profits,profit_change
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Walmart,485873,0.8,USA,13643.0,-7.2
Berkshire Hathaway,223604,6.1,USA,24074.0,
Apple,215639,-7.7,USA,45687.0,-14.4
Exxon Mobil,205004,-16.7,USA,7840.0,-51.5
McKesson,198533,3.1,USA,5070.0,124.5
