# Data Cleaning
---


**Objective:** to get the data for the following information for all countries from December 1959 to December 1990


1. Industrial production (Index)

2. Exchange rates, National Currency per US dollar (Period Average)

3. Consumer prices (All items), index

4. International Reserves and Liquidity (Reserves, Official Reserve Assets, US Dollar)

5. Data for consumer prices and international reserves for the United States only over the same time period.


---

# 1. Downloading the data

We collected the data from ['IMF data portal'](https://data.imf.org/?sk=4c514d48-b6ba-49ed-8ab9-52b0c1a0179b&sid=1390030341854) using the query function to get desired data

the desired data for Germany and the USA can be found in 2 seperate excel files in the data folder of the repository, titled Germany and the USA respectively



---
# 2. Cleaning the data

#### Importing and merging the 2 datasets

In [12]:
import pandas as pd

# Desired final columns in correct order
final_columns = [
    "Time (Year/Month)",
    "Economic Activity, Industrial Production, Index",
    "Exchange Rates, National Currency Per U.S. Dollar, Period Average, Rate",
    "International Reserves and Liquidity, Reserves, Official Reserve Assets, US Dollar",
    "Prices, Consumer Price Index, All items, Index"
]

# --- GERMANY ---
# Suppose Germany has all five columns in the correct order
germany_df = pd.read_excel("../data/Germany.xlsx", header=0, skiprows=2)  # adjust skiprows
# rename them as needed
germany_df.columns = final_columns
germany_df["Country"] = "Germany"

# --- USA ---
# Suppose the US file only has 3 columns in this order:
#   1) Time (Year/Month)
#   2) International Reserves
#   3) Prices
usa_df = pd.read_excel("../data/USA.xlsx", header=0, skiprows=2)
# rename them to the columns the US file actually has:
usa_df.columns = [
    "Time (Year/Month)",
    "International Reserves and Liquidity, Reserves, Official Reserve Assets, US Dollar",
    "Prices, Consumer Price Index, All items, Index"
]

# Insert the two blank columns at the correct spots
# Insert “Economic Activity” as column index 1
usa_df.insert(
    loc=1, 
    column="Economic Activity, Industrial Production, Index", 
    value=pd.NA
)
# Insert “Exchange Rates” as column index 2
usa_df.insert(
    loc=2,
    column="Exchange Rates, National Currency Per U.S. Dollar, Period Average, Rate",
    value=pd.NA
)

# Now we have 5 columns in the correct order
usa_df["Country"] = "USA"

# --- MERGE ---
merged_df = pd.concat([germany_df, usa_df], ignore_index=True)

merged_df

  merged_df = pd.concat([germany_df, usa_df], ignore_index=True)


Unnamed: 0,Time (Year/Month),"Economic Activity, Industrial Production, Index","Exchange Rates, National Currency Per U.S. Dollar, Period Average, Rate","International Reserves and Liquidity, Reserves, Official Reserve Assets, US Dollar","Prices, Consumer Price Index, All items, Index",Country
0,Dec 1959,32.500305,4.2,4811.474341,24.616929,Germany
1,Jan 1960,31.193881,4.2,4724.155785,24.616929,Germany
2,Feb 1960,31.041599,4.2,4806.362830,24.477068,Germany
3,Mar 1960,32.203755,4.2,4966.456016,24.477068,Germany
4,Apr 1960,34.287622,4.2,5236.120624,24.616929,Germany
...,...,...,...,...,...,...
741,Aug 1990,,,78908.838357,60.351608,USA
742,Sep 1990,,,80024.166133,60.856066,USA
743,Oct 1990,,,82852.196532,61.222946,USA
744,Nov 1990,,,83059.402774,61.360525,USA
