# 1. Introduction to Inflation data
- Inflation data refers to the collection of economic metrics that measure the rate at which the general price level of goods and services in an economy increases over a specific period. Inflation reflects a decline in purchasing power, as consumers can buy fewer goods with the same amount of money over time.
- In order to have an accurate view of economic features of movies over time, we will use rate of inflation to have an exact number in features like:
    - Budget
    - WorldWide/Domestic Gross
    

# 2. Libraries

In [6]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

# 3. Data Collection
- We will collect the rate of inflation through api https://www.investopedia.com/inflation-rate-by-year-7253832#toc-historical-us-inflation-rates-from-1929-to-2024
- Since the api does not allow default requests, we will use `User-Agent` header for web protection (Without commercial use, therefore we are safe to do this)


In [7]:
url = 'https://www.investopedia.com/inflation-rate-by-year-7253832#toc-historical-us-inflation-rates-from-1929-to-2024'

headers = {
    'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.6613.119 Mobile Safari/537.36 (compatible; Googlebot/2.1;  http://www.google.com/bot.html)'
}

response = requests.get(url=url, headers=headers)

soup = BeautifulSoup(response.text, 'html.parser')

- Information collected:
    - Year: Ranging from 1929 to 2024.
    - Rate: Inflation rate of that year compared to the year before it.
    - Unit: Default value in 2024 will be 1, values of other years will be calculated based on the inflation rate 

In [8]:
table = soup.find('table', class_='mntl-sc-block-table__table')

data = table.find_all('tr')[1:]
data = [info.text.strip().split('\n') for info in data]

years, rates, units = [], [], []

new_data = {
    'Year': years,
    'Rate': rates,
    'Unit': units
}

for item in data:
    years.append(int(item[0]))
    rates.append(float(item[1].replace('%', '')))


years.append(2024)
rates.append(2.4)

years.reverse()
rates.reverse()

- Unit of other years is calculated by this equation:
    - $U_n = U_{n-1} \times (1 + \frac{r}{100})$

In [9]:
units.append(1.0)

for rate in rates[:-1]:
    units.append(round(units[-1] * (1 + rate / 100),3))

- After finishing the collection, put the data into DataFrame and save it to file csv

In [10]:
df = pd.DataFrame.from_dict(new_data)

In [11]:
df.to_csv('../../Data/inflation_rate.csv', index=False)