<div style="background-color: #FF7F50; color: #FFFFFF; padding: 10px; font-size: 24px; font-weight: bold; text-align: center; border-radius: 5px;">Project Title: Meta Stock Price Analysis/Prediction</div>

# About Dataset
__Meta Platforms Stock Prices (Oct 28, 2021 - May 7, 2024)__

This dataset contains daily stock price data for Meta Platforms (formerly Facebook) from October 28, 2021, to May 7, 2024. The data was collected from Yahoo Finance.

__Columns:__

    Date: Date (DD/MM/YYYY)

* _Open:_ Opening price of the stock on that day
* _High:_ Highest price of the stock on that day
* _Low:_ Lowest price of the stock on that day
* _Close:_ Closing price of the stock on that day
* _Adj Close:_ Adjusted closing price of the stock on that day (adjusted for stock splits)
* _Volume:_ Number of shares traded on that day

__Dataset is taken from [Kaggle](https://www.kaggle.com/datasets/saadatkhalid/meta-platforms-stock-price-data/data)__

![META](https://www.artapixel.com/images/artapixel/38-artapixel-meta-logo-icon-3d-social-media-pv.jpg)

# Importing Libaries

In [None]:
# libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots

# Data Loading

In [None]:
df = pd.read_csv("/kaggle/input/meta-platforms-stock-price-data/META.csv")

* Viewing First Five Rows of Dataset

In [None]:
df.head()

* Shape of Dataset

In [None]:
print(f"There are {df.shape[0]} rows and {df.shape[1]} columns")

* Dataset Info

In [None]:
df.info()

In [None]:
df.dtypes

* Checking for Missing Values

In [None]:
df.isnull().sum()

## Meta Stock Price Over Time

In [None]:
fig = make_subplots(rows=1, cols=1)

# Add trace for Close Price
fig.add_trace(
    go.Scatter(x=df['Date'], y=df['Close'], mode='lines', name='Close Price', line=dict(color='blue')),
)

# Update layout
fig.update_layout(
    title='Meta Stock Price Over Time',
    xaxis_title='Date',
    yaxis_title='Close Price',
    showlegend=True,
)

# Show the plot
fig.show()


## Meta Trading Volume Over Time

In [None]:
fig = make_subplots(rows=1, cols=1)

# Add trace for Volume
fig.add_trace(
    go.Scatter(x=df['Date'], y=df['Volume'], mode='lines', name='Volume', line=dict(color='green')),
)

# Update layout
fig.update_layout(
    title='Meta Trading Volume Over Time',
    xaxis_title='Date',
    yaxis_title='Volume',
    showlegend=True,
)

# Show the plot
fig.show()

## Meta Stock Price Analysis (Candle Plot)

In [None]:
figure = go.Figure(data=[go.Candlestick(x=df["Date"],
                                        open=df["Open"], 
                                        high=df["High"],
                                        low=df["Low"], 
                                        close=df["Close"])])
figure.update_layout(title = "Meta Stock Price Analysis", 
                     xaxis_rangeslider_visible=False)
figure.show()

### Converting the Date Column to Datetime

In [None]:

# Convert date strings to datetime objects
df['Date'] = pd.to_datetime(df['Date'])

# Extract numerical features from the date column
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day

# Drop the original date column
df.drop(columns=['Date'], inplace=True)

In [None]:
df.head()

### Meta Stock Price Comparison by Year

In [None]:
# Group the data by year and calculate the mean open and close prices for each year
yearly_prices = df.groupby('Year')[['Open', 'Close']].mean().reset_index()

# Plot the bar chart
fig = px.bar(yearly_prices, x='Year', y=['Open', 'Close'], barmode='group', title='Meta Stock Price Comparison by Year')
fig.show()


* Upon looking at the above fig we can see that the stock price of Meta Platform was down between the years 2022-2023.

# Prediction with ML Models (REGRESSION)

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

In [None]:
# Split the data into features (X) and target variable (y)
X = df.drop('Close', axis=1)
y = df['Close']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Model training and evaluation
models = {
    'Linear Regression': LinearRegression(),
    'Random Forest': RandomForestRegressor(random_state=42)
}

In [None]:
for name, model in models.items():
    # Train the model
    model.fit(X_train, y_train)
    
    # Make predictions
    y_pred = model.predict(X_test)
    
    # Evaluate the model
    mse = mean_squared_error(y_test, y_pred)
    rmse = np.sqrt(mse)
    mae = mean_absolute_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    
    print(f'{name}:')
    print(f'Mean Squared Error (MSE): {mse}')
    print(f'Root Mean Squared Error (RMSE): {rmse}')
    print(f'Mean Absolute Error (MAE): {mae}')
    print(f'R-squared (R2): {r2}')
    print('------------------------------------')

__Linear Regression:__

* Mean Squared Error (MSE): 0.0119
  * The average squared difference between the predicted and actual values is very low, indicating a good fit of the model to the data.
* Root Mean Squared Error (RMSE): 0.1092
  * The RMSE represents the standard deviation of the residuals, which is also low, suggesting that the model's predictions are close to the actual values on average.
* Mean Absolute Error (MAE): 0.0766
  * The average absolute difference between the predicted and actual values is quite small, indicating that the model's predictions are generally accurate.
* R-squared (R2): 0.999999
  * The R-squared value is very close to 1, indicating that the model explains almost all of the variance in the target variable, which is an excellent result.

__Random Forest:__

* Mean Squared Error (MSE): 2.3599
  * The MSE is higher compared to the Linear Regression model, suggesting that the Random Forest model's predictions have more variability or dispersion around the actual values.
* Root Mean Squared Error (RMSE): 1.5362
  * The RMSE is also higher compared to the Linear Regression model, indicating that the Random Forest model's predictions have higher variability or spread.
* Mean Absolute Error (MAE): 0.9242
  * The MAE represents the average absolute difference between the predicted and actual values, which is higher compared to the Linear Regression model.
* R-squared (R2): 0.999800
  * The R-squared value is still very high, indicating that the Random Forest model explains a significant portion of the variance in the target variable, although slightly lower compared to the Linear Regression model.

Overall, both models seem to perform well, but the Linear Regression model appears to have slightly better performance based on the evaluation metrics provided. 

<div style="background-color: #FF7F50; color: #FFFFFF; padding: 10px; font-size: 24px; font-weight: bold; text-align: center; border-radius: 5px;">THE END - THANKS FOR YOUR ATTENTION - Upvote if you liked it</div>