<a href="https://colab.research.google.com/github/tafartech/glo-finance/blob/main/Glo_Finance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Collection

*As we are interested in stock price data. We can use a library like yfinance to fetch historical stock data.*

In [None]:
!pip install yfinance


**Once the installation is successful, you can proceed with the next cell to fetch the stock data. For example, let's fetch historical data for a well-known company like Apple Inc. (AAPL)**

In [None]:
import yfinance as yf

# Specify the stock symbol and date range
stock_symbol = 'AAPL'
start_date = '2020-01-01'
end_date = '2023-01-01'

# Fetch the historical data
stock_data = yf.download(stock_symbol, start=start_date, end=end_date)

# Display the first few rows of the dataset
stock_data.head()


# 2.Data Exploration and Preprocessing

*In this step, we'll explore the dataset to understand its structure and perform any necessary preprocessing*

In [None]:
# Import necessary libraries
import pandas as pd

# Display basic information about the dataset
stock_data.info()


### Data Exploration and Preprocessing

The dataset contains historical stock data for Apple Inc. (AAPL) from January 2, 2020, to December 30, 2022. It consists of 756 entries with a datetime index.

#### Data Columns

1. **Open**: Opening stock price for the day.
2. **High**: Highest stock price during the day.
3. **Low**: Lowest stock price during the day.
4. **Close**: Closing stock price for the day.
5. **Adj Close**: Adjusted closing stock price, accounting for dividends and stock splits.
6. **Volume**: Number of shares traded.

#### Data Types and Non-Null Counts

- **Open, High, Low, Close, Adj Close**: Floating-point numbers.
- **Volume**: Integer.

No missing values are observed in the dataset. We can now proceed with visualizing these stock prices and exploring trends over time.


In [None]:
# Import necessary libraries for visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Set the style for the plots
sns.set_style("whitegrid")

# Plotting the closing prices over time
plt.figure(figsize=(12, 6))
sns.lineplot(x=stock_data.index, y=stock_data['Close'], label='Closing Price')
plt.title('AAPL Stock Closing Prices Over Time')
plt.xlabel('Date')
plt.ylabel('Closing Price (USD)')
plt.legend()
plt.show()


# Visualizing Financial Metrics



*   Create visualizations for additional financial metrics like moving averages, volatility, or trading volume.
*   Consider plotting these metrics on separate subplots or in a multi-panel chart for better comparison.



**Moving Averages**

Moving averages can help smooth out fluctuations in stock prices and reveal trends over time. Let's start by calculating and visualizing simple moving averages (SMA) for the closing prices.

In [None]:
# Calculate 20-day and 50-day simple moving averages
stock_data['SMA_20'] = stock_data['Close'].rolling(window=20).mean()
stock_data['SMA_50'] = stock_data['Close'].rolling(window=50).mean()

# Plotting
plt.figure(figsize=(14, 8))
plt.plot(stock_data['Close'], label='Closing Price', linewidth=2)
plt.plot(stock_data['SMA_20'], label='20-Day SMA', linestyle='--', linewidth=2)
plt.plot(stock_data['SMA_50'], label='50-Day SMA', linestyle='--', linewidth=2)

plt.title('AAPL Stock Price with Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.legend()
plt.show()


**Trading Volume**

Visualizing trading volume can provide insights into the intensity of market activity. Let's plot the trading volume on a separate subplot.

In [None]:
# Plotting trading volume
plt.figure(figsize=(14, 8))

plt.subplot(2, 1, 1)
plt.plot(stock_data['Close'], label='Closing Price', linewidth=2)
plt.plot(stock_data['SMA_20'], label='20-Day SMA', linestyle='--', linewidth=2)
plt.plot(stock_data['SMA_50'], label='50-Day SMA', linestyle='--', linewidth=2)
plt.title('AAPL Stock Price with Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.legend()

plt.subplot(2, 1, 2)
plt.bar(stock_data.index, stock_data['Volume'], color='yellow', alpha=0.7)
plt.title('AAPL Trading Volume')
plt.xlabel('Date')
plt.ylabel('Volume (Millions)')
plt.show()


# 4.Statistical Analysis



1.   Incorporate statistical analysis or time-series forecasting to provide insights into future stock prices.
2.   Calculate and visualize metrics such as rolling averages, standard deviations, or correlations.



**Correlation Matrix**

Explore the correlation between different financial metrics. This can provide insights into how variables move in relation to each other.

In [None]:
# Calculate correlation matrix
correlation_matrix = stock_data[['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']].corr()

# Plotting heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=.5)
plt.title('Correlation Matrix of Financial Metrics')
plt.show()


**Pie Chart to represent metrics**

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Calculate the percentage contribution of each metric to the closing price
contributions = stock_data[['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']].mean() / stock_data['Close'].mean()

# Set a color palette for a more professional look
colors = sns.color_palette('pastel')

# Plotting a more polished pie chart
plt.figure(figsize=(10, 8))
plt.pie(contributions, labels=contributions.index, autopct='%1.1f%%', startangle=140, colors=colors, wedgeprops=dict(width=0.3))
plt.title('Percentage Contribution of Metrics to Closing Price', fontsize=16, fontweight='bold')
plt.show()


# 5. Time-Series Forecasting



*  Let's Explore time-series forecasting techniques to predict future stock prices.
*   Consider using models like ARIMA, SARIMA, or machine learning approaches for more advanced forecasting.

* Let's proceed with Time-Series Forecasting. One commonly used method for time-series forecasting is the ARIMA (AutoRegressive Integrated Moving Average) model. This model combines autoregression, differencing, and moving averages to make predictions.*



In [None]:
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
from math import sqrt

# Extracting the closing prices for modeling
closing_prices = stock_data['Close']

# Splitting the data into training and testing sets
train_size = int(len(closing_prices) * 0.8)
train, test = closing_prices[:train_size], closing_prices[train_size:]

# Fit ARIMA model
order = (5, 1, 1)  # Example order, you may need to fine-tune this
model = ARIMA(train, order=order)
model_fit = model.fit()

# Make predictions
predictions = model_fit.forecast(steps=len(test))

# Calculate RMSE (Root Mean Squared Error)
rmse = sqrt(mean_squared_error(test, predictions))
print(f'Root Mean Squared Error: {rmse}')

# Plotting
plt.figure(figsize=(14, 6))
plt.plot(train, label='Training Data')
plt.plot(test, label='Actual Prices')
plt.plot(test.index, predictions, label='Predicted Prices', linestyle='--')
plt.title('ARIMA Forecasting of Stock Prices')
plt.xlabel('Date')
plt.ylabel('Closing Price (USD)')
plt.legend()
plt.show()


**Note: In this example**



*   We split the data into training and testing sets.
*   We fit an ARIMA model on the training data.


*   We make predictions for the testing set.
*   We calculate and print the Root Mean Squared Error (RMSE) as a measure of model accuracy.
We visualize the training data, actual prices in the testing set, and predicted prices.





# 6.Fine-Tune ARIMA Model:
Experiment with different orders and parameters to fine-tune the ARIMA model for better forecasting accuracy.
we will Consider using grid search techniques to find the best combination of parameters.

In [None]:
import itertools
import warnings
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
from math import sqrt

# Suppress ARIMA warnings
warnings.filterwarnings("ignore")

# Extracting the closing prices for modeling
closing_prices = stock_data['Close']

# Splitting the data into training and testing sets
train_size = int(len(closing_prices) * 0.8)
train, test = closing_prices[:train_size], closing_prices[train_size:]

# Define the hyperparameters to search
p_values = range(0, 3)  # Example range, adjust as needed
d_values = range(0, 3)
q_values = range(0, 3)

# Generate all possible combinations of p, d, and q
orders = list(itertools.product(p_values, d_values, q_values))

best_rmse = float('inf')
best_order = None

# Grid search loop
for order in orders:
    try:
        model = ARIMA(train, order=order)
        model_fit = model.fit()

        predictions = model_fit.forecast(steps=len(test))
        rmse = sqrt(mean_squared_error(test, predictions))

        # Update the best model if the current one performs better
        if rmse < best_rmse:
            best_rmse = rmse
            best_order = order

        print(f'Order: {order}, RMSE: {rmse}')

    except Exception as e:
        continue

print(f'Best Order: {best_order}, Best RMSE: {best_rmse}')


*In this example, we iterate through different combinations of p, d, and q values, fit the ARIMA model for each combination, and evaluate its performance using the Root Mean Squared Error (RMSE). The combination with the lowest RMSE is considered the best.*

**Tips: Now that we have identified the best order, we can proceed to use this order to build the final ARIMA model for forecasting**

In [None]:
# Fit the ARIMA model with the best order
best_order = (0, 2, 2)  # Replace with the best order from your grid search
final_model = ARIMA(closing_prices, order=best_order)
final_model_fit = final_model.fit()

# Forecast future prices
future_steps = 30  # Adjust as needed
forecast = final_model_fit.forecast(steps=future_steps)

# Plotting
plt.figure(figsize=(14, 6))
plt.plot(closing_prices, label='Historical Prices')
plt.plot(forecast.index, forecast, label='Forecasted Prices', linestyle='--')
plt.title('ARIMA Forecasting of Stock Prices with Best Order')
plt.xlabel('Date')
plt.ylabel('Closing Price (USD)')
plt.legend()
plt.show()


# 6.1 Evaluation and Validation:

Evaluate the accuracy of the forecasts using additional metrics like Mean Absolute Error (MAE) or Mean Absolute Percentage Error (MAPE).
Compare the forecasted values with the actual values to ensure the model is capturing the underlying patterns.

*Let's calculate both Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) to assess the accuracy of your ARIMA model*

In [None]:
from sklearn.metrics import mean_absolute_error
import numpy as np

# Ensure the lengths of actual_values and forecast match
actual_values = closing_prices[train_size:train_size + len(forecast)]

# Evaluate forecast accuracy
mae = mean_absolute_error(actual_values, forecast)

# Calculate MAPE with handling for zero values and nan values
mape = np.mean(np.abs((actual_values - forecast) / np.maximum(np.abs(actual_values), 1)))
mape = np.where(np.isnan(mape), 0, mape) * 100

print(f'Mean Absolute Error (MAE): {mae:.2f}')
print(f'Mean Absolute Percentage Error (MAPE): {mape:.2f}%')


#6.2 Visualization Enhancement:

Enhance the visualizations to include the historical prices, forecasted prices, and any additional information that adds value to the representation.

In [None]:
import matplotlib.pyplot as plt

# Plotting
plt.figure(figsize=(14, 6))

# Plot historical prices
plt.plot(closing_prices, label='Historical Prices', color='blue')

# Plot forecasted prices
plt.plot(forecast.index, forecast, label='Forecasted Prices', linestyle='--', color='orange')

# Highlight the testing set period
plt.axvspan(actual_values.index[0], actual_values.index[-1], alpha=0.3, color='gray', label='Testing Set')

# Title and labels
plt.title('ARIMA Forecasting of Stock Prices with Best Order')
plt.xlabel('Date')
plt.ylabel('Closing Price (USD)')

# Legend
plt.legend()

# Show the plot
plt.show()


**This code uses matplotlib to create a plot that includes historical prices, forecasted prices, and highlights the testing set period. The axvspan function is used to shade the area corresponding to the testing set.**

# 6.3 Dashboard Integration (Optional):

Let's integrate the ARIMA forecasts into a dashboard to provide an interactive and dynamic view of the predictions.

In [None]:
!pip install plotly dash


In [None]:
pip install dash dash-bootstrap-components plotly


In [None]:
# Import necessary libraries
import dash
from dash import dcc, html
from dash.dependencies import Input, Output
import pandas as pd
import numpy as np
import plotly.express as px

# Sample data (replace this with your actual data)
np.random.seed(42)
closing_prices = pd.Series(np.random.rand(365), index=pd.date_range(start='2022-01-01', periods=365))

# Initialize the application
app = dash.Dash(__name__)

# Update the layout of the dashboard
app.layout = html.Div([
    html.H1("Financial Analysis Dashboard"),

    # Dropdown for selecting stocks
    html.Label("Select Stock:"),
    dcc.Dropdown(
        id='stock-dropdown',
        options=[
            {'label': 'Your Stock 1', 'value': 'STOCK1'},
            {'label': 'Your Stock 2', 'value': 'STOCK2'},
            # Add more options for other stocks as needed
        ],
        value='STOCK1',  # Default selected stock
        multi=False
    ),

    # Slider for selecting time range
    html.Label("Select Time Range:"),
    dcc.RangeSlider(
        id='time-slider',
        min=0,
        max=len(closing_prices)-1,
        marks={i: str(closing_prices.index[i].date()) for i in range(0, len(closing_prices), len(closing_prices)//5)},
        step=1,
        value=[0, len(closing_prices)-1]  # Default: entire time range
    ),

    # Line chart for displaying closing prices
    dcc.Graph(
        id='stock-closing-prices',
        figure={
            'data': [
                {'x': closing_prices.index, 'y': closing_prices, 'type': 'line', 'name': 'Closing Price'},
            ],
            'layout': {
                'title': 'Selected Stock Closing Prices Over Time',
                'xaxis': {'title': 'Date'},
                'yaxis': {'title': 'Closing Price (USD)'},
            }
        }
    )
])

# Callback to update the chart based on dropdown and slider values
@app.callback(
    Output('stock-closing-prices', 'figure'),
    [Input('stock-dropdown', 'value'),
     Input('time-slider', 'value')]
)
def update_chart(selected_stock, selected_time_range):
    # Your logic for updating the chart based on selected_stock and selected_time_range
    # Make sure to replace this placeholder logic with your actual data and filtering logic

    # For demonstration purposes, let's assume closing_prices is a Series containing the necessary data
    # Replace this with your actual data filtering logic
    filtered_data = closing_prices[selected_time_range[0]:selected_time_range[1]]

    # Update the figure
    updated_figure = {
        'data': [
            {'x': filtered_data.index, 'y': filtered_data, 'type': 'line', 'name': 'Closing Price'},
        ],
        'layout': {
            'title': f'{selected_stock} Closing Prices Over Time',
            'xaxis': {'title': 'Date'},
            'yaxis': {'title': 'Closing Price (USD)'},
        }
    }

    return updated_figure

# Run the application
if __name__ == '__main__':
    app.run_server(debug=True)


#  Refinement of the Model:

*Analyze the diagnostic plots and model summary to identify any issues or areas for improvement. If necessary, refine the ARIMA model by adjusting hyperparameters, trying different orders, or incorporating additional features. Consider the following:*

In [None]:
# Assuming you have already fitted the ARIMA model (replace 'model' with your actual model object)
model_fit = model.fit()

# Display model summary
print(model_fit.summary())

# Analyze diagnostic plots
model_fit.plot_diagnostics()


# Feature Engineering

*Feature engineering can sometimes improve the predictive power of time series models. Some ideas for feature engineering in time series forecasting include:*



1.   Lagged Variables: Include lagged values of the target variable (e.g., closing prices) as additional features. This can capture trends and patterns in past observations
2.   External Factors: If available, consider incorporating external factors that may influence the target variable. For financial analysis, this could include economic indicators, news sentiment, or other relevant data.



In [None]:
# Assuming you already have a DataFrame named 'stock_data' and 'Close' is the target variable
lags = 3  # Choose the number of lagged values to include

# Create lagged columns
for i in range(1, lags + 1):
    stock_data[f'Close_Lag_{i}'] = stock_data['Close'].shift(i)

# Drop rows with NaN values resulting from the shift
stock_data.dropna(inplace=True)

# Display the updated DataFrame
print(stock_data.head())


**Now that we've successfully created lagged variables and incorporated them into the DataFrame. Now, with these additional features, we can proceed to train the ARIMA model using the new dataset.**

In [None]:
from statsmodels.tsa.arima.model import ARIMA

# Assuming 'Close' is still your target variable
target_variable = 'Close'
lagged_features = [f'Close_Lag_{i}' for i in range(1, lags + 1)]

# Prepare data for the model
X = stock_data[lagged_features]
y = stock_data[target_variable]

# Fit the ARIMA model with lagged features
model = ARIMA(y, order=(2, 2, 2))  # Adjust order as needed
model_fit = model.fit()

# Display the model summary
print(model_fit.summary())


# Model Residuals Analysis



*   After training, analyze the residuals of the ARIMA model.
*   Plot the residuals and check for any patterns or systematic errors.
*   Use statistical tests to ensure that the residuals are white noise.




In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.stattools import acf, pacf

# Retrieve the residuals
residuals = model_fit.resid

# Plot the residuals over time
plt.figure(figsize=(12, 6))
plt.plot(residuals, color='blue')
plt.title('Residuals Over Time')
plt.xlabel('Time')
plt.ylabel('Residuals')
plt.show()

# Plot histogram and Q-Q plot for normality check
plt.figure(figsize=(14, 6))
plt.subplot(1, 2, 1)
sns.histplot(residuals, bins=30, kde=True, color='blue')
plt.title('Histogram of Residuals')

plt.subplot(1, 2, 2)
from statsmodels.graphics.gofplots import qqplot
qqplot(residuals, line='s', color='blue', alpha=0.7)
plt.title('Q-Q Plot of Residuals')
plt.show()


**let's proceed with the second task in Model Residuals Analysis, which involves calculating and plotting the autocorrelation and partial autocorrelation functions of the residuals.**

In [None]:
# Calculate autocorrelation and partial autocorrelation functions
acf_resid = acf(residuals, fft=False)
pacf_resid = pacf(residuals)

# Plot autocorrelation function
plt.figure(figsize=(12, 6))
plt.subplot(2, 1, 1)
plt.stem(acf_resid, markerfmt=' ', basefmt="-b", use_line_collection=True)
plt.title('Autocorrelation Function of Residuals')
plt.xlabel('Lag')
plt.ylabel('Autocorrelation')

# Plot partial autocorrelation function
plt.subplot(2, 1, 2)
plt.stem(pacf_resid, markerfmt=' ', basefmt="-b", use_line_collection=True)
plt.title('Partial Autocorrelation Function of Residuals')
plt.xlabel('Lag')
plt.ylabel('Partial Autocorrelation')

plt.tight_layout()
plt.show()


**Now let's predict future stock prices using the trained ARIMA model on the new dataset, you can follow these steps:**



*   Train the ARIMA model on the entire dataset (including the new data).
*   Forecast future values based on the trained model.



In [None]:
from statsmodels.tsa.arima.model import ARIMA

# Assuming 'new_data' is your new dataset with the same structure as the training data
# Train the ARIMA model on the entire dataset
model = ARIMA(closing_prices, order=(2, 2, 2))  # Use the best parameters obtained during training
fit_model = model.fit()

# Forecast future values
forecast_periods = 10  # Change this value based on how many future periods you want to predict
forecast = fit_model.get_forecast(steps=forecast_periods)

# Extract forecasted values and confidence intervals
forecast_values = forecast.predicted_mean
confidence_intervals = forecast.conf_int()

# Display the forecasted values and confidence intervals
print("Forecasted Values:")
print(forecast_values)
print("\nConfidence Intervals:")
print(confidence_intervals)
