# Exploratory Data Analysis

This notebook is used for exploratory data analysis of the investment decisions dataset. We will visualize and analyze the data to gain insights before building the PyMC model.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
data = pd.read_csv('../data/investment_decisions.csv')

# Display the first few rows of the dataset
data.head()

In [None]:
# Summary statistics
data.describe()

In [None]:
# Visualizing investment outcomes
plt.figure(figsize=(10, 6))
sns.countplot(x='investment_outcome', data=data)
plt.title('Investment Outcomes Distribution')
plt.xlabel('Investment Outcome')
plt.ylabel('Count')
plt.show()

In [None]:
# Visualizing gains and losses
plt.figure(figsize=(10, 6))
sns.histplot(data['gains'], bins=30, kde=True, color='green', label='Gains')
sns.histplot(data['losses'], bins=30, kde=True, color='red', label='Losses')
plt.title('Distribution of Gains and Losses')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()
plt.show()

In [None]:
# Correlation heatmap
plt.figure(figsize=(12, 8))
correlation_matrix = data.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## Conclusion

This exploratory analysis provides insights into the investment decisions dataset, highlighting the distribution of outcomes, gains, and losses, as well as correlations between variables. These insights will inform the subsequent modeling steps using PyMC.