# KAIM Week 1 Challenge Task 2

## Business Objective

**Nova Financial Solutions** aims to enhance its predictive analytics capabilities to significantly boost its financial forecasting accuracy and operational efficiency through advanced data analysis. As a Data Analyst at Nova Financial Solutions,  your primary task is to conduct a rigorous analysis of the financial news dataset. The focus of your analysis should be two-fold:

*     **Sentiment Analysis:** Perform sentiment analysis on the ‘headline’ text to quantify the tone and sentiment expressed in financial news. This will involve using natural language processing (NLP) techniques to derive sentiment scores, which can be associated with the respective 'Stock Symbol' to understand the emotional context surrounding stock-related news.
*     **Correlation Analysis:** Establish statistical correlations between the sentiment derived from news articles and the corresponding stock price movements. This involves tracking stock price changes around the date the article was published and analyzing the impact of news sentiment on stock performance. This analysis should consider the publication date and potentially the time the article was published if such data can be inferred or is available.

Your recommendations should leverage insights from this sentiment analysis to suggest investment strategies. These strategies should utilize the relationship between news sentiment and stock price fluctuations to predict future movements. The final report should provide clear, actionable insights based on your analysis, offering innovative strategies to use news sentiment as a predictive tool for stock market trends.


## Dataset Overview

### Financial News and Stock Price Integration Dataset

**FNSPID (Financial News and Stock Price Integration Dataset)**, is a comprehensive financial dataset designed to enhance stock market predictions by combining quantitative and qualitative data.

- The structure of the [data](https://drive.google.com/file/d/1tLHusoOQOm1cU_7DtLNbykgFgJ_piIpd/view?usp=drive_link) is as follows
    - `headline`: Article release headline, the title of the news article, which often includes key financial actions like stocks hitting highs, price target changes, or company earnings.
    - `url`: The direct link to the full news article.
    - `publisher`: Author/creator of article.
    - `date`: The publication date and time, including timezone information(UTC-4 timezone).
    - `stock`: Stock ticker symbol (unique series of letters assigned to a publicly traded company). For example (AAPL: Apple)


### Deliverables and Tasks to be Done

#### Quantitative analysis using pynance and TaLib

**Tasks:**
- Use additional finance data
- Load and prepare the data.
- Load your stock price data into a pandas DataFrame. Ensure your data includes columns like Open, High, Low, Close, and Volume.	
- Apply Analysis Indicators with TA-Lib
    - You can use TA-Lib to calculate various technical indicators such as moving averages, RSI (Relative Strength Index), and MACD (Moving Average Convergence Divergence)
- Use PyNance for Financial Metrics
- Visualize the Data
    - Create visualizations to better understand the data and the impact of different indicators on the stock price.

**KPIs**
- Proactivity to self-learn - sharing references.
- Accuracy of indicators
- Completeness of Data Analysis

### Minimum Essential To Do:

- Merge the necessary branches from task-1 into the main branch using a Pull Request (PR)
- Create at least one new branch called "task-2" for the ongoing development of the dashboard.
- Commit your work with a descriptive commit message.
- Prepare Your Data
- Calculate Basic Technical Indicators
- Visualize Data


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
data = pd.read_csv('/kaggle/input/kaim-w1/yfinance_data/yfinance_data/AAPL_historical_data.csv')
data.head()

To complete Task 2, you'll need to conduct a quantitative analysis using financial data with the help of `TA-Lib` for technical indicators and `PyNance` for financial metrics. Here’s a structured approach to accomplishing the tasks:

### Steps to Complete the Task

#### 1. **Setup GitHub Repository**

1. **Create a New Branch:**
   - Switch to the main branch:
     ```bash
     git checkout main
     ```
   - Pull any changes from the remote repository to ensure your local branch is up to date:
     ```bash
     git pull origin main
     ```
   - Create and switch to the new branch for Task 2:
     ```bash
     git checkout -b task-2
     ```

2. **Commit Your Work:**
   - Ensure you commit changes with descriptive messages regularly.

#### 2. **Prepare Your Data**

**Load and Prepare Stock Price Data:**

Assuming you have a CSV file with columns like Open, High, Low, Close, and Volume, you can use pandas to load and prepare the data.

```python
import pandas as pd

# Load stock price data
file_path = 'path_to_your_stock_data.csv'  # Replace with your dataset path
df = pd.read_csv(file_path)

# Convert date column to datetime if necessary
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Check the first few rows of the dataframe
print(df.head())
```

#### 3. **Calculate Basic Technical Indicators**

**Install TA-Lib:**

Install `TA-Lib` if you haven’t already:
```bash
pip install TA-Lib
```

**Calculate Indicators:**

```python
import talib

# Calculate Moving Averages
df['SMA_50'] = talib.SMA(df['Close'], timeperiod=50)
df['SMA_200'] = talib.SMA(df['Close'], timeperiod=200)

# Calculate RSI (Relative Strength Index)
df['RSI'] = talib.RSI(df['Close'], timeperiod=14)

# Calculate MACD (Moving Average Convergence Divergence)
df['MACD'], df['MACD_signal'], df['MACD_hist'] = talib.MACD(df['Close'], fastperiod=12, slowperiod=26, signalperiod=9)

# Check the first few rows with new columns
print(df.head())
```

#### 4. **Use PyNance for Financial Metrics**

**Install PyNance:**

Install `PyNance` if you haven’t already:
```bash
pip install pynance
```

**Calculate Financial Metrics:**

```python
from pynance import financials

# Example: Calculate metrics such as Price-to-Earnings ratio (P/E)
# Assuming you have financial metrics data
# df_metrics = financials.get_financials(ticker='AAPL')
# p_e_ratio = df_metrics['PE Ratio']

# This is an example; actual usage depends on the available methods and your dataset
```

#### 5. **Visualize Data**

**Create Visualizations:**

Using `matplotlib` and `seaborn` for visualizations:

```python
import matplotlib.pyplot as plt
import seaborn as sns

# Plot Closing Price and Moving Averages
plt.figure(figsize=(14, 7))
plt.plot(df['Close'], label='Close Price', color='blue')
plt.plot(df['SMA_50'], label='50-Day SMA', color='orange')
plt.plot(df['SMA_200'], label='200-Day SMA', color='green')
plt.title('Stock Price and Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()

# Plot RSI
plt.figure(figsize=(14, 7))
plt.plot(df['RSI'], label='RSI', color='purple')
plt.axhline(70, color='red', linestyle='--')
plt.axhline(30, color='green', linestyle='--')
plt.title('Relative Strength Index (RSI)')
plt.xlabel('Date')
plt.ylabel('RSI')
plt.legend()
plt.show()

# Plot MACD
plt.figure(figsize=(14, 7))
plt.plot(df['MACD'], label='MACD', color='blue')
plt.plot(df['MACD_signal'], label='MACD Signal', color='orange')
plt.bar(df.index, df['MACD_hist'], label='MACD Histogram', color='grey', alpha=0.5)
plt.title('MACD and MACD Signal')
plt.xlabel('Date')
plt.ylabel('MACD')
plt.legend()
plt.show()
```

### Example Commit Messages

1. “Set up repository for Task 2 and created branch.”
2. “Loaded stock price data and prepared for analysis.”
3. “Calculated technical indicators: SMA, RSI, and MACD.”
4. “Implemented visualizations for stock prices, moving averages, RSI, and MACD.”

### References and Resources

- [TA-Lib Documentation](https://mrjbq7.github.io/ta-lib/)
- [PyNance Documentation](https://pynance.readthedocs.io/en/latest/)
- [Matplotlib Documentation](https://matplotlib.org/stable/contents.html)
- [Seaborn Documentation](https://seaborn.pydata.org/)

This approach will help you conduct a comprehensive quantitative analysis of stock price data using technical indicators and visualize the results to gain insights.

Sure! Let’s dive into a comprehensive solution that covers all the tasks for Task 2: Quantitative Analysis using `TA-Lib` and `PyNance`. This guide includes detailed steps and robust analysis to ensure you meet the requirements effectively.

### 1. **Setup GitHub Repository**

1. **Create a New Branch:**
   ```bash
   git checkout main
   git pull origin main
   git checkout -b task-2
   ```

2. **Commit Your Work:**
   ```bash
   git add .
   git commit -m "Set up repository for Task 2 and created branch."
   git push origin task-2
   ```

### 2. **Prepare Your Data**

**Load and Prepare Stock Price Data:**

Assuming you have a CSV file with columns like Open, High, Low, Close, and Volume, the following code loads and prepares this data.

```python
import pandas as pd

# Load the dataset
file_path = 'path_to_your_stock_data.csv'  # Replace with your dataset path
df = pd.read_csv(file_path)

# Convert the Date column to datetime
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Check for missing values and basic info
print(df.info())
print(df.describe())
print(df.head())
```

### 3. **Calculate Basic Technical Indicators**

**Install `TA-Lib`:**

```bash
pip install TA-Lib
```

**Calculate Technical Indicators:**

```python
import talib

# Calculate Moving Averages
df['SMA_50'] = talib.SMA(df['Close'], timeperiod=50)
df['SMA_200'] = talib.SMA(df['Close'], timeperiod=200)

# Calculate RSI (Relative Strength Index)
df['RSI'] = talib.RSI(df['Close'], timeperiod=14)

# Calculate MACD (Moving Average Convergence Divergence)
df['MACD'], df['MACD_signal'], df['MACD_hist'] = talib.MACD(df['Close'], fastperiod=12, slowperiod=26, signalperiod=9)

# Check the new columns
print(df[['Close', 'SMA_50', 'SMA_200', 'RSI', 'MACD', 'MACD_signal', 'MACD_hist']].head())
```

### 4. **Use PyNance for Financial Metrics**

**Install `PyNance`:**

```bash
pip install pynance
```

**Calculate Financial Metrics:**

```python
from pynance import financials

# Example: Fetching financial metrics for a stock
# This is a placeholder; actual usage depends on PyNance's API and the available methods
# ticker = 'AAPL'
# metrics = financials.get_financials(ticker=ticker)

# Example metric extraction
# p_e_ratio = metrics['PE Ratio']
# print(f"P/E Ratio for {ticker}: {p_e_ratio}")
```

**Note:** PyNance might not be directly used in this example due to the lack of specific APIs. Ensure to adapt based on the available functionality or dataset.

### 5. **Visualize Data**

**Create Visualizations:**

Using `matplotlib` and `seaborn` to visualize the stock price data and technical indicators.

```python
import matplotlib.pyplot as plt
import seaborn as sns

# Plot Closing Price and Moving Averages
plt.figure(figsize=(14, 7))
plt.plot(df['Close'], label='Close Price', color='blue')
plt.plot(df['SMA_50'], label='50-Day SMA', color='orange')
plt.plot(df['SMA_200'], label='200-Day SMA', color='green')
plt.title('Stock Price and Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()

# Plot RSI
plt.figure(figsize=(14, 7))
plt.plot(df['RSI'], label='RSI', color='purple')
plt.axhline(70, color='red', linestyle='--', label='Overbought')
plt.axhline(30, color='green', linestyle='--', label='Oversold')
plt.title('Relative Strength Index (RSI)')
plt.xlabel('Date')
plt.ylabel('RSI')
plt.legend()
plt.show()

# Plot MACD
plt.figure(figsize=(14, 7))
plt.plot(df['MACD'], label='MACD', color='blue')
plt.plot(df['MACD_signal'], label='MACD Signal', color='orange')
plt.bar(df.index, df['MACD_hist'], label='MACD Histogram', color='grey', alpha=0.5)
plt.title('MACD and MACD Signal')
plt.xlabel('Date')
plt.ylabel('MACD')
plt.legend()
plt.show()
```

### 6. **Detailed Analysis**

**Moving Averages (SMA):**
- **50-Day SMA:** Short-term trend indicator. Useful for identifying medium-term trends.
- **200-Day SMA:** Long-term trend indicator. Helps in understanding long-term trends.

**Relative Strength Index (RSI):**
- **Overbought (>70):** Stock might be due for a correction.
- **Oversold (<30):** Stock might be undervalued or due for a rebound.

**MACD:**
- **MACD Line vs. Signal Line:** The crossing points can indicate buy or sell signals.
- **MACD Histogram:** Shows the difference between MACD and the Signal Line, indicating momentum.

**Visualizations:**
- **Moving Averages Plot:** Helps in visualizing how the stock price relates to short-term and long-term trends.
- **RSI Plot:** Shows overbought or oversold conditions and helps in timing buy or sell decisions.
- **MACD Plot:** Provides insights into momentum and potential buy/sell signals.

### Example Commit Messages

1. “Set up the repository and created branch for Task 2.”
2. “Loaded and prepared stock price data.”
3. “Calculated technical indicators using TA-Lib: SMA, RSI, and MACD.”
4. “Visualized stock price data with technical indicators.”

### Final Notes

- Ensure that the data you use for financial metrics and technical analysis is accurate and up-to-date.
- Review and validate your visualizations and indicators for correctness.
- Adapt the code based on the specific features of your dataset and the methods available in `PyNance`.

Feel free to adjust the code and analysis according to your specific needs and dataset characteristics.