# KAIM Week 1 Challenges Task 3

## Business Objective

**Nova Financial Solutions** aims to enhance its predictive analytics capabilities to significantly boost its financial forecasting accuracy and operational efficiency through advanced data analysis. As a Data Analyst at Nova Financial Solutions,  your primary task is to conduct a rigorous analysis of the financial news dataset. The focus of your analysis should be two-fold:

*     **Sentiment Analysis:** Perform sentiment analysis on the ‘headline’ text to quantify the tone and sentiment expressed in financial news. This will involve using natural language processing (NLP) techniques to derive sentiment scores, which can be associated with the respective 'Stock Symbol' to understand the emotional context surrounding stock-related news.
*     **Correlation Analysis:** Establish statistical correlations between the sentiment derived from news articles and the corresponding stock price movements. This involves tracking stock price changes around the date the article was published and analyzing the impact of news sentiment on stock performance. This analysis should consider the publication date and potentially the time the article was published if such data can be inferred or is available.

Your recommendations should leverage insights from this sentiment analysis to suggest investment strategies. These strategies should utilize the relationship between news sentiment and stock price fluctuations to predict future movements. The final report should provide clear, actionable insights based on your analysis, offering innovative strategies to use news sentiment as a predictive tool for stock market trends.


## Dataset Overview

### Financial News and Stock Price Integration Dataset

**FNSPID (Financial News and Stock Price Integration Dataset)**, is a comprehensive financial dataset designed to enhance stock market predictions by combining quantitative and qualitative data.

- The structure of the [data](https://drive.google.com/file/d/1tLHusoOQOm1cU_7DtLNbykgFgJ_piIpd/view?usp=drive_link) is as follows
    - `headline`: Article release headline, the title of the news article, which often includes key financial actions like stocks hitting highs, price target changes, or company earnings.
    - `url`: The direct link to the full news article.
    - `publisher`: Author/creator of article.
    - `date`: The publication date and time, including timezone information(UTC-4 timezone).
    - `stock`: Stock ticker symbol (unique series of letters assigned to a publicly traded company). For example (AAPL: Apple)


### Correlation between news and stock movement

**Tasks:**
- Date Alignment: Ensure that both datasets (news and stock prices) are aligned by dates. This might involve normalizing timestamps.
- Sentiment Analysis: Conduct sentiment analysis on news headlines to quantify the tone of each article (positive, negative, neutral).Tools: Use Python libraries like nltk, TextBlob for sentiment analysis.
- Analysis:
    - Calculate Daily Stock Returns: Compute the percentage change in daily closing prices to represent stock movements.
    - Correlation Analysis: Use statistical methods to test the correlation between daily news sentiment scores and stock returns.

**KPIs**
- Proactivity to self-learn - sharing references.
- Sentiment Analysis
- Correlation Strength


### Minimum Essential To Do:
- Merge the necessary branches from task-2 into the main branch using a Pull Request (PR)
- Create at least one new branch called "task-3" for the ongoing development of the dashboard.
- Commit your work with a descriptive commit message.
- Data preparation
    - Normalize Dates: Align dates in both news and stock datasets to ensure each news item matches the corresponding stock trading day.
    - Perform Sentiment Analysis: Use a simple and effective sentiment analysis tool to assign sentiment scores to headlines.
- Calculate Stock Movements
    - Compute Daily Returns: Calculate daily percentage changes in stock prices to represent movements.
- Correlation Analysis
    - Aggregate Sentiments: Compute average daily sentiment scores if multiple articles appear on the same day.
    - Calculate Correlation: Determine the Pearson correlation coefficient between average daily sentiment scores and stock daily returns.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
data = pd.read_csv('/kaggle/input/kaim-w1/yfinance_data/yfinance_data/AAPL_historical_data.csv')
data.head()