# Task 3: Correlation between News Sentiment and Stock Movement

This notebook demonstrates how to align news and stock data by date, perform sentiment analysis on news headlines, calculate daily stock returns, and analyze the correlation between news sentiment and stock price movements.

**Steps:**
1. Install required libraries
2. Import libraries
3. Load and align news and stock data
4. Perform sentiment analysis on headlines
5. Calculate daily stock returns
6. Aggregate daily sentiment
7. Correlation analysis

In [None]:
# 1. Install required libraries
!pip install textblob --quiet
!pip install pandas_ta --quiet

In [None]:
# 2. Import libraries
import pandas as pd
import numpy as np
from textblob import TextBlob
import matplotlib.pyplot as plt
import pandas_ta as ta

In [None]:
# 3. Load and align news and stock data (AAPL example)
# Load news data
news = pd.read_csv('../data/raw_analyst_ratings.csv', usecols=['headline','date','stock'])
news = news[news['stock'] == 'AAPL']
news['date'] = pd.to_datetime(news['date']).dt.date

# Load stock data
stock = pd.read_csv('../data/yfinance_data/AAPL_historical_data.csv')
stock = stock.rename(columns=lambda x: x.strip().capitalize())
stock['Date'] = pd.to_datetime(stock['Date']).dt.date
stock = stock[['Date', 'Close']]
stock = stock.sort_values('Date')
stock.head()

In [None]:
# 4. Perform sentiment analysis on headlines
def get_sentiment(text):
    blob = TextBlob(str(text))
    return blob.sentiment.polarity

news['sentiment'] = news['headline'].apply(get_sentiment)
news.head()

In [None]:
# 5. Calculate daily stock returns
stock['return'] = stock['Close'].pct_change()
stock.head()

# 6. Aggregate daily sentiment
daily_sentiment = news.groupby('date')['sentiment'].mean().reset_index()
daily_sentiment.columns = ['Date', 'avg_sentiment']
# Merge with stock returns
merged = pd.merge(stock, daily_sentiment, on='Date', how='left')
merged = merged.dropna(subset=['avg_sentiment', 'return'])
merged.head()

In [None]:
# 7. Correlation analysis
from scipy.stats import pearsonr
corr, pval = pearsonr(merged['avg_sentiment'], merged['return'])
print(f'Pearson correlation between daily average sentiment and stock return: {corr:.3f} (p-value: {pval:.3g})')

plt.figure(figsize=(8,5))
plt.scatter(merged['avg_sentiment'], merged['return'], alpha=0.6)
plt.title('Daily Avg Sentiment vs. Stock Return (AAPL)')
plt.xlabel('Average Daily Sentiment')
plt.ylabel('Daily Stock Return')
plt.grid(True)
plt.show()

### References for Self-Learning
- [TextBlob Documentation](https://textblob.readthedocs.io/en/dev/)
- [pandas_ta Documentation](https://github.com/twopirllc/pandas-ta)
- [Investopedia: Correlation](https://www.investopedia.com/terms/c/correlation.asp)