<a href="https://colab.research.google.com/github/vanle2000/Stock-Price-prediction-with-deep-learning/blob/main/DIC_Phase_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
''' File creation Date: 14.09.2023 '''

' File creation Date: 14.09.2023 '

#1. Introduction:
- problem statement
- potential contribution: why this contribution is crucial?

This project we aim to build a recommendation system for stocks buying in the 3rd quarter that used historical stocks prices, modalities, and news (if possible) from 1st and 2nd quarter of 2023 in order to recommend relevant stocks for user interests.



# 2. Dataset:

## 2.1. Import library:

In [None]:
!pip install pyspark[pandas_on_spark] plotly

Collecting pyspark[pandas_on_spark]
  Downloading pyspark-3.4.1.tar.gz (310.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.8/310.8 MB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pyspark
  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
  Created wheel for pyspark: filename=pyspark-3.4.1-py2.py3-none-any.whl size=311285387 sha256=f20179cff07580e5b20fcec2d0c8b095b79b24fc0b6fb3b87c28c760d8536066
  Stored in directory: /root/.cache/pip/wheels/0d/77/a3/ff2f74cc9ab41f8f594dabf0579c2a7c6de920d584206e0834
Successfully built pyspark
Installing collected packages: pyspark
Successfully installed pyspark-3.4.1


In [1]:
import os, sys
import requests
import glob
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from pyspark.sql import SparkSession
from datetime import datetime, date
from alpha_vantage.timeseries import TimeSeries
# from alpha_vantage.techindicators import TechIndicators
# from alpha_vantage.sectorperformance import SectorPerformances
# from alpha_vantage.cryptocurrencies import CryptoCurrencies
# from alpha_vantage.foreignexchange import ForeignExchange

ModuleNotFoundError: ignored

## 2.2. Data Acquisitions:

* Historical Stock Price Data: This dataset contains historical stock price information, including open, close, high, low prices, and trading volumes. We retrieve real time, historical and technical indicators financial data providers including Yahoo Finance, Alpha Vantage API.

* Fundamental Data: Fundamental data includes financial metrics such as earnings per share (EPS), price-to-earnings ratio (P/E), market capitalization, debt ratios, and more. You can often find this data in financial reports, financial news sources, or specialized financial data providers.

* Macroeconomic Indicators: Economic indicators like GDP growth, inflation rates, interest rates, and unemployment rates can impact the overall market and individual stocks. You can obtain this data from government sources or economic data providers.

* Sector and Industry Information: Stocks within the same sector or industry often move together. Data on sectors and industries can be obtained from financial news sources or sector-specific databases.

* Market Index Data: Data on major market indices like the S&P 500, NASDAQ, or Dow Jones can be useful for benchmarking and analyzing stock performance relative to broader market trends.

* Earnings Call Transcripts: Analyzing transcripts of earnings calls can provide insights into a company's management outlook and future plans.

* News and Social Media Data (if time permits): News articles, social media sentiment, and market news can influence stock prices. Accessing APIs or scraping news articles and social media platforms can provide valuable sentiment data.

* Machine Learning Features: You may generate additional features through feature engineering or sentiment analysis on news and social media data to feed into your recommendation algorithm.

###2.2.1. Core Stocks and Financial Indicators Retrieval:

In [None]:
# Request the data from API key
url = 'https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=META&interval=60min&apikey=MNYH22VAN1CPKDLG'
r = requests.get(url)
data = r.json()

print(data)

NameError: ignored

In [None]:
# Define the API key
api_key = "MNYH22VAN1CPKDLG"

# Read ticker symbols from a file to python list object
symbols = []
with open('/content/nasdaq_ticker_symbols.csv') as f:
    for line in f:
        symbols.append(line.strip())
f.close


def get_alpha_vantage_data(api_key, symbols, function):
    # Define the API endpoint
    endpoint = "https://www.alphavantage.co/query"

    # Create a list to hold the data frames for each symbol
    dfs = []

    for symbol in symbols:
        # Construct the API request URL
        url = f"{endpoint}?function={function}&symbol={symbol}&apikey={api_key}"

        # Send the API request
        r = requests.get(url)

        # Parse the JSON response
        data = r.json()

        # Check if the API request was successful
        if "Error Message" in data:
            print(f"Error retrieving data for {symbol}: {data['Error Message']}")
            continue

        # Extract the data based on the function
        if function.startswith("TIME_SERIES"):
            # Time series data for stocks
            time_series_key = f"Time Series ({function.split('_')[-1].capitalize()})"
            time_series = data[time_series_key]

            # Create a list of dictionaries to hold the data points
            rows = []

            # Iterate over the time series data and extract specific data points
            for date, values in time_series.items():
                open_price = float(values["1. open"])
                high_price = float(values["2. high"])
                low_price = float(values["3. low"])
                close_price = float(values["4. close"])

                # Create a dictionary for each row of data
                row = {
                    "Symbol": symbol,
                    "Date": date,
                    "Open": open_price,
                    "High": high_price,
                    "Low": low_price,
                    "Close": close_price
                }

                # Append the row dictionary to the list
                rows.append(row)

        elif function == "GLOBAL_QUOTE":
            # Market indicators
            global_quote = data["Global Quote"]

            # Create a list of dictionaries to hold the data points
            rows = [{
                "Symbol": symbol,
                "Open": float(global_quote["02. open"]),
                "High": float(global_quote["03. high"]),
                "Low": float(global_quote["04. low"]),
                "Close": float(global_quote["05. price"])
            }]

        elif function == "ECONOMIC_DATA":
            # Economic indicators
            rows = [{
                "Symbol": symbol,
                "Function": function
            }]
            for key, value in data.items():
                if key != "Symbol":
                    rows[0][key] = value

        elif function == "OVERVIEW":
            # Fundamental data
            rows = [{
                "Symbol": symbol,
                "Function": function
            }]
            for key, value in data.items():
                if key != "Symbol":
                    rows[0][key] = value

        else:
            print(f"Unsupported function: {function}")
            continue

        # Create a SparkSession
        spark = SparkSession.builder.getOrCreate()

        # Create a DataFrame from the list of dictionaries
        df = spark.createDataFrame(rows)

        # Add the DataFrame to the list
        dfs.append(df)

    # Union all the DataFrames into a single DataFrame
    combined_df = dfs[0]
    for df in dfs[1:]:
        combined_df = combined_df.union(df)

    # Return the combined DataFrame
    return combined_df

### 2.2.2. Market News and Sentiment:

#3. Exporatory Data Analysis: