# **Assessment Problems**

## Data from yfinance

Using the yfinance Python package, write a function called get_data() that downloads all hourly data for the previous five days for the five FAANG stocks: Facebook (META) Apple (AAPL) Amazon (AMZN) Netflix (NFLX) Google (GOOG) The function should save the data into a folder called data in the root of your repository using a filename with the format YYYYMMDD-HHmmss.csv where YYYYMMDD is the four-digit year (e.g. 2025), followed by the two-digit month (e.g. 09 for September), followed by the two digit day, and HHmmss is hour, minutes, seconds.

Imports

In [1]:
import yfinance as yf # Yahoo Finance data.
import pandas as pd # Pandas library
import os
import datetime as dt
import matplotlib.pyplot as plt

Get data function

In [2]:
# Fetches data from yfinance and saves it to a CSV

# List tickers 
tickers = ["META", "AAPL", "AMZN", "NFLX", "GOOG"]

# Defining function
def get_data(): 

    # Dictionary to store stock data
    stocks_data = {}
        
    # Looping through tickers
    for ticker in tickers:  
        
        # Fetching data with custom interval, hourly data, previous five days
        df = yf.download(ticker, period="5d", interval="1h", auto_adjust = False)
       
        # print(df) checking how data if printed (debugging)

        # Converting datetime from an index to a column, for better visualization and analysis
        df.reset_index(inplace=True)

        #print(df) checking how data is printed after setting datetime to a column (debugging)

        # Storing each stock's DataFrame in a dictionary using the ticker name
        stocks_data[ticker] = df
                 
    # Fetching current time
    now = dt.datetime.now()

    # Storing csv naming format and current time to a variable
    filename = now.strftime("%Y%m%d-%H%M%S") + ".csv"

    # Creating data folder if it doesn't exist
    if not os.path.exists("data"):
        os.makedirs("data")  

    # Concatenate all stocks data side by side in one table
    # stocks_data.values() fetch all DF saved in dictionary
    # axis = 1: join columns horizontally left to right
    all_data = pd.concat(stocks_data.values(), axis=1)

    # Creating CSV file with required naming format and current time
    all_data.to_csv("data/" + filename, index=False) 
    
# Run the function
get_data()

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


##### References:
YFinance documentation: https://github.com/ranaroussi/yfinance  
Fetching multiple tickers: https://www.geeksforgeeks.org/python/getting-stock-symbols-with-yfinance-in-python/  
Downloading periodic data:  https://medium.com/@kasperjuunge/yfinance-10-ways-to-get-stock-data-with-python-6677f49e8282  
Custom intervals yfinance: https://www.geeksforgeeks.org/python/getting-stock-symbols-with-yfinance-in-python/  
Creating file with datetime format: https://www.geeksforgeeks.org/python/how-to-create-filename-containing-date-or-time-in-python/   
Adding "auto_adjust" to fix error "FutureWarning: YF.download() has changed argument auto_adjust default to True": https://github.com/ranaroussi/yfinance/issues/2308  
Error handling, creating folder if they don't exist: https://www.geeksforgeeks.org/python/how-to-create-directory-if-it-does-not-exist-using-python/  
Inspiration codes:  
First: https://huggingface.co/Adilbai/stock-trading-rl-agent/blob/27f177526bebcc8cc49daa4cd66566d360feae0d/dataprocessor.py  
Second:  https://medium.com/@shouke.wei/mastering-stock-data-analysis-with-yfinance-in-python-63e91a6c41c2  
The first code stored data diagonally (example: if first code block ended in column L row 50, the next block of code would be stored in M51) instead of horizontally, I used and adapted parts of both to write my final code. 

## Plotting Data

Write a function called plot_data() that opens the latest data file in the data folder and, on one plot, plots the Close prices for each of the five stocks. The plot should include axis labels, a legend, and the date as a title. The function should save the plot into a plots folder in the root of your repository using a filename in the format YYYYMMDD-HHmmss.png. Create the plots folder if you don't already have one.

Plot data function

In [3]:
# Plots data from latest CSV file

# Defining function
def plot_data():

    # Storing path - data folder - into a variable to be reused multiple times without the need of typing folder path, or in case path changes
    # it only need to be updated once in the function
    path = "data"

    # Creating plots folder if it doesn't exist
    if not os.path.exists("plots"):
        os.makedirs("plots")

    # Find all CSV files in the folder
    csv_files = [x for x in os.listdir(path) if x.endswith(".csv")]

    # Sorting CSV files by filename showing newest first, inspiration from lecture
    csv_files.sort(reverse=True)

    # Latest file, inspiration from lecture
    recent_csv = csv_files[0]

    # Joining folder and file to create the full file path for functions
    latest_path = os.path.join(path, recent_csv)

    # Reading CSV and converting to Dataframe, header has two rows for plotting reference (multi level indexing)
    df = pd.read_csv(latest_path, header=[0, 1])

    # Converting first column with dates to Datetime
    datetime_column = df.columns[0]
    df[datetime_column] = pd.to_datetime(df[datetime_column])
    
    # Making plot wider for better visualization
    plt.figure(figsize=(14, 6)) 

    # Plot Close prices for each ticker, looping through each ticker 
    for ticker in tickers:
        plt.plot(df[datetime_column], df[('Close', ticker)], label=ticker)
        
    # Defining lables and title to plot
    plt.xlabel("Datetime")
    plt.ylabel("Close Price")
    plt.title('Close prices')
    plt.legend()

      # Saving plot to 'plots' folder, ensuring CSV and PNG files have the same timestamp
    plot_filename = recent_csv.replace('.csv', '.png')
    save_path = os.path.join("plots", plot_filename)
    plt.savefig(save_path, dpi=300)
    plt.close()
    
# Calling function
plot_data()

##### References:
Finding latest CSV file: https://stackoverflow.com/questions/58881381/using-python-to-identify-and-load-last-csv-file-in-directory-by-updated-time#:~:text=Open%20the%20directory%20and%20filter,file%20in%20the%20target%20directory  
Path.join method: https://www.geeksforgeeks.org/python/python-os-path-join-method/  
Plot documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html  
Convert date to datetime: https://www.geeksforgeeks.org/pandas/convert-the-column-type-from-string-to-datetime-format-in-pandas-dataframe/  
Multi level indexing https://towardsdatascience.com/working-with-multi-index-pandas-dataframes-f64d2e2c3e02/#:~:text=A%20multi%2Dindex%20

## Script

Create a Python script called faang.py in the root of your repository. Copy the above functions into it and it so that whenever someone at the terminal types ./faang.py, the script runs, downloading the data and creating the plot. Note that this will require a shebang line and the script to be marked executable. Explain the steps you took in your notebook.

Steps:

1. Created Python script called faang.py, in the root directory of my repository. 

2. Added shebang `#!/usr/bin/env python3` at the top of the script, this line of code tells the system to run the script using the Python 3 when executed from the terminal.  

3. Added libraries needed to run the code: yfinance, pandas, os, datetime and matplotlib.  

4. Added get_data(): function imports data from yfinance and saves it into a csv.  

5. Added plot_data(): function reads the csv and generates a plot.  

6. Added a main function - main() to call both functions in order.  

7. Added the below conditional block that ensures the code runs only when the file is executed as a script not when imported.
 
    ```if __name__ == "__main__":```
        ```main()```

8. Updated user permissions with chmod 777 faang.py to allow use the script. 777 gives all users full permissions: read, run, and modify.

9. Opened a new terminal

10. Ran the script using ```python faang.py```

## Automation

Create a GitHub Actions workflow to run your script every Saturday morning. The script should be called faang.yml in a .github/workflows/ folder in the root of your repository. In your notebook, explain each of the individual lines in your workflow.

name: Run script -> Name of the wokflow, appears in Github Actions UI.

on: -> When the workflow should run  
workflow_dispatch: -> allows workflow to be run manually  
schedule:  -> Schedule workflow execution with Cron  
cron: "0 9 * * 6"  -> Cron expression, workflow runs every Saturday at 9am (zero minute, 9 hour, 6 Saturday)  

permissions:  -> Define workflow permissions  
contents: write  -> In this case, the permission is to commit and push changes to repository  

jobs: -> Part of Github actions structure, contain all taks/steps in the wokflow 
run-download:  -> name of the job  
runs-on: ubuntu-latest  -> Environment to run the workflow, Linux chosen as it is a free platform  

steps:  -> Actions executed by the workflow, indiviual tasks of a job
name: Checkout repository  -> Name of one of the multiple steps in this script    
uses: actions/checkout@v4  -> github actions will clone the repository  
with:  -> Actions config options  
token: ${{ secrets.GITHUB_TOKEN }}  -> Githubs authentication token  
fetch-depth: 0  -> Fetch full git history  

name: Set up Python  -> Name of one of the multiple steps in this script     
uses: actions/setup-python@v5  -> Install Python  
with:  -> Actions config options  
python-version: "3.x"  -> Python version to be installed, Python 3  

name: Install dependencies -> Name of one of the multiple steps in this script      
run: |  -> tells Github to run a multi line command 
pip install -r requirements.txt || true  -> install packages from requirements. || Prevents the workflow from failing in case requirements do not exist or dependencies are already installed  

name: Run faang.py -> Name of one of the multiple steps in this script     
run: |  -> tells Github to run a multi line command     
python faang.py  -> name of the script to be run  

name: Commit changes  -> Name of one of the multiple steps in this script     
run: |  -> tells Github to run a multi line command  
git config --global user.name "github-actions"  -> Who is making to commit, in this csa Github actions (field required by Github for commits)  
git config --global user.email "github-actions@github.com"  -> Email adress for who is making the commit (field required by Github for commits)  
git add -A  -> Stage new, deleted and modified files in repository  
git diff --quiet --cached || git commit -m "Automated update from stocks.py"  -> Looks for stages changes, exits if no changes, runs if any changes found  

name: Push changes  Name of one of the multiple steps in this script    
run: |  -> tells Github to run a multi line command 
git push origin HEAD  -> Push commit to the default branch in repository  

##### References:
Name, on, jobs, steps: https://dev.to/harshm03/github-actions-full-guide-5cm6  
Permissions: https://www.graphite.com/guides/github-actions-permissions  
Checkout repository step: https://github.com/actions/checkout  
run: |: https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-syntax  
Git config: https://gitscripts.com/git-config-email-and-name

## End