# Problem 1: Data from yfinance

https://github.com/ranaroussi/yfinance

In [5]:
import yfinance as yf  # Yahoo Finance data
import pandas as pd   # üìö Reference: https://pandas.pydata.org/
import os             # üìö Reference: https://docs.python.org/3/library/os.html
from datetime import datetime  # üìö Reference: https://docs.python.org/3/library/datetime.html

def get_data():
    # üìö Reference: https://docs.python.org/3/tutorial/controlflow.html#defining-functions    

    # The list of FAANG stock symbols
    faang = ['META', 'AAPL', 'AMZN', 'NFLX', 'GOOG']      

    # Downloading the data from Yahoo Finance
    # Using yf.download() to get data for multiple tickers
    # 'period="5d"' means last 5 days
    # 'interval="1h"' gives hourly price data
    data = yf.download(tickers=faang, period='5d', interval='1h', group_by='ticker')
    # üìö Reference: https://aroussi.com/post/python-yahoo-finance, 
    # https://medium.com/@kasperjuunge/yfinance-10-ways-to-get-stock-data-with-python-6677f49e8282,
    # https://www.youtube.com/watch?v=j0sBKAB75oc  

    # Checking if data was downloaded successfully
    print("Downloaded data sample:\n")
    print(data.head())  # Printing first few rows for confirmation
    # üìö Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.head.html 

    # Checking and printing the original timezone
    print("Original timezone information:")
    print(data.index.tz)
    # üìö Reference: https://pandas.pydata.org/docs/reference/api/pandas.Index.tz.html
    # This checks if the datetime index contains timezone information.

    # Localizing and converting to Irish time 
    if data.index.tz is None:
        # Localizing to New York time (exchange timezone for FAANG)
        data = data.tz_localize('America/New_York')
        # üìö Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.tz_localize.html
        # https://pandas.pydata.org/docs/user_guide/timeseries.html#localizing-time-zones
        
    data = data.tz_convert('Europe/Dublin')
    # üìö Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.tz_convert.html
    # https://pandas.pydata.org/docs/user_guide/timeseries.html#time-zone-handling
    # Converts timestamps to Irish local time.

    # Verify the conversion
    print("Converted to timezone:", data.index.tz)
    # üìö Reference: https://pandas.pydata.org/docs/reference/api/pandas.Index.tz.html

    # Confirming that data folder exists
    folder_name = 'data'
    if not os.path.isdir(folder_name):
        print(f"Folder '{folder_name}' not found! Please create it manually.")
        return  # Exits early if folder missing
    else:
        print(f"Folder '{folder_name}' found. Proceeding to save the data.") 
    # üìö Reference: https://www.w3schools.com/python/python_conditions.asp, 
    # https://docs.python.org/3/library/os.path.html#os.path.isdir

# Creating a timestamp for the filename in the format YYYYMMDD-HHmmss
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    # üìö Reference: https://www.geeksforgeeks.org/python/python-strftime-function/
    # https://stackoverflow.com/questions/32490629/getting-todays-date-in-yyyy-mm-dd-in-python
    # https://www.geeksforgeeks.org/python/convert-datetime-string-to-yyyy-mm-dd-hhmmss-format-in-python/

    # Creating the filename using the timestamp
    filename = str(timestamp) + ".csv"
    # üìö Reference: https://docs.python.org/3/library/functions.html#func-str
    # https://www.geeksforgeeks.org/python/how-to-create-filename-containing-date-or-time-in-python/
    
    # Saving the DataFrame to a CSV file in the specified folder
    filepath = os.path.join(folder_name, filename)
    # üìö Reference: https://docs.python.org/3/library/os.path.html#os

    # Saving the data to CSV
    data.to_csv(filepath)
    # üìö Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html

get_data()


  data = yf.download(tickers=faang, period='5d', interval='1h', group_by='ticker')
[*********************100%***********************]  5 of 5 completed

Downloaded data sample:

Ticker                          NFLX                                   \
Price                           Open       High        Low      Close   
Datetime                                                                
2025-12-09 14:30:00+00:00  96.790001  97.110001  96.230003  96.644997   
2025-12-09 15:30:00+00:00  96.644997  96.714996  95.879997  96.065002   
2025-12-09 16:30:00+00:00  96.065002  96.629997  95.760002  95.925003   
2025-12-09 17:30:00+00:00  95.925003  96.070000  95.449997  95.783798   
2025-12-09 18:30:00+00:00  95.783798  97.190002  95.750000  96.589897   

Ticker                                    AAPL                          \
Price                       Volume        Open        High         Low   
Datetime                                                                 
2025-12-09 14:30:00+00:00  7862597  277.890015  280.029999  277.350006   
2025-12-09 15:30:00+00:00  4915351  277.700012  278.000000  277.029999   
2025-12-09 16:30:00+




## 1st part - importing

`import yfinance as yf`<br>
Loads the yfinance package, which connects to Yahoo Finance and allows me to download stock market data directly into Python.

At first, I got the error `ModuleNotFoundError: No module named 'yfinance'`. It was because even though Python was installed, the yfinance package wasn‚Äôt, and that VS Code sometimes uses the wrong Python interpreter. I had to run python -m pip install yfinance in the terminal, making sure it installed into the same environment VS Code was using. These issues occured after transferring my work from GitHub Codespace to VS Code due to Codespace downtime and problems with syncing VS Code and GitHub.

`import pandas as pd`<br>
Imports pandas and is used to manage and save tabular data.

`import os`<br>
Provides tools for interacting with the operating system, like checking if folders exist or creating new ones.

`from datetime import datetime`<br>
Helps manipulate dates and times and helps creating timestamps for filenames.

**üìö References:**<br>
- https://packaging.python.org/en/latest/tutorials/installing-packages/\n
- https://code.visualstudio.com/docs/python/environments\n
- https://code.visualstudio.com/docs/python/environments#_select-and-activate-an-environment\n
- https://pip.pypa.io/en/stable/cli/pip_install/\n
- https://pandas.pydata.org/
- https://docs.python.org/3/library/os.html
- https://docs.python.org/3/library/datetime.html
- https://stackoverflow.com/questions/15707532/import-datetime-v-s-from-datetime-import-datetime
- https://www.geeksforgeeks.org/python/python-datetime-module/


## 2nd part - downloading and defining data

`def get_data():`<br>
Defines a new function and actions everything underneath the code line.

`faang = ['META', 'AAPL', 'AMZN', 'NFLX', 'GOOG']`<br>
This line creates a list of ticker symbols for the 5 FAANG companies and tells `yfinance` which stocks to download data for. At first, I struggled with the following error repeatedly: `NameError: name 'faang' is not defined` until I realised the issue was indentation.

`data = yf.download(tickers=faang, period='5d', interval='1h', group_by='ticker')`<br>
`yf.download()` function retrieves the data.<br>
`tickers=faang` means I am downloading multiple stocks at once.<br>
`period='5d'` requests the last five days.<br>
`interval='1h'` retrieves hourly price data.<br>
`group_by='ticker'` organises the data by company.<br>

`msft = yf.Ticker("MSFT")` wasn't used after realising I need a solution that covers multiple tickers.

**üìö References:**<br>
- https://docs.python.org/3/tutorial/controlflow.html#defining-functions  
- https://aroussi.com/post/python-yahoo-finance, 
- https://medium.com/@kasperjuunge/yfinance-10-ways-to-get-stock-data-with-python-6677f49e8282,
- https://www.youtube.com/watch?v=j0sBKAB75oc  


## 3rd part - timezone conversion

I realised that FAAANG companies trade on NASDAQ and NYSE, which use the U.S. Eastern Time Zone.<br>
To make the dataset consistent with my local time in Ireland I made the following changes:
- Checked the timezone of thedata with `print(data.index.tz)`
- Localised timestamps to America/New_York (the stock exchange timezone) if none was found, with `data = data.tz_localize('America/New_York')`.
- Converted the localised timestamps to Irish time (Europe/Dublin) with `data = data.tz_convert('Europe/Dublin')`
- Verified the conversion with `print(data.index.tz)`

**üìö References:**<br>
- https://www.ig.com/sg/trading-strategies/nasdaq-opening-and-closing-times--when-can-you-trade--230527
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.tz_localize.html
- https://pandas.pydata.org/docs/user_guide/timeseries.html#localizing-time-zones
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.tz_convert.html
- https://pandas.pydata.org/docs/reference/api/pandas.DatetimeIndex.tz.html
- https://stackoverflow.com/questions/16628819/convert-pandas-timezone-aware-datetimeindex-to-naive-timestamp-but-in-certain-t
- https://www.geeksforgeeks.org/pandas/pandas-series-dt-tz_localize/
- https://pandas.pydata.org/docs/user_guide/timeseries.html#time-zone-handling


## 4th part - checking the data

`print("Downloaded data sample:\n")`outputs preview text and \n adds a new line for better reading. <br>
`print(data.head())` displays the first five rows of the data frame to confirm that the download worked.

`folder_name = 'data'`<br>
    `if not os.path.isdir(folder_name):`<br>
        `print(f"Folder '{folder_name}' not found! Please create it manually.")`<br>
        `return`<br>
    `else:`<br>
        `print(f"Folder '{folder_name}' found. Proceeding to save the data.")`<br>
This part checks if a folder named `data` exists. The folder was created and this checks if the program can find it.<br>
`os.path.isdir(folder_name)`tells me if the folder is found: `Folder 'data' found. Proceeding to save the data.`<br>

**üìö References:**<br>
- https://www.geeksforgeeks.org/python/difference-between-newline-and-carriage-return-in-python/
- https://docs.python.org/3/library/functions.html#print
- https://docs.python.org/3/library/os.path.html#os.path.isdir
- https://www.w3schools.com/python/python_conditions.asp
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.head.html
- https://www.geeksforgeeks.org/python/python-os-path-isdir-method/


## 5th part - timestamp for the filename

`timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')` gets current time and date and then formats it into a string.<br>
This ensures that each file has a unique timestamp.<br>
`filename = str(timestamp) + ".csv"`creates a file object with extension and creates a string. <br>

**üìö References:**<br>
- https://www.geeksforgeeks.org/python/python-strftime-function/
- https://stackoverflow.com/questions/32490629/getting-todays-date-in-yyyy-mm-dd-in-python
- https://www.geeksforgeeks.org/python/convert-datetime-string-to-yyyy-mm-dd-hhmmss-format-in-python/
- https://docs.python.org/3/library/functions.html#func-str
- https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
- https://docs.python.org/3/library/datetime.html#datetime.datetime.now
- https://docs.python.org/3/library/datetime.html#datetime.date.strftime


## 6th part - saving the data

`filepath = os.path.join(folder_name, filename)` joins folder name and filename into one path.<br>
`data.to_csv(filepath)` saves the entire data frame to a csv file in the data folder. Each time the function runs, a new file is created with a unique timestamp name.<br>
`get_data()` defined earlier with `def get_data():` at this point the program downloads the FAANG stock data, checks the folder and saves the csv file.

**üìö References:**<br>
- https://docs.python.org/3/tutorial/controlflow.html#defining-functions
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html
- https://docs.python.org/3/library/os.path.html#os.path.join
- https://docs.python.org/3/library/os.path.html#os

# Problem 2: Plotting Data

In [6]:
import matplotlib.pyplot as plt 
import numpy as np 

def plot_data():
    # Folder paths
    data_folder = "data"
    plot_folder = "plots" 

    # Listing all files in the data folder
    files = os.listdir(data_folder)  
    print("All files found:", files)
    # https://docs.python.org/3/library/os.html#os.listdir
    # https://docs.python.org/3/library/functions.html#print

    # Filtering CSV files
    csv_files = [f for f in files if ".csv" in f]  
    print("CSV files found:", csv_files)
    # https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions

    csv_paths = [os.path.join(data_folder, f) for f in csv_files] # Full paths to CSV files
    
    # Picking the latest file
    latest_file = max(csv_paths, key=os.path.getmtime)  # Selecting most recently modified file
    
    # Extracting filename from full path
    csv_filename = os.path.basename(latest_file)
    # https://docs.python.org/3/library/os.path.html#os.path.basename

    # Removing .csv extension and replace with .png (saving png with the same name as csv)
    plot_filename = os.path.splitext(csv_filename)[0] + ".png"
    # https://docs.python.org/3/library/os.path.html#os.path.splitext

    print("Latest file picked:", latest_file)
    # https://docs.python.org/3/tutorial/introduction.html#lists
    # https://docs.python.org/3/library/os.path.html#os.path.getmtime
    # https://docs.python.org/3/library/time.html#module-time
    # https://stackoverflow.com/questions/39327032/how-to-get-the-latest-file-in-a-folder

    # Loading the CSV into pandas
    file_path = latest_file
    df = pd.read_csv(file_path, header=[0, 1], index_col=0)
    print(df.head()) 
    # https://www.geeksforgeeks.org/pandas/python-read-csv-using-pandas-read_csv/
    # https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

    # MultiIndex DataFrame so all tickers are selected
    arrays = [
    # First level = stock tickers
    ["META", "META", "AAPL", "AAPL", "AMZN", "AMZN", "NFLX", "NFLX", "GOOG", "GOOG"],
    
    # Second level = price data fields
    ["Open", "Close", "Open", "Close", "Open", "Close", "Open", "Close", "Open", "Close"]
]

    # Tuples pair ticker and field together
    tuples = list(zip(*arrays))
    index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])

    # https://www.geeksforgeeks.org/python/pandas-multi-index-and-groupby/
    # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.html
    # https://www.datacamp.com/tutorial/pandas-multi-index
    # https://docs.python.org/3/library/functions.html#zip
    # https://www.geeksforgeeks.org/python/zip-in-python/
    # https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.from_tuples.html
    # https://www.geeksforgeeks.org/python/python-pandas-multiindex-from_tuples/

    # Selecting only the 'Close' prices for all tickers
    close_data = df.loc[:, (slice(None), 'Close')].copy()
    # https://pandas.pydata.org/docs/user_guide/advanced.html#multiindex-advanced-indexing
    # https://www.geeksforgeeks.org/python/python-pandas-dataframe-loc/

    # Flattening the MultiIndex columns to single level
    close_data.columns = close_data.columns.get_level_values(0)
    # https://pandas.pydata.org/docs/reference/api/pandas.Index.get_level_values.html
    # https://pandas.pydata.org/docs/user_guide/advanced.html#multiindex-advanced-indexing
    # https://stackoverflow.com/questions/39080555/pandas-get-level-values-for-multiple-columns
 
    
    # Plotting all FAANG close prices
    plt.figure(figsize=(12, 6))
    for ticker in close_data.columns:
        plt.plot(close_data.index, close_data[ticker], label=ticker)
        # Plotting each ticker's close price over time
        # https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html
        # https://matplotlib.org/stable/tutorials/introductory/pyplot.html
        # https://blog.quantinsti.com/python-matplotlib-tutorial/

    # Adding titles and labels
    plt.title("FAANG Stock Close Prices", fontsize=14)
    plt.xlabel("Date and Time (Irish Local Time)", fontsize=12)
    plt.ylabel("Stock Closing Price", fontsize=12)
    plt.legend(title="Ticker", loc="upper left")
    plt.xticks(rotation=45) # Rotating x-axis labels for better readability
    plt.grid(True)
    # https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
    # https://www.geeksforgeeks.org/python/matplotlib-pyplot-legend-in-python/
    # https://stackoverflow.com/questions/19125722/adding-a-matplotlib-legend
    # https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.xticks.html
    

    # Saving the plot
    if not os.path.isdir(plot_folder):
        os.makedirs(plot_folder) # Creating the folder
        print(f"Created folder: {plot_folder}")
        # https://docs.python.org/3/library/os.html#os.makedirs
        # https://www.geeksforgeeks.org/python-os-makedirs-method/
        # https://stackoverflow.com/questions/273192/how-can-i-create-a-directory-in-python
        
    # Adjusting layout
    plt.tight_layout()
    # https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.tight_layout.html
 
    # Saving the plot into plots folder
    plt.savefig(
        os.path.join(plot_folder, plot_filename),
        dpi=300,
        bbox_inches='tight'
    )
    # https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html
    # https://docs.python.org/3/library/os.path.html#os.path.join
    
    plt.close()
    print(f"Plot saved successfully in '{plot_folder}' folder as '{plot_filename}'.")
    # https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html
    # https://www.geeksforgeeks.org/python/saving-a-plot-as-an-image-in-python/
    # https://docs.python.org/3/library/os.path.html#os.path.join   
    
plot_data()
   
            

All files found: ['20251018_105950.csv', '20251018_110232 .csv', '20251022_221515.csv', '20251022_221612.csv', '20251022_221845.csv', '20251022_223230.csv', '20251026_141153.csv', '20251026_141202.csv', '20251027_191000.csv', '20251027_191225.csv', '20251027_191236.csv', '20251101_140747.csv', '20251101_140758.csv', '20251101_141627.csv', '20251101_142723.csv', '20251101_143026.csv', '20251109_180831.csv', '20251109_183335.csv', '20251113_174926.csv', '20251113_175104.csv', '20251113_180028.csv', '20251113_180929.csv', '20251113_181718.csv', '20251113_181743.csv', '20251115_174229.csv', '20251115_174627.csv', '20251115_182327.csv', '20251115_184004.csv', '20251115_184014.csv', '20251115_184142.csv', '20251123_105440.csv', '20251123_105455.csv', '20251123_145713.csv', '20251123_195532.csv', '20251123_195725.csv', '20251123_195826.csv', '20251123_200439.csv', '20251123_200837.csv', '20251123_201824.csv', '20251123_202001.csv', '20251123_202407.csv', '20251123_202926.csv', '20251123_20293

## 1st part - Importing

`matplotlib.pyplot` is imported to create plots later in the function.

`numpy` is imported to work with arrays.

**üìö References:**<br>
- https://numpy.org/
- https://www.w3schools.com/python/numpy/numpy_intro.asp
- https://www.w3schools.com/python/matplotlib_pyplot.asp
- https://matplotlib.org/3.5.3/api/_as_gen/matplotlib.pyplot.html


## 2nd part - Listing and selecting the latest CSV file

`files = os.listdir(data_folder)`<br>
`print("All files found:", files)`<br>
This lists every file inside the data folder.

`csv_files = [f for f in files if ".csv" in f]`<br>
`print("CSV files found:", csv_files)`<br>
This uses a list comprehension to keep only the filenames that contain `.csv`.

`csv_paths = [os.path.join(data_folder, f) for f in csv_files]`<br>
This converts each CSV filename into a full file path so it can be accessed correctly by the system.

`latest_file = max(csv_paths, key=os.path.getmtime)`<br>
`print("Latest file picked:", latest_file)`<br>
Here every CSV filename is turned into a full path, and `max(..., key=os.path.getmtime)` finds the file with the most recent modification time ‚Äî which is the newest dataset from Problem 1.

After identifying the latest CSV file, the filename itself is reused to create a matching PNG filename.<br>
`csv_filename = os.path.basename(latest_file)`<br>
This extracts just the filename (for example 20251215_203431.csv) from the full file path.<br>
`plot_filename = os.path.splitext(csv_filename)[0] + ".png"`<br>
This part removes the csv extension and replaces it with png. This guarantees that the plot and CSV share the exact same timestamp.

Because the CSV file with a two-row header that shows ticker and price field, the following was implemented to rearrange MultiIndex colums:<br>

`file_path = latest_file`<br>
`df = pd.read_csv(file_path, header=[0, 1], index_col=0)`<br>
`print(df.head())`<br>
By using `header=[0,1]` the column is reconstructed and `index_col=0` makes the first column (timestamps) the index.

**üìö References:**<br>
- https://docs.python.org/3/library/os.html#os.listdir
- https://docs.python.org/3/library/functions.html#print
- https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
- https://docs.python.org/3/tutorial/introduction.html#lists
- https://docs.python.org/3/library/os.path.html#os.path.getmtime
- https://docs.python.org/3/library/time.html#module-time
- https://stackoverflow.com/questions/39327032/how-to-get-the-latest-file-in-a-folder
- https://www.geeksforgeeks.org/pandas/python-read-csv-using-pandas-read_csv/
- https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
- https://docs.python.org/3/library/os.path.html#os.path.basename
- https://docs.python.org/3/library/os.path.html#os.path.splitext


## 3rd part - MultiIndex

In this section of the function, I prepared a structure that helps describe the layout of the FAANG stock data. The CSV file downloaded from `yfinance` uses two header rows (ticker + price field), and to work with this type of structure properly, pandas uses MultiIndex ‚Äî which means each column has more than one label.

Even though `pd.read_csv()` already reads the CSV into a MultiIndex, this part of the code recreates the same MultiIndex structure manually.

`arrays = [`<br>
    `["META", "META", "AAPL", "AAPL", "AMZN", "AMZN", "NFLX", "NFLX", "GOOG", "GOOG"],`<br>
    `["Open", "Close", "Open", "Close", "Open", "Close", "Open", "Close", "Open",`<br> `"Close"]`<br>
`]`<br>
The first list represents the stock ticker repeated twice (because each ticker has both "Open" and "Close" price values).<br>
The second list represents the price fields for each corresponding ticker. This structure is needed because pandas MultiIndex uses pairs of values to describe each column.

`tuples = list(zip(*arrays))`<br>
`zip(*arrays)` takes the two lists and pairs them by position: ("META", "Open"), ("META", "Close") etc.<br>
`list(...)` turns the zip object into a list of tuples.

Then...<br>
`index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])`<br>
This creates a MultiIndex where:<br>
- `"first"` = ticker (META, AAPL, AMZN, NFLX, GOOG)
- `"second"` = field (open or close)<br>

**üìö References:**<br>
- https://www.geeksforgeeks.org/python/pandas-multi-index-and-groupby/
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.html
- https://www.datacamp.com/tutorial/pandas-multi-index
- https://docs.python.org/3/library/functions.html#zip
- https://www.geeksforgeeks.org/python/zip-in-python/
- https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.from_tuples.html
- https://www.geeksforgeeks.org/python/python-pandas-multiindex-from_tuples/


## 4th part - Selecting 'Close' prices and flattening MultiIndex

`close_data = df.loc[:, (slice(None), 'Close')].copy()`<br>
`df.loc[...]` is used to select data by label.<br>
Symbol `:` is specifying columns.<br>
`(slice(None), 'Close')` means:<br>
- `slice(None)` ‚Üí all tickers in the first level of the MultiIndex.
- `'Close'` ‚Üí only the 'Close' field in the second level.

So `close_data` now contains the Close prices for all tickers, and `.copy()` makes a separate copy of the data to avoid warnings.

Next is this part:<br>
`close_data.columns = close_data.columns.get_level_values(0)`<br>
`close_data.columns` is currently a MultiIndex with two levels: (Ticker, "Close").<br>
After this, the columns become a simple Index: META, AAPL, AMZN, NFLX, GOOG. This makes the column names easier to use when looping for plotting.

**üìö References:**<br>
- https://pandas.pydata.org/docs/user_guide/advanced.html#multiindex-advanced-indexing
- https://www.geeksforgeeks.org/python/python-pandas-dataframe-loc/
- https://pandas.pydata.org/docs/reference/api/pandas.Index.get_level_values.html
- https://pandas.pydata.org/docs/user_guide/advanced.html#multiindex-advanced-indexing
- https://stackoverflow.com/questions/39080555/pandas-get-level-values-for-multiple-columns


## 5th part - Plotting all FAANG Close prices

This part creates a new figure (plot window) with a width of 12 inches and height of 6 inches:<br>
`plt.figure(figsize=(12, 6))`

Next:<br>
    `for ticker in close_data.columns:`<br>
        `plt.plot(close_data.index, close_data[ticker], label=ticker)`<br>
This part loops over each ticker in `close_data.columns`. This is x-axis.<br>
`close_data[ticker]` contains the Close price values This is y-axis.<br>
`label=ticker` makes each line appear with the correct ticker name in the legend.

**üìö References:**<br>
- https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html
- https://matplotlib.org/stable/tutorials/introductory/pyplot.html
- https://blog.quantinsti.com/python-matplotlib-tutorial/


## 6th part - Chart design

`plt.title("FAANG Stock Close Prices", fontsize=14)`<br>
`plt.xlabel("Date and Time (Irish Local Time)", fontsize=12)`<br>
`plt.ylabel("Stock Closing Price", fontsize=12)`<br>
`plt.legend(title="Ticker", loc="upper left")`<br>
`plt.grid(True)`<br>

- `plt.title(...)` sets the chart title.
- `plt.xlabel(...)` and `plt.ylabel(...)` label the axes.
- `plt.legend(...)` adds a legend using the `label=` values from `plt.plot`.

**üìö References:**<br>
- https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
- https://www.geeksforgeeks.org/python/matplotlib-pyplot-legend-in-python/
- https://stackoverflow.com/questions/19125722/adding-a-matplotlib-legend


## 7th part - Saving and closing the plot into `plots` folder
  
`if not os.path.isdir(plot_folder):`<br>
`os.makedirs(plot_folder) # Creating the folder`<br>
`print(f"Created folder: {plot_folder}")`

- `os.path.isdir(plot_folder)` checks if `plots` folder already exists.
- If it doesn‚Äôt exist, `os.makedirs(plot_folder)` creates it.
- The `print` part confirms that the folder was created.

*Next:*<br>
`  plt.savefig(`<br>
        `os.path.join(plot_folder, plot_filename),`<br>
        `dpi=300,`<br>
        `bbox_inches='tight'`<br>
    `)`<br>
    `plt.close()`<br>
    `print(f"Plot saved successfully in '{plot_folder}' folder as '{plot_filename}'.")`

- `os.path.join(plot_folder, plot_filename)` builds the full file path for the PNG image inside the `plots` folder.
- `plt.savefig(..., dpi=300, bbox_inches='tight')` saves the figure as a PNG file. `dpi=300` gives a high-resolution image. `bbox_inches='tight'` trims extra whitespace and avoids cutting off labels.
- `plt.close()` closes everything.
- `print()` confirms that the file is saved succesfully.

`plot_data()` executes everything.

**üìö References:**<br>
- https://docs.python.org/3/library/os.html#os.makedirs
- https://www.geeksforgeeks.org/python-os-makedirs-method/
- https://stackoverflow.com/questions/273192/how-can-i-create-a-directory-in-python
- https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html
- https://www.geeksforgeeks.org/python/saving-a-plot-as-an-image-in-python/
- https://docs.python.org/3/library/os.path.html#os.path.join

# Problem 3: Script

‚û°Ô∏è **Please refer to this file:** https://github.com/tihana-gray/computer-infrastructure-assessment/blob/main/faang.py

## 1st part - Importing required libraries

These imports are the same libraries used in Problems 1 and 2.

**üìö References:**<br>
- https://pandas.pydata.org/
- https://docs.python.org/3/library/os.html
- https://docs.python.org/3/library/datetime.html
- https://matplotlib.org/stable/tutorials/introductory/pyplot.html


## 2nd part - Reusing functions from Problems 1 and 2

The `get_data()` function was copied directly from Problem 1 and placed into `faang.py` without changing its logic.<br>
It downloads the FAANG stock data, converts the timestamps to Irish time, checks for the data folder, and saves the data as a timestamped CSV file.

The plot_data() function was copied from Problem 2 and placed at the top level of the script (not inside another function).<br>
This function does the following:
- Finds all CSV files in the data folder.
- Selects the most recently modified file.
- Loads the data into pandas.
- Extracts only the Close prices.
- Plots the closing prices for all five FAANG stocks.
- Saves the plot as a PNG file in a plots folder.

**üìö References:**<br>
- https://docs.python.org/3/tutorial/controlflow.html#function-definitions
- https://pandas.pydata.org/docs/user_guide/advanced.html#multiindex-advanced-indexing
- https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html


## 3rd part - Adding Shebang line

`#!/usr/bin/env python3` - This line is known as a shebang. It tells operating systems which interpreter should be used to run the script. In this case, it specifies that the script should be run using Python 3.

**üìö References:**<br>
- https://realpython.com/python-shebang/
- https://www.geeksforgeeks.org/python/why-we-write-usr-bin-env-python-on-the-first-line-of-a-python-script/
- https://stackoverflow.com/questions/7670303/purpose-of-usr-bin-python3-shebang
- https://medium.com/cloud-for-everybody/i-kept-seeing-usr-bin-env-python3-and-finally-decided-to-figure-it-out-a3376a21316b


## 4th part - Entry point

Added:<br>
`if __name__ == "__main__":`<br>
    `get_data()`<br>
    `plot_data()`<br>
    `print("All saved successfully.")`<br>

This block ensures that the script only runs the functions when the file is executed directly from the terminal, and not when it is imported into another Python file.

**üìö References:**<br>
- https://www.geeksforgeeks.org/python/what-does-the-if-__name__-__main__-do/
- https://realpython.com/python-main-function/
- https://docs.python.org/3/library/__main__.html
- https://stackoverflow.com/questions/419163/what-does-if-name-main-do
- https://realpython.com/python-main-function/


# Problem 4: Automation

‚û°Ô∏è **Please refer to this file:** https://github.com/tihana-gray/computer-infrastructure-assessment/blob/main/.github/workflows/faang.yml

GitHub Actions was used to automate the execution of the `faang.py` script.<br>
The script is set to automatically every Saturday morning at 09:00 Irish time, downloads FAANG stock data, generates a timestamped CSV file, creates a matching PNG plot, and commits the results back to the repository without manual work.

This automation is implemented using a GitHub Actions workflow defined in a `YAML` file named `faang.yml`, located in the `.github/workflows/` directory of the repository.


## 1st part - Workflow location and name

The workflow is saved at:<br>
`.github/workflows/faang.yml`

Workflow name:
`name: Run faang.py and push changes`<br>
‚Üì
The `name` field defines how the workflow appears in the Actions tab on GitHub.
This name helps identify what the workflow does when viewing execution history.


**üìö References:**<br>
- https://docs.github.com/en/actions
- https://docs.github.com/en/actions/how-tos/write-workflows
- https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-syntax
- https://docs.github.com/en/actions/get-started/quickstart


## 2nd part - Workflow triggers

This section defines when the workflow runs:
`on:`<br>
  `workflow_dispatch:`<br>
  `schedule:`<br>
    `- cron: "0 9 * * 6"`<br>

`workflow_dispatch` makes workflow to be triggered from the GitHub Actions workspace.<br>
`schedule` uses a cron expression to schedule automatic execution.<br>
This cron expression means:<br>
- Minute: 0
- Hour: 9
- Day of the week: 6 (Saturday)

**üìö References:**<br>
- https://crontab.guru/#0_11_*_*_6
- https://crontab.guru/examples.html
- https://docs.github.com/en/actions/reference/workflows-and-actions/events-that-trigger-workflows#schedule
- https://docs.github.com/en/actions/reference/workflows-and-actions/events-that-trigger-workflows#workflow_dispatch


## 3rd part - Permissions and Jobs

This part gives workflow permission to commit and push changes back to the repository.
`permissions:`<br>
  `contents: write`

`jobs:`<br> 
  `run-faang:`‚Üí this defines the job.<br> 
    `runs-on: ubuntu-latest` ‚Üí says that the workflow runs on a Linux virtual machine provided by GitHub.


**üìö References:**<br>
- https://docs.github.com/en/actions/using-jobs/assigning-permissions-to-jobs
- https://docs.github.com/en/actions/security-guides/automatic-token-authentication
- https://docs.github.com/en/actions/using-jobs/choosing-the-runner-for-a-job
- https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idruns-on


## 4th part - Environment and Dependencies

`- name: Checkout repository`<br>
  `uses: actions/checkout@v4` ‚Üí this is official GitHub action for cloning repositories.<br>
  `with:`<br>
    `token: ${{ secrets.GITHUB_TOKEN }}` ‚Üí authenticates the process.<br>  
    `fetch-depth: 0` ‚Üí makes commit history available.                      

`- name: Set up Python` ‚Üí sets Python.<br>
  `uses: actions/setup-python@v5` ‚Üí installs Python workflow.<br>
  `with:`<br>
    `python-version: "3.x"` ‚Üí ensures Python compatibility.

`- name: Install dependencies`‚Üí workflow syntax.<br>
  `run: |`<br>
    `python -m pip install --upgrade pip` ‚Üí avoids pip problems.<br>
    `pip install yfinance pandas matplotlib numpy` ‚Üí making sure it installs third-party packages.

**üìö References:**<br>
- https://docs.github.com/en/actions/tutorials/authenticate-with-github_token
- https://github.com/actions/setup-python
- https://realpython.com/github-actions-python/
- https://docs.github.com/en/actions/tutorials/build-and-test-code/python
- https://pip.pypa.io/en/stable/cli/pip_install
- https://stackoverflow.com/questions/9956741/how-to-install-multiple-python-packages-at-once-using-pip?utm_source=chatgpt.com
- https://realpython.com/github-actions-python/
- https://docs.python.org/3/installing/index.html
- https://www.geeksforgeeks.org/python/libraries-in-python/


## 5th part - Run-Commit-Push

This part executes `faang.py`:<br>
- `name: Run faang.py` ‚Üí name of the file.<br>
  `run: |`‚Üí runs the file below.<br>
    `python faang.py`

Committing all changes:<br>
`- name: Commit changes` ‚Üí labels the step.<br>
  `run: |` ‚Üí runs the commands underneath.<br>
    `git config --global user.name "github-actions"` ‚Üí changes Git settings.<br>
    `git config --global user.email "github-actions@github.com"` ‚Üí applies changes to the whole environment.<br>
    `git add data plots` ‚Üí adds data plots in the plots folder.<br>
    `git diff --quiet --cached || git commit -m "Automated FAANG data update"` ‚Üí this part checks if there are any changes and checks what is staged for commit. It shows the output as success or failure but only if there is a change.

Pushing changes:<br>
`- name: Push changes`‚Üí step label.<br>
  `run: |`‚Üí runs the part below.<br>
    `git push origin HEAD`‚Üí uploads my commit to my branch.<br>

...and success! üéâ<br>
'Run `faang.py` workflow' was triggered and completed as scheduled.‚ú®

**üìö References:**<br>
- https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-syntax#jobsjob_idstepsrun
- https://stackoverflow.com/questions/69839851/github-actions-copy-git-user-name-and-user-email-from-last-commit
- https://github.com/actions/checkout/issues/13
- https://joht.github.io/johtizen/build/2022/01/20/github-actions-push-into-repository.html
- https://git-scm.com/docs/git-diff
- https://stackoverflow.com/questions/23241052/what-does-git-push-origin-head-mean


## Learning path:

This was my first time creating and running a GitHub Actions workflow, so I was initially unsure whether the automation would execute successfully once committed to the repository. Throughout the semester, I experienced repeated environment and synchronisation issues when moving between GitHub Codespaces (not working in browser at all, had to run them from VS Code), VS Code, and local Python environments. These problems often resulted in missing packages or mismatched Python environments, even when code worked correctly locally.<br>
For this reason, I opted to include a dedicated dependency installation step that was not in the course tutorial, using:<br>
`python -m pip install --upgrade pip`<br>
`pip install yfinance pandas matplotlib numpy`<br>
This was recommended to me as a safer option considering my abovementioned issues by ChatGPT and also listed in RealPython article (https://realpython.com/github-actions-python/) and few other sources so I decided to give it a go. It worked! üôåüèª<br>

## End