# DSA4265 Assignment 2: RAG Generation

With the large availability of news available today from different agencies alongside the sheer number of stocks to select from today, it is increasingly difficult for investors to spend time to look through all news articles and reports about the different companies and the performance of their stocks to decide which to buy to maximise their returns. Through the controversial opinions on the performance of stocks, investors tend to rely on analyst reports in terms of scores for established metrics such as Earnings and Sentiments. 

Therefore, the goal of this assignment is to generate a Retriever-Augmentation-Generation (RAG) model that extracts key information about the overall performance of the stock data based on the recent windows, and providing investment advice to prospective investors about the performance of the stocks. This makes it easier for investors to make an informed decision about the investment in the stocks that have been included in the RAG model.

## Part 1: Data Extraction and Preparation

The following section describes the data extraction process and generation of the labelled dataframe. The data obtained was sourced from Refinitiv Workspace.

The tickers used in this assignment can be summarised in the table below: 

| Stock Name | Ticker Symbol |
| ---------  | ------------- |
| Apple Inc | AAPL |
| Amazon.com Inc | AMZN |
| Boeing Co | BA |
| Berkshire Hathaway Inc Class A | BRKA |
| Google | GOOGL |
| Goldman-Sachs | GS |
| Johnson & Johnson | JNJ |
| JPMorgan Chase & Co | JPM |
| Coca-Cola Co | KO |
| McDonald's Corp | MCD |
| Meta Platforms Inc | META |
| Morgan Stanley | MS |
| Microsoft Corp | MSFT |
| NextEra Energy Inc | NEE |
| NVIDIA Corp | NVDA |
| Pfizer Inc | PFE |
| Procter & Gamble Co | PG |
| Tesla Inc | TSLA |
| Visa Inc | V |
| Exxon Mobil Corp | XOM |

### Feature 1: Summarisation of Analytic Reports

As the documents included in the dataset are relatively long, the use of LLMs was used to summarise the different chunks, and these summarised chunks will then be used for embedding and subsequently to answer the query.

### Feature 2: Chunking of Documents

To facilitate the separation of documents into distinct chunks, CharacterTextSplitter function was utilised, with an overlap of 100 characters so as to ensure the preservation of context between chunks. Therefore, this enables better understanding of each chunk.

In [5]:
stocks_used = ["aapl", "amzn", "ba", "brka", "googl", "gs", "jnj", "jpm", "ko", "mcd", 
               "meta", "ms", "msft", "nee", "nvda", "pfe", "pg", "tsla", "v", "xom"]

In [1]:
from langchain.prompts import PromptTemplate
from langchain.storage import InMemoryStore
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_google_vertexai import (
    ChatVertexAI,
    VectorSearchVectorStore,
    VertexAI,
    VertexAIEmbeddings,
)
from langchain_text_splitters import CharacterTextSplitter
from google.cloud import aiplatform
import fitz  # pymupdf

In [None]:
# Parameters for VertexAI

PROJECT_ID = "PROJECT-ID"
LOCATION = "LOCATION"

# For Vector Search Staging
GCS_BUCKET = "BUCKET-ID"
GCS_BUCKET_URI = f"gs://{GCS_BUCKET}"
aiplatform.init(project=PROJECT_ID, location=LOCATION, staging_bucket=GCS_BUCKET_URI)

MODEL_NAME = "gemini-1.5-flash"
GEMINI_OUTPUT_TOKEN_LIMIT = 8192

EMBEDDING_MODEL_NAME = "text-embedding-004"
EMBEDDING_TOKEN_LIMIT = 2048

TOKEN_LIMIT = min(GEMINI_OUTPUT_TOKEN_LIMIT, EMBEDDING_TOKEN_LIMIT)

In [None]:
# Function to summarise text using VertexAPI

import time
def generate_text_summaries(
    texts: list[str], summarize_texts: bool = False
) -> tuple[list, list]:

    # Prompt
    prompt_text = """You are an assistant tasked with summarizing tables and text for retrieval. \
    These summaries will be embedded and used to retrieve the raw text or table elements. \
    Summarise the issues stemming for the report provided. The report is as shown: {element} """
    prompt = PromptTemplate.from_template(prompt_text)
    empty_response = RunnableLambda(
        lambda x: AIMessage(content="Error processing document")
    )
    # Text summary chain
    model = VertexAI(
        temperature=0, model_name=MODEL_NAME, max_output_tokens=TOKEN_LIMIT
    ).with_fallbacks([empty_response])
    summarize_chain = {"element": lambda x: x} | prompt | model | StrOutputParser()

    # Initialize empty summaries
    text_summaries = []

    if texts:
        for i in range(len(texts)):
            text = texts[i]
            if summarize_texts:
                # Summarize the current text chunk
                summary = summarize_chain.invoke({"element": text})
                text_summaries.append(summary)
            else:
                text_summaries.append(text)
            print(f"Chunk {i} summarised, {len(texts)-i} remaining for this stock")
            # Wait for 1 minute after every 3 chunks
            if (i + 1) % 4 == 0 and i != len(texts) - 1:
                print("Waiting for 1 minute before processing the next 4 chunks...")
                time.sleep(60)  # Delay for 1 minute after every 3 chunks
    print("Summarised!")
    return text_summaries

In [None]:
stocks_used = ["aapl", "amzn", "ba", "brka", "googl", "gs", "jnj", "jpm", "ko", "mcd", 
               "meta", "ms", "msft", "nee", "nvda", "pfe", "pg", "tsla", "v", "xom"]

stocks_used_dict = dict()

for stock in stocks_used:
    doc = fitz.open(f"{stock}_report.pdf")
    text = "\n".join([page.get_text() for page in doc])
    
    # Extract text from all pages
    texts = [page.get_text("text") for page in doc]

    # Combine extracted text
    full_text = "\n\n".join(texts)

    # Initialize the text splitter, and chunk the reports into more concise summaries
    text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=1000, chunk_overlap=200
    )

    # Split text into chunks
    texts_4k_token = text_splitter.split_text(full_text)

    # Get text summaries from report
    text_summaries = generate_text_summaries(
        texts_4k_token, summarize_texts=True
    )
    stocks_used_dict[stock] = text_summaries

## Part 2: Building of RAG Model

For each ticker's analytics report, the summarised chunks are stored in a list. These lists will then be collated into a dictionary format, where the stock tickers act as the keys of the dictionary for ease of identification. The text summarising function is coded into a loop across all the ticker reports. Following this, the RAG model was built based on the steps which will be described in greater depth below:

In [None]:
import pandas as pd

# Convert stocks_used_dict to a pandas DataFrame, where each stock symbol becomes a row, and its associated summaries become a column
stocks_df = pd.DataFrame(list(stocks_used_dict.items()), columns=['Stock', 'Summaries'])

# Saving
stocks_df.to_csv('stocks_used_summaries.csv', index=False)

# Check the resulting DataFrame
print(stocks_df.head())

   Stock                                          Summaries
0   aapl  [## Summary of Issues from Apple Inc. (0R2V-LN...
1   amzn  [## Summary of Issues for AMZN:\n\nThe report ...
2     ba  [## Summary of Issues for Boeing Co (BA)\n\nTh...
3   brka  [## Summary of Issues from the Berkshire Hatha...
4  googl  [## Summary of Issues from Alphabet Inc. (GOOG...


# Part 2: RAG Model Set-Up

This section discusses the various methodology involved in the generation of the RAG Model to answer queries based on the impressions of various stocks. To generate the RAG Model, VertexAI was used.

## Stage 1: Retrieval of Data

In this stage, the following steps are applied:

1. Checking for duplicates in the summaries for each stock to ensure that the results will not be biased to particular chunks.
2. VertexAI vector search index & endpoint is deployed for ease of access to the embedding vectors.
3. Creation of Retriever was done with the help of VectorSearchVectorStore with the Vector Search Index ID and Endpoint ID, and the embedding model as textembedding-gecko. This allows the querying of the vector index to retrieve documents that are semantically similar to a query.

In [3]:
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain_core.documents import Document
import re
import uuid
from langchain.prompts import PromptTemplate
from langchain.storage import InMemoryStore
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_google_vertexai import (
    ChatVertexAI,
    VectorSearchVectorStore,
    VertexAI,
    VertexAIEmbeddings,
)
from langchain_text_splitters import CharacterTextSplitter
from google.cloud import aiplatform
import fitz  # pymupdf

In [4]:
import pandas as pd
import ast

stocks_used_dict = pd.read_csv('stocks_used_summaries.csv')

# Convert the string representation of a list back to an actual list
stocks_used_dict['Summaries'] = stocks_used_dict['Summaries'].apply(ast.literal_eval)
stocks_used_dict = stocks_used_dict.set_index('Stock')['Summaries'].to_dict()

In [5]:
# Checking for duplicates in summaries; after checking, KO does not have duplicates (even though the summaries are of the same length)
from collections import Counter

for stock, summaries in stocks_used_dict.items():
    lengths = [len(s) for s in summaries]  # List of lengths of each summary
    length_counts = Counter(lengths)  # Count occurrences of each length
    duplicates = {length for length, count in length_counts.items() if count > 1}  # Set of duplicate lengths
    if duplicates:
        print(f"{stock} stock has summaries with duplicate lengths: {duplicates}")
    else:
        print(f"{stock} stock has no duplicate lengths.")

aapl stock has no duplicate lengths.
amzn stock has no duplicate lengths.
ba stock has no duplicate lengths.
brka stock has no duplicate lengths.
googl stock has no duplicate lengths.
gs stock has no duplicate lengths.
jnj stock has no duplicate lengths.
jpm stock has no duplicate lengths.
ko stock has summaries with duplicate lengths: {1641}
mcd stock has no duplicate lengths.
meta stock has no duplicate lengths.
ms stock has no duplicate lengths.
msft stock has no duplicate lengths.
nee stock has no duplicate lengths.
nvda stock has no duplicate lengths.
pfe stock has no duplicate lengths.
pg stock has no duplicate lengths.
tsla stock has no duplicate lengths.
v stock has no duplicate lengths.
xom stock has no duplicate lengths.


In [7]:
# Creation of endpoints
DIMENSIONS = 768  # Dimensions output from textembedding-gecko

index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
    display_name="rag_index",
    dimensions=DIMENSIONS,
    approximate_neighbors_count=150,
    leaf_node_embedding_count=500,
    leaf_nodes_to_search_percent=7,
    description="RAG LangChain Index",
    index_update_method="STREAM_UPDATE",
)

DEPLOYED_INDEX_ID = "rag_index_endpoint"

index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
    display_name=DEPLOYED_INDEX_ID,
    description="RAG Index Endpoint",
    public_endpoint_enabled=True,
)

index_endpoint = index_endpoint.deploy_index(
    index=index, deployed_index_id="rag_deployed_index1"
)
# index_endpoint.deployed_indexes

Creating MatchingEngineIndex
Create MatchingEngineIndex backing LRO: projects/954241416931/locations/us-central1/indexes/2117250376771043328/operations/1220752679026819072
MatchingEngineIndex created. Resource name: projects/954241416931/locations/us-central1/indexes/2117250376771043328
To use this MatchingEngineIndex in another session:
index = aiplatform.MatchingEngineIndex('projects/954241416931/locations/us-central1/indexes/2117250376771043328')
Creating MatchingEngineIndexEndpoint
Create MatchingEngineIndexEndpoint backing LRO: projects/954241416931/locations/us-central1/indexEndpoints/5350764540478881792/operations/3368547488817479680
MatchingEngineIndexEndpoint created. Resource name: projects/954241416931/locations/us-central1/indexEndpoints/5350764540478881792
To use this MatchingEngineIndexEndpoint in another session:
index_endpoint = aiplatform.MatchingEngineIndexEndpoint('projects/954241416931/locations/us-central1/indexEndpoints/5350764540478881792')
Deploying index Matchi

In [None]:
from chromadb import Client
from chromadb.config import Settings
from langchain.vectorstores import Chroma
from langchain.retrievers import MultiVectorRetriever
import uuid
from langchain.schema import Document

vectorstore = VectorSearchVectorStore.from_components(
    project_id=PROJECT_ID,
    region=LOCATION,
    gcs_bucket_name=GCS_BUCKET,
    index_id=index.name,
    endpoint_id=index_endpoint.name,
    embedding=VertexAIEmbeddings(model_name=EMBEDDING_MODEL_NAME),
    stream_update=True,
)

# Create the in-memory docstore to store metadata (e.g., stock symbol)
docstore = InMemoryStore()

# Define the key for document IDs (it could be stock symbols or unique IDs)
id_key = "doc_id"

# Process the stock summaries and add to the vectorstore
for stock, summaries in stocks_used_dict.items():
    # Generate unique document IDs (or use stock symbols as IDs)
    doc_ids = [str(uuid.uuid4()) for _ in summaries]
    
    # Create Document objects (with summaries and metadata)
    summary_docs = [
        Document(page_content=s, metadata={id_key: doc_ids[i]})
        for i, s in enumerate(summaries)
    ]
    
    # Add documents (summaries) to Chroma vectorstore
    vectorstore.add_documents(summary_docs)

## Stage 2: Introduction of Query, and Similarity Search

In this stage, given a query, the similarity_search function was used to embed the query, and find the chunks that are semantically the closest to the embedded query. Subsequently, these chunks will function as the context, from which it will be fed into the ChatVertexAI LLM to construct a response to answer the question.

In [80]:
# Create RAG chain with text-only logic
def chain_rag(query, num_k, temp = 0):
    docs = vectorstore.similarity_search(query, k=num_k)
    all_texts = []
    for doc in docs:
        all_texts.append(doc.page_content)
    formatted_texts = "\n".join(all_texts)
    
    model = ChatVertexAI(
        temperature=temp,
        model_name=MODEL_NAME,
        max_output_tokens=TOKEN_LIMIT,
    )
    # Prepare the message to send to the model
    message = (
        "You are a financial analyst tasked with providing investment advice.\n"
        "You will be given text-based data.\n"
        "Use this information to provide investment advice related to the user's question.\n"
        f"User-provided question: {query}\n\n"
        "Text:\n"
        f"{formatted_texts}\n\n"
        "Your response should include:\n"
        "- A summary of relevant information from the provided text.\n"
        "- An analysis of key financial indicators or trends.\n"
        "- A conclusion based on the evidence, explicitly stating how the data supports your recommendation.\n"
        "- Citations or references to specific data points where applicable."
        "You do not need to repeat the answer twice."
    )
    
    # Send the message to the model and get the response
    response = model([HumanMessage(content=message)])
    final_ans = response.content.replace('*', '')
    return final_ans

## Part 3: Evaluation of RAG Model

To evaluate the model, a groundedness check was done to check the validity of the answers itself.

### Evaluation Method 1: Groundedness Evaluation

Following the preparation of the RAG chain to generate based on the queries, checking that the RAG model works was then done by means of a groundedness check to ensure that the answers are not randomly generated. This step is critical in financial applications, as it prevents hallucinations and enhances trust in the generated advice. Through this process, we can also confirm that model outputs align with factual data rather than simply speculations.

In this project, I tested it using the methods listed below:

#### Type 1: Hallucinations
In this type, I test for incorrect or fabricated answers, as TSLA does NOT pay dividends.

In this example, hallucinations are not observed.

In [81]:
answer_1 = chain_rag("What is Tesla’s (TSLA) dividend yield?", num_k=1)
print(answer_1)

Based on the provided text, Tesla (TSLA) does not currently pay a dividend. [2] This means its dividend yield is 0%. [2] The report highlights that Tesla is one of seven companies in its industry group that does not pay a dividend. [2] Additionally, the report lacks information on dividend metrics such as payout, coverage, and yield, making it difficult to assess the company's dividend policy and potential future dividend payments. [5] 

While the report does not explicitly state why Tesla does not pay a dividend, it does mention that the company has a high accruals ratio, which is the highest within its industry group. [3] This could indicate aggressive accounting practices or a reliance on non-cash earnings, which may be a factor in the company's decision to not pay dividends. 

Conclusion: Based on the available information, Tesla does not currently pay a dividend and therefore has a dividend yield of 0%. [2] The report lacks information on dividend metrics and does not provide any 

#### Type 2: Comparative Analysis
This example tests the reasoning and logic behind the retrieval of the documents. This checks if the RAG model is able to synthesise and process the material thoroughly to make a decision.

In this case, a decision was made with reasonable evidence from the analyst reports.

In [85]:
answer_2 = chain_rag("It is noted that Pfizer and BRK.A have among the highest positive outlooks. But which stock are investors more optimistic about?", num_k=5)
print(answer_2)

Based on the provided information, it's difficult to definitively say which stock investors are more optimistic about, as the data presents a mixed picture for both Pfizer (PFE) and Berkshire Hathaway (BRK.A). 

Pfizer (PFE):

 Positive Outlook: PFE has a strong earnings rating of 9, exceeding the Pharmaceuticals industry average (5.3). [Text: "Pfizer has a strong earnings rating of 9, significantly higher than the Pharmaceuticals industry average of 5.3."]
 Earnings Surprises: PFE has consistently exceeded earnings expectations over the past four quarters. [Text: "Over the past four quarters, Pfizer has consistently exceeded earnings expectations, reporting four positive surprises."]
 Price Target: Analysts predict a 12-month price target of $30.60, representing a 16.6% increase from the current price. [Text: "Analysts predict a 12-month price target of $30.60, representing a 16.6% increase from the current price of $26.24."]

However, there are also concerns:

 Declining Average Scor

#### Type 3: Giving Logical Advice
This example evaluates the ability for the model to assess the risks associated with the stock, and make a decision accordingly.

In this case, the evaluation of the Amazon stock is well-balanced, with proper evaluation of the stock risk profile with benchmark stocks such as S&P 500, thereby indicating a decent answer.

In [83]:
answer_3 = chain_rag("What are the advantages and disadvantages of investing in Amazon?", num_k=5)
print(answer_3)

## Investing in Amazon: A Risky Yet Potentially Rewarding Proposition

Amazon (AMZN) is a company with a high market capitalization and a strong track record of growth. However, the provided report highlights several factors that make it a risky investment. 

Key Risks:

 High Correlation with S&P 500: AMZN's high correlation with the S&P 500 (cited in the report) means it offers limited diversification benefits. This implies that during market downturns, AMZN is likely to experience significant losses alongside the broader market.
 Volatility and Beta: AMZN has a higher beta than the S&P 500, indicating greater volatility. This means its stock price is more susceptible to fluctuations, potentially leading to larger losses during market corrections.
 High Valuation: AMZN's trailing and forward PE ratios are significantly higher than the market average (36.3 and 32.1 respectively). This suggests the stock is currently priced at a premium, making it vulnerable to a correction if the comp

### Evaluation Method 2: LLM Checking

Apart from using groundedness check to ensure the validity of the results, I also decided to test the results using VertexAI LLM as well to evaluate the responses of the answers on a scale of 1 to 100. I conducted this using a loop with the specifications of k = [4, 6, 8] and temperature = [0, 0.2, 0.4]. Based on the responses, an overall table for each question will be constructed to compare the quality of responses based on the question to decide the best-performing parameters. This is in hopes of seeing if there are general patterns in parameters that yield better quality responses.

In [None]:
import time
num_k = [4,6,8]
temp_lst = [0,0.2,0.4]
answer1_lst = []
answer2_lst = []
answer3_lst = []
i = 0
for k_trial in num_k:
    for temp_val in temp_lst:
        ans1 = chain_rag("Should I invest in JPM now?", num_k=k_trial, temp=temp_val)
        answer1_lst.append(ans1)
        ans2 = chain_rag("Is TSLA or NEE better positioned for the renewable energy sector?", num_k=k_trial, temp=temp_val)
        answer2_lst.append(ans2)
        ans3 = chain_rag("Is NVDA a good long-term investment based on recent reports?", num_k=k_trial, temp=temp_val)
        answer3_lst.append(ans3)
        if (i + 1) % 2 == 0:
                time.sleep(60)  # Delay for 1 minute after every 6 times
        i += 1

In [64]:
def llm_evaluator(query, ans_list):
    score_list = []
    model = ChatVertexAI(
        temperature=0,
        model_name=MODEL_NAME,
        max_output_tokens=TOKEN_LIMIT,
    )
    for i in range(len(ans_list)):
        ans = ans_list[i]
        # Prepare the message to send to the model
        message = (
            "You are a financial analyst tasked with evaluating investment advice.\n"
            "Use this information to evaluate to the user's question.\n"
            f"User-provided question: {query}\n\n"
            "Text:\n"
            f"{ans}\n\n"
            "Please provide a score out of 100 that reflects the overall quality of the answers.\n"
            "The score should consider clarity, relevance, depth, and accuracy."
            "You do not need an explanation, and you do not need to repeat the answer twice."
        )
    
        # Send the message to the model and get the response
        response = model([HumanMessage(content=message)])
        score_list.append(response.content)
        # print(response.content)
        if (i + 1) % 3 == 0:
            print("Waiting for 1 minute before processing the next few chunks...")
            time.sleep(60)  # Delay for 1 minute after every few calls
    return score_list

In [None]:
score_lst1 = llm_evaluator("Should I invest in JPM now?", answer1_lst)
score_lst2 = llm_evaluator("Is TSLA or NEE better positioned for the renewable energy sector?", answer2_lst)
score_lst3 = llm_evaluator("Is NVDA a good long-term investment based on recent reports?", answer3_lst)

#### Results from Evaluation Method 2
The following describes the results from the use of the VertexAI LLM to evaluate the answers produced from the RAG Models:

##### Question 1
| k    | 0    | 0.2 |  0.4    |
| :--: | :--: | :-: | :-----: |
| 0 |   85   |  85   |   85      |
| 0.2 |   85  |  85  |    85    |
| 0.4 |   85  |  85  |   85     |

##### Question 2
| k    | 0    | 0.2 |  0.4    |
| :--: | :--: | :-: | :-----: |
| 0 |   85   |    85 |  85       |
| 0.2 |   90  |  90  |    85    |
| 0.4 |   90  |  85  |   90     |

##### Question 3
| k    | 0    | 0.2 |  0.4    |
| :--: | :--: | :-: | :-----: |
| 0 |   90   |   85  |    85     |
| 0.2 |   85  |  85  |    90    |
| 0.4 |   85  |  90  |   85     |

Based on the results, no clear pattern is observed as temperature and the k varies for each question. Therefore, for this particular RAG model, there does not appear to be optimal parameters for any given question - rather, the best solution (based on the tested parameters) will vary based on each prompt posed to the model. More research can be done in this field for more parameters to determine if there are optimum parameters for this model.

## Part 4: Overall Evaluation of Model and Conclusion

Through the usage of 2 different types of evaluation methods, the quality of the responses produced by the RAG model was relatively similar across the parameters tested, and the answers produced have been shown to be able to provide logical advice, conduct comparative analyses between stocks and companies, and also appear not to have any hallucinations. 

#### Strengths of Model
Firstly, from retrieving relevant documents using vector search to using LLMs such as ChatVertexAI, VertexAI facilitates the entire RAG process. This is extremely key for RAG, as this allows both efficient document retrieval and the ability to generate relevant, context-rich responses to be achieved. The advantages mentioned above is also something commendable about the model built.

Next, it also integrates well with Cloud Storage, thereby enabling smooth data pipeline management and faster training cycles for the models used.

#### Weaknesses of Model
Given the strengths of using VertexAI, there also comes weaknesses of the system. By requiring frequent retrieval operations, RAG models built from Vertex AI can lead to high compute costs, particularly when dealing with large document collections and frequent queries. The cost of both vector search during the retrieval of documents and the generation of text responses can accumulate very quickly. Additionally, given that it will there will be large volumes of queries in subsequent work, the costs of using this service can be very high as well.

### Potential Future Work
Firstly, as only texts were considered in the RAG model built in this assignment, multi-modal RAG models could be considered since these models will be able to take in images and process the images and graphs to provide a better view of the performance of the stocks. Therefore, this improves the answers from the model.

Next, for future works, beyond merely providing an answer or advice to investors, this RAG model has the potential to be adapted to forecast risk factors into the financial sector through the incorporation of more diverse financial indicators. Hence, better predictions can be generated as well.

Finally, the measurement of performance of the RAG model can also be improved in terms of comparing the performance. Beyond simply generating and showing evidence, perhaps additional techniques such as the measurement of importance of particular phrases can be done so as to provide greater insight as to why the answer was provided by the RAG model.

### Conclusion
All in all, this model shows great potential in providing advice to prospective investors about different stocks through the summarisation of the reports in point form, and also through the comparative analysis between stocks. It also does not produce hallucinations which may mislead investors. However, more can be done to enhance the model in the long term to help investors in greater ways such as the use of multi-modal RAG models, and adaptations to achieve other functions investors intend to see such as performance forecasts for the various stocks based on the recent analyst reports.