# Link Insertion Investigation

It has been noted that in situations when `text_in_answer` for a citation does not match with `text`  from the LLM response, sources will appear but the citation numbering will not be shown in the response.

**This is the section of code we're trying to replace:**
```python
# Add footnotes to messages
for message in messages:
    footnote_counter = 1
    for display, href, cit_id, text_in_answer in message.unique_citation_uris():  # noqa: B007
        if text_in_answer:
            message.text = message.text.replace(
                text_in_answer,
                f'{text_in_answer}<a class="rb-footnote-link" href="/citations/{message.id}/#{cit_id}">{footnote_counter}</a>',  # noqa: E501
            )
            footnote_counter = footnote_counter + 1
```

The current method is encapsulated in `simple_insert_link` below. The proposed solution is as below:

- Convert both `message.text` and `text_in_answer` from markdown text to plain text.

- Search for the plain text_in_answer in the plain message.text and track the position of the text.

- If/when found, replace the text in message.text to include the link.

This will allow for markdown formatting to be maintained

This is detailed in the `insert_link` function below.

In [None]:
import markdown
from bs4 import BeautifulSoup
import re

In [None]:
def simple_insert_link(md_text: str, sub_string: str, message_id: str, cit_id: str, footnote_counter: int):
    md_text=md_text.replace(sub_string,
                    f'{sub_string}<a class="rb-footnote-link" href="/citations/{message_id}/#{cit_id}">{footnote_counter}</a>')
    return md_text

In [None]:
def get_plain_text(md_text: str)-> str:
    """
    Converts a Markdown-formatted string to plain text by rendering it as HTML and stripping all HTML tags. 
    It also replaces '\n' with `\\n`

    Args:
        md_text (str): The Markdown content to be stripped.

    Returns:
        The plain text representation of the Markdown input.
    """
    md_text = md_text.replace('\\n','\n')
    html = markdown.markdown(md_text)
    soup = BeautifulSoup(html, 'html.parser')
    return soup.get_text()

In [None]:
def insert_link(md_text: str, sub_string: str, message_id: str, cit_id: str, footnote_counter: int)-> str:
    """
    Searches for a plain-text substring within a Markdown-formatted string,
    and if found, appends a Markdown link immediately after the matched content,
    preserving original Markdown formatting.

    Args:
        md_text (str): The original Markdown text.
        sub_string (str): The string to search for (text_in_answer).
        message_id (str): The message ID
        cit_id(str): The citation ID
        footnote_counter(int): The footnote counter.

    Returns:
        The updated Markdown string with the link inserted if a match was found. If no match is found, returns the original Markdown unchanged.
    """
    rendered_text=get_plain_text(md_text)
    rendered_string=get_plain_text(sub_string)

    match = re.search(re.escape(rendered_string), rendered_text)
    if not match:
        return md_text

    for i in range(len(md_text)):
        snippet = get_plain_text(md_text[i:])
        if snippet.startswith(rendered_string):
            end_idx=i
            count = 0
            while count<len(rendered_string) and end_idx<len(md_text):
                temp = get_plain_text(md_text[i:end_idx])
                count=len(temp)
                end_idx +=1
            main_text = md_text[:end_idx-1] + '<a class="rb-footnote-link" href="/citations/{message_id}/#{cit_id}">{footnote_counter}</a>' + md_text[end_idx:]
            return main_text
    return md_text

A sample LLM Response in markdown format is shown below (`text_body`) alongside a sample citation text to search for (`search_text`).

We will test both methods and time them to see what the time impact is.

In [None]:
# Sample LLM Response
text_body = """
**UK Market Share in Japan**\n\nAccording to the Japan Trade and Investment Factsheet:\n\n- The UK's total market share in Japan was 1.8% in 2023, an increase of 0.3 percentage points from 2022.\n- The UK market share in Japan for goods was 1.1% in 2023, an increase of 0.1 percentage points from 2022. \n- The UK market share in Japan for services was 4.3% in 2023, an increase of 0.4 percentage points from 2022.\n\nThe factsheet provides a table showing the UK's market share in Japan from 2014 to 2023 for total trade, goods, and services.\n\n**General Information on Japan's Economy**\n\nFrom Wikipedia:\n\n- Japan has the 4th largest economy in the world by nominal GDP and the 5th largest by purchasing power parity (PPP).\n- It is a highly developed mixed economy and founding member of the G7.  \n- Japan has a highly service-dominated economy, contributing around 70% of GDP.\n- The industrial sector is led by the automobile industry, which is the second largest in the world. Major companies include Toyota, Honda, Sony, Hitachi.\n- Japan underwent rapid economic growth and industrialization after World War II, becoming the world's 2nd largest economy by 1968 until surpassed by China in 2010.\n- However, economic stagnation and deflation marked the 'Lost Decades' from the 1990s to 2010s after the collapse of an asset price bubble.\n\n**Recent UK-Japan Trade Negotiations**\n\nBased on information from GOV.UK:\n\n- In March 2025, the UK and Japan held the second UK-Japan Strategic Economic Policy and Trade Dialogue to strengthen economic ties and cooperation in areas like supply chain resilience, critical technologies, clean energy, and advanced manufacturing.\n\n- In March 2023, the first Japan-UK Economic 2+2 Ministers' Meeting was held to discuss economic security, free trade, energy security, and engagement with the Global South.\n\n- In December 2022, the UK joined the Comprehensive and Progressive Trans-Pacific Partnership (CPTPP), which Japan is a member of.\n\n- In June 2020, the UK and Japan started negotiating the UK-Japan Comprehensive Economic Partnership Agreement (CEPA), which entered into force by the end of 2020 after the UK's exit from the EU.
"""

In [None]:
# Sample text_in_answer
search_text="""
From Wikipedia:\n\n- Japan has the 4th largest economy in the world by nominal GDP and the 5th largest by purchasing power parity (PPP).
"""

**Testing the Current Method**

In [None]:
from pprint import pprint

pprint(insert_link(text_body, search_text, 'id_123', 'id_456', 5))

**Testing the Proposed Method**

In [None]:
pprint(simple_insert_link(text_body, search_text, 'id_123', 'id_456', 5))

**Calculating the Time Difference**

In [None]:
%%timeit -o
test1=insert_link(text_body, search_text, 'id_123', 'id_456', 5)

In [None]:
time1=_.average

In [None]:
%%timeit -o
test2=simple_insert_link(text_body, search_text, 'id_123', 'id_456', 5)

In [None]:
time2=_.average

In [None]:
difference = time1/time2

In [None]:
print(f'The proposed solution takes {round(difference,2):,.2f} times as long as the current solution')