## Get markdown formated details of a paper from arxive url/id
- From a list of arxive urls I want to get a formated previous so I can past it in Notion.
- Although Notion will recognize some md comments when you paste them such as "#" for (sub)sections
-  It will not recognize other markdown things such as a expandable header. 
    - Therefore, past the output in a .md file and copy it form there 

In [1]:
import arxiv
import tqdm

In [2]:
def fetch_arxiv_info(arxiv_id):
    # print("arxiv id is ", arxiv_id)
    
    # Search for papers by arXiv ID
    search = arxiv.Search(id_list=[arxiv_id])
    paper = next(search.results(), None)  # Fetch the first result
    
    if paper:
        # Extracting the necessary information
        title = paper.title
        abstract = paper.summary
        authors = ", ".join(author.name for author in paper.authors)
        year = paper.published.year
        url = paper.entry_id  # Fetch the URL
        
        return {
            "title": title,
            "abstract": abstract,
            "authors": authors,
            "year": year,
            "url": url  # Return the URL
        }
    else:
        return "No paper found with that arXiv ID."

In [3]:
def format_paper_info(info):
    # info is a dict with fields: title, abstract, authors, year, url
    authors = info['authors'].split(",")[0].split(" ")  # Only the first author (for brevity)
    if len(authors) > 1:
        # use the initials for first names and full last name
        authors = " ".join([f"{name[0]}." if i < len(authors) - 1 else name for i, name in enumerate(authors)])
    outp_str = f"###  [{info['title']} - {authors} ({info['year']})]({info['url']})\n\n"
    outp_str += "<details open>\n<summary>Abstract</summary>\n\n"
    outp_str += f"> {info['abstract']}\n\n"
    outp_str += "</details> \n\n"
    return outp_str

def process_arxivurl_list(arxiv_url_str):
    arxiv_urls_og = arxiv_url_str.split("\n")
    arxiv_urls_og = [url.strip() for url in arxiv_urls_og if len(url.strip()) > 0]
    arxiv_urls = [item for item in arxiv_urls_og if "arxiv.org" in item]
    arxiv_ids = [url.split("/")[-1].replace(".pdf", "").strip() for url in arxiv_urls]
    non_arxive_urls = [url for url in arxiv_urls_og if url not in arxiv_urls]
    formatted_info = ""

    if len(non_arxive_urls) > 0:
        formatted_info += "\n ### Non arXiv URLs: \n\n - "
        formatted_info += "\n - ".join(non_arxive_urls)
    
    formatted_info += "\n ## ArXiv items: \n\n"
    for arxiv_id in tqdm.tqdm(arxiv_ids):
        info = fetch_arxiv_info(arxiv_id)
        formatted_info += format_paper_info(info)

    # create a temp.md file and append the formatted_info to it
    with open("temp.md", "w") as f:
        f.write(formatted_info)
        f.write("\n\n---\n\n")

    return formatted_info



arxiv_url_str = """

https://arxiv.org/abs/2401.06102
https://gist.github.com/pierrejoubert73/902cc94d79424356a8d20be2b382e1ab

https://arxiv.org/pdf/2403.15419.pdf
"""

formatted_info = process_arxivurl_list(arxiv_url_str)

  paper = next(search.results(), None)  # Fetch the first result
100%|██████████| 2/2 [00:01<00:00,  1.61it/s]


In [4]:
from IPython.display import display, Markdown, Latex
display(Markdown(formatted_info))


 ### Non arXiv URLs: 

 - https://gist.github.com/pierrejoubert73/902cc94d79424356a8d20be2b382e1ab
 ## ArXiv items: 

###  [Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models - A. Ghandeharioun (2024)](http://arxiv.org/abs/2401.06102v2)

<details open>
<summary>Abstract</summary>

> Inspecting the information encoded in hidden representations of large
language models (LLMs) can explain models' behavior and verify their alignment
with human values. Given the capabilities of LLMs in generating
human-understandable text, we propose leveraging the model itself to explain
its internal representations in natural language. We introduce a framework
called Patchscopes and show how it can be used to answer a wide range of
questions about an LLM's computation. We show that prior interpretability
methods based on projecting representations into the vocabulary space and
intervening on the LLM computation can be viewed as instances of this
framework. Moreover, several of their shortcomings such as failure in
inspecting early layers or lack of expressivity can be mitigated by
Patchscopes. Beyond unifying prior inspection techniques, Patchscopes also
opens up new possibilities such as using a more capable model to explain the
representations of a smaller model, and unlocks new applications such as
self-correction in multi-hop reasoning.

</details> 

###  [Attention is all you need for boosting graph convolutional neural network - Y. Wu (2024)](http://arxiv.org/abs/2403.15419v1)

<details open>
<summary>Abstract</summary>

> Graph Convolutional Neural Networks (GCNs) possess strong capabilities for
processing graph data in non-grid domains. They can capture the topological
logical structure and node features in graphs and integrate them into nodes'
final representations. GCNs have been extensively studied in various fields,
such as recommendation systems, social networks, and protein molecular
structures. With the increasing application of graph neural networks, research
has focused on improving their performance while compressing their size. In
this work, a plug-in module named Graph Knowledge Enhancement and Distillation
Module (GKEDM) is proposed. GKEDM can enhance node representations and improve
the performance of GCNs by extracting and aggregating graph information via
multi-head attention mechanism. Furthermore, GKEDM can serve as an auxiliary
transferor for knowledge distillation. With a specially designed attention
distillation method, GKEDM can distill the knowledge of large teacher models
into high-performance and compact student models. Experiments on multiple
datasets demonstrate that GKEDM can significantly improve the performance of
various GCNs with minimal overhead. Furthermore, it can efficiently transfer
distilled knowledge from large teacher networks to small student networks via
attention distillation.

</details> 

