# In-Context Learning


In-context learning is a generalisation of few-shot learning where the LLM is provided a context as part of the prompt and asked to respond by utilising the information in the context.

* Example: *"Summarize this research article into one paragraph highlighting its strengths and weaknesses: [insert article text]”*
* Example: *"Extract all the quotes from this text and organize them in alphabetical order: [insert text]”*

A very popular technique that you will learn in week 5 called Retrieval-Augmented Generation (RAG) is a form of in-context learning, where:
* a search engine is used to retrieve some relevant information
* that information is then provided to the LLM as context


In this example we download some recent research papers from arXiv papers, extract the text from the PDF files and ask Gemini to summarize the articles as well as provide the main strengths and weaknesses of the papers. Finally we print the summaries to a local html file and as markdown.

In [ ]:
!pip install -q -U google-generativeai

In [3]:
!pip install -q -U pypdf


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [1]:
import os
import requests
from bs4 import BeautifulSoup
import google.generativeai as genai
from urllib.request import urlopen, urlretrieve
from IPython.display import Markdown, display
from pypdf import PdfReader
from datetime import date
from tqdm import tqdm

In [57]:
API_KEY = os.environ.get("GEMINI_API_KEY")
genai.configure(api_key=API_KEY)
!echo $GEMINI_API_KEY
print("GEMINI_API_KEY" in os.environ)

AIzaSyAvBLdI9tvOhuwlBPkbSy2x055QUmyRkiE
True


In [25]:
print(API_KEY)

None


We select those papers that have been featured in Hugging Face papers.

In [53]:
BASE_URL = "https://huggingface.co/papers"
page = requests.get(BASE_URL)
soup = BeautifulSoup(page.content, "html.parser")
h3s = soup.find_all("h3")

papers = []

for h3 in h3s:
    a = h3.find("a")
    title = a.text
    link = a["href"].replace('/papers', '')

    papers.append({"title": title, "url": f"https://arxiv.org/pdf{link}"})
    
papers.append({"title": "aboutness", "url": "https://journals.uio.no/dhnbpub/article/download/11510/9543/41817"})


In [33]:
papers[-1]

{'title': 'aboutness',
 'url': 'https://journals.uio.no/dhnbpub/article/download/11510/9543/41817'}

Code to extract text from PDFs.

In [34]:
def extract_paper(url):
    html = urlopen(url).read()
    soup = BeautifulSoup(html, features="html.parser")

    # kill all script and style elements
    for script in soup(["script", "style"]):
        script.extract()    # rip it out

    # get text
    text = soup.get_text()

    # break into lines and remove leading and trailing space on each
    lines = (line.strip() for line in text.splitlines())
    # break multi-headlines into a line each
    chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
    # drop blank lines
    text = '\n'.join(chunk for chunk in chunks if chunk)

    return text


def extract_pdf(url):
    pdf = urlretrieve(url, "pdf_file.pdf")
    reader = PdfReader("pdf_file.pdf")
    text = ""
    for page in reader.pages:
        text += page.extract_text() + "\n"
    return text

In [31]:
test = extract_pdf("https://journals.uio.no/dhnbpub/article/download/11510/9543/41817")
print(test)


Augmenting BERT to Model Remediation Processes in
Finnish Countermedia: Feature Comparisons for
Supervised Text Classification
Ümit Bedretdin1,∗, Pihla Toivanen1 and Eetu Mäkelä1
1University of Helsinki
Abstract
This paper showcases a supervised machine learning classifier to bridge the gap between qualitative
and quantitative research in media studies, leveraging recent advancements in data-driven approaches.
Current machine learning methods make it possible to gain insights from large datasets that would be
impractical to analyze with more traditional methods. Supervised document classification presents a
good platform for combining specific domain knowledge and close reading with broader quantitative
analysis. The study focuses on a dataset of 37 185 articles from the Finnish countermedia publication MV-
lehti, from which a randomly sampled 997 articles were annotated into three categories based on frame
analysis. Contextual sequence representations from the finBERT language model, 

In [ ]:
def printmd(string):
    display(Markdown(string))

In [14]:
LLM = "gemini-1.5-flash"
model = genai.GenerativeModel(LLM)

We use Gemini to summarize the papers.

In [54]:
for paper in tqdm(papers[-3:]):
    try:
        paper["summary"] = model.generate_content("Summarize this research article into one paragraph without formatting highlighting its strengths and weaknesses. Format the strenghts and weaknesses into a markup table " + extract_pdf(paper["url"])).text
    except:
        print("Generation failed")
        paper["summary"] = "Paper not available"


100%|██████████| 3/3 [00:31<00:00, 10.45s/it]


In [43]:
papers[-1]

{'title': 'aboutness',
 'url': 'https://journals.uio.no/dhnbpub/article/download/11510/9543/41817',
 'summary': 'This research investigates the effectiveness of a supervised machine learning classifier in automatically identifying remediation tactics in Finnish countermedia, specifically within the publication MV-lehti. The study uses BERT-based contextual embeddings, topic distributions from a trained topic model, and structural features from the articles as input for the classifier.  \n\n**Strengths:**\n\n* **Innovative approach:**  Applies machine learning to a media studies problem, bridging the gap between qualitative and quantitative research.\n* **Large dataset:**  Utilizes a dataset of 37,185 articles, offering a comprehensive analysis of the countermedia phenomenon.\n* **Multi-feature analysis:** Explores the effectiveness of combining various features, including BERT embeddings, topic models, and structural information.\n\n**Weaknesses:**\n\n* **Limited performance improvemen

We print the results to a html file.

In [55]:
page = f"<html> <head> <h1>Daily Dose of AI Research</h1> <h4>{date.today()}</h4> <p><i>Summaries generated with: {LLM}</i>"
with open("papers.html", "w") as f:
    f.write(page)
for paper in papers[-3:]:
    page = f'<h2><a href="{paper["url"]}">{paper["title"]}</a></h2> <p>{paper["summary"]}</p>'
    with open("papers.html", "a") as f:
        f.write(page)
end = "</head>  </html>"
with open("papers.html", "a") as f:
    f.write(end)

We can also print the results to this notebook as markdown.

In [56]:
for paper in papers[-3:]:
    printmd("**[{}]({})**<br>{}<br><br>".format(paper["title"],
                                                paper["url"],
                                                paper["summary"]))

**[SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation](https://arxiv.org/pdf/2411.04989)**<br>The research article introduces SG-I2V, a novel framework for controllable image-to-video generation that leverages the knowledge present in a pre-trained image-to-video diffusion model without fine-tuning or external knowledge. SG-I2V achieves zero-shot trajectory control by aligning feature maps extracted from the self-attention layers of the diffusion model and optimizing the latent code to enforce similarity between features within bounding box trajectories. The authors also introduce a frequency-based post-processing step to enhance output quality by preserving the high-frequency noise expected by the diffusion model.

## Strengths and Weaknesses:

| Strength | Description |
|---|---|
| **Zero-shot control** |  SG-I2V does not require fine-tuning, making it computationally efficient and requiring no additional training data. |
| **Unified control** | The framework offers unified control over object and camera motion by specifying bounding box trajectories. |
| **Versatile control** | SG-I2V can control both rigid and non-rigid motions of various objects and also enables camera motion control. |
| **Quantitative evaluation** | The authors comprehensively evaluate SG-I2V against supervised and adapted zero-shot baselines, demonstrating its competitive performance in visual quality and motion fidelity. |
| **Ablation studies** | The authors perform extensive ablation studies to investigate the effect of various design choices and provide valuable insights into the framework's inner workings. |


| Weakness | Description | Analysis |
|---|---|---|
| **Quality limitations** | The quality of generated videos is limited by the base video diffusion model, especially for subjects with complex motion or physical interactions. | The base model's capabilities inherently limit the output quality, emphasizing the need for improved diffusion models. |
| **Potential artifacts** | The optimization process can lead to artifacts, although mitigated by frequency-based post-processing. |  The optimization process itself might introduce out-of-distribution latents, requiring further research on how to alter the denoising process while maintaining in-distribution latents. |
| **Limited to image-to-video** | While the authors claim the framework could be extended to newly released models, its current scope is limited to image-to-video generation. |  The framework's applicability to different types of video generation, like text-to-video, remains to be explored. |
| **Ethical considerations** | The authors acknowledge the potential for misuse of high-quality video generation, highlighting the need for responsible and safe use. |  The increasing realism of synthesized videos raises ethical concerns regarding potential manipulation and misinformation, emphasizing the need for robust ethical guidelines and safeguards. |
<br><br>

**[M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models](https://arxiv.org/pdf/2411.04075)**<br>This research article introduces M3S CIQA, a multi-modal, multi-document scientific question answering benchmark for evaluating foundation models' ability to handle complex research workflows. M3S CIQA surpasses existing benchmarks by incorporating both visual and textual data, requiring models to reason across multiple documents. The benchmark contains 1,452 expert-annotated questions spanning 70 natural language processing paper clusters, each comprised of an anchor paper and its cited documents. The authors evaluated 18 foundation models, highlighting significant performance gaps between current models and human experts, particularly in scientific image understanding and long-range information retrieval. 

## Strengths and Weaknesses of the Research:

| Strength | Weakness | Analysis |
|---|---|---|
| **Introduction of a novel benchmark with multi-modality and multi-document reasoning.** This addresses a crucial gap in existing scientific QA benchmarks. | **Limited context window in open-source LMMs restricts their ability to handle full paper clusters.** This leads to "unfair" comparisons and impacts ranking accuracy.  | The authors acknowledge this limitation and propose future work to standardize or extend context windows in LMMs. |
| **Comprehensive evaluation of various open-source and proprietary LLMs and LMMs.** This provides valuable insights into the current state of foundation models in scientific QA. | **Prompting LMMs with a set of possible reference papers can be suboptimal.** It presents challenges for models in ranking a large number of papers. | The authors suggest using individual paper embeddings and comparing them with the textual embedding of the question and image, as an alternative approach for future research. |
| **Detailed analysis of model limitations in visual reasoning, paper ranking, and long-range retrieval.** This offers valuable insights for improving foundation models in these areas. | **Use of GPT-4o's textual descriptions of images for BM25 and Contriever might not accurately capture image nuances.**  | This highlights the need for specialized LMMs trained on scientific images to enhance scientific applications. |
| **Clear explanation of the benchmark construction process and annotation guidelines.** This ensures transparency and reproducibility. | **LLM-based evaluation might not accurately reflect human evaluation due to models' confidence levels.** Models might provide tangentially relevant answers instead of "I don't know," leading to inflated scores. | This limitation requires further investigation to better align LLM-based evaluation with human assessments. | 
<br><br>

**[aboutness](https://journals.uio.no/dhnbpub/article/download/11510/9543/41817)**<br>This research paper investigates the use of a supervised machine learning classifier, leveraging BERT-based contextual embeddings, topic models, and structural features, to automatically detect remediation tactics in Finnish countermedia publications. The study focuses on a dataset of 997 articles from MV-lehti, a popular Finnish countermedia publication, categorized into three classes: media criticism, copies from mainstream media, and narrative. While the combination of BERT embeddings and structural features achieved the best overall performance, the addition of topic information resulted in only marginal improvements, particularly for minority classes. 

| Strength | Weakness | Analysis |
|---|---|---|
| Utilizes a large dataset of Finnish countermedia articles, providing a unique and valuable case study | The dataset is unbalanced, with a majority class and two minority classes, impacting the model's ability to learn features for the minority classes | Unbalanced datasets can lead to biased models that favor the majority class, leading to poor performance on minority classes. |
| Employs BERT-based embeddings, a powerful language representation model | BERT embeddings do not contain information about HTML code, limiting the model's ability to utilize structural features effectively | While BERT captures semantic and syntactic information, structural features require specific processing to be integrated effectively. |
| Investigates the effectiveness of combining different feature types, including topic models and structural features | The chosen topic modeling approach, using lemmatized nouns, may not be sufficiently nuanced to capture complex framing tactics | Complex framing often involves intricate linguistic constructions that go beyond simple lemmatized nouns, requiring more sophisticated topic modeling techniques. |
| Evaluates the performance of the classifier on a test set | The evaluation set is relatively small, limiting the generalizability of the results | A small evaluation set can lead to inaccurate assessments of model performance, particularly for minority classes. |

The study highlights the potential of automated frame analysis for studying remediation tactics in the misinformation field, but also emphasizes the limitations of current approaches. Future research could focus on developing custom feature sets, exploring more sophisticated topic modeling methods, and utilizing transformer models that can parse HTML code to improve classification performance, particularly for minority classes. 
<br><br>