In [1]:
import numpy as np
import pandas as pd
from openai import OpenAI

In [2]:
client = OpenAI()

In [3]:
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

In [4]:
df = pd.read_csv('papers_embedded.csv', index_col=0)

In [5]:
df.head()

Unnamed: 0,text,n_tokens,embedding
0,573 \n\nBIT - SERIAL NEURAL NETWORKS \n\nAlan...,7959.0,"[0.007057549431920052, 0.022557897493243217, -..."
1,1 \n\nCONNECTIVITY VERSUS ENTROPY \n\nYaser S...,5220.0,"[0.017669761553406715, -0.02821267582476139, 0..."
2,278 \n\nTHE HOPFIELD MODEL WITH MUL TI-LEVEL N...,4445.0,"[0.019363459199666977, -0.004505184479057789, ..."
3,442 \n\nAlan Lapedes \nRobert Farber \n\nThe...,7942.0,"[0.006703744176775217, -0.002929536160081625, ..."
4,740 \n\nSPATIAL ORGANIZATION OF NEURAL NEn...,7980.0,"[-0.019718386232852936, 0.021891098469495773, ..."


In [6]:
df.dropna(inplace=True)

In [7]:
df.isna().sum()

text         0
n_tokens     0
embedding    0
dtype: int64

In [8]:
df['embedding'] = df['embedding'].apply(eval).apply(np.array)

In [9]:
df.head()

Unnamed: 0,text,n_tokens,embedding
0,573 \n\nBIT - SERIAL NEURAL NETWORKS \n\nAlan...,7959.0,"[0.007057549431920052, 0.022557897493243217, -..."
1,1 \n\nCONNECTIVITY VERSUS ENTROPY \n\nYaser S...,5220.0,"[0.017669761553406715, -0.02821267582476139, 0..."
2,278 \n\nTHE HOPFIELD MODEL WITH MUL TI-LEVEL N...,4445.0,"[0.019363459199666977, -0.004505184479057789, ..."
3,442 \n\nAlan Lapedes \nRobert Farber \n\nThe...,7942.0,"[0.006703744176775217, -0.002929536160081625, ..."
4,740 \n\nSPATIAL ORGANIZATION OF NEURAL NEn...,7980.0,"[-0.019718386232852936, 0.021891098469495773, ..."


In [10]:
df.dtypes

text          object
n_tokens     float64
embedding     object
dtype: object

In [11]:
def create_context(
    question, max_len=8000
):
    """
    Create a context for a question by finding the most similar context from the dataframe
    """

    # Get the embeddings for the question
    q_embeddings = client.embeddings.create(input=question, model='text-embedding-3-small').data[0].embedding

    # Get the distances from the embeddings
    df['distances'] = df['embedding'].apply(lambda x: cosine_similarity(x, q_embeddings))


    returns = []
    cur_len = 0

    # Sort by distance and add the text to the context until the context is too long
    for _, row in df.sort_values('distances', ascending=False).iterrows():

        # Add the length of the text to the current length
        cur_len += row['n_tokens'] + 4

        # If the context is too long, break
        if cur_len > max_len:
            break

        # Else add it to the text that is being returned
        returns.append(row["text"])
    
    print(len(returns))

    # Return the context
    return "\n\n###\n\n".join(returns)

In [12]:
context = create_context("Can you summarize the main findings of 'CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganography'?")

1


In [20]:
print(context[:1000])

CRoSS: Diffusion Model Makes
Controllable, Robust and Secure Image Steganography
Jiwen Yu1
Xuanyu Zhang1
Youmin Xu1,2
Jian Zhang1†
1 Peking University Shenzhen Graduate School
2 Peng Cheng Laboratory
Abstract
Current image steganography techniques are mainly focused on cover-based meth-
ods, which commonly have the risk of leaking secret images and poor robustness
against degraded container images. Inspired by recent developments in diffu-
sion models, we discovered that two properties of diffusion models, the ability to
achieve translation between two images without training, and robustness to noisy
data, can be used to improve security and natural robustness in image steganogra-
phy tasks. For the choice of diffusion model, we selected Stable Diffusion, a type
of conditional diffusion model, and fully utilized the latest tools from open-source
communities, such as LoRAs and ControlNets, to improve the controllability and
diversity of container images. In summary, we propose a novel i

In [21]:
q_embeddings = client.embeddings.create(input="Can you summarize the main findings of 'CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganography'?", model='text-embedding-3-small').data[0].embedding

In [22]:
df['distances'] = df['embedding'].apply(lambda x: cosine_similarity(x, q_embeddings))

In [23]:
sorted = df.sort_values('distances', ascending=False)

In [24]:
sorted.head()

Unnamed: 0,text,n_tokens,embedding,distances
22994,"CRoSS: Diffusion Model Makes\nControllable, Ro...",7962.0,"[0.020469989627599716, 0.0027696345932781696, ...",0.597551
6946,Hiding Images in Plain Sight:\n\nDeep Steganog...,7929.0,"[0.0006676532211713493, 0.007436826359480619, ...",0.580511
7279,Generating steganographic images via adversari...,7929.0,"[0.04593009501695633, 0.0037568772677332163, 0...",0.561688
17290,Hiding Images in Deep Probabilistic Models\nHa...,7925.0,"[-0.0010277237743139267, -0.02994450181722641,...",0.50104
22467,Collaborative Score Distillation\nfor Consiste...,2115.0,"[0.024803483858704567, 0.03232729062438011, 0....",0.47572


In [25]:
def answer_question(
    model="gpt-4o-mini",
    question="Can you summarize the main findings of 'CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganography'?",
    max_len=8000,
    debug=False,
    max_tokens=150,
    stop_sequence=None
):
    """
    Answer a question based on the most similar context from the dataframe texts
    """
    context = create_context(
        question,
        max_len=max_len
    )
    # If debug, print the raw model response
    if debug:
        print("Context:\n" + context)
        print("\n\n")

    try:
        # Create a chat completion using the question and context
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": "Answer the question based on the context below, and if the question can't be answered based on the context, say \"I don't know\"\n\n"},
                {"role": "user", "content": f"Question: {question}\n\n---\n\nContext: {context}"}
            ]
            # temperature=0,
            # max_tokens=max_tokens,
            # top_p=1,
            # frequency_penalty=0,
            # presence_penalty=0,
            # stop=stop_sequence,
        )
        return response.choices[0].message.content
    except Exception as e:
        print(e)
        return ""

In [26]:
answer = answer_question()

1


In [27]:
print(answer)

The main findings of the paper "CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganography" highlight the development of a novel image steganography framework that leverages diffusion models, specifically the Stable Diffusion model, to enhance controllability, robustness, and security compared to traditional cover-based methods. 

Key points include:

1. **Improved Security**: Traditional cover-based steganography often leaks information about the secret image due to artifacts left in the container image. The CRoSS framework utilizes the properties of diffusion models to create a coverless steganography approach that makes it difficult for unauthorized receivers to infer any hidden data.

2. **Enhanced Controllability**: By incorporating techniques such as LoRAs and ControlNets from the Stable Diffusion community, the framework allows users to control the content of the container images more effectively while maintaining high visual quality.

3. **Robustness to De