# RNA Embedding Model Tutorial
This tutorial will guide you through how to use the RNA embedding model using the `OmniGenomeModelForEmbedding` class. We will cover initializing the model, encoding RNA sequences, saving/loading embeddings, and computing similarities.

## Step 1: Install Required Dependencies
Before we start, make sure you have the necessary libraries installed. You can install them using the following command:

In [1]:
!pip install OmniGenome torch transformers autocuda





## Step 2: Setting Up the Embedding Model
First, let's initialize the `OmniGenomeModelForEmbedding` class with a pre-trained model.

In [2]:
from omnigenome import OmniGenomeModelForEmbedding
import torch

# Initialize the model using a pre-trained model path (replace with RNA-specific model if available)
model_name = "yangheng/OmniGenome-52M"  # Example model, replace with your own model
embedding_model = OmniGenomeModelForEmbedding(model_name, trust_remote_code=True).to(torch.device("cuda:0")).to(torch.float16)

  from .autonotebook import tqdm as notebook_tqdm


                       
      **@@ +----- @@**             / _ \  _ __ ___   _ __  (_)
        **@@ = @@**               | | | || '_ ` _ \ | '_ \ | |
           **@@                   | |_| || | | | | || | | || |
        @@** = **@@                \___/ |_| |_| |_||_| |_||_|
     @@** ------+ **@@                
  @@ ---------------+ @@          / ___|  ___  _ __    ___   _ __ ___    ___ 
  @@ +--------------- @@         | |_| ||  __/| | | || (_) || | | | | ||  __/ 
    @@** +------ **@@          
       @@** = **@@           
          @@**                    ____                      _   
       **@@ = @@**               | __ )   ___  _ __    ___ | |__  
    **@@ -----+  @@**            |  _ \  / _ \| '_ \  / __|| '_ \ 
  @@ --------------+ @@**        |____/  \___||_| |_| \___||_| |_|



Some weights of OmniGenomeModel were not initialized from the model checkpoint at yangheng/OmniGenome-52M and are newly initialized: ['OmniGenome.pooler.dense.bias', 'OmniGenome.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Step 3: Encoding RNA Sequences into Embeddings
We'll now encode a batch of RNA sequences into embeddings.

In [3]:
# Example RNA sequences (replace these with your own RNA sequences)
rna_sequences = [
    "AUGGCUACG",
    "CGGAUACGGC",
    "UGGCCAAGUC",
    "AUGCUGCUAUGCUA"
]
# Encode the RNA sequences into embeddings
rna_embeddings = embedding_model.batch_encode(rna_sequences, agg='mean')

# Display the generated embeddings
print("RNA Embeddings:")
print(rna_embeddings)

[2025-06-16 22:58:40] [OmniGenome 0.2.6alpha0]  Generated embeddings for 4 sequences.
RNA Embeddings:
tensor([[-0.4038, -1.0078, -0.0919,  ..., -0.6841, -0.9468, -0.2502],
        [-0.2445, -0.7437, -0.2668,  ..., -0.2125, -0.9575, -0.1359],
        [-0.4094, -0.8535, -0.0769,  ..., -0.5132, -0.5581, -0.3665],
        [-0.3696, -0.7798, -0.0314,  ..., -0.6567, -1.0420, -0.0429]],
       dtype=torch.float16)


## Step 4: Saving and Loading Embeddings
You can save the generated embeddings to a file and load them later when needed.

In [4]:
# Save embeddings to a file
embedding_model.save_embeddings(rna_embeddings, "rna_embeddings.pt")

# Load the embeddings from the file
loaded_embeddings = embedding_model.load_embeddings("rna_embeddings.pt")

# Display the loaded embeddings to verify
print("Loaded RNA Embeddings:")
print(loaded_embeddings)

[2025-06-16 22:58:40] [OmniGenome 0.2.6alpha0]  Embeddings saved to rna_embeddings.pt
[2025-06-16 22:58:40] [OmniGenome 0.2.6alpha0]  Loaded embeddings from rna_embeddings.pt
Loaded RNA Embeddings:
tensor([[-0.4038, -1.0078, -0.0919,  ..., -0.6841, -0.9468, -0.2502],
        [-0.2445, -0.7437, -0.2668,  ..., -0.2125, -0.9575, -0.1359],
        [-0.4094, -0.8535, -0.0769,  ..., -0.5132, -0.5581, -0.3665],
        [-0.3696, -0.7798, -0.0314,  ..., -0.6567, -1.0420, -0.0429]],
       dtype=torch.float16)


## Step 5: Computing Similarity Between RNA Sequences
Let's compute the similarity between two RNA sequence embeddings using cosine similarity.

In [5]:
# Compute the similarity between the first two RNA sequence embeddings
similarity = embedding_model.compute_similarity(loaded_embeddings[0], loaded_embeddings[1])

# Display the similarity score
print(f"Similarity between the first two RNA sequences: {similarity:.4f}")

Similarity between the first two RNA sequences: 0.9395


## Step 6: Encoding a Single RNA Sequence
You can also encode a single RNA sequence into its embedding.

In [6]:
# Example single RNA sequence
single_rna_sequence = "AUGGCUACG"

# Get the embedding for the single RNA sequence

head_rna_embedding = embedding_model.encode(rna_sequences[0], agg='head', keep_dim=True)  # Encode a single RNA sequence
mean_rna_embedding = embedding_model.encode(rna_sequences[0], agg='mean')  # Encode a single RNA sequence
tail_rna_embedding = embedding_model.encode(rna_sequences[0], agg='tail')  # Encode a single RNA sequence

# Display the embedding for the single RNA sequence
print("Single RNA Sequence Embedding:")
print(head_rna_embedding)

  right=ast.Str(s=sentinel),
  return Constant(*args, **kwargs)


AttributeError: 'OmniGenomeModelForEmbedding' object has no attribute 'encode_single_sequence'

## Full Example
Here's a complete example that walks through all the steps we covered in the tutorial.

In [None]:
from omnigenome import OmniGenomeModelForEmbedding

# Step 1: Initialize the model
model_name = "yangheng/OmniGenome-52M"  # Replace with your RNA-specific model
embedding_model = OmniGenomeModelForEmbedding(model_name)

# Step 2: Encode RNA sequences
rna_sequences = ["AUGGCUACG", "CGGAUACGGC"]
rna_embeddings = embedding_model.encode_sequences(rna_sequences)
print("RNA Embeddings:", rna_embeddings)

# Step 3: Save embeddings to a file
embedding_model.save_embeddings(rna_embeddings, "rna_embeddings.pt")

# Step 4: Load embeddings from the file
loaded_embeddings = embedding_model.load_embeddings("rna_embeddings.pt")

# Step 5: Compute similarity between the first two RNA sequence embeddings
similarity = embedding_model.compute_similarity(loaded_embeddings[0], loaded_embeddings[1])
print(f"Similarity between RNA sequences: {similarity:.4f}")

# Step 6: Encode a single RNA sequence
single_rna_sequence = "AUGGCUACG"
single_rna_embedding = embedding_model.encode_single_sequence(single_rna_sequence)
print("Single RNA Sequence Embedding:", single_rna_embedding)