# A class is the template for an object (instance)

In Python we constantly use **objects** (like strings!), and their methods (like `astring.len()`). A method is like a function, but it belongs to an object created from a class. Methods can use and change the object's data.

This short notebook gives and idea about how to create a custom class.

We will create a class for storing sequences.

In [3]:
# What is a sequence?
sequence = "GATTACAGATTACGTACTCGTACTTACGTGAGGTGT"
seq_name = "my_sequence"
print(f">{seq_name}\n{sequence}")


>my_sequence
GATTACAGATTACGTACTCGTACTTACGTGAGGTGT


Now let's try to formalise a class for the storing the sequence. **At the end of this notebook an explanation**

In [4]:
class DNASequence:
    def __init__(self, sequence, name, description=""):
        self.sequence = sequence.upper()
        self.name = name
        self.description = description

    def __len__(self):
        return len(self.sequence)

    def __str__(self):
        desc_str = f" {self.description}" if self.description else ""
        return f">{self.name}{desc_str}\n{self.sequence}"

    def rc(self):
        complement = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
        rev_comp = ''.join(complement[base] for base in reversed(self.sequence))
        return DNASequence(rev_comp, self.name + "_rc", self.description)


now we can create as many instances of this class as we need. Let's try:

In [6]:
dna1 = DNASequence("ATGCGTA", "SampleSeq", "Example description")
dna2 = DNASequence("atgcgta", "SampleSeq")
print(dna1)               # Prints in FASTA format
print(dna2)


>SampleSeq Example description
ATGCGTA
>SampleSeq
ATGCGTA


as you can see when we print an object, if we define the special `__str__` method Python will use that to render the content as we wish.

We can now try some more methods:

In [8]:

print("Length dna1:", len(dna1))  # Prints the length of the sequence
print("Length dna2:", len(dna1))  # Prints the length of the sequence

print("----\nreverse complement of dna1:")
print(dna1.rc())          # Prints the reverse complement in FASTA format



Length dna1: 7
Length dna2: 7
----
reverse complement of dna1:
>SampleSeq_rc Example description
TACGCAT


what happens if we compare the two sequences?

In [9]:
if (dna1 == dna2):
    print("Sequences are the same")
else:
    print("Sequences are different")

Sequences are different


HINT: we can also control the equality with `__eq__`...

## Some explanations

The functions inside the class are called methods, and some are special (surrounded by double underscore). We can create as many methods as we want, like our `rc()` method.

### Constructor (`__init__` method)

a constructor is a special function in a class that gets called automatically when you create a new instance (object) of that class. Think of it as a way to set up your object with some initial values or settings.

Key Points about a Constructor:

*    Automatic Call: When you create a new object, the constructor runs by itself.
*    Initialization: It sets up the object's initial state by assigning values to its attributes.
*    Special Name: In Python, the constructor method is always named __init__.

In our case it takes a **sequence**, **name**, and an optional **description**.
Converts the sequence to uppercase to ensure consistency.

### `__len__` method:

the `__len__` method in a class allows you to use the built-in len() function to get the length of an object. It's like giving your custom objects the ability to tell how big they are. 

In our case it makes sense to simply return the length of the sequence (ignoring name and description)

### `__str__` method

Formats the output as a FASTA formatted string when the object is printed. When python is asked to prints an object it will use the __str__ method. 

If that is missing... try yourself and see!

### rc() custom method

Computes the reverse complement of the DNA sequence.

Creates and returns a new DNASequence object with the reverse complement sequence.

In [None]:
class DNASequence:
    # Standard genetic code dictionary
    codon_table = {
        'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
        'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
        'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
        'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
        'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
        'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
        'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
        'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
        'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
        'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
        'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
        'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
        'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
        'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
        'TAC':'Y', 'TAT':'Y', 'TAA':'*', 'TAG':'*',
        'TGC':'C', 'TGT':'C', 'TGA':'*', 'TGG':'W',
    }

    def __init__(self, sequence, name, description=""):
        self.sequence = sequence.upper()
        self.name = name
        self.description = description

    def __len__(self):
        return len(self.sequence)

    def __str__(self):
        desc_str = f" {self.description}" if self.description else ""
        return f">{self.name}{desc_str}\n{self.sequence}"

    def __eq__(self, other):
        return self.sequence == other.sequence
    
    def rc(self):
        complement = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
        rev_comp = ''.join(complement[base] for base in reversed(self.sequence))
        return DNASequence(rev_comp, self.name + "_rc", self.description)
    
    def translate(self):
        protein = []
        # Process the DNA sequence in chunks of three (codons)
        for i in range(0, len(self.sequence), 3):
            codon = self.sequence[i:i+3]
            if len(codon) == 3:
                protein.append(self.codon_table.get(codon, 'X'))  # 'X' for unknown codons
        return ''.join(protein)