# Session 11: Practical (Computational Biology)

## Introduction

In this practical, you will apply your knowledge of Python concepts including dictionaries, lists, arrays, conditionals, loops, functions, and variables to perform a computational analysis of protein data. This exercise will involve reading protein data, processing it, and extracting meaningful insights.

### Objectives

- Use dictionaries to store protein data
- Use lists and arrays to manage and process data
- Apply loops and conditionals to analyze the data
- Write functions to modularize the tasks
- Use variables to store intermediate results and perform calculations

**Estimated Time:** 1.5 hours

## Part 1: Reading and Storing Protein Data (30 minutes)

### Task 1.1: Create a Protein Data Dictionary

You will start by creating a dictionary to store protein data. Each protein will have an identifier, sequence, and length.

**Instructions:**

Create a dictionary where the keys are protein identifiers and the values are dictionaries containing the sequence and length of each protein.

Use the following sample data to populate your dictionary:

Protein Data:
- ID: P1, Sequence: MVLTIYPDELVQIVSDKK, Length: 19
- ID: P2, Sequence: QLVQIVSDLKLLV, Length: 13
- ID: P3, Sequence: MVLTIY, Length: 6
- ID: P4, Sequence: KLLVSDKKQIV, Length: 11

**Hints:**

- Use nested dictionaries to store sequence and length.
- Use the len() function to calculate the length of the sequences.

**Solution:**

In [2]:
# Creating the protein data dictionary
proteins = {
    "P1": {"sequence": "MVLTIYPDELVQIVSDKK", "length": 19},
    "P2": {"sequence": "QLVQIVSDLKLLV", "length": 13},
    "P3": {"sequence": "MVLTIY", "length": 6},
    "P4": {"sequence": "KLLVSDKKQIV", "length": 11}
}

# Printing the protein data dictionary
print(proteins)


{'P1': {'sequence': 'MVLTIYPDELVQIVSDKK', 'length': 19}, 'P2': {'sequence': 'QLVQIVSDLKLLV', 'length': 13}, 'P3': {'sequence': 'MVLTIY', 'length': 6}, 'P4': {'sequence': 'KLLVSDKKQIV', 'length': 11}}


# Part 2: Analyzing Protein Sequences (30 minutes)

## Task 2.1: Counting Amino Acid Occurrences

Write a function `count_amino_acids` that takes a protein sequence as input and returns a dictionary with the count of each amino acid.

**Instructions:**

1. Define the function `count_amino_acids`.
2. Use a loop to iterate through the sequence and count each amino acid.
3. Store the counts in a dictionary where keys are amino acids and values are their counts.

**Hints:**

- Initialize an empty dictionary to store counts.
- Use if statements to check if an amino acid is already in the dictionary and update the count accordingly.

**Solution:**

In [3]:
def count_amino_acids(sequence):
    counts = {}
    for amino_acid in sequence:
        if amino_acid in counts:
            counts[amino_acid] += 1
        else:
            counts[amino_acid] = 1
    return counts

# Test the function
sequence = "MVLTIYPDELVQIVSDKK"
print(count_amino_acids(sequence))


{'M': 1, 'V': 3, 'L': 2, 'T': 1, 'I': 2, 'Y': 1, 'P': 1, 'D': 2, 'E': 1, 'Q': 1, 'S': 1, 'K': 2}


## Task 2.2: Calculate Amino Acid Frequencies

Write a function `calculate_frequencies` that takes the amino acid counts and the length of the sequence and returns a dictionary with the frequency of each amino acid.

**Instructions:**

1. Define the function `calculate_frequencies`.
2. Use a loop to iterate through the counts dictionary.
3. Calculate the frequency by dividing the count by the total length.
4. Store the frequencies in a new dictionary.

**Hints:**

- Use a dictionary comprehension to create the frequencies dictionary.

**Solution:**

In [4]:
def calculate_frequencies(counts, length):
    frequencies = {amino_acid: count / length for amino_acid, count in counts.items()}
    return frequencies

# Test the function
counts = count_amino_acids(sequence)
length = len(sequence)
print(calculate_frequencies(counts, length))


{'M': 0.05555555555555555, 'V': 0.16666666666666666, 'L': 0.1111111111111111, 'T': 0.05555555555555555, 'I': 0.1111111111111111, 'Y': 0.05555555555555555, 'P': 0.05555555555555555, 'D': 0.1111111111111111, 'E': 0.05555555555555555, 'Q': 0.05555555555555555, 'S': 0.05555555555555555, 'K': 0.1111111111111111}


# Part 3: Comparing Protein Sequences (30 minutes)

## Task 3.1: Find Common Amino Acids

Write a function `find_common_amino_acids` that takes two protein sequences and returns a set of common amino acids.

**Instructions:**

1. Define the function `find_common_amino_acids`.
2. Convert each sequence to a set of amino acids.
3. Find the intersection of the two sets to get the common amino acids.

**Hints:**

- Use the `set()` function to convert sequences to sets.
- Use the `&` operator to find the intersection of two sets.

**Solution:**

In [5]:
def find_common_amino_acids(seq1, seq2):
    set1 = set(seq1)
    set2 = set(seq2)
    common_amino_acids = set1 & set2
    return common_amino_acids

# Test the function
seq1 = "MVLTIYPDELVQIVSDKK"
seq2 = "QLVQIVSDLKLLV"
print(find_common_amino_acids(seq1, seq2))


{'L', 'S', 'D', 'K', 'Q', 'I', 'V'}


## Task 3.2: Compare Protein Lengths

Write a function `compare_lengths` that takes a list of protein lengths and returns the longest and shortest lengths.

**Instructions:**

1. Define the function `compare_lengths`.
2. Use the `max()` and `min()` functions to find the longest and shortest lengths.
3. Return the longest and shortest lengths.

**Hints:**

- You can use the `max()` and `min()` functions directly on a list of numbers.

**Solution:**

In [6]:
def compare_lengths(lengths):
    longest = max(lengths)
    shortest = min(lengths)
    return longest, shortest

# Test the function
lengths = [19, 13, 6, 11]
print(compare_lengths(lengths))


(19, 6)


## Conclusion

In this practical, you have applied various Python concepts to perform a computational analysis of protein data. You have learned to:

- Create and manipulate dictionaries to store protein data
- Count and calculate frequencies of amino acids in protein sequences
- Find common amino acids between sequences
- Compare protein lengths

### Additional Practice Problems

**Identify Unique Amino Acids:**

Write a function `identify_unique_amino_acids` that takes a list of protein sequences and returns a set of amino acids that appear in only one sequence.

**Solution:**

In [7]:
def identify_unique_amino_acids(sequences):
    unique_amino_acids = set()
    common_amino_acids = set()
    
    for sequence in sequences:
        sequence_set = set(sequence)
        unique_amino_acids |= (sequence_set - common_amino_acids)
        common_amino_acids |= sequence_set
    
    return unique_amino_acids

# Example usage:
protein_sequences = ["MVLTIYPDELVQIVSDKK", "QLVQIVSDLKLLV", "MVLTIY", "KLLVSDKKQIV"]
print(identify_unique_amino_acids(protein_sequences))


{'L', 'S', 'Y', 'D', 'P', 'T', 'K', 'Q', 'I', 'M', 'E', 'V'}


**Calculate Average Protein Length:**

Write a function `average_length` that takes a list of protein lengths and returns the average length.

**Solution:**

In [8]:
def average_length(lengths):
    if not lengths:
        return 0
    return sum(lengths) / len(lengths)

# Example usage:
protein_lengths = [19, 13, 6, 11]
print(average_length(protein_lengths))


12.25


**Find Proteins with Specific Amino Acid:**

Write a function `find_proteins_with_amino_acid` that takes a list of protein sequences and an amino acid, and returns a list of protein IDs that contain the specified amino acid.

**Solution:**

In [9]:
def find_proteins_with_amino_acid(sequences, amino_acid):
    proteins_with_amino_acid = []
    
    for protein_id, sequence in sequences.items():
        if amino_acid in sequence:
            proteins_with_amino_acid.append(protein_id)
    
    return proteins_with_amino_acid

# Example usage:
protein_sequences = {
    "P1": "MVLTIYPDELVQIVSDKK",
    "P2": "QLVQIVSDLKLLV",
    "P3": "MVLTIY",
    "P4": "KLLVSDKKQIV"
}
amino_acid = "K"
print(find_proteins_with_amino_acid(protein_sequences, amino_acid))


['P1', 'P2', 'P4']
