# Practical Exercise: DNA Sequence Analysis

### Scenario
You are a bioinformatician analyzing DNA sequences. You need to write a Python program to calculate various properties of DNA sequences and perform some common bioinformatics tasks using variables, functions, and NumPy.

### Objectives
- Store DNA sequences in variables
- Write functions to compute the GC content of a DNA sequence
- Use NumPy to perform statistical analysis on sequence lengths
- Solve a simple biological equation related to DNA sequence analysis

---

## Step 1: Define Variables

**Task**: Store multiple DNA sequences and their names using variables.


In [2]:
# Step 1: Define Variables

# DNA sequences
sequence_1 = "ATGCGTAACGTA"
sequence_2 = "CGTACGTAGCTA"
sequence_3 = "TACGATCGTACG"
sequence_4 = "GCTAGCTAGCTA"
sequence_5 = "ATCGTACGATCG"

# Sequence names
name_1 = "Seq1"
name_2 = "Seq2"
name_3 = "Seq3"
name_4 = "Seq4"
name_5 = "Seq5"

# Print the sequences and their names to verify
print(f"{name_1}: {sequence_1}")
print(f"{name_2}: {sequence_2}")
print(f"{name_3}: {sequence_3}")
print(f"{name_4}: {sequence_4}")
print(f"{name_5}: {sequence_5}")


Seq1: ATGCGTAACGTA
Seq2: CGTACGTAGCTA
Seq3: TACGATCGTACG
Seq4: GCTAGCTAGCTA
Seq5: ATCGTACGATCG


---

## Step 2: Write Functions

### Task 1: Calculate GC Content

Write a function to calculate the GC content of a DNA sequence. The GC content is the percentage of bases that are either G or C.


In [3]:
# Step 2: Write Functions

def gc_content(sequence):
    """
    Calculate the GC content of a DNA sequence.

    Parameters:
    sequence (str): The DNA sequence.

    Returns:
    float: The GC content as a percentage.
    """
    # Count the occurrences of G and C
    g_count = sequence.count('G')
    c_count = sequence.count('C')
    gc_count = g_count + c_count
    
    # Calculate the percentage
    gc_percentage = (gc_count / len(sequence)) * 100
    return gc_percentage

# Test the gc_content function on each sequence individually
gc_content_1 = gc_content(sequence_1)
gc_content_2 = gc_content(sequence_2)
gc_content_3 = gc_content(sequence_3)
gc_content_4 = gc_content(sequence_4)
gc_content_5 = gc_content(sequence_5)

print(f"GC Content of {name_1}: {gc_content_1:.2f}%")
print(f"GC Content of {name_2}: {gc_content_2:.2f}%")
print(f"GC Content of {name_3}: {gc_content_3:.2f}%")
print(f"GC Content of {name_4}: {gc_content_4:.2f}%")
print(f"GC Content of {name_5}: {gc_content_5:.2f}%")


GC Content of Seq1: 41.67%
GC Content of Seq2: 50.00%
GC Content of Seq3: 50.00%
GC Content of Seq4: 50.00%
GC Content of Seq5: 50.00%


### Task 2: Calculate AT Content

Write a function to calculate the AT content of a DNA sequence. The AT content is the percentage of bases that are either A or T.


In [4]:
def at_content(sequence):
    """
    Calculate the AT content of a DNA sequence.

    Parameters:
    sequence (str): The DNA sequence.

    Returns:
    float: The AT content as a percentage.
    """
    # Count the occurrences of A and T
    a_count = sequence.count('A')
    t_count = sequence.count('T')
    at_count = a_count + t_count
    
    # Calculate the percentage
    at_percentage = (at_count / len(sequence)) * 100
    return at_percentage

# Test the at_content function on each sequence individually
at_content_1 = at_content(sequence_1)
at_content_2 = at_content(sequence_2)
at_content_3 = at_content(sequence_3)
at_content_4 = at_content(sequence_4)
at_content_5 = at_content(sequence_5)

print(f"AT Content of {name_1}: {at_content_1:.2f}%")
print(f"AT Content of {name_2}: {at_content_2:.2f}%")
print(f"AT Content of {name_3}: {at_content_3:.2f}%")
print(f"AT Content of {name_4}: {at_content_4:.2f}%")
print(f"AT Content of {name_5}: {at_content_5:.2f}%")


AT Content of Seq1: 58.33%
AT Content of Seq2: 50.00%
AT Content of Seq3: 50.00%
AT Content of Seq4: 50.00%
AT Content of Seq5: 50.00%


---

## Step 3: Statistical Analysis with NumPy

**Task**: Calculate Mean and Standard Deviation

Use NumPy to calculate the mean and standard deviation of the lengths of the DNA sequences.


In [5]:
# Step 3: Statistical Analysis with NumPy

import numpy as np

# Calculate lengths of sequences
length_1 = len(sequence_1)
length_2 = len(sequence_2)
length_3 = len(sequence_3)
length_4 = len(sequence_4)
length_5 = len(sequence_5)

# Convert lengths to NumPy array
lengths_array = np.array([length_1, length_2, length_3, length_4, length_5])

# Calculate mean and standard deviation
mean_length = np.mean(lengths_array)
std_dev_length = np.std(lengths_array)

print(f"Mean Length: {mean_length}")
print(f"Standard Deviation of Lengths: {std_dev_length}")


Mean Length: 12.0
Standard Deviation of Lengths: 0.0


---

## Step 4: Solve a Biological Equation

**Task**: Calculate DNA Concentration Over Time

In a DNA sample, the concentration of a DNA fragment can be modeled using the equation:
`C(t) = C_0 * e^(-kt)`
where `C(t)` is the concentration at time `t`, `C_0` is the initial concentration, and `k` is the decay constant.


In [6]:
# Step 4: Solve a Biological Equation

import numpy as np

def concentration(C0, k, t):
    """
    Calculate the concentration of a DNA fragment at time t.

    Parameters:
    C0 (float): Initial concentration.
    k (float): Decay constant.
    t (float or np.array): Time.

    Returns:
    float or np.array: Concentration at time t.
    """
    return C0 * np.exp(-k * t)

# Define parameters
C0 = 100  # Initial concentration
k = 0.1   # Decay constant
time_points = np.array([0, 1, 2, 3, 4, 5])  # Time points

# Calculate concentrations at each time point
concentrations = concentration(C0, k, time_points)

# Print results
for i in range(len(time_points)):
    print(f"At time {time_points[i]}: Concentration = {concentrations[i]:.2f}")


At time 0: Concentration = 100.00
At time 1: Concentration = 90.48
At time 2: Concentration = 81.87
At time 3: Concentration = 74.08
At time 4: Concentration = 67.03
At time 5: Concentration = 60.65


---

# Conclusion
By completing this exercise, you have:
- Stored and manipulated DNA sequences using variables
- Written functions to calculate GC content and AT content of DNA sequences
- Used NumPy for statistical analysis and solving equations
