## QTM 350: Data Science Computing

### Assignment 06 - AI-Assisted Programming

### Due 05 March 2025

### Instructions

In this assignment, you will use AI tools to help you generate, refactor, and explain code. Ideally, use GitHub Copilot, but if you cannot install the software, feel free to use other available tools (free or otherwise). Please name the tools you use in your assignment.

The main idea is to use natural language as much as possible, whilst remaining attentive to any mistakes the AI tool may produce. Your code should run without issues and provide the correct results. Please feel free to include tests and screenshots.

As always, should you have any questions, please let me know.

### Tasks

1. Use the `/explain` command in Copilot to get an explanation of the following code snippet:

```python
def fibonacci(n):
    if n <= 0:
        return []
    elif n == 1:
        return [0]
    elif n == 2:
        return [0, 1]
    else:
        fib = [0, 1]
        for i in range(2, n):
            fib.append(fib[i-1] + fib[i-2])
        return fib
```

2. Create a new Git repository using GitHub CLI and Copilot suggestions. The suggestions should include code to initialise the repository, add a `README.md` and a `.gitignore` file for Python projects, then add, commit, and push the changes to the repository. Please include the link to your repository below.

3. Use Copilot to refactor the following code to improve its efficiency and readability:

```python
def is_prime(num):
    if num <= 1:
        return False
    for i in range(2, num):
        if num % i == 0:
            return False
    return True

primes = []
for i in range(1, 101):
    if is_prime(i):
        primes.append(i)

print(primes)
```

4. Use the `@terminal` command to generate CLI commands that create a new directory named `data_analysis`, navigate into the directory, create a new Python file called `analysis.py`, and add a shebang line (`#!/usr/bin/env python`) at the top of the file.

5. Write a Python function that calculates the factorial of a number. Deliberately introduce an error in the function, then use Copilot's `/fix` command to identify and correct the issue.

6. Use Copilot to generate documentation comments for the following `R` factorial function:

```r
calculate_factorial <- function(n) {
  if (n == 0) {
    return(1)
  } else {
    return(n * calculate_factorial(n - 1))
  }
}
```

7. Use the `/explain` command to get an explanation for the following Python code snippet:

```python
def normalize_data(data):
    min_val = min(data)
    max_val = max(data)
    return [(x - min_val) / (max_val - min_val) for x in data]
```

8. Ask Copilot to translate the following `R` code to `Python`:

```r
library(tidyverse)

# Create a sample dataframe with missing values
df <- tibble(
  A = c(1, 2, NA, 4, 5),
  B = c(NA, 2, 3, 4, 5),
  C = c(1, 2, 3, NA, 5),
  D = c(1, 2, 3, 4, 5)
)

# Define the function
count_missing_values <- function(df) {
  sapply(df, function(col) sum(is.na(col)))
}

# Test the function
result <- count_missing_values(df)
print(result)
```

9. Write a function in both `R` and `Python` that takes a string and returns the number of vowels in the string.

10. Ask Copilot to translate the following binary code into text:

```verbatim
01011001 01101111 01110101 00100111 01110110 01100101 00100000 01100011 01101111 01101101 01110000 01101100 01100101 01110100 01100101 01100100 00100000 01110100 01101000 01100101 00100000 01100001 01110011 01110011 01101001 01100111 01101110 01101101 01100101 01101110 01110100 00101110 00100000 01000011 01101111 01101110 01100111 01110010 01100001 01110100 01110101 01101100 01100001 01110100 01101001 01101111 01101110 01110011 00100001
```



### Assignment 06 was completed using GitHub Copilot.

1. Use the `/explain` command in Copilot to get an explanation of the following code snippet:

```python
def fibonacci(n):
    if n <= 0:
        return []
    elif n == 1:
        return [0]
    elif n == 2:
        return [0, 1]
    else:
        fib = [0, 1]
        for i in range(2, n):
            fib.append(fib[i-1] + fib[i-2])
        return fib
```

yejinannachoi: /explain 

GitHub Copilot: The provided code defines a function named `fibonacci` that generates a list containing the first `n` numbers in the Fibonacci sequence. The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones, typically starting with 0 and 1.

The function begins by checking if the input `n` is less than or equal to 0. If so, it returns an empty list, as there are no Fibonacci numbers to generate. If `n` is exactly 1, the function returns a list containing only the first Fibonacci number, which is `[0]`. If `n` is 2, it returns the first two Fibonacci numbers, `[0, 1]`.

For values of `n` greater than 2, the function initializes a list `fib` with the first two Fibonacci numbers, `[0, 1]`. It then uses a `for` loop to iterate from 2 up to `n-1`. In each iteration, it calculates the next Fibonacci number by summing the last two numbers in the list (`fib[i-1]` and `fib[i-2]`) and appends this new number to the list `fib`. After the loop completes, the function returns the list `fib`, which now contains the first `n` Fibonacci numbers.

This implementation efficiently generates the Fibonacci sequence using an iterative approach, ensuring that the function handles various edge cases (such as `n` being 0, 1, or 2) correctly.

2. Create a new Git repository using GitHub CLI and Copilot suggestions. The suggestions should include code to initialise the repository, add a `README.md` and a `.gitignore` file for Python projects, then add, commit, and push the changes to the repository. Please include the link to your repository below.

In [None]:
https://github.com/yejinannachoi/06-assignment

3. Use Copilot to refactor the following code to improve its efficiency and readability:

```python
def is_prime(num):
    if num <= 1:
        return False
    for i in range(2, num):
        if num % i == 0:
            return False
    return True

primes = []
for i in range(1, 101):
    if is_prime(i):
        primes.append(i)

print(primes)
```

is_prime function refactored using Copilot

In [None]:
def is_prime(num):
    if num <= 1:
        return False
    if num == 2:
        return True
    if num % 2 == 0:
        return False
    for i in range(3, int(num**0.5) + 1, 2):
        if num % i == 0:
            return False
    return True

primes = [i for i in range(1, 101) if is_prime(i)]

print(primes)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]


4. Use the `@terminal` command to generate CLI commands that create a new directory named `data_analysis`, navigate into the directory, create a new Python file called `analysis.py`, and add a shebang line (`#!/usr/bin/env python`) at the top of the file.

In [5]:
%%bash
cd ~/Desktop/"qtm 350"
mkdir data_analysis && cd data_analysis && touch analysis.py && echo '#!/usr/bin/env python3' > analysis.py

5. Write a Python function that calculates the factorial of a number. Deliberately introduce an error in the function, then use Copilot's `/fix` command to identify and correct the issue.

factorial function with deliberate error

In [None]:
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n - 1)

# Deliberate error: incorrect base case
def factorial_with_error(n):
    if n == 1:  # This should be `if n == 0`
        return 1
    else:
        return n * factorial_with_error(n - 1)

factorial function with error fixed

In [None]:
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n - 1)

# Corrected function: fixed base case
def factorial_with_error(n):
    if n == 0:  # Corrected base case
        return 1
    else:
        return n * factorial_with_error(n - 1)

6. Use Copilot to generate documentation comments for the following `R` factorial function:

```r
calculate_factorial <- function(n) {
  if (n == 0) {
    return(1)
  } else {
    return(n * calculate_factorial(n - 1))
  }
}
```

In [None]:
#' Calculate the factorial of a number
#'
#' This function calculates the factorial of a given non-negative integer `n`.
#' The factorial of `n` (denoted as `n!`) is the product of all positive integers
#' less than or equal to `n`.
#'
#' @param n A non-negative integer whose factorial is to be calculated.
#' @return The factorial of the input number `n`.
#' @examples
#' calculate_factorial(5) # returns 120
#' calculate_factorial(0) # returns 1
calculate_factorial <- function(n) {
  if (n == 0) {
    return(1)
  } else {
    return(n * calculate_factorial(n - 1))
  }
}

7. Use the `/explain` command to get an explanation for the following Python code snippet:

```python
def normalize_data(data):
    min_val = min(data)
    max_val = max(data)
    return [(x - min_val) / (max_val - min_val) for x in data]
```

yejinannachoi: /explain 

GitHub Copilot: The provided code defines a function named `normalize_data` in Python. This function takes a single argument, `data`, which is expected to be a list of numerical values. The purpose of this function is to normalize the data, which means adjusting the values in the list so that they fall within a specific range, typically between 0 and 1.

The function begins by determining the minimum and maximum values in the `data` list using the built-in `min()` and `max()` functions, respectively. These values are stored in the variables `min_val` and `max_val`. The normalization process involves transforming each value `x` in the list such that it is scaled according to the formula `(x - min_val) / (max_val - min_val)`. This formula shifts the data so that the minimum value becomes 0 and the maximum value becomes 1, with all other values proportionally adjusted between these two extremes.

The function then uses a list comprehension to apply this normalization formula to each element in the `data` list. The list comprehension iterates over each element `x` in `data`, applies the normalization formula, and constructs a new list with the normalized values. Finally, the function returns this new list of normalized values.

This normalization technique is useful in various data processing and machine learning tasks where it is important to have data within a consistent range, as it can improve the performance and convergence of certain algorithms.

8. Ask Copilot to translate the following `R` code to `Python`:

```r
library(tidyverse)

# Create a sample dataframe with missing values
df <- tibble(
  A = c(1, 2, NA, 4, 5),
  B = c(NA, 2, 3, 4, 5),
  C = c(1, 2, 3, NA, 5),
  D = c(1, 2, 3, 4, 5)
)

# Define the function
count_missing_values <- function(df) {
  sapply(df, function(col) sum(is.na(col)))
}

# Test the function
result <- count_missing_values(df)
print(result)
```

In [8]:
import pandas as pd
import numpy as np

# Create a sample dataframe with missing values
df = pd.DataFrame({
    'A': [1, 2, np.nan, 4, 5],
    'B': [np.nan, 2, 3, 4, 5],
    'C': [1, 2, 3, np.nan, 5],
    'D': [1, 2, 3, 4, 5]
})

# Define the function
def count_missing_values(df):
    return df.isna().sum()

# Test the function
result = count_missing_values(df)
print(result)

A    1
B    1
C    1
D    0
dtype: int64


9. Write a function in both `R` and `Python` that takes a string and returns the number of vowels in the string.

R

In [None]:
count_vowels <- function(input_string) {
    # Define the vowels
    vowels <- c('a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U')
    
    # Split the string into individual characters
    chars <- strsplit(input_string, NULL)[[1]]
    
    # Count the number of vowels
    num_vowels <- sum(chars %in% vowels)
    
    return(num_vowels)
}

# Test the function
test_string <- "Hello World"
print(count_vowels(test_string))  # Output should be 3

Python

In [None]:
def count_vowels(input_string):
    # Define the vowels
    vowels = 'aeiouAEIOU'
    
    # Count the number of vowels
    num_vowels = sum(1 for char in input_string if char in vowels)
    
    return num_vowels

# Test the function
test_string = "Hello World"
print(count_vowels(test_string))  # Output should be 3

3


10. Ask Copilot to translate the following binary code into text:

```verbatim
01011001 01101111 01110101 00100111 01110110 01100101 00100000 01100011 01101111 01101101 01110000 01101100 01100101 01110100 01100101 01100100 00100000 01110100 01101000 01100101 00100000 01100001 01110011 01110011 01101001 01100111 01101110 01101101 01100101 01101110 01110100 00101110 00100000 01000011 01101111 01101110 01100111 01110010 01100001 01110100 01110101 01101100 01100001 01110100 01101001 01101111 01101110 01110011 00100001
```

In [10]:
binary_code = "01011001 01101111 01110101 00100111 01110110 01100101 00100000 01100011 01101111 01101101 01110000 01101100 01100101 01110100 01100101 01100100 00100000 01110100 01101000 01100101 00100000 01100001 01110011 01110011 01101001 01100111 01101110 01101101 01100101 01101110 01110100 00101110 00100000 01000011 01101111 01101110 01100111 01110010 01100001 01110100 01110101 01101100 01100001 01110100 01101001 01101111 01101110 01110011 00100001"

# Convert binary to text
text = ''.join([chr(int(b, 2)) for b in binary_code.split()])
print(text)

You've completed the assignment. Congratulations!
