## QTM 350: Data Science Computing

### Assignment 06 - AI-Assisted Programming

### Due 15 October 2025

### Instructions

In this assignment, you will use AI tools to help you generate, refactor, and explain code. Ideally, use GitHub Copilot, but if you cannot install the software, feel free to use other available tools (free or otherwise). Please name the tools you use in your assignment.

The main idea is to use natural language as much as possible, whilst remaining attentive to any mistakes the AI tool may produce. Your code should run without issues and provide the correct results. Please feel free to include tests and screenshots.

As always, should you have any questions, please let me know.

### Tasks

1. Use the `/explain` command in Copilot to get an explanation of the following code snippet:

```python
def fibonacci(n):
    if n <= 0:
        return []
    elif n == 1:
        return [0]
    elif n == 2:
        return [0, 1]
    else:
        fib = [0, 1]
        for i in range(2, n):
            fib.append(fib[i-1] + fib[i-2])
        return fib
```

2. Create a new Git repository using GitHub CLI and Copilot suggestions. The suggestions should include code to initialise the repository, add a `README.md` and a `.gitignore` file for Python projects, then add, commit, and push the changes to the repository. Please include the link to your repository below.

3. Use Copilot to refactor the following code to improve its efficiency and readability:

```python
def is_prime(num):
    if num <= 1:
        return False
    for i in range(2, num):
        if num % i == 0:
            return False
    return True

primes = []
for i in range(1, 101):
    if is_prime(i):
        primes.append(i)

print(primes)
```

4. Use the `@terminal` command to generate CLI commands that create a new directory named `data_analysis`, navigate into the directory, create a new Python file called `analysis.py`, and add a shebang line (`#!/usr/bin/env python`) at the top of the file.

5. Write a Python function that calculates the factorial of a number. Deliberately introduce an error in the function, then use Copilot's `/fix` command to identify and correct the issue.

6. Use Copilot to generate documentation comments for the following `R` factorial function:

```r
calculate_factorial <- function(n) {
  if (n == 0) {
    return(1)
  } else {
    return(n * calculate_factorial(n - 1))
  }
}
```

7. Use the `/explain` command to get an explanation for the following Python code snippet:

```python
def normalize_data(data):
    min_val = min(data)
    max_val = max(data)
    return [(x - min_val) / (max_val - min_val) for x in data]
```

8. Ask Copilot to translate the following `R` code to `Python`:

```r
library(tidyverse)

# Create a sample dataframe with missing values
df <- tibble(
  A = c(1, 2, NA, 4, 5),
  B = c(NA, 2, 3, 4, 5),
  C = c(1, 2, 3, NA, 5),
  D = c(1, 2, 3, 4, 5)
)

# Define the function
count_missing_values <- function(df) {
  sapply(df, function(col) sum(is.na(col)))
}

# Test the function
result <- count_missing_values(df)
print(result)
```

9. Write a function in both `R` and `Python` that takes a string and returns the number of vowels in the string.

10. Ask Copilot to translate the following binary code into text:

```verbatim
01011001 01101111 01110101 00100111 01110110 01100101 00100000 01100011 01101111 01101101 01110000 01101100 01100101 01110100 01100101 01100100 00100000 01110100 01101000 01100101 00100000 01100001 01110011 01110011 01101001 01100111 01101110 01101101 01100101 01101110 01110100 00101110 00100000 01000011 01101111 01101110 01100111 01110010 01100001 01110100 01110101 01101100 01100001 01110100 01101001 01101111 01101110 01110011 00100001
```



1. By using /explain, GitHub Copilot returned the following explanation:

"The provided code defines a Python function named fibonacci that generates a list containing the first n numbers in the Fibonacci sequence. The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones, starting with 0 and 1.

The function begins by handling edge cases. If the input n is less than or equal to 0, the function returns an empty list, as a non-positive number of terms in the sequence is invalid. If n equals 1, the function returns a list containing only the first Fibonacci number, [0]. Similarly, if n equals 2, it returns the first two Fibonacci numbers, [0, 1].

For cases where n is greater than 2, the function initializes a list fib with the first two Fibonacci numbers, [0, 1]. It then uses a for loop to calculate the remaining numbers in the sequence. The loop starts at index 2 and iterates up to n - 1. During each iteration, the next Fibonacci number is calculated as the sum of the two preceding numbers in the list (fib[i-1] + fib[i-2]) and appended to the fib list.

Finally, the function returns the complete list of Fibonacci numbers. This implementation is efficient for generating the sequence up to a given number of terms, and its structure ensures clarity and correctness."

2. The following repository was created with help from Copilot: https://github.com/v-m-wang/assignment-6

See suggested code below.

In [1]:
# Initialize a new Git repository
!git init

# Create a README.md file
!echo "# New Repository" > README.md

# Create a .gitignore file for Python projects
!echo "__pycache__/" > .gitignore
!echo "*.pyc" >> .gitignore

# Add all files to the staging area
!git add .

# Commit the changes
!git commit -m "Initial commit"

# Create a new repository on GitHub (replace 'new-repo' with your desired repository name)
!gh repo create assignment-6 --public --source=. --push

Reinitialized existing Git repository in /Users/Vicki/Library/CloudStorage/OneDrive-Emory/25-26/Fall 2025/QTM 350/assignment-6/.git/
[main e85cb34] Initial commit
 1 file changed, 5 insertions(+), 6 deletions(-)
]11;?\[6n]11;?\[6nGraphQL: Name already exists on this account (createRepository)


3. Copilot suggested the following refactored code:

```python
def is_prime(num):
  if num <= 1:
    return False
  if num == 2:
    return True
  if num % 2 == 0:
    return False
  for i in range(3, int(num**0.5) + 1, 2):
    if num % i == 0:
      return False
  return True

primes = [i for i in range(2, 101) if is_prime(i)]

print(primes)
```

I was confused about the for loop in the function, so I asked Copilot to explain it to me and I now understand how it optimizes the search for prime numbers through Copilot's explanation:

"This code snippet is part of a function that likely checks whether a number (num) is a prime number. It uses a for loop to iterate over potential divisors of num, starting from 3 and incrementing by 2 (i.e., only odd numbers are considered). The loop continues until it reaches the square root of num, rounded down to the nearest integer, plus 1. This range is determined by range(3, int(num**0.5) + 1, 2).

The reason for starting at 3 and skipping even numbers is that even numbers (other than 2) cannot be prime, so they are excluded to optimize the process. The square root of num is used as the upper limit because if num has a divisor larger than its square root, the corresponding smaller divisor would already have been found earlier in the loop.

Inside the loop, the condition if num % i == 0 checks whether num is divisible by the current value of i. If it is, the function immediately returns False, indicating that num is not a prime number. This early exit further optimizes the function by avoiding unnecessary checks once a divisor is found."

4. The suggested commmands were:

mkdir data_analysis && cd data_analysis && echo '#!/usr/bin/env python' > analysis.py

In [2]:
# Question 5

# Function to find the factorial of a number (with error)
def factorial(n):
    for i in range(1, n):
        n *= i
    return n

# Fixed function with explanation

# The issue with your code is that the variable n is being modified within the loop, which leads to incorrect results.
# To fix this, you should use a separate variable to calculate the factorial. Here's the corrected code:

def factorial(n):
    result = 1
    for i in range(1, n + 1):
        result *= i
    return result


The above correction for question 5 fixed the range of values (ending value is exclusive) and the use of variable n within the function, though it did not take into consideration negative values, for which there are no factorials.

6. Copilot suggested the following documentation:

#' Calculate the factorial of a number
#'
#' This function computes the factorial of a given non-negative integer `n`.
#' The factorial of `n` is defined as the product of all positive integers
#' less than or equal to `n`. By definition, the factorial of 0 is 1.
#'
#' @param n A non-negative integer for which the factorial is to be calculated.
#' @return The factorial of the input number `n`.
#' @examples
#' calculate_factorial(5) # Returns 120
#' calculate_factorial(0) # Returns 1
calculate_factorial <- function(n) {
    if (n == 0) {
        return(1)
    } else {
        return(n * calculate_factorial(n - 1))
    }
}

7. Copilot gave the following explanation:

The provided Python function, normalize_data, is designed to normalize a list of numerical data. Normalization is a common preprocessing step in data analysis and machine learning, where the goal is to scale data values to a specific range, typically between 0 and 1. This ensures that all data points are on the same scale, which can improve the performance of algorithms that are sensitive to the magnitude of the input values.

The function takes a single argument, data, which is expected to be a list of numerical values. It begins by calculating the minimum value (min_val) and the maximum value (max_val) of the input list using Python's built-in min() and max() functions. These values are essential for determining the range of the data, which is the difference between the maximum and minimum values.

Next, the function uses a list comprehension to iterate over each value x in the input list. For each value, it applies the normalization formula:

normalized_value=(x − min_val)/(max_val − min_val)
​
 
This formula shifts the data by subtracting the minimum value, ensuring that the smallest value in the dataset becomes 0. It then scales the result by dividing by the range of the data, so the largest value becomes 1. The result is a new list where all values are scaled to the range [0, 1].

One potential issue with this function is that it does not handle cases where all values in the input list are the same. In such cases, the range (max_val - min_val) would be 0, leading to a division by zero error. To make the function more robust, you could add a check to handle this edge case.

In summary, this function is a simple and efficient way to normalize a dataset, but it assumes that the input list contains at least two distinct values.

In [3]:
# Question 8

# Translated code in Python

import pandas as pd
import numpy as np

# Create a sample dataframe with missing values
data = {
  "A": [1, 2, np.nan, 4, 5],
  "B": [np.nan, 2, 3, 4, 5],
  "C": [1, 2, 3, np.nan, 5],
  "D": [1, 2, 3, 4, 5]
}
df = pd.DataFrame(data)

# Define the function
def count_missing_values(df):
  return df.isna().sum()

# Test the function
result = count_missing_values(df)
print(result)

A    1
B    1
C    1
D    0
dtype: int64


9. Copilot suggested the following codes:

```python
# Python function to count vowels in a string
def count_vowels(s):
    vowels = "aeiouAEIOU"
    return sum(1 for char in s if char in vowels)
```

```r
# R function to count vowels in a string
#' Count the number of vowels in a string
#'
#' This function takes a string as input and returns the number of vowels (a, e, i, o, u) in the string.
#'
#' @param s A character string.
#' @return An integer representing the number of vowels in the string.
#' @examples
#' count_vowels("hello") # Returns 2
count_vowels <- function(s) {
  vowels <- c("a", "e", "i", "o", "u", "A", "E", "I", "O", "U")
  sum(strsplit(s, NULL)[[1]] %in% vowels)
}
```

10. In plain text, the binary code translates to:

You've completed the assignment. Congratulations!