# Review by M2
I am Team Member 2. I was responsible for the Download and Collation Module.
My job was to write two functions:
	•	One function to download answer files from the cloud and rename them.
	•	One function to combine all the answer files into one final output file.
These functions are written in data_preparation_M2.py.

# Explanation of `gdown` Installation Block

In [None]:
import os
import subprocess
import sys

# dowmload gdown
subprocess.check_call([sys.executable, "-m", "pip", "install", "gdown"])

This code ensures that the required package `gdown` is installed before proceeding.

* It imports the necessary system libraries: `os`, `subprocess`, and `sys`.
* It uses `subprocess.check_call()` to run the command-line instruction for installing `gdown` with `pip`.
* The expression `sys.executable` ensures that the correct Python interpreter is used, even inside a virtual environment.
* This step allows the script to download files from Google Drive using the `gdown` command later in the workflow.

This installation step makes the code more portable, ensuring it can run even if `gdown` is not already installed on the machine.

# Explanation of `download_answer_files()` Function

In [None]:
def download_answer_files(cloud_url, path_to_data_folder, respondent_index):

    os.makedirs(path_to_data_folder, exist_ok=True)

    # gdown --folder URL -O path
    command = [
        "gdown",
        "--folder",
        cloud_url,
        "-O",
        path_to_data_folder
    ]

    try:
        subprocess.run(command, check=True)
        print(" Download complete.")
    except Exception as e:
        print(f" Download failed: {e}")
        return

    # a1.txt rename to answers_respondent_1.txt 
    for i in range(1, respondent_index + 1):
        original = os.path.join(path_to_data_folder, f"a{i}.txt")
        renamed = os.path.join(path_to_data_folder, f"answers_respondent_{i}.txt")
        if os.path.exists(original):
            os.rename(original, renamed)
            print(f" Renamed {original} → {renamed}")
        else:
            print(f" File missing: {original}")


This function is used to download answer files from a shared Google Drive folder and rename them into a standard format.

* The first line creates the target folder (if it doesn't exist) using `os.makedirs()`.
* It then constructs a `gdown` command to download the folder from the given URL.
* The `subprocess.run()` command is used to execute the download; if it fails, an error message is shown.
* After downloading, the function loops through all expected files (e.g. `a1.txt`, `a2.txt`, ...).
* Each file is checked using `os.path.exists()`. If it exists, it is renamed to the format `answers_respondent_X.txt`.
* Each success or missing file is reported using `print()`.

This function helps automate the preparation of raw input files for analysis.


# Explanation of `collate_answer_files()` Function

In [None]:
def collate_answer_files(data_folder_path):
    
    os.makedirs("output", exist_ok=True)
    output_path = os.path.join("output", "collated_answers.txt")

    with open(output_path, 'w', encoding='utf-8') as outfile:
        for filename in sorted(os.listdir(data_folder_path)):
            if filename.startswith("answers_list_respondent_") and filename.endswith(".txt"):
                file_path = os.path.join(data_folder_path, filename)
                with open(file_path, 'r', encoding='utf-8') as infile:
                    outfile.write(infile.read())
                    outfile.write("*\n")

    print(f"Collation complete: {output_path}")
    print(" Collate function has run.")
from data_preparation_M2 import collate_answer_files
collate_answer_files("data")


This function merges all respondent answer files into one combined output file.

* It first ensures the `output/` folder exists using `os.makedirs()`.
* It creates a new file called `collated_answers.txt` for writing the merged content.
* It loops through all files in the data folder that match the naming pattern `answers_list_respondent_X.txt`.
* For each file, it opens and reads the content, then writes it into the output file.
* After each respondent’s answers, it adds a `*` symbol as a separator.
* At the end, it prints a confirmation message.

This function ensures that all cleaned data is stored in a single file for easy access and further analysis.


# Explanation of Manual Renaming Script

In [None]:
#rename to standard format
import os

folder = "data/quiz_answers_named_a1_to_a25"

for i in range(1, 26):
    old_name = f"a{i}.txt"
    new_name = f"answers_respondent_{i}.txt"
    old_path = os.path.join(folder, old_name)
    new_path = os.path.join("data", new_name)

    if os.path.exists(old_path):
        os.rename(old_path, new_path)
        print(f"Renamed {old_name} → {new_name}")
    else:
        print(f"{old_name} not found")

This script is used to rename raw files extracted from a zip folder into a standardized format.

* It loops through numbers 1 to 25 and builds original file names like `a1.txt`, `a2.txt`, etc.
* For each file, it constructs the new name `answers_respondent_X.txt`.
* It uses `os.path.exists()` to check if the original file exists.
* If the file is found, it uses `os.rename()` to rename it.
* Each action is logged using `print()` to show success or missing files.

This script is useful when files are provided locally without using the download function.