## M1 Review (Self Review)

In [2]:
def extract_answers_sequence(string_file_path):

    answers = []   # creates an empty list to store the answer values 
    current_answer = 0 
    i = 0 

    with open(string_file_path, 'r', encoding = 'utf-8') as file:
        survey = file.readlines()      # opens the file to read only 


    while i < len(survey):
        text = survey[i].strip()

        if text.startswith("Question"):
            current_answer = 0   # if a line starts with "Question", current answer remains 0 
            question_block = survey[i+1:i+5]

            k = 0

            for answer_line in question_block:
                answer_line = answer_line.strip()
                if '[x]' in answer_line:
                    current_answer = 1 + k 
                    break
                k +=1
            answers.append(current_answer)
            i += 5

        else:
            i += 1

    return answers


#string_file_path = "data/answers_respondent_2.txt"
#list_answers = extract_answers_sequence(string_file_path)
#print(list_answers)

#list_answers = extract_answers_sequence("data/raw_answers/answers_respondent_1.txt")
def write_answers_sequence(list_answers, int_n):
    new_text_file = f"data/answers_list_respondent_{int_n}.txt"
    
    with open(new_text_file, 'w') as file:
        file.writelines(f"{answer}\n" for answer in list_answers)    # sets new name to the text file containing answers list
    
    print(f"Answers saved to text file!")



M1's responsibility was to extract all answers from the raw data and collect them in individual text files for each respondent. The code also sets a new name to these individual answer files as answers_list_respondent{i}.txt. 

The code is structured well, distringuishing each function. It has some comments throughout explaining the purpose of different blocks. 

Contains print statements for clarity when running and makes use of sample data to test functions (in comments).

Print statements can be clearer by specifying which respondent's answers are being saved. 

To improve, integrate try/except statements for error handling as it assumes there is a line with "[x]" after "Question". 

## M2 Review

In [4]:
import os
import subprocess

import sys

# dowmload gdown
subprocess.check_call([sys.executable, "-m", "pip", "install", "gdown"])
def download_answer_files(cloud_url, path_to_data_folder, respondent_index):

    os.makedirs(path_to_data_folder, exist_ok=True)

    

    # gdown --folder URL -O path
    command = [
        "gdown",
        "--folder",
        cloud_url,
        "-O",
        path_to_data_folder
    ]

    try:
        subprocess.run(command, check=True)
        print(" Download complete.")
    except Exception as e:
        print(f" Download failed: {e}")
        return

    # a1.txt rename to answers_respondent_1.txt 
    for i in range(1, respondent_index + 1):
        original = os.path.join(path_to_data_folder, f"a{i}.txt")
        renamed = os.path.join(path_to_data_folder, f"answers_respondent_{i}.txt")
        if os.path.exists(original):
            os.rename(original, renamed)
            print(f" Renamed {original} → {renamed}")
        else:
            print(f" File missing: {original}")


def collate_answer_files(data_folder_path):
    
    os.makedirs("output", exist_ok=True)
    output_path = os.path.join("output", "collated_answers.txt")

    with open(output_path, 'w', encoding='utf-8') as outfile:
        for filename in sorted(os.listdir(data_folder_path)):
            if filename.startswith("answers_list_respondent_") and filename.endswith(".txt"):
                file_path = os.path.join(data_folder_path, filename)
                with open(file_path, 'r', encoding='utf-8') as infile:
                    outfile.write(infile.read())
                    outfile.write("*\n")

    print(f"Collation complete: {output_path}")
    print(" Collate function has run.")






[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3 install --upgrade pip[0m


M2's code consists of two functions, which aim to download and rename raw data files containing questions and answers, as well as collecting all respondents' answers into a single file. 

M2's code is well structured and easy to read. They have added comments throughout the code, explaining what each block does. This makes it easier to follow. 

They have added print statements throughout, which is useful for debugging.

The code follows the assignment's function guidelines.

Running time is 0.2 seconds. 

M2 could could add more commens, explaining individual lines of code (not all). 

Could try using try/except statements for error handling in case gdown isn't imported properly. 

## M3 Review

In [5]:
import matplotlib.pyplot as plt

def load_sequences(collated_answers_path):
    sequences = []
    
    # open, read and close file. named file as "f". 
    with open(collated_answers_path, 'r',encoding='utf-8') as f:
        # read text, clean text(\n) in opening/ closing and sperate answer from 4 answerers(by using *).
        blocks = f.read().strip().split("*\n")  
        for block in blocks:
            # turn text to string
            lines = block.strip().splitlines()
            # turn string to integers
            sequence = [int(x.strip()) for x in lines if x.strip().isdigit()]
            if len(sequence)==100:
        
                sequences.append(sequence)
            else:
                print("incomplete sequence")
    return sequences
    
def generate_means_sequence(collated_answers_path):
# if seq[i]!= 0, add seq[i] in values.
    sequences = load_sequences(collated_answers_path)
    means=[]
    for i in range(100):
        values = [seq[i] for seq in sequences if seq[i] != 0]
        mean = sum(values) / len(values) if values else 0
        means.append(mean)
    return means

def visualize_data(collated_answers_path, n):
    sequences = load_sequences(collated_answers_path)
    # draw the average value of all the answer (exclude 0)
    if n == 1:
        means = generate_means_sequence(collated_answers_path)
        plt.scatter(range(1, 101), means)
        plt.title("Mean Answer Value per Question")
        plt.xlabel("Question Number")
        plt.ylabel("Mean Answer (1–4)")
    # draw every one(4 mumber) answer line.
    elif n == 2:
        for seq in sequences:
            plt.plot(range(1, 101), seq)
        plt.title("All Respondents’ Answer Sequences")
        plt.xlabel("Question Number")
        plt.ylabel("Answer (1–4 or 0)")
    else:
        print("Error: Invalid plot option. n must be 1 or 2.")
        return
    plt.grid(True)
    plt.show()


M3 was in charge of computing statistics and providing visual insights of the data into potential patterns. The code generates a scatter plot at the end, showing clear patterns. 

The code is well structured with comments throughout, making it easier to follow and understand what's happening. 

M2 has organised their code and distinguished their functions, so it is neat and readable. 

Code's running time is 0.2 seconds, so runs efficiently and fast. 

Makes use of appropriate libraries and functions to produce visualisations.

The plot is well presented with appropriate titles and labels. 

The code should consist of two functions instead of three. 

Could specify which sequence is missing when printing statement "missing sequence".

## M4 Review 

In [1]:
#import os
#from data_preparation_M2 import download_answer_files, collate_answer_files
#from data_extraction_M1 import extract_answers_sequence, write_answers_sequence
#from data_analysis_M3 import visualize_data

def run_full_analysis():

    cloud_url = "https://drive.google.com/drive/folders/1wq4I1RFFIZ7fz0tQ9ojcBSOHFe1AX95y?usp=sharing"
    data_folder = "data/raw_answers"
    structured_folder = "data"
    collated_file_path = "output/collated_answers.txt"
    respondent_index = 25
    plot_mode = 1,2  

    print("="*50)
    download_answer_files(cloud_url, data_folder, respondent_index)

    for i in range(1, respondent_index + 1):
        input_file = os.path.join(data_folder, f"answers_respondent_{i}.txt")
        answers = extract_answers_sequence(input_file)
        write_answers_sequence(answers, i)

    collate_answer_files(structured_folder)


    visualize_data(collated_file_path, plot_mode)



if __name__ == "__main__":
    run_full_analysis()



NameError: name 'download_answer_files' is not defined

M4 was in charge of leading the group and ensuring the project gets done smoothly by ensuring all members could access the folder and contribute effectively. Their code integrates all modules - requires M1, M2, and M3 to have completed their scripts. 

M4's code consists of a single function, as required.

M4 uses data_preparation_M2.py to download and collate answer files. Then data_extraction_M1.py to extract all answer sequences from each respondent's file. Lastly, they used data_analysis_M3.py to compute statistics and suggest the patterns shown. 

Code is well structured, readable and organised. They add comments to show which team member's work is being carried out at which stage. 

M4 could add a print statement, throughout the code or at the end once the process is complete for clarity. 