# Prolific id encryption
author: Tingying He, date: August, 2023

This script is used to encrypt the Prolific ID. By running this script, all Prolific IDs in the raw data will be replaced with corresponding unique participant IDs.


### `encrypt_prolific_ids(file_name, column_name, output_path)`

This function reads a CSV file that contains a specific column of participant IDs, referred to as prolific IDs. It then encrypts these prolific IDs by converting each one to a corresponding unique new ID using the `uuid` library. The mapping between the original prolific IDs and the unique new IDs is persistent across multiple calls to the function, meaning that the same prolific ID will always be mapped to the same unique new ID, even when processed in different files.

#### Parameters:

- `file_name` (str): The path and name of the input CSV file containing the prolific IDs. This file should have a specific column containing the prolific IDs to be encrypted.

- `column_name` (str): The name of the column in the CSV file that contains the prolific IDs to be encrypted. This column should exist in the specified CSV file.

- `output_path` (str): The directory where the new file with the encrypted IDs will be saved. The new file will have the same name as the original file and will be saved in this directory.

#### Output:

- A new CSV file is created in the specified `output_path` with the same structure as the original file, except that the prolific IDs in the specified column are replaced with unique new IDs.
- A message is printed to the console with the path of the newly created file.


#### Notes:

- The function keeps a global mapping of the original prolific IDs to the unique new IDs. This ensures that the same mapping is used across different files, and the same prolific ID is always mapped to the same unique new ID.
- The specified output directory should exist before calling the function.
- The specified column name should exist in the CSV file, or an error message will be printed.

---

In [12]:
import csv
import uuid
import os
import shutil
import pandas as pd

In [13]:
# Path to the folder containing the CSV files
folder_path = 'final_data_original'

# Path to the folder saving the output files
output_path = 'final_data'

In [17]:
# Global mapping of prolific IDs to unique new IDs
id_mapping = {}

def encrypt_prolific_ids(file_name, column_name, output_path):
    global id_mapping

    # Read the file using pandas
    df = pd.read_csv(os.path.join(folder_path, os.path.basename(file_name)))

    # Check if the column_name exists in the DataFrame
    if column_name not in df.columns:
        print(f"Column '{column_name}' not found in the file.")
        return

    # For each prolific ID, generate a unique new ID if not already in id_mapping
    for prolific_id in df[column_name].unique():
        if prolific_id not in id_mapping:
            id_mapping[prolific_id] = str(uuid.uuid4())

    # Replace the prolific IDs in the specified column with the corresponding unique new IDs
    df[column_name] = df[column_name].map(id_mapping)

    # Prepare the output file name
    output_file_name = os.path.join(output_path, os.path.basename(file_name))

    # Write the DataFrame to the new file
    df.to_csv(output_file_name, index=False)
    print(f"File with encrypted IDs has been saved to {output_file_name}")

In [18]:
encrypt_prolific_ids('prolific_export_640bd0f630a99d7f77ef9361-20230313.csv', 
                     'Participant id', output_path)

File with encrypted IDs has been saved to final_data_encrypted_prolific_id/prolific_export_640bd0f630a99d7f77ef9361-20230313.csv


In [19]:
encrypt_prolific_ids('results-survey893947-20230313.csv', 
                     'PID', output_path)

File with encrypted IDs has been saved to final_data_encrypted_prolific_id/results-survey893947-20230313.csv
