# Clean-up survey data

This notebooks cleans up the survey questionnaire data. It uses an exported .csv from the survey from Qualtrics and outputs a 'clean' .csv file for further analysis.

* _Input:_ Exported .csv from Qualtrics
* _Output:_ Saves a 'clean' .csv to disk 

**Tasks:** 
* Removes non-consenting users
* Removes non-finished responses
* Removes irrelevant device and date metadata

## Metadata

* **Master**: Master Information Studies: Information Systems (track)
* **University**: University of Amsterdam (UvA)
* **Institute**: Informatics Institute
* **Faculty**: Faculty of Science (FNWI)
* **Research Group**: Digital Interactions Lab (DIL)
* **Student**: BSc Danny de Vries (14495643)
* **Supervisor**: Dr. H. (Hamed) Seiied Alavi PhD

[Viszlab](https://www.viszlab.github.io) © 2024 by [Danny de Vries](https://wwww.github.com/dandevri) is licensed under [CC BY-NC-SA 4.0](http://creativecommons.org/licenses/by-nc-sa/4.0/?ref=chooser-v1).

## Prequisites

This notebooks needs a sufficient Python version (>=3.6) to run and requires some packages and libraries for analysis and visualization. The following code checks if your installed Python version is compatible, installs the necessary packages and imports the packages into the notebook.

### Check Python installation

In [50]:
from packaging import version
import platform
import sys

min_version = '3.8'

def check_version(min_version):
    current_version = sys.version.split()[0]
    return version.parse(current_version) >= version.parse(min_version)

# Example usage:
if __name__ == "__main__":
    if check_version(min_version):
        print("Running a sufficiently new version of Python.")
        print("Current version: " + platform.python_version())
        print("Minimum required version: " + min_version)
    else:
        print("Python version is too old. Upgrade to a newer version.")

Running a sufficiently new version of Python.
Current version: 3.9.12
Minimum required version: 3.8


### Install the required packages

In [51]:
!pip install pandas
!pip install seaborn
!pip install matplotlib
!pip install numpy



### Import the packages into the project

In [52]:
import pandas
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

## Clean-up

### Load the full CSV

In [None]:
def import_csv(file):
    df = pd.read_csv(file)
    return df

file = 'survey-data/survey_data_raw.csv'

full_data = import_csv(file)

### Remove irrelevant rows

Remove rows and metadata unecessary for research exported by qualtrics such as IP adress, import ID.

In [54]:
data = pd.read_csv(file)

def remove_metadata_columns(data):
    # List of columns to remove
    metadata_columns = [
        'StartDate', 'EndDate', 'Status', 'IPAddress', 'Progress', 
        'Duration (in seconds)', 'RecordedDate', 'ResponseId', 
        'RecipientLastName', 'RecipientFirstName', 'RecipientEmail', 
        'ExternalReference', 'LocationLatitude', 'LocationLongitude', 
        'DistributionChannel', 'UserLanguage'
    ]
    
    # Drop the metadata columns
    data_cleaned = data.drop(columns=metadata_columns, errors='ignore')
    
    # Print the cleaned DataFrame
    print("Cleaned DataFrame:")
    print(data_cleaned)
    
    return data_cleaned

# Example usage
cleaned_data = data_cleaned = remove_metadata_columns(data)
print(cleaned_data);

Cleaned DataFrame:
   Finished                                     Text / Graphic  \
0      True  Yes, I understand, and I agree to participate ...   
1      True  Yes, I understand, and I agree to participate ...   
2      True  Yes, I understand, and I agree to participate ...   
3      True  Yes, I understand, and I agree to participate ...   
4      True  Yes, I understand, and I agree to participate ...   
5      True  Yes, I understand, and I agree to participate ...   
6      True  Yes, I understand, and I agree to participate ...   
7      True  Yes, I understand, and I agree to participate ...   
8      True  Yes, I understand, and I agree to participate ...   

                                            Location       Activity  \
0  On the first floor (1st floor - in a working s...   1 day a week   
1  On the first floor (1st floor - in a working s...  4 days a week   
2  On the first floor (1st floor - in a working s...  4 days a week   
3                   On the ground fl

### Remove non consenting users

In [55]:
def remove_non_consenting_users(data):
    data_cleaned = data[data['Text / Graphic'].str.contains('Yes, I understand, and I agree to participate in the survey')]
    return data_cleaned

data_cleaned = remove_non_consenting_users(cleaned_data)
print(data_cleaned)

   Finished                                     Text / Graphic  \
0      True  Yes, I understand, and I agree to participate ...   
1      True  Yes, I understand, and I agree to participate ...   
2      True  Yes, I understand, and I agree to participate ...   
3      True  Yes, I understand, and I agree to participate ...   
4      True  Yes, I understand, and I agree to participate ...   
5      True  Yes, I understand, and I agree to participate ...   
6      True  Yes, I understand, and I agree to participate ...   
7      True  Yes, I understand, and I agree to participate ...   
8      True  Yes, I understand, and I agree to participate ...   

                                            Location       Activity  \
0  On the first floor (1st floor - in a working s...   1 day a week   
1  On the first floor (1st floor - in a working s...  4 days a week   
2  On the first floor (1st floor - in a working s...  4 days a week   
3                   On the ground floor (the atrium)  2

### Remove non-finished surveys

In [56]:
def remove_unfinished_surveys(data):
    data = data[data['Finished'] == True]    
    return data

data_final = remove_unfinished_surveys(data_cleaned)
print(data_final)

   Finished                                     Text / Graphic  \
0      True  Yes, I understand, and I agree to participate ...   
1      True  Yes, I understand, and I agree to participate ...   
2      True  Yes, I understand, and I agree to participate ...   
3      True  Yes, I understand, and I agree to participate ...   
4      True  Yes, I understand, and I agree to participate ...   
5      True  Yes, I understand, and I agree to participate ...   
6      True  Yes, I understand, and I agree to participate ...   
7      True  Yes, I understand, and I agree to participate ...   
8      True  Yes, I understand, and I agree to participate ...   

                                            Location       Activity  \
0  On the first floor (1st floor - in a working s...   1 day a week   
1  On the first floor (1st floor - in a working s...  4 days a week   
2  On the first floor (1st floor - in a working s...  4 days a week   
3                   On the ground floor (the atrium)  2

### Remove consent and finished column

In [57]:
def remove_columns(data):
    data_cleaned = data.drop(columns=['Finished', 'Text / Graphic'])
    return data_cleaned

data_final_cleaned = remove_columns(data_final)
print(data_final_cleaned)

                                            Location       Activity  \
0  On the first floor (1st floor - in a working s...   1 day a week   
1  On the first floor (1st floor - in a working s...  4 days a week   
2  On the first floor (1st floor - in a working s...  4 days a week   
3                   On the ground floor (the atrium)  2 days a week   
4  On the first floor (1st floor - in a working s...   1 day a week   
5                   On the ground floor (the atrium)  3 days a week   
6  On the second floor (2th floor - in a working ...  5 days a week   
7  On the second floor (2th floor - in a working ...  4 days a week   
8  On the first floor (1st floor - in a working s...   1 day a week   

         Occupancy Indoor Air Quality_1 Perceived_1  \
0      Not crowded              Neutral        Good   
1          Crowded           Very aware        Poor   
2  Not too crowded                Aware        Good   
3  Not too crowded              Unaware        Good   
4      Not cro

### Export the clean CSV to disk


In [58]:
def export_to_csv(data, filename):
    data.to_csv(filename, index=False)

export_to_csv(data_final_cleaned, 'survey-data/survey_data_clean.csv')