<h1>NOTE!</h1>
This notebook shows how to read the <b>MERGED</b> dataset. That is, all the annotations are in a single sheet. If you're using the IAA or individual one separately, use the other notebook.

<h1>Working with the dataset</h1>
Use the following code to load the dataset (stored in Excel files). It contains a sheet with all struggles, demographics, annotations, and annotators info.

<b>Note:</b> every field of the resulting dataframes containing lists has been formatted with a custom "###" separator. This is because normal separators (.,/;: etc.) have been used by the crowdworkers when writing the struggles.

In [1]:
import pandas as pd
import itertools

In [2]:
dataset_path = '/path/to/dataset.xlsx'

In [3]:
#reading the dataset from Excel files
dataset = pd.read_excel(dataset_path, sheet_name = 'DATASET', keep_default_na=False)

In [4]:
# splitting lists with ### separator
for col in dataset.columns:
    if type(dataset[col][0]) == str:
        if dataset[col].str.contains(" ### ").any():
            new_col = dataset[col].str.split(" ### ")
            dataset[col] = new_col

<h3>Playing a bit with the dataset</h3>

In [5]:
#accessing all struggles
dataset['struggle']

0       When dieting I often find it hard to track my ...
1       Saying no to alcohol in social settings. I usu...
2       Healthy food is expensive and earning a middle...
3       Working out is hard for me because I'm used to...
4       When I see pizza I always want to buy and I en...
                              ...                        
2415    I love eating chocolate in almost every form. ...
2416    When I am about to have my cycle I struggle to...
2417    When I'm around strangers I am getting very se...
2418    I can’t seem to stick with a routine and have ...
2419    I hate cooking and never have the patience to ...
Name: struggle, Length: 2420, dtype: object

In [6]:
#accessing the FIRST struggle the FIRST annotator worked with
ann1 = dataset[dataset['annotator']=='1']
ann1['struggle'][0]

'When dieting I often find it hard to track my calories. This is because of the fact that not many food packages include this information and trying to track every calorie is a tedious task. It also does not help when trying to maintain a calorie surplus as all calories need to be tracked.'

In [7]:
#accessing the 10 reflections that ChatGPT generated for the first struggle
for reflection in  ann1['reflection_candidates'][0]:
    print(f'- {reflection}\n')

- So, do you mean that tracking your calorie intake is difficult because food packaging does not always have the information?

- So, are you saying that it can be tedious to track every calorie consumed?

- So, do you mean that you find it challenging to maintain a calorie surplus while dieting?

- So, do you mean that tracking calories is important to you while dieting?

- So, do you mean that the lack of calorie information on food packaging makes it hard for you to track your intake?

- Do you mean that trying to track every calorie consumed is a tedious task?

- Are you saying that it is difficult to maintain a calorie surplus while dieting because of the effort required to track calories?

- Do you mean that you find it difficult to track your calorie intake due to the lack of information on food packaging?

- Do you mean that the effort required to track every calorie consumed can make dieting more difficult?

-  So, do you mean that the lack of calorie information on food packag

In [8]:
#Printing the first struggle, then the reflections and how the annotator marked them:
struggle = ann1['struggle'][0]
reflections = ann1['reflection_candidates'][0]
refl_annotations = ann1['reflection_annotation'][0]

print(f'{struggle}\n\n')

for index, refl, ann in zip(range(1,len(reflections)+1),
                            reflections,
                            refl_annotations):

    print(f'{index}) {refl} \n   ➡️ {"✅" if ann == "Y" else "❌"}\n')
    

When dieting I often find it hard to track my calories. This is because of the fact that not many food packages include this information and trying to track every calorie is a tedious task. It also does not help when trying to maintain a calorie surplus as all calories need to be tracked.


1) So, do you mean that tracking your calorie intake is difficult because food packaging does not always have the information? 
   ➡️ ✅

2) So, are you saying that it can be tedious to track every calorie consumed? 
   ➡️ ✅

3) So, do you mean that you find it challenging to maintain a calorie surplus while dieting? 
   ➡️ ✅

4) So, do you mean that tracking calories is important to you while dieting? 
   ➡️ ✅

5) So, do you mean that the lack of calorie information on food packaging makes it hard for you to track your intake? 
   ➡️ ✅

6) Do you mean that trying to track every calorie consumed is a tedious task? 
   ➡️ ✅

7) Are you saying that it is difficult to maintain a calorie surplus while di