# Topics of Sociability and Solidarity in Recollections of those who worked and who did not work in Auschwitz-Birkenau



 ### Research Question:

Did solidarity and sociability thrive more among victims who were forced to work than those who did not work? Do victims who worked speak more about solidarity and sociability?

### Further point to consider:

 - Majority of victims did not work

In [248]:
import json
path = os.getcwd()
parent = os.path.abspath(os.path.join(path, os.pardir))
with open(parent+'/'+constants.output_data_segment_keyword_matrix + "metadata_partitions.json") as read_file:
        metadata_partitions = json.load(read_file)
        
total_number_of_persons = len(metadata_partitions['complete'])

Percentage of those who did not work

In [249]:
(len(metadata_partitions['notwork']) / total_number_of_persons)*100

76.10138805069403

Percentage of those who worked

In [250]:
(len(metadata_partitions['work']) / total_number_of_persons)*100

23.898611949305977

## Load the relevant data

Load the libraries needed to work with the data

In [251]:
import constants
import pandas as pd
import os
from IPython.display import display

Set up the paths to data

Two different datasets were created. In Dataset 1, the topic 'social bonds' include 'friendship' as well; similarly the topic 'aid giving' includes 'food sharing'. In Dataset 2, 'friendship' and 'food sharing' are distinguished, and they are different topics.

First, load Dataset 1

In [252]:
input_directory = constants.output_data_markov_modelling

path = os.getcwd()
parent = os.path.abspath(os.path.join(path, os.pardir))
input_directory = parent +'/'+ constants.output_data_markov_modelling

In [253]:
p_work_dataset_1 = pd.read_csv(input_directory+'work'+'/'+'stationary_probs.csv')

p_not_work_dataset_1 = pd.read_csv(input_directory+'notwork'+'/'+'stationary_probs.csv')

input_directory = parent +'/'+ constants.output_data_report_statistical_analysis
input_file = 'strength_of_association_odds_ratio_work_notwork.csv'
df_fisher_dataset_1 = pd.read_csv(input_directory+input_file)

Second,load Dataset 2

In [254]:
input_directory = 'data/output_aid_giving_sociability_expanded/output/markov_modelling/'

path = os.getcwd()
parent = os.path.abspath(os.path.join(path, os.pardir))
input_directory = parent +'/'+ input_directory 

In [255]:
p_work_dataset_2 = pd.read_csv(input_directory+'work'+'/'+'stationary_probs.csv')

p_not_work_dataset_2 = pd.read_csv(input_directory+'notwork'+'/'+'stationary_probs.csv')

input_directory = "data/output_aid_giving_sociability_expanded/output/reports_statistical_analysis/"
input_file = 'strength_of_association_men_women_odds_ratio.csv'
df_fisher_data_2 = pd.read_csv(parent +'/'+input_directory+input_file)

In [256]:
df_fisher_data_2 = pd.read_csv(parent +'/'+input_directory+input_file)

## Observation 1

## Qualitative description

There is not significative difference in terms of sociability when the working and the non working population compared

## Quantitative proof

In [257]:
social_bonds_working = p_work_dataset_1[p_work_dataset_1.topic_name=='social bonds']['stationary_prob'].values[0]
social_bonds_not_working = p_not_work_dataset_1[p_not_work_dataset_1.topic_name=='social bonds']['stationary_prob'].values[0]

In [258]:
social_bonds_working / social_bonds_not_working

0.8986194393949368

todo: add plot

In [259]:
friends_working = p_work_dataset_2[p_work_dataset_2.topic_name=='friends']['stationary_prob'].values[0]
friends_not_working = p_not_work_dataset_2[p_not_work_dataset_2.topic_name=='friends']['stationary_prob'].values[0]

In [260]:
friends_working / friends_not_working

0.8948539317992287

todo: add plot

### Comparison with results of Fisher test

In [261]:
display(df_fisher_dataset_1[df_fisher_dataset_1.topic_word=="social bonds"])

Unnamed: 0.1,Unnamed: 0,topic_word,p_value,work,notwork,count_work,count_notwork,significance_Bonferroni_corrected,significance
26,97,social bonds,2.433245e-27,2.143883,0.466443,434,755,True,True


The Fisher test signals significant difference between those who work and those who did not work; this is inconsistent with the results deriving from the Markov framework. The reason is because of the following facts:

In [262]:
print (df_fisher_dataset_1[df_fisher_dataset_1.topic_word=="social bonds"].count_work.values[0]/len(metadata_partitions['work']))

0.273989898989899


In [263]:
print (df_fisher_dataset_1[df_fisher_dataset_1.topic_word=="social bonds"].count_notwork.values[0]/len(metadata_partitions['notwork']))

0.14968279143536875


Compared to the percentage of those who did not work and discuss social bonds, those who worked and discuss social bonds are significantly more (almost double); the Fisher test always compares values to derive significance; however, the 27% is still not very significant. This is becoming even more evident if we do the following calculations:

In [264]:
social_bonds_working / p_work_dataset_1[p_work_dataset_1.topic_name=='Appell']['stationary_prob'].values[0]

1.0892829568624707

In [265]:
social_bonds_not_working / p_not_work_dataset_1[p_not_work_dataset_1.topic_name=='Appell']['stationary_prob'].values[0]

0.8723999676838027

## Observation 2

## Qualitative description

Those who worked are more likely to discuss acts of solidarity

## Quantitative proof

In [266]:
aid_giving_working =  p_work_dataset_1[p_work_dataset_1.topic_name=='aid giving']['stationary_prob'].values[0]
aid_giving_not_working = p_not_work_dataset_1[p_not_work_dataset_1.topic_name=='aid giving']['stationary_prob'].values[0]

In [267]:
aid_giving_working / aid_giving_not_working

1.4556193822049726

todo: add plot

In [268]:
food_sharing_working =  p_work_dataset_2[p_work_dataset_2.topic_name=='food sharing']['stationary_prob'].values[0]
food_sharing_not_working = p_not_work_dataset_2[p_not_work_dataset_2.topic_name=='food sharing']['stationary_prob'].values[0]

In [269]:
food_sharing_working / food_sharing_not_working

1.2883508253530562

todo: add plot

### Comparison with results of Fisher test

In [270]:
display(df_fisher_dataset_1[df_fisher_dataset_1.topic_word=="aid giving"])

Unnamed: 0.1,Unnamed: 0,topic_word,p_value,work,notwork,count_work,count_notwork,significance_Bonferroni_corrected,significance
5,8,aid giving,5.124102e-68,2.895268,0.345391,731,1152,True,True


In [271]:
print (df_fisher_dataset_1[df_fisher_dataset_1.topic_word=="aid giving"].count_work.values[0]/len(metadata_partitions['work']))

0.461489898989899


In [272]:
print (df_fisher_dataset_1[df_fisher_dataset_1.topic_word=="aid giving"].count_notwork.values[0]/len(metadata_partitions['notwork']))

0.22839016653449642


Almost half of those who worked discuss aid giving, which explains why the Markov framework signalled significance