# Topics of Sociability and Solidarity in Recollections of men and women who worked in Birkenau



 ### Research Question:

Did solidarity and sociability thrive more among women who were forced to work than men who were forced to work? Can we observe the same trend in the working population as in the entire population (i.e. women discuss more solidarity and sociability than men)?

### Further point to consider:

 - Majority of women and men did not work
 - More or less the same percentage of women and men worked

In [1]:
import json
import os
import constants
path = os.getcwd()
parent = os.path.abspath(os.path.join(path, os.pardir))
with open(parent+'/'+constants.output_data_segment_keyword_matrix + "metadata_partitions.json") as read_file:
        metadata_partitions = json.load(read_file)
        
total_number_of_persons = len(metadata_partitions['complete'])

Percentage of those women who worked

In [2]:
(len(metadata_partitions['work_w']) / len(metadata_partitions['complete_w'])*100)

23.025027203482047

Percentage of those men who worked

In [3]:
(len(metadata_partitions['work_m']) / len(metadata_partitions['complete_m'])*100)

25.87309394982784

- In Auschwitz-Birkenau, people had different types of work. It is believed that women tended to do easier jobs; can this explain why they showed more solidarity and sociability towards each other? I.e. for instance, one who does "translation" has more energy to be social than one who works on road construction. One who works in the kitchen is more likely to share food, etc. Nevertheless, more or less the same percentage of women and men did the same type of work in our data set. The types of forced labour victims discuss were divided into three categories: easy, medium, hard, and below is the comparison normalized to the total men and women population in the data. (Percentage does not add up since one could report different types of forced labour)

Percentage of those men who did hard work:

In [4]:
(len(metadata_partitions['hard_m']) / len(metadata_partitions['complete_m'])*100)

14.067879980324644

Percentage of those women who did hard work

In [5]:
(len(metadata_partitions['hard_w']) / len(metadata_partitions['complete_w'])*100)

8.857453754080522

Percentage of those men who did medium-hard work:

In [6]:
(len(metadata_partitions['medium_m']) / len(metadata_partitions['complete_m'])*100)

14.559763895720609

Percentage of those women who did medium-hard work

In [7]:
(len(metadata_partitions['medium_w']) / len(metadata_partitions['complete_w'])*100)

16.517954298150165

Percentage of women who did easy work

In [8]:
(len(metadata_partitions['easy_w']) / len(metadata_partitions['complete_w'])*100)

1.6974972796517953

Percentage of men who did easy work

In [9]:
(len(metadata_partitions['easy_m']) / len(metadata_partitions['complete_m'])*100)

2.6069847515986226

## Load the relevant data

Load the libraries needed to work with the data

In [10]:
import constants
import pandas as pd
import os
from IPython.display import display

Set up the paths to data

Two different datasets were created. In Dataset 1, the topic 'social bonds' include 'friendship' as well; similarly the topic 'aid giving' includes 'food sharing'. In Dataset 2, 'friendship' and 'food sharing' are distinguished, and they are different topics.

First, load Dataset 1

In [11]:
input_directory = constants.output_data_markov_modelling

path = os.getcwd()
parent = os.path.abspath(os.path.join(path, os.pardir))
input_directory = parent +'/'+ constants.output_data_markov_modelling

In [12]:
p_work_m_dataset_1 = pd.read_csv(input_directory+'work_m'+'/'+'stationary_probs.csv')

p_work_w_dataset_1 = pd.read_csv(input_directory+'work_w'+'/'+'stationary_probs.csv')

input_directory = parent +'/'+ constants.output_data_report_statistical_analysis
input_file = 'strength_of_association_odds_ratio_work_m_work_w.csv'
df_fisher_dataset_1 = pd.read_csv(input_directory+input_file)

Second,load Dataset 2

In [13]:
input_directory = 'data/output_aid_giving_sociability_expanded/markov_modelling/'

path = os.getcwd()
parent = os.path.abspath(os.path.join(path, os.pardir))
input_directory = parent +'/'+ input_directory 

In [14]:
p_work_m_dataset_2 = pd.read_csv(input_directory+'work_m'+'/'+'stationary_probs.csv')

p_work_w_dataset_2 = pd.read_csv(input_directory+'work_w'+'/'+'stationary_probs.csv')

input_directory = "data/output_aid_giving_sociability_expanded/reports_statistical_analysis/"
input_file = 'strength_of_association_odds_ratio_work_m_work_w.csv'
df_fisher_data_2 = pd.read_csv(parent +'/'+input_directory+input_file)

In [15]:
input_directory

'data/output_aid_giving_sociability_expanded/reports_statistical_analysis/'

In [16]:
df_fisher_data_2 = pd.read_csv(parent +'/'+input_directory+input_file)

### Use menstruation as a checkpoint

In [37]:
mens_m = p_work_m_dataset_1[p_work_m_dataset_1.topic_name=='menstruation']['stationary_prob']
print (mens_m)

Series([], Name: stationary_prob, dtype: float64)


In [38]:
mens_=w = p_work_w_dataset_1[p_work_w_dataset_1.topic_name=='menstruation']['stationary_prob']
print (mens_w)

0.003090445261359345


## Observation 1

## Qualitative description

Women who worked are significantly more likely to discuss social bonds and friendship than men who worked. Friendship is supported by Fisher test, but social bonds is not (the difference is not significant).

## Quantitative proof

In [18]:
social_bonds_working_w = p_work_w_dataset_1[p_work_w_dataset_1.topic_name=='social bonds']['stationary_prob'].values[0]
social_bonds_working_m = p_work_m_dataset_1[p_work_m_dataset_1.topic_name=='social bonds']['stationary_prob'].values[0]

In [19]:
social_bonds_working_w / social_bonds_working_m

1.2096491819207695

![title](output/markov_modelling/bootstrap/work_w_work_m/social%20bonds.png)

In [20]:
friends_w = p_work_w_dataset_2[p_work_w_dataset_2.topic_name=='friends']['stationary_prob'].values[0]
friends_m = p_work_m_dataset_2[p_work_m_dataset_2.topic_name=='friends']['stationary_prob'].values[0]

In [21]:
friends_w / friends_m

1.9104007386126982

![title](output_aid_giving_sociability_expanded/markov_modelling/bootstrap/work_w_work_m/friends.png) 

### Comparison with results of Fisher test

In [22]:
display(df_fisher_data_2[df_fisher_data_2.topic_word=="friends"])

Unnamed: 0.1,Unnamed: 0,topic_word,p_value,work_m,work_w,count_work_m,count_work_w,significance_Bonferroni_corrected,significance
19,53,friends,0.014452,0.574974,1.739209,27,91,False,True


In [23]:
display(df_fisher_dataset_1[df_fisher_dataset_1.topic_word=="social bonds"])

Unnamed: 0.1,Unnamed: 0,topic_word,p_value,work_m,work_w,count_work_m,count_work_w,significance_Bonferroni_corrected,significance
56,97,social bonds,0.282273,0.876451,1.140965,135,299,False,False


The Fisher test does not signal significant difference in terms of social bonds either.

## Comment:

### Tim:
- That's due to the p-value > 0.05 right? That's a bit difficult to marry with the Bayesian , statistics that I'm doing with the MSMs... we can discuss it in the paper maybe.
, It is well possible that the error estimate from the MSM is not perfect (I've no idea if the same is true for the Fisher test).

### Gabor:

- that is correct, because of p-value > 0.05

## Observation 2

## Qualitative description

Those women who worked are less likely to discuss acts of solidarity; by 13% working men are more likely to discuss acts of solidarity. This result is not supported by the Fisher test, which signals the contrary. Nevertheless, working women are more likely to talk about food sharing. This is again not supported by the Fisher test; no statistical significance.

## Quantitative proof

In [24]:
aid_giving_w = p_work_w_dataset_1[p_work_w_dataset_1.topic_name=='aid giving']['stationary_prob'].values[0]
aid_giving_m = p_work_m_dataset_1[p_work_m_dataset_1.topic_name=='aid giving']['stationary_prob'].values[0]

In [25]:
aid_giving_m / aid_giving_w

1.1314227034370872

![title](output/markov_modelling/bootstrap/work_w_work_m/aid%20giving.png)

In [26]:
food_sharing_w = p_work_w_dataset_2[p_work_w_dataset_2.topic_name=='food sharing']['stationary_prob'].values[0]
food_sharing_m = p_work_m_dataset_2[p_work_m_dataset_2.topic_name=='food sharing']['stationary_prob'].values[0]

In [27]:
food_sharing_w / food_sharing_m

1.618948872767577

![title](output_aid_giving_sociability_expanded/markov_modelling/bootstrap/work_w_work_m/food%20sharing.png) 

### Comparison with results of Fisher test

In [28]:
display(df_fisher_dataset_1[df_fisher_dataset_1.topic_word=="aid giving"])

Unnamed: 0.1,Unnamed: 0,topic_word,p_value,work_m,work_w,count_work_m,count_work_w,significance_Bonferroni_corrected,significance
26,8,aid giving,0.047747,0.806092,1.240553,224,507,False,True


In [29]:
display(df_fisher_data_2[df_fisher_data_2.topic_word=="food sharing"])

Unnamed: 0.1,Unnamed: 0,topic_word,p_value,work_m,work_w,count_work_m,count_work_w,significance_Bonferroni_corrected,significance
38,49,food sharing,0.121363,0.736007,1.358683,41,109,False,False


## Comment:

### Tim:
- I'm just observing that maybe there's a connection between the order of magnitude of the probability: There seem to be large p-values in the fisher test for topics
that are very rare or have stationary probabilities on the order of 0.001 which makes some sense I guess.

## Interpretation

Working women are more likely to discuss social bonds, friendship, as well as food sharing. In this sense, working women and working men follow the general trend (women are more likely to discuss social activity and food sharing). Nevertheless, working women are less likely to address aid giving, which is a deviation from the general trend (there women are slightly more likely to address aid giving).
