# Sentiment analysis

## Analyses
1. ~~Simple keyword analysis~~
    * ~~tag cloud / bar chart visualisation~~
3. ~~Keyword analysis using word embeddings to conflate synonyms~~
    * Visualization of 2.?
4. ~~Sentiment analysis, e.g. pos/neg/neutral etc.~~
6. Compare different cohorts, e.g. based on career stage, geography etc.  (if we have this data).
   * ~~Need to prioritise stratifications~~
5. Clustering of keywords, possibly using Carrot2 or similar?
7. Correlation analysis to see how responses are related
   * e.g. does working longer hours correlate with greater negative sentiment?


TODO

* ~~Load dataframe~~
* ~~Apply basic sentiment analysis to column 1~~
  * ~~do manual check~~
* ~~Apply SA to all columns~~
* ~~calculate average sentiment for each column~~
    * Experiment with variations on the VADER output, e.g. use other sentiment values (e.g. pos/neg)
* Programmatically iterate over all columns
* Test different SA algos
  * e.g. https://realpython.com/python-nltk-sentiment-analysis/
  * Can't really use ML without training data


## Prioritised stratifications
1. Geography
   * low income
   * low/middle
   * high
2. Gender
   * male
   * female
3. Role
   * student/intern/trainee
   * technician/assistant
   * embryologist/technologist/clinical scientist/andrologist/freelance
   * chief/lead/senior embryologist
   * directors/managers
   * others

## SA hypotheses:
* Type of unit: Public sector more positive than private?
* Gender: no difference expected
* Geography: higher income more negative?
* Role: more senior is more negative?

## Import data, clean & normalise

In [2]:
import pandas as pd

In [57]:
# read csv ignoring potential N/As
df = pd.read_csv('Open Ended Questions-Table 1.csv', na_filter = False)

In [58]:
df

Unnamed: 0,What are the TWO major challenges faced by embryologists in the workplace?,Unnamed: 1,"What are the TWO major challenges faced by the embryology profession, in general?",Unnamed: 3,Provide TWO suggestions to improve embryologists' working conditions,Unnamed: 5,What is your career goal?
0,1,2,1,2,1,2,Open-Ended Response
1,Equality with clinical members,Poor pay,Shortage of trained staff,Equality,Improve number,Improve pay,Continue
2,Bullying by colleagues and managers,"Poorly designed protocols, technical ignorance...",Stress of delivering high quality work without...,Presence of narcissistic individuals destroyin...,Screening for sociopathic personality traits n...,Clinics must be staffed so that there is a goo...,To survive until retirement without suffering ...
3,Working hours and too many responsibilities,Salary,Access to training,Certification,Add reading and projects into basic workload n...,Understand for how many hours a brain can be f...,Get my ESHRE certification Publish paper thro...
4,burnout/stress,poor management,not enough highly trained staff,politics,better pay,better CPD opportunities,FRCPath
...,...,...,...,...,...,...,...
1252,low salaries,weekend and holiday work,low salaries,weekend and holiday work,increase salaries,close over christmas,Dont make a big mistake
1253,Low salaries,Weekend and holiday work,Low salary,Weekend and holiday work,Higher salaries,Close over Christmas holidays,Make babies and make people happy.
1254,Respect from other departments for the complex...,Financial compensation for overtime etc.,Financial participation: Owning shares,lack of respect for the work we do by doctors,Pay them!,Invite financial incentives to the embryologis...,I reached the glass ceiling. All I can do is h...
1255,limited staff,doctors,Salaries,Limit work opportunities,Better salaries,Flexible working hours,Laboratory director


In [59]:
# create a dict for the short and log column names
long_col_names = {'Q1a': df.columns[0], 'Q1b': df.columns[0], 'Q2a': df.columns[2], 'Q2b': df.columns[2], \
                  'Q3a': df.columns[4], 'Q3b': df.columns[4], 'Q4': df.columns[6]}

In [60]:
long_col_names

{'Q1a': 'What are the TWO major challenges faced by embryologists in the workplace?',
 'Q1b': 'What are the TWO major challenges faced by embryologists in the workplace?',
 'Q2a': 'What are the TWO major challenges faced by the embryology profession, in general? ',
 'Q2b': 'What are the TWO major challenges faced by the embryology profession, in general? ',
 'Q3a': "Provide TWO suggestions to improve embryologists' working conditions",
 'Q3b': "Provide TWO suggestions to improve embryologists' working conditions",
 'Q4': 'What is your career goal?'}

In [61]:
long_col_names['Q1a']

'What are the TWO major challenges faced by embryologists in the workplace?'

In [62]:
# rename the columns
df = df.rename(columns={df.columns[0]: 'Q1a', df.columns[1]: 'Q1b', df.columns[2]: 'Q2a',df.columns[3]: \
                        'Q2b', df.columns[4]: 'Q3a', df.columns[5]: 'Q3b', df.columns[6]: 'Q4'})

In [63]:
df.columns[0]

'Q1a'

In [64]:
# remove first row
df = df.iloc[1:]

In [39]:
df

Unnamed: 0,Q1a,Q1b,Q2a,Q2b,Q3a,Q3b,Q4
1,Equality with clinical members,Poor pay,Shortage of trained staff,Equality,Improve number,Improve pay,Continue
2,Bullying by colleagues and managers,"Poorly designed protocols, technical ignorance...",Stress of delivering high quality work without...,Presence of narcissistic individuals destroyin...,Screening for sociopathic personality traits n...,Clinics must be staffed so that there is a goo...,To survive until retirement without suffering ...
3,Working hours and too many responsibilities,Salary,Access to training,Certification,Add reading and projects into basic workload n...,Understand for how many hours a brain can be f...,Get my ESHRE certification Publish paper thro...
4,burnout/stress,poor management,not enough highly trained staff,politics,better pay,better CPD opportunities,FRCPath
5,Recognizion,Trust,Handson training,Trust,Good laboratory training,Troubleshooting,Be confident in the work i do Academically st...
...,...,...,...,...,...,...,...
1252,low salaries,weekend and holiday work,low salaries,weekend and holiday work,increase salaries,close over christmas,Dont make a big mistake
1253,Low salaries,Weekend and holiday work,Low salary,Weekend and holiday work,Higher salaries,Close over Christmas holidays,Make babies and make people happy.
1254,Respect from other departments for the complex...,Financial compensation for overtime etc.,Financial participation: Owning shares,lack of respect for the work we do by doctors,Pay them!,Invite financial incentives to the embryologis...,I reached the glass ceiling. All I can do is h...
1255,limited staff,doctors,Salaries,Limit work opportunities,Better salaries,Flexible working hours,Laboratory director


## Aggregate all the responses for visualization

In [19]:
# create a dictionary to store each of the columns, indexed on the column name
answers = {}
# create a dict of strings to store the column content, indexed on the column name
responses = {}
# iterate over the dataframe columns
for series_name, series in df.items():
    print(series_name)
    print(series)
    answers[series_name] = pd.DataFrame([' '.join(df[series_name].to_list())], columns=['content'])
    responses[series_name] = answers[series_name].values[0,0]

Q1a
1                         Equality with clinical members
2            Working hours and too many responsibilities
3                                         burnout/stress
4                                            Recognizion
5                             General workplace politics
                             ...                        
853                                         Low Salaries
854                                         low salaries
855                                         Low salaries
856    Respect from other departments for the complex...
857                                        limited staff
Name: Q1a, Length: 857, dtype: object
Q1b
1                                      Poor pay
2                                        Salary
3                               poor management
4                                         Trust
5                             Internal Pressure
                         ...                   
853                            Working 

In [20]:
responses['Q1a']

'Equality with clinical members Working hours and too many responsibilities burnout/stress Recognizion General workplace politics Burnout due to working continously through weekends without breaks fertilization outcomes are laid on the embryologists Recognition of our contribution in the ART field Time managment administration work Overworked Oocyte quality hard cases; bad sperm, fragile oocytes Poor pay; lack of transparency of salary in the private sector time management with patients/doctors Pression Daily work hours and weekends Employees shortage space space Never ever make a mistake Workload BURN OUT Repeated manual processes emotional stress A good salary Little black of communication with clinical team Automaticity Financial Restrictions Inconsistent workload TRUST Work overload Staffing levels Staff retention / salary MULTITASKING Understaffing Little or no day off alloted Each and every case of ours is different it itself is a big challenge. pregnancy status- if negative, we 

In [35]:
# lower case
for col_name in responses:
    responses[col_name] = responses[col_name].casefold()

## Sentiment Analysis using NLTK

See https://www.datacamp.com/tutorial/text-analytics-beginners-nltk

In [25]:
# import libraries
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

In [65]:
df['Q1a']

1                          Equality with clinical members
2                     Bullying by colleagues and managers
3             Working hours and too many responsibilities
4                                          burnout/stress
5                                             Recognizion
                              ...                        
1252                                         low salaries
1253                                         Low salaries
1254    Respect from other departments for the complex...
1255                                        limited staff
1256                                             Training
Name: Q1a, Length: 1256, dtype: object

In [27]:
# create preprocess_text function
def preprocess_text(text):
    # Tokenize the text
    tokens = word_tokenize(text.lower())
    
    # Remove stop words
    filtered_tokens = [token for token in tokens if token not in stopwords.words('english')]
   
    # Lemmatize the tokens
    lemmatizer = WordNetLemmatizer()
    lemmatized_tokens = [lemmatizer.lemmatize(token) for token in filtered_tokens]

    # Join the tokens back into a string
    processed_text = ' '.join(lemmatized_tokens)
    return processed_text

In [28]:
# Pre-processing made things worse on test data (see SA test.ipynb)

# df['Q1a'] = df['Q1a'].apply(preprocess_text)
# df

In [29]:
# initialize NLTK sentiment analyzer
analyzer = SentimentIntensityAnalyzer()

In [30]:
# BINARY get_sentiment function
def get_binary_sentiment(text):
    scores = analyzer.polarity_scores(text)
    sentiment = 1 if scores['pos'] > 0 else 0
    return sentiment

In [31]:
# compound get_sentiment function
def get_compound_sentiment(text):
    scores = analyzer.polarity_scores(text)
    return scores['compound']

In [32]:
# vanilla get_sentiment function
def get_sentiment(text):
    scores = analyzer.polarity_scores(text)
    return scores

In [66]:
col='Q1a'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:16.3f}".format(df.loc[:, 'sentiment'].std()))

What are the TWO major challenges faced by embryologists in the workplace?


Unnamed: 0,Q1a,sentiment
1,Equality with clinical members,0.0000
2,Bullying by colleagues and managers,-0.5994
3,Working hours and too many responsibilities,0.0000
4,burnout/stress,0.0000
5,Recognizion,0.0000
...,...,...
1252,low salaries,-0.2732
1253,Low salaries,-0.2732
1254,Respect from other departments for the complex work we do,0.4767
1255,limited staff,-0.2263


Average sentiment: -0.045
Std dev:            0.240


In [67]:
col='Q1b'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:16.3f}".format(df.loc[:, 'sentiment'].std()))

What are the TWO major challenges faced by embryologists in the workplace?


Unnamed: 0,Q1b,sentiment
1,Poor pay,-0.5423
2,"Poorly designed protocols, technical ignorance by colleagues and managers.",-0.3612
3,Salary,0.0000
4,poor management,-0.4767
5,Trust,0.5106
...,...,...
1252,weekend and holiday work,0.4019
1253,Weekend and holiday work,0.4019
1254,Financial compensation for overtime etc.,0.0000
1255,doctors,0.0000


Average sentiment: -0.051
Std dev:            0.245


In [68]:
col='Q2a'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:16.3f}".format(df.loc[:, 'sentiment'].std()))

What are the TWO major challenges faced by the embryology profession, in general? 


Unnamed: 0,Q2a,sentiment
1,Shortage of trained staff,-0.2500
2,Stress of delivering high quality work without sufficient staff.,-0.4215
3,Access to training,0.0000
4,not enough highly trained staff,0.0000
5,Handson training,0.0000
...,...,...
1252,low salaries,-0.2732
1253,Low salary,-0.2732
1254,Financial participation: Owning shares,0.2960
1255,Salaries,0.0000


Average sentiment: -0.035
Std dev:            0.231


In [69]:
col='Q2b'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:16.3f}".format(df.loc[:, 'sentiment'].std()))

What are the TWO major challenges faced by the embryology profession, in general? 


Unnamed: 0,Q2b,sentiment
1,Equality,0.0000
2,Presence of narcissistic individuals destroying mental health of colleagues.,-0.5574
3,Certification,0.0000
4,politics,0.0000
5,Trust,0.5106
...,...,...
1252,weekend and holiday work,0.4019
1253,Weekend and holiday work,0.4019
1254,lack of respect for the work we do by doctors,0.2023
1255,Limit work opportunities,0.3818


Average sentiment: -0.038
Std dev:            0.239


In [175]:
col='Q3a'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:15.3f}".format(df.loc[:, 'sentiment'].std()))

Provide TWO suggestions to improve embryologists' working conditions


Unnamed: 0,Q3a,sentiment
1,Improve number,0.4939
2,Screening for sociopathic personality traits needs to be introduced. A 45 minute interview is no way to select for entry into the profession.,-0.2960
3,Add reading and projects into basic workload not in leisure time,0.0000
4,better pay,0.3612
5,Good laboratory training,0.4404
...,...,...
1252,increase salaries,0.3182
1253,Higher salaries,0.0000
1254,Pay them!,-0.1759
1255,Better salaries,0.4404


Average sentiment: 0.126
Std dev:           0.223


In [173]:
col='Q3b'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:15.3f}".format(df.loc[:, 'sentiment'].std()))

Provide TWO suggestions to improve embryologists' working conditions


Unnamed: 0,Q3b,sentiment
1,Improve pay,0.3612
2,Clinics must be staffed so that there is a good margin of available workers so that staff are not pressured to work at 110% and then burn out.,0.5523
3,Understand for how many hours a brain can be functional performing lab procedures per day,0.0000
4,better CPD opportunities,0.6705
5,Troubleshooting,0.1779
...,...,...
1252,close over christmas,0.0000
1253,Close over Christmas holidays,0.3818
1254,Invite financial incentives to the embryologists as well. e.g. Shares!,0.7574
1255,Flexible working hours,0.2263


Average sentiment: 0.148
Std dev:           0.240


In [174]:
col='Q4'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:15.3f}".format(df.loc[:, 'sentiment'].std()))

What is your career goal?


Unnamed: 0,Q4,sentiment
1,Continue,0.0000
2,To survive until retirement without suffering mental illness or making a catestrophic lab error.,0.2760
3,Get my ESHRE certification Publish paper through research Presentations and leading workshops Acquire lab directing skills,0.0000
4,FRCPath,0.0000
5,Be confident in the work i do Academically strong,0.7579
...,...,...
1252,Dont make a big mistake,0.2584
1253,Make babies and make people happy.,0.5719
1254,I reached the glass ceiling. All I can do is hope to study further and further and further.....,0.5106
1255,Laboratory director,0.0000


Average sentiment: 0.198
Std dev:           0.306


## Iterate over all columns

In [117]:
for col_name, col in df.items():
    print(col_name)
    # print(series)
    col.apply(get_sentiment)

Q1a
Q1b
Q2a
Q2b
Q3a
Q3b
Q4
sentiment


AttributeError: 'float' object has no attribute 'encode'

In [111]:
col=list(df )
for i in col:
    #printing the 2nd element of the column
  print(df[i][0])

KeyError: 0

In [113]:
col=list(df )
for i in col:
    print(df[i][1])
    df[i].apply(get_sentiment)

Equality with clinical members
Poor pay
Shortage of trained staff
Equality
Improve number
Improve pay
Continue
0.0


AttributeError: 'float' object has no attribute 'encode'

## Compare cohorts

### 1. Female

In [10]:
# read csv ignoring potential N/As
df = pd.read_csv('Data/female.csv', na_filter = False)

In [11]:
df

Unnamed: 0,What are the TWO major challenges faced by embryologists in the workplace?,Unnamed: 1,"What are the TWO major challenges faced by the embryology profession, in general?",Unnamed: 3,Provide TWO suggestions to improve embryologists' working conditions,Unnamed: 5,What is your career goal?
0,1,2,1,2,1,2,Open-Ended Response
1,Equality with clinical members,Poor pay,Shortage of trained staff,Equality,Improve number,Improve pay,Continue
2,Working hours and too many responsibilities,Salary,Access to training,Certification,Add reading and projects into basic workload n...,Understand for how many hours a brain can be f...,Get my ESHRE certification Publish paper thro...
3,burnout/stress,poor management,not enough highly trained staff,politics,better pay,better CPD opportunities,FRCPath
4,Recognizion,Trust,Handson training,Trust,Good laboratory training,Troubleshooting,Be confident in the work i do Academically st...
...,...,...,...,...,...,...,...
853,Low Salaries,Working weekends,Not closing over December,Low Salaries,Higher salaries,Closing over December,To complete my masters and become the best pra...
854,low salaries,weekend and holiday work,low salaries,weekend and holiday work,increase salaries,close over christmas,Dont make a big mistake
855,Low salaries,Weekend and holiday work,Low salary,Weekend and holiday work,Higher salaries,Close over Christmas holidays,Make babies and make people happy.
856,Respect from other departments for the complex...,Financial compensation for overtime etc.,Financial participation: Owning shares,lack of respect for the work we do by doctors,Pay them!,Invite financial incentives to the embryologis...,I reached the glass ceiling. All I can do is h...


In [12]:
# create a dict for the short and log column names
long_col_names = {'Q1a': df.columns[0], 'Q1b': df.columns[0], 'Q2a': df.columns[2], 'Q2b': df.columns[2], \
                  'Q3a': df.columns[4], 'Q3b': df.columns[4], 'Q4': df.columns[6]}

In [13]:
long_col_names

{'Q1a': 'What are the TWO major challenges faced by embryologists in the workplace?',
 'Q1b': 'What are the TWO major challenges faced by embryologists in the workplace?',
 'Q2a': 'What are the TWO major challenges faced by the embryology profession, in general? ',
 'Q2b': 'What are the TWO major challenges faced by the embryology profession, in general? ',
 'Q3a': "Provide TWO suggestions to improve embryologists' working conditions",
 'Q3b': "Provide TWO suggestions to improve embryologists' working conditions",
 'Q4': 'What is your career goal?'}

In [15]:
# rename the columns
df = df.rename(columns={df.columns[0]: 'Q1a', df.columns[1]: 'Q1b', df.columns[2]: 'Q2a',df.columns[3]: \
                        'Q2b', df.columns[4]: 'Q3a', df.columns[5]: 'Q3b', df.columns[6]: 'Q4'})

In [17]:
# remove first row
df = df.iloc[1:]

In [18]:
df

Unnamed: 0,Q1a,Q1b,Q2a,Q2b,Q3a,Q3b,Q4
1,Equality with clinical members,Poor pay,Shortage of trained staff,Equality,Improve number,Improve pay,Continue
2,Working hours and too many responsibilities,Salary,Access to training,Certification,Add reading and projects into basic workload n...,Understand for how many hours a brain can be f...,Get my ESHRE certification Publish paper thro...
3,burnout/stress,poor management,not enough highly trained staff,politics,better pay,better CPD opportunities,FRCPath
4,Recognizion,Trust,Handson training,Trust,Good laboratory training,Troubleshooting,Be confident in the work i do Academically st...
5,General workplace politics,Internal Pressure,Undervalued,Underpaid,Flexible hours to reduce stress,Shorter rotations in the Lab,I would like to work as an embryologist intern...
...,...,...,...,...,...,...,...
853,Low Salaries,Working weekends,Not closing over December,Low Salaries,Higher salaries,Closing over December,To complete my masters and become the best pra...
854,low salaries,weekend and holiday work,low salaries,weekend and holiday work,increase salaries,close over christmas,Dont make a big mistake
855,Low salaries,Weekend and holiday work,Low salary,Weekend and holiday work,Higher salaries,Close over Christmas holidays,Make babies and make people happy.
856,Respect from other departments for the complex...,Financial compensation for overtime etc.,Financial participation: Owning shares,lack of respect for the work we do by doctors,Pay them!,Invite financial incentives to the embryologis...,I reached the glass ceiling. All I can do is h...


In [33]:
col='Q1a'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:16.3f}".format(df.loc[:, 'sentiment'].std()))

What are the TWO major challenges faced by embryologists in the workplace?


Unnamed: 0,Q1a,sentiment
1,Equality with clinical members,0.0000
2,Working hours and too many responsibilities,0.0000
3,burnout/stress,0.0000
4,Recognizion,0.0000
5,General workplace politics,0.0000
...,...,...
853,Low Salaries,-0.2732
854,low salaries,-0.2732
855,Low salaries,-0.2732
856,Respect from other departments for the complex work we do,0.4767


Average sentiment: -0.047
Std dev:            0.240


In [34]:
col='Q1b'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:16.3f}".format(df.loc[:, 'sentiment'].std()))

What are the TWO major challenges faced by embryologists in the workplace?


Unnamed: 0,Q1b,sentiment
1,Poor pay,-0.5423
2,Salary,0.0000
3,poor management,-0.4767
4,Trust,0.5106
5,Internal Pressure,-0.2960
...,...,...
853,Working weekends,0.0000
854,weekend and holiday work,0.4019
855,Weekend and holiday work,0.4019
856,Financial compensation for overtime etc.,0.0000


Average sentiment: -0.048
Std dev:            0.240


In [35]:
col='Q2a'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:16.3f}".format(df.loc[:, 'sentiment'].std()))

What are the TWO major challenges faced by the embryology profession, in general? 


Unnamed: 0,Q2a,sentiment
1,Shortage of trained staff,-0.2500
2,Access to training,0.0000
3,not enough highly trained staff,0.0000
4,Handson training,0.0000
5,Undervalued,0.0000
...,...,...
853,Not closing over December,0.0000
854,low salaries,-0.2732
855,Low salary,-0.2732
856,Financial participation: Owning shares,0.2960


Average sentiment: -0.031
Std dev:            0.226


In [36]:
col='Q2b'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:16.3f}".format(df.loc[:, 'sentiment'].std()))

What are the TWO major challenges faced by the embryology profession, in general? 


Unnamed: 0,Q2b,sentiment
1,Equality,0.0000
2,Certification,0.0000
3,politics,0.0000
4,Trust,0.5106
5,Underpaid,0.0000
...,...,...
853,Low Salaries,-0.2732
854,weekend and holiday work,0.4019
855,Weekend and holiday work,0.4019
856,lack of respect for the work we do by doctors,0.2023


Average sentiment: -0.042
Std dev:            0.241


In [37]:
col='Q3a'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:15.3f}".format(df.loc[:, 'sentiment'].std()))

Provide TWO suggestions to improve embryologists' working conditions


Unnamed: 0,Q3a,sentiment
1,Improve number,0.4939
2,Add reading and projects into basic workload not in leisure time,0.0000
3,better pay,0.3612
4,Good laboratory training,0.4404
5,Flexible hours to reduce stress,-0.2263
...,...,...
853,Higher salaries,0.0000
854,increase salaries,0.3182
855,Higher salaries,0.0000
856,Pay them!,-0.1759


Average sentiment: 0.125
Std dev:           0.223


In [38]:
col='Q3b'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:15.3f}".format(df.loc[:, 'sentiment'].std()))

Provide TWO suggestions to improve embryologists' working conditions


Unnamed: 0,Q3b,sentiment
1,Improve pay,0.3612
2,Understand for how many hours a brain can be functional performing lab procedures per day,0.0000
3,better CPD opportunities,0.6705
4,Troubleshooting,0.1779
5,Shorter rotations in the Lab,0.0000
...,...,...
853,Closing over December,0.0000
854,close over christmas,0.0000
855,Close over Christmas holidays,0.3818
856,Invite financial incentives to the embryologists as well. e.g. Shares!,0.7574


Average sentiment: 0.150
Std dev:           0.242


In [39]:
col='Q4'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:15.3f}".format(df.loc[:, 'sentiment'].std()))

What is your career goal?


Unnamed: 0,Q4,sentiment
1,Continue,0.0000
2,Get my ESHRE certification Publish paper through research Presentations and leading workshops Acquire lab directing skills,0.0000
3,FRCPath,0.0000
4,Be confident in the work i do Academically strong,0.7579
5,"I would like to work as an embryologist internationally, hopefully one day obtain my PhD. I would like to influence up and coming embryologists and have an active role in research within the field of ART",0.8555
...,...,...
853,To complete my masters and become the best practical embryologist to my abilities.,0.7351
854,Dont make a big mistake,0.2584
855,Make babies and make people happy.,0.5719
856,I reached the glass ceiling. All I can do is hope to study further and further and further.....,0.5106


Average sentiment: 0.190
Std dev:           0.306


### 2. Male

In [41]:
# read csv ignoring potential N/As
df = pd.read_csv('Data/male.csv', na_filter = False)

In [42]:
df

Unnamed: 0,What are the TWO major challenges faced by embryologists in the workplace?,Unnamed: 1,"What are the TWO major challenges faced by the embryology profession, in general?",Unnamed: 3,Provide TWO suggestions to improve embryologists' working conditions,Unnamed: 5,What is your career goal?
0,1,2,1,2,1,2,Open-Ended Response
1,Bullying by colleagues and managers,"Poorly designed protocols, technical ignorance...",Stress of delivering high quality work without...,Presence of narcissistic individuals destroyin...,Screening for sociopathic personality traits n...,Clinics must be staffed so that there is a goo...,To survive until retirement without suffering ...
2,Stress created by no embryos/ less embryos/ po...,No respect,Gossips of others which disturbs other Embryol...,No respect,Reporting Manager should understand the situation,Stress free atmosphere,To be Good Embryologist by providing maximum ...
3,lack of leadership,burnout,managment,AI,train managers to lead and not manage,recognition,have a happy team
4,work overload,No/less support from clinicians,Poor salary,Expected to work hard and give good results,Constant support form clinicians,Increase salaries or pay better and relative t...,To achieve high position and personal developm...
...,...,...,...,...,...,...,...
381,Access to knowledge about QC / QA and the tool...,Learning embryo biopsy,Earn a salary according to the profession,Professional recognition,Improve access to training,Improve wages,Have the best reproductive results in my country
382,"Motivation (stress, burnout)",Getting the work done,Safety in the lab,Deliver quality,even distribution of work,provide sufficient staff,PhD
383,"not getting enough working materials, in ethiopia",to get Professional maintenance is difficult i...,maintaining the quality control in the IVF lab,"training of good embryology staffs,",having good embryology training center,support technicians financially to get trainin...,be ART researcher.
384,Too many cases per embryologist,Not enough off-time,Continuous skill development,Not enough staff,More staff to allow more personal time,More opportunities to better our knowledge,To be the best embryologist I can be to the pa...


In [43]:
# create a dict for the short and log column names
long_col_names = {'Q1a': df.columns[0], 'Q1b': df.columns[0], 'Q2a': df.columns[2], 'Q2b': df.columns[2], \
                  'Q3a': df.columns[4], 'Q3b': df.columns[4], 'Q4': df.columns[6]}

In [44]:
long_col_names

{'Q1a': 'What are the TWO major challenges faced by embryologists in the workplace?',
 'Q1b': 'What are the TWO major challenges faced by embryologists in the workplace?',
 'Q2a': 'What are the TWO major challenges faced by the embryology profession, in general? ',
 'Q2b': 'What are the TWO major challenges faced by the embryology profession, in general? ',
 'Q3a': "Provide TWO suggestions to improve embryologists' working conditions",
 'Q3b': "Provide TWO suggestions to improve embryologists' working conditions",
 'Q4': 'What is your career goal?'}

In [45]:
# rename the columns
df = df.rename(columns={df.columns[0]: 'Q1a', df.columns[1]: 'Q1b', df.columns[2]: 'Q2a',df.columns[3]: \
                        'Q2b', df.columns[4]: 'Q3a', df.columns[5]: 'Q3b', df.columns[6]: 'Q4'})

In [46]:
# remove first row
df = df.iloc[1:]

In [47]:
df

Unnamed: 0,Q1a,Q1b,Q2a,Q2b,Q3a,Q3b,Q4
1,Bullying by colleagues and managers,"Poorly designed protocols, technical ignorance...",Stress of delivering high quality work without...,Presence of narcissistic individuals destroyin...,Screening for sociopathic personality traits n...,Clinics must be staffed so that there is a goo...,To survive until retirement without suffering ...
2,Stress created by no embryos/ less embryos/ po...,No respect,Gossips of others which disturbs other Embryol...,No respect,Reporting Manager should understand the situation,Stress free atmosphere,To be Good Embryologist by providing maximum ...
3,lack of leadership,burnout,managment,AI,train managers to lead and not manage,recognition,have a happy team
4,work overload,No/less support from clinicians,Poor salary,Expected to work hard and give good results,Constant support form clinicians,Increase salaries or pay better and relative t...,To achieve high position and personal developm...
5,Time spent at work,Money for new quipment,Low salary,Stress,More staff,more off time,Best success rate to promote our clinic
...,...,...,...,...,...,...,...
381,Access to knowledge about QC / QA and the tool...,Learning embryo biopsy,Earn a salary according to the profession,Professional recognition,Improve access to training,Improve wages,Have the best reproductive results in my country
382,"Motivation (stress, burnout)",Getting the work done,Safety in the lab,Deliver quality,even distribution of work,provide sufficient staff,PhD
383,"not getting enough working materials, in ethiopia",to get Professional maintenance is difficult i...,maintaining the quality control in the IVF lab,"training of good embryology staffs,",having good embryology training center,support technicians financially to get trainin...,be ART researcher.
384,Too many cases per embryologist,Not enough off-time,Continuous skill development,Not enough staff,More staff to allow more personal time,More opportunities to better our knowledge,To be the best embryologist I can be to the pa...


In [48]:
col='Q1a'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:16.3f}".format(df.loc[:, 'sentiment'].std()))

What are the TWO major challenges faced by embryologists in the workplace?


Unnamed: 0,Q1a,sentiment
1,Bullying by colleagues and managers,-0.5994
2,Stress created by no embryos/ less embryos/ poor quality embryos,-0.7024
3,lack of leadership,-0.3182
4,work overload,-0.3612
5,Time spent at work,0.0000
...,...,...
381,Access to knowledge about QC / QA and the tools to apply them,0.0000
382,"Motivation (stress, burnout)",0.3400
383,"not getting enough working materials, in ethiopia",0.0000
384,Too many cases per embryologist,0.0000


Average sentiment: -0.037
Std dev:            0.237


In [49]:
col='Q1b'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:16.3f}".format(df.loc[:, 'sentiment'].std()))

What are the TWO major challenges faced by embryologists in the workplace?


Unnamed: 0,Q1b,sentiment
1,"Poorly designed protocols, technical ignorance by colleagues and managers.",-0.3612
2,No respect,0.2263
3,burnout,0.0000
4,No/less support from clinicians,0.4019
5,Money for new quipment,0.0000
...,...,...
381,Learning embryo biopsy,0.0000
382,Getting the work done,0.0000
383,to get Professional maintenance is difficult in my area,-0.3612
384,Not enough off-time,0.0000


Average sentiment: -0.054
Std dev:            0.251


In [50]:
col='Q2a'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:16.3f}".format(df.loc[:, 'sentiment'].std()))

What are the TWO major challenges faced by the embryology profession, in general? 


Unnamed: 0,Q2a,sentiment
1,Stress of delivering high quality work without sufficient staff.,-0.4215
2,Gossips of others which disturbs other Embryologist work,-0.6369
3,managment,0.0000
4,Poor salary,-0.4767
5,Low salary,-0.2732
...,...,...
381,Earn a salary according to the profession,0.0000
382,Safety in the lab,0.4215
383,maintaining the quality control in the IVF lab,0.0000
384,Continuous skill development,0.0000


Average sentiment: -0.040
Std dev:            0.241


In [51]:
col='Q2b'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:16.3f}".format(df.loc[:, 'sentiment'].std()))

What are the TWO major challenges faced by the embryology profession, in general? 


Unnamed: 0,Q2b,sentiment
1,Presence of narcissistic individuals destroying mental health of colleagues.,-0.5574
2,No respect,0.2263
3,AI,0.0000
4,Expected to work hard and give good results,0.3612
5,Stress,-0.4215
...,...,...
381,Professional recognition,0.0000
382,Deliver quality,0.0000
383,"training of good embryology staffs,",0.4404
384,Not enough staff,0.0000


Average sentiment: -0.024
Std dev:            0.231


In [52]:
col='Q3a'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:15.3f}".format(df.loc[:, 'sentiment'].std()))

Provide TWO suggestions to improve embryologists' working conditions


Unnamed: 0,Q3a,sentiment
1,Screening for sociopathic personality traits needs to be introduced. A 45 minute interview is no way to select for entry into the profession.,-0.2960
2,Reporting Manager should understand the situation,0.0000
3,train managers to lead and not manage,0.0000
4,Constant support form clinicians,0.4019
5,More staff,0.0000
...,...,...
381,Improve access to training,0.4404
382,even distribution of work,0.0000
383,having good embryology training center,0.4404
384,More staff to allow more personal time,0.2878


Average sentiment: 0.128
Std dev:           0.222


In [53]:
col='Q3b'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:15.3f}".format(df.loc[:, 'sentiment'].std()))

Provide TWO suggestions to improve embryologists' working conditions


Unnamed: 0,Q3b,sentiment
1,Clinics must be staffed so that there is a good margin of available workers so that staff are not pressured to work at 110% and then burn out.,0.5523
2,Stress free atmosphere,0.1280
3,recognition,0.0000
4,Increase salaries or pay better and relative to the workload,0.5859
5,more off time,0.0000
...,...,...
381,Improve wages,0.4404
382,provide sufficient staff,0.0000
383,support technicians financially to get training abroad,0.4019
384,More opportunities to better our knowledge,0.7233


Average sentiment: 0.145
Std dev:           0.236


In [54]:
col='Q4'
print(long_col_names[col])
df['sentiment'] = df[col].apply(get_compound_sentiment)
with pd.option_context('display.max_colwidth', None):
    display(df[[col, 'sentiment']])
print("Average sentiment: {0:2.3f}".format(df.loc[:, 'sentiment'].mean()))
print("Std dev: {0:15.3f}".format(df.loc[:, 'sentiment'].std()))

What is your career goal?


Unnamed: 0,Q4,sentiment
1,To survive until retirement without suffering mental illness or making a catestrophic lab error.,0.2760
2,To be Good Embryologist by providing maximum effort and good professor by teaching others,0.7003
3,have a happy team,0.5719
4,To achieve high position and personal development (lab director or chief technologist),0.0000
5,Best success rate to promote our clinic,0.8885
...,...,...
381,Have the best reproductive results in my country,0.6369
382,PhD,0.0000
383,be ART researcher.,0.0000
384,To be the best embryologist I can be to the patients.,0.6369


Average sentiment: 0.213
Std dev:           0.307


## Comparison

| Q                 | Female   | Male| Average |
| :---------------- | :------: | ----: | ----: |
| Q1a               |   -0.047  | -0.037 | -0.045 |
| Q1b               |   -0.048  | -0.054 | -0.051 |
| Q2a               |   -0.031  | -0.040 | -0.035 |
| Q2b               |   -0.042  | -0.024 | -0.038 |
| Q3a               |   0.125   | 0.128 | 0.126 |
| Q3b               |   0.150   | 0.145 | 0.148 |
| Q4               |   0.190   | 0.213 | 0.198 |