This project was mostly builded to get some findings related to diversities in a specific country using 2018, 2019 and 2020 stackoverflow survey.
It’s possible to manipulate the variables to make a similar study in other countries.
There should be no necessary libraries to run the code here beyond the Anaconda distribution of Python. The code should run with no issues using Python versions 3.*. It's necessary unzip the survey data to run the project.
Algorithms are getting deeper in our routines, dictating what content we should see in social media, what priorities we should receive in a hospital queue and if our CV’s should be analysed for a specific job.
Netflix approaches themes like these more deeply in documentaries like “Coded Bias” and “The Social Dilemma”, and inspires me to search for answers about diversity in the world of technology.
For this project, I was interestested in using Stack Overflow data from 2018, 2019 and 2020 to better understand:
- How diverse is the tech workforce in the US?
- How satisfied are minorities among their peers?
- Do minorities have been compensated equally in the US tech industry?
- The US tech industry has been absorbing the minorities's labor?
There is one notebook available here to showcase work related to the above questions. The notebook is exploratory in searching through the data pertaining to the questions showcased across the notebook. Markdown cells were used to assist in walking through the thought process for individual steps.
The main findings of the code can be found at the post available here.
Must give credit to Stack Overflow for the data. You can find the Licensing for the data and other descriptive information at the Kaggle links available below:
Otherwise, feel free to use the code here as you would like!