Skip to content
Please note that GitHub no longer supports Internet Explorer.

We recommend upgrading to the latest Microsoft Edge, Google Chrome, or Firefox.

Learn more
A 15-minute talk about R data science unicorns from rstudio:conf 2020
HTML CSS JavaScript R
Branch: master
Clone or download
Latest commit 10c8423 Jan 30, 2020
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R upload map data and helpers Jan 29, 2020
data
figures draft for Tampa group Jan 21, 2020
index_files/figure-html implement panelset skeleton Jan 21, 2020
libs add unicoRn map, begin drafting key points Jan 29, 2020
site update gitignore Jan 28, 2020
.gitignore update gitignore Jan 29, 2020
README.md typos Jan 29, 2020
dev.R intialize dev.R for metaprocess annotation Jan 14, 2020
index.Rmd
index.html slidecrafting Jan 30, 2020
moffitt-xaringan.css add moffitt css Jan 14, 2020
unicoRns-are-real.Rproj Initial commit Jan 14, 2020

README.md

UnicoRns are real

Travis Gerke & Donna Evans

This is a 15-minute talk most recently given at rstudio:conf 2020.

📺 View slides

✍️ Abstract:

Learning objective: This talk argues that “data science unicorns” are common among the R user base, and gives suggestions for next generation job descriptions that improve matchmaking between R job seekers and hiring organizations.

Common advice from experienced data scientists to job-seekers is to avoid job postings that describe a "data science unicorn": someone who has experience performing an unrealistically large array of technical and business-related job duties. Seeking a unicorn is viewed as a potential indicator that the company fails to understand their data science needs, and that new hires will not be poised for success due to lacking support and resources [Robinson & Nolis, 2019].

The R language, particularly when used with RStudio products, has evolved to enable production-level activities in the areas of data wrangling, reporting/dashboarding, database/software engineering, machine learning, and web application development. It is increasingly plausible that a data scientist will be able to efficiently perform a wide variety of job functions with experience only in a single language (R). Indeed, even entry level R users may tread into "unicorn" territory. Current standards for data scientist job descriptions and salaries do not accommodate this nuance, leaving both job-seekers and hiring managers unable to distinguish job requirements which should be read as warning signs from listings which are idyllic matches for the modern R unicorn.

In this talk, we present data aggregated from several large compensation analytics companies which summarize current benchmarks for data science job descriptions and corresponding salary ranges. We then suggest job description language to target modern R users, considering both job duty compatibility and job post findability. These descriptions are presented with likely salary range pairings. Attention is given to deviations from traditional degree requirements, years of experience, and demands for multiple programming language literacy which may lack relevance for the R unicorn. Our overarching goal is to provide job description templates which encourage optimal matchmaking between R job seekers and organizations in need of their talents.

🧮 Methods for the unicoRn map:

The aim of the unicoRn map is to present Data Scientist I-V salary estimates across a variety of cities/metropolitan regions in the US. We are equipped with salary estimaes for the Tampa region, and we need to multiply these estimates by cost of living and region-specific data science demands in order to span the country.

Salary data across a broad selection of occupations are available from the Bureau of Labor Statistics, and were downloaded on 2020-01-07 from the "All data" source here. Data Scientist is not an occupational title embedded in these data. Furthermore, data collection procedures from this federal resource may be less precise than those collected from compensation survey companies for our purposes. Occupations were filtered to those job titles assumed to most closely match that of Data Scientist; these were Computer and Mathematical Occupations, Computer and Information Research Scientists, and Computer and Information Analysts. Region-specific multipliers were generated by dividing the average of these three occupations by the Tampa-specific average.

The geography lookup table for geographic areas (which solves the problem of CBSA area codes that do not directly map to city names) was downloaded 2020-01-09 here (credit to Steven.Rosenberg@fcc.gov).

You can’t perform that action at this time.