<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/San_Francisco_Chronicle_logo.svg/1200px-San_Francisco_Chronicle_logo.svg.png" width="600" height="2000" style="vertical-align:top">

# <center><font size=5> S.F. Jobs With The</font> <br /><font size=6>Most Competitive Salary Growth?</font></center>
<center><font size=3><i>By Marisol Hernandez</i></font></center>  
<center><font size=3><i>09/12/2020</i></font></center>

---

## Table of Contents

[I. Objective](#objective)  
[II. Libraries](#libraries)  
[III. Data Exploration](#data-exploration)  
[IV. Subsets of Data](#subsets-of-data)  
[V. Single Vector](#single-vector)  
[V!. 5 different Lists](#5-different-lists)  
[VII. Conditionals & Loops](#conditionals-&-loops)  
[VIII. Summary](#summary)  
[IX. Recommendations](#recommendations)

## Objective <a id='objective'></a>
---

It is important to analyze employee compensation data such as the changes in income over a period because it allows us to compare an occupation's compensation tendencies to other occupations within the same market. From a statistical standpoint, income data can indicate salary trends in certain occupations and/or occupational sectors. 

One of the goals of [The San Francisco Controller's Office](https://sfcontroller.org/about-controller%E2%80%99s-office) is to promote financial and economic security, which means that they try to be proactive in ensuring that employees receive the appropriate amount of pay. They study and maintain a database of the salary and benefits paid to City employees in order to identify any significant trends. We can use the trends we see in income to compare salary growth amongst occupations, as well as identify the occupations that exhibit the highest or lowest salary growth.

The objective of this investigation is to provide The San Francisco Chronicle with a report that depicts the main insights regarding the income in the city (based on the data from The Controller's Office) that would be particularly useful for any student or jobseeker looking for a secure occupation with a competitive salary growth rate. In doing so, I conducted my research to answer the following questions:

**Questions of Investigation:**  
**1. Which jobs exhibited positive salary growth from 2015 to 2019?**  
**2. Which jobs exhibited negative salary growth from 2015 to 2019?**  
**3. What are the jobs with the highest salary growth rate from 2015 to 2019?**  
**4. What are the jobs with the lowest salary growth rate from 2015 to 2019?**   
**5. Which jobs are in the bottom 25th percentile based on salary growth rate? Upper 25th percentile? In between?**  

## Libraries  <a id='libraries'></a>
---
The first step to every exploratory data analysis is to import all necessary libraries.

In [1]:
# install.packages("plyr")
# install.packages("formattable")
# install.packages("dplyr")
# install.packages("rlang")
library(plyr)
library(formattable)

## Data Exploration  <a id='data-exploration'></a>
---


### Read in the Data
Because Github has a limit on file size, I have to read the dataset from a zipped file using `unz()` and viewing the first 6 rows.

Because the dimensions are so large, `head()` could not display every column. I will retrieve the column names in the next few steps. Another thing I immediately notice is that there are some missing values that I would like to address in my data cleaning subsection.

In [2]:
# Open a connection to the zip file
unzip_conn <- unz("employee-compensation.csv.zip", "employee-compensation.csv")

# Read the CSV data from the zip archive
employee <- read.csv(unzip_conn)

# Close the connection
# close.(unzip_conn)

head(employee)

Unnamed: 0_level_0,Year.Type,Year,Organization.Group.Code,Organization.Group,Department.Code,Department,Union.Code,Union,Job.Family.Code,Job.Family,⋯,Employee.Identifier,Salaries,Overtime,Other.Salaries,Total.Salary,Retirement,Health.and.Dental,Other.Benefits,Total.Benefits,Total.Compensation
Unnamed: 0_level_1,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,Calendar,2028,7,General City Responsibilities,229259,,792,Utd Pub EmpL790 SEIU-Crt Clrks,0000,Untitled,⋯,8540990,674.28,0.0,5.76,680.04,130.91,0.0,53.86,184.77,864.81
2,Calendar,2028,1,Public Protection,CRT,,792,Utd Pub EmpL790 SEIU-Crt Clrks,0000,Untitled,⋯,8540990,674.28,0.0,5.76,680.04,130.91,0.0,53.86,184.77,864.81
3,Fiscal,2028,1,Public Protection,CRT,,792,Utd Pub EmpL790 SEIU-Crt Clrks,0000,Untitled,⋯,8540990,674.28,0.0,5.76,680.04,130.91,0.0,53.86,184.77,864.81
4,Fiscal,2020,3,Human Welfare & Neighborhood Development,HSA,,535,"SEIU, Local 1021, Misc",2900,Human Services,⋯,8526655,47822.0,0.0,264.0,48086.0,9986.92,6747.52,3898.79,20633.23,68719.23
5,Fiscal,2020,1,Public Protection,POL,,911,POA,Q000,Police Services,⋯,8576844,61089.35,39316.78,6966.37,107372.5,12385.79,6606.96,1816.03,20808.78,128181.28
6,Fiscal,2020,1,Public Protection,CRT,,792,Utd Pub EmpL790 SEIU-Crt Clrks,0000,Untitled,⋯,8501208,32406.02,0.0,264.0,32670.02,6783.77,6754.93,2699.55,16238.25,48908.27


Because I am unable to view the full dimensions, I used `nrow()` and `ncol()` to check the dimensions. I now know that I am working with a ***886,102 by 22*** dataframe.

In [3]:
cat("There are", nrow(employee), "rows and", ncol(employee), "columns in this table.")

There are 886102 rows and 22 columns in this table.

Additionally, I used `colnames()` to retrieve the column names. For the purpose of my investigation, I am only interested in the variables `Year`, `Job`, `Salaries`, which I will subset in a later section.

In [4]:
colnames(employee)

### Data Cleaning

The next step to my analysis is to check for *NULL* values and *empty strings* within the data frame. To do so, I utilized the `count()` function to provide a count for the number of NULL and empty strings within a particular column. The logic expression within the function checks to see if an input is NULL or an empty string.

In [5]:
count(employee["Job"] == "" | is.na(employee["Job"]))

Job,freq
<lgl>,<int>
False,886098
True,4


Because I wanted to check for *NULL* values and *empty strings* in every column, I built a `for` loop to do just that. The conditional expression within the loop returns the column name if it contains NULL values or empty strings.

In [6]:
for (col in colnames(employee)) {
    x <- count(employee[col] == "" | is.na(employee[col]))
    
    if (x[1,2] < 886102) {
        print(col)
    }
}

[1] "Department.Code"
[1] "Department"
[1] "Union.Code"
[1] "Union"
[1] "Job"


R has a useful function called `na.omit()` that can be applied to an entire data frame to clean NULL values from data frames. Running it, however, I learned that my data frame does not contain any NULL values because the dimensions remained the same.

In [7]:
clean_employee <- na.omit(employee) 
dim(clean_employee)

Because `Job` was the only one of my three variables of interest that contained empty strings, I decided to filter only that one column to remove those rows. I decided not to filter the remaining four columns that contained empty strings because they could potentially contain data that could be of use to my investigation. Filtering the dataset resulted in the removal of just 4 rows.

In [8]:
filtered_employee <- clean_employee[!(employee$Job==""),]
nrow(clean_employee) - nrow(filtered_employee)
dim(filtered_employee)

##  Subsets of Data  <a id='subsets-of-data'></a>
---


### Subset 1: Dataframe of Yearly Median Salary & Percent Change

Because I am observing occupational salary growth, the only relevant columns I need are `Year`, `Job`, and `Salaries`. I subsetted my data accordingly and printed a summary to verify. In the summary, I noticed that `Salaries` contains negative inputs that could be input errors.

In [9]:
# subset 1
subset <- filtered_employee[, c("Year", "Job", "Salaries")]
summary(subset)

      Year          Job               Salaries     
 Min.   :2013   Length:886098      Min.   :-33808  
 1st Qu.:2015   Class :character   1st Qu.:  4461  
 Median :2017   Mode  :character   Median : 49168  
 Mean   :2017                      Mean   : 52463  
 3rd Qu.:2018                      3rd Qu.: 83424  
 Max.   :2028                      Max.   :651937  

I am only interested in observing occupational salary growth within full time positions so I subsetted my data to only contain rows where with salaries above $30,000.

In [10]:
fulltime <- subset[subset$Salaries > 30000,]
summary(fulltime)

      Year          Job               Salaries     
 Min.   :2013   Length:531782      Min.   : 30000  
 1st Qu.:2015   Class :character   1st Qu.: 56531  
 Median :2017   Mode  :character   Median : 75349  
 Mean   :2017                      Mean   : 83237  
 3rd Qu.:2018                      3rd Qu.:105023  
 Max.   :2020                      Max.   :651937  

I decided to look at the occupational salary growth between 2015 and 2019. Using a `for` loop, I subsetted the `fulltime` data frame based on the Year specified within the loop. The second command within the loop then creates another subset where the data is grouped `Job`. 

The third command within the loop then creates another subset that contains a column for `Job` and another column for the median salary. I decided to use median salary as measurement for comparison because it is a measure of central tendency regarding compensation.

The forth command within the loop just renames the median salary income to specify the Year. The fifth command adds the subset to a list called `dataframes` that will be used to merge all the data frames.

In [18]:
dataframes <- list()
x <- 1

for (i in 2015:2019) {
    salary <- fulltime[fulltime$Year == i,]
    
    by_job <- salary %>%
              group_by(Job) %>%
              summarise(Salary = median(Salaries, na.rm = TRUE))
    colnames(by_job)[2] <- paste("Salary", i, sep=".")
    
    dataframes[[x]] <- by_job
    x <- x + 1
}

I assigned the first dataframe in `dataframes` to the variable `full`. Using a `for` loop, I merged the remaining dataframes to the first data frame by `Job`. This creates a general data frame that contains the median salary for every year between 2015 and 2019. Additionally, I created a column called `Percent.Change` that contains the salary growth from 2015 to 2019.

In [19]:
full <- dataframes[[1]]

for (i in 2:length(dataframes)) {
    full <- merge(full, dataframes[[i]], by="Job")
}

full$Percent.Change <- percent((full$Salary.2019 - full$Salary.2015) / full$Salary.2015)
head(full)

Unnamed: 0_level_0,Job,Salary.2015,Salary.2016,Salary.2017,Salary.2018,Salary.2019,Percent.Change
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<formttbl>
1,Account Clerk,52855.44,59160.3,46314.8,57500.5,62070.02,17.43%
2,Accountant II,70223.11,73691.14,47499.81,79119.75,79390.82,13.06%
3,Accountant III,85757.5,92054.04,55972.06,94491.04,101643.06,18.52%
4,Accountant Intern,44160.0,42104.18,47360.0,46341.02,50240.0,13.77%
5,Accountant IV,112427.32,116558.33,69664.0,119150.02,124975.18,11.16%
6,Acupuncturist,77029.8,80653.9,44170.0,81250.01,86950.0,12.88%


### Subset 2: Positive Salary Growth

I am interested in occupations that exhibited a positive salary growth. To make this observation, I created another subset called `positive` that contains data for the occupations that exhibited a positive salary growth.

In [278]:
positive <- full[full$Percent.Change > 0,]
positive <- head(positive, n=5)
positive

Job,Salary.2015,Salary.2016,Salary.2017,Salary.2018,Salary.2019,Percent.Change
Account Clerk,52855.44,59160.3,46314.8,57500.5,62070.02,17.43%
Accountant II,70223.11,73691.14,47499.81,79119.75,79390.82,13.06%
Accountant III,85757.5,92054.04,55972.06,94491.04,101643.06,18.52%
Accountant Intern,44160.0,42104.18,47360.0,46341.02,50240.0,13.77%
Accountant IV,112427.32,116558.33,69664.0,119150.02,124975.18,11.16%


### Subset 3: Negative Salary Growth

I am interested in occupations that exhibited a negative salary growth. To make this observation, I created another subset called `negative` that contains data for the occupations that exhibited a negative salary growth.

In [279]:
negative <- full[full$Percent.Change < 0,]
negative <- head(negative, n=5)
negative

Unnamed: 0,Job,Salary.2015,Salary.2016,Salary.2017,Salary.2018,Salary.2019,Percent.Change
7,Admin Analyst 3,100631.57,105146.99,82722.72,84617.6,84045.77,-16.48%
11,Administrative Services Mgr,101990.4,101989.75,58758.0,104291.11,92328.01,-9.47%
19,Airport Noise Abatement Spec,80349.07,68318.34,40348.04,59367.0,79310.15,-1.29%
24,Animal Care Asst Supv,61543.3,61325.41,38970.8,55568.91,61407.62,-0.22%
25,Animal Control Supervisor,65687.29,71051.76,56142.97,42323.38,62236.42,-5.25%


### Subset 4: Jobs with the Highest Salary Growth Rate

I am interested in the jobs with the highest salary growth from 2015 to 2019. To make this I ordered the `full` data frame in a descending manner based on `Percent.Change`. I then created another subset called `fast_growth` that contains data for the top 5 jobs with the highest salary growth.

In [20]:
by_percent_change <- full[order(-full$Percent.Change),]
fast_growth <- head(by_percent_change, n=5)
fast_growth

Unnamed: 0_level_0,Job,Salary.2015,Salary.2016,Salary.2017,Salary.2018,Salary.2019,Percent.Change
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<formttbl>
713,Telecommunications Tech Supv,49038.0,89278.9,66542.01,122400.0,130971.03,167.08%
670,Sr Airport Noise Abatement Spe,38060.01,92887.31,50862.01,47516.03,92968.7,144.27%
9,Administrative Analyst II,37857.6,72572.1,59676.65,73539.2,91210.74,140.93%
691,Statistician,35878.06,77877.38,65881.78,32567.15,85233.82,137.57%
271,Executive Contract Employee,128453.14,297141.83,164079.91,335775.03,294037.07,128.91%


### Subset 5: Jobs with the Lowest Salary Growth Rate
I am interested in the jobs with the lowest salary growth from 2015 to 2019. I created another subset called `slow_growth` that contains data for the top 5 jobs with the lowest salary growth.

In [281]:
slow_growth <- tail(by_percent_change, n=5)
slow_growth

Unnamed: 0,Job,Salary.2015,Salary.2016,Salary.2017,Salary.2018,Salary.2019,Percent.Change
533,"Pr Investigator, Tax Collector",98742.7,103976.7,56938.0,104725.06,43150.03,-56.30%
119,Chief Surveyor,106220.5,121373.9,72114.0,109303.61,42993.9,-59.52%
662,Sprv Adult Prob Ofc (SFERS),129162.9,114800.3,111590.57,99438.51,48030.02,-62.81%
768,Wire Rope Cable Maint Sprv,164920.4,106342.3,56331.72,104250.05,47108.11,-71.44%
224,"Div Director, Adult Probation",123678.6,133208.2,71312.2,134124.99,30978.0,-74.95%


##  Single Vector   <a id='single-vector'></a>
---

### Top 5 Jobs with the Highest Salary Growth Rate
I am interested in the top 5 occupations that have the highest salary growth rate. To retrieve this information, I stored the `Job` column from the data frame `fast_growth` into a vector using `as.vector()`. This information is particularly helpful for any jobseeker or student looking to find a secure job with the fastest-rising salary growth rate in the future.

In [284]:
fast_jobs <- as.vector(fast_growth[,"Job"])
fast_jobs

### Top 5 Jobs with the Lowest Salary Growth Rate
I am interested in the top 5 occupations that have the lowest salary growth rate. To retrieve this information, I stored the `Job` column from the data frame `slow_growth` into a vector using `as.vector()`. This information is particularly helpful because these are the jobs that any jobseeker or student may want to stray away from since their salary growth rate is very low.

In [285]:
slow_jobs <- as.vector(slow_growth[,"Job"])
slow_jobs

##  5 different Lists   <a id='5-different-lists'></a>
---

I am interested in dividing the ***Jobs*** from the `full` data frame into 5 different lists based on their salary growth rate from 2015 to 2019 using the following criteria:

**Criteria Used for Listing:**  
**- [Lower 25th Percentile](#lower-25th-percentile)**  
**- [Between 25th and 50th Percentile](#between-25th-and-50th-percentile)**  
**- [Between 50th and 75th Percentile](#between-50th-and-75th-percentile)**  
**- [Upper 25th Percentile](#upper-25th-percentile)**  
**- [Positive Salary Growth](#positive-salary-growth)**

To get a statistical summary—minimum, first quartile, median, mean, third quartile, and maximum—of `Percent.Change`, I ran the `summary()` function over the column.

In [242]:
summary(full$Percent.Change)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.74953  0.03561  0.12208  0.11045  0.14230  1.67081 

To begin my organization, I first made four different lists for the *lower*, *middle.1* (between 25th and 50th percentile), *middle.2* ((between 50th and 75th percentile), and *upper* quartiles. I then made a `for` loop that will organize  ***Jobs*** from the `full` data frame into these five lists based on their salary growth rate from 2015 to 2019. Additionally, I created a function `clean` to first filter the lists of any possible *NULL* values and then print the first 5 elements.

In [243]:
lower_quartile <- list()
middle.1 <- list()
middle.2 <- list()
upper_quartile <- list()

for (i in 1:nrow(full)){
    if (full[i,"Percent.Change"] < 0.03561){
        lower_quartile[[x]] <- full[i,"Job"]
    } else if (full[i,"Percent.Change"] >= 0.03561 & full[i,7] < 0.12208) {
        middle.1[[x]] <- full[i,"Job"]
    } else if (full[i,"Percent.Change"] >= 0.12208 & full[i,7] < 0.14230){
        middle.2[[x]] <- full[i,"Job"]
    } else  {
        upper_quartile[[x]] <- full[i,"Job"]
    }
    x <- x + 1
}


clean <- function(list.name) {
    filtered_list <- list.clean(list.name, fun = is.null, recursive = TRUE)
    filtered_list[1:5]
}

### Lower 25th Percentile <a id='lower-25th-percentile'></a>
I am interested in the jobs that are in the lower 25th percentile based on their salary growth rate from 2015 to 2019. I used the `clean()` function to filter the list and print the first 5 elements. This information is particularly useful because we can see the jobs that have the lowest salary growth rate.

In [244]:
clean(lower_quartile)

### Between 25th and 50th Percentile <a id='between-25th-and-50th-percentile'></a>
I am interested in the jobs that are in between the lower 25th and 50th percentile based on their salary growth rate from 2015 to 2019. I used the `clean()` function to filter the list and print the first 5 elements.

In [245]:
clean(middle.1)

### Between 50th and 75th Percentile <a id='between-50th-and-75th-percentile'></a>
I am interested in the jobs that are in between the lower 50th and 75th percentile based on their salary growth rate from 2015 to 2019. I used the `clean()` function to filter the list and print the first 5 elements.

In [246]:
clean(middle.2)

### Upper 25th Percentile <a id='upper-25th-percentile'></a>
I am interested in the jobs that are in the upper 25th percentile based on their salary growth rate from 2015 to 2019. I used the `clean()` function to filter the list and print the first 5 elements. This information is particularly useful because we can see the jobs that have the highest salary growth rate.

In [247]:
clean(upper_quartile)

### Positive Salary Growth <a id='positive-salary-growth'></a>
I am interested in the jobs that only had a postive salary growth rate from 2015 to 2019. To find this information, I made another list called `positive_growth`. I then made a for loop that will organize ***Jobs*** from the `full` data frame into `positive_growth` if they have a positive salary growth rate. I used the clean() function to filter the list and print the first 5 elements. This information is particularly useful for any student or jobseeker looking for an occupation that has a positive salary growth rate.

In [248]:
positive_growth = list()

for (i in 1:nrow(full)){
    if (full[i,"Percent.Change"] > 0){
        positive_growth[[x]] <- full[i,1]
        }
    x <- x + 1
}

clean(positive_growth)

##  Conditionals & Loops   <a id='conditionals-&-loops'></a>
---

I wanted to verify the direction—whether it was positive or negative—of salary growth rate of the jobs listed in the `positive` and `negative` data frames. To find this information, I made a function `growth` that can take in a data frame as a parameter. Within the function I made a `for` loop that checks the direction of `Percent.Change` and specifies it in a printed statement for every row .

In [249]:
growth <- function(df){

    for (i in 1:nrow(df)) {
        if (df[i,"Percent.Change"] > 0){
            cat(as.character(df[i,"Job"]), "has postive salary growth from 2015 to 2019.\n")
        }else if (df[i,"Percent.Change"] < 0){
            cat(as.character(df[i,"Job"]), "has negative salary growth from 2015 to 2019.\n")
        }
    }
}

Additionally, I wanted to know how the salary growth rate for the jobs listed in the `positive` and `negative` data frames compared to the median salary growth rate. To find this information, I made a function `percent_average` that can take in a data frame as a parameter. Within the function I made a `for` loop that checks whether `Percent.Change` is above or below the median salary growth and specifies it in a printed statement for every row.

In [250]:
percent_average <- function(df){
    for (i in 1:nrow(df)) {
        if (df[i,"Percent.Change"] > 0.12208){
            cat(as.character(df[i,"Job"]), "salary growth rate from 2015 to 2019 is above average.\n")
        }else if (df[i,"Percent.Change"] <= 0.12208){
            cat(as.character(df[i,"Job"]), "salary growth rate from 2015 to 2019 is below average.\n")
        }
    }
}

Lastly, I wanted to know which percentile the salary growth rate of the jobs listed in the `positive` and `negative` data frames lie in. To find this information, I made a function `percentile` that can take in a data frame as a parameter. Within the function I made a `for` loop that checks which percentile the `Percent.Change` lies in a specifies it in a printed statement for every row.

In [251]:
percentile <- function(df){
    for (i in 1:nrow(df)){
        if (df[i,"Percent.Change"] < 0.03561){
            cat(as.character(df[i,"Job"]), "salary growth rate is in the bottom 25th percentile.\n")
        } else if (df[i,"Percent.Change"] >= 0.03561 & df[i,"Percent.Change"] < 0.12208) {
            cat(as.character(df[i,"Job"]), "salary growth rate is in between the 25th and 50th percentile.\n")
        } else if (df[i,"Percent.Change"] >= 0.12208 & df[i,"Percent.Change"] < 0.14230){
            cat(as.character(df[i,"Job"]), "salary growth rate is in between the 50th and 75th percentile.\n")
        } else  {
            cat(as.character(df[i,"Job"]), "salary growth rate is in the upper 25th percentile.\n")
        }
    }
}

###  Positive Salary Growth

Running the `growth` function on the `positive` data frame we can verify that each of the 5 jobs from have a positive salary growth from 2015 to 2019.

In [252]:
growth(positive)

Account Clerk has postive salary growth from 2015 to 2019.
Accountant II has postive salary growth from 2015 to 2019.
Accountant III has postive salary growth from 2015 to 2019.
Accountant Intern has postive salary growth from 2015 to 2019.
Accountant IV has postive salary growth from 2015 to 2019.


Running the `percent_average` function on the `positive` data frame we learn that 4 of the 5 jobs exhibited a salary growth rate that was above average, while only 1 exhibited a salary growth rate that was below average.

In [258]:
percent_average(positive)

Account Clerk salary growth rate from 2015 to 2019 is above average.
Accountant II salary growth rate from 2015 to 2019 is above average.
Accountant III salary growth rate from 2015 to 2019 is above average.
Accountant Intern salary growth rate from 2015 to 2019 is above average.
Accountant IV salary growth rate from 2015 to 2019 is below average.


Running the `percentile` function on the `positive` data frame we learn that the percentiles of the salary growth rate for each of the 5 jobs is in the upper 50th percentile.

In [254]:
percentile(positive)

Account Clerk salary growth rate is in the upper 25th percentile.
Accountant II salary growth rate is in between the 50th and 75th percentile.
Accountant III salary growth rate is in the upper 25th percentile.
Accountant Intern salary growth rate is in between the 50th and 75th percentile.
Accountant IV salary growth rate is in between the 25th and 50th percentile.


###  Negative Salary Growth

Running the `growth` function on the `negative` data frame we can verify that each of the 5 jobs from have a negative salary growth from 2015 to 2019.

In [255]:
growth(negative)

Admin Analyst 3 has negative salary growth from 2015 to 2019.
Administrative Services Mgr has negative salary growth from 2015 to 2019.
Airport Noise Abatement Spec has negative salary growth from 2015 to 2019.
Animal Care Asst Supv has negative salary growth from 2015 to 2019.
Animal Control Supervisor has negative salary growth from 2015 to 2019.


Running the `percent_average` function on the `negative` data frame we learn that all 5 jobs exhibited a salary growth rate that was below average.

In [256]:
percent_average(negative)

Admin Analyst 3 salary growth rate from 2015 to 2019 is below average.
Administrative Services Mgr salary growth rate from 2015 to 2019 is below average.
Airport Noise Abatement Spec salary growth rate from 2015 to 2019 is below average.
Animal Care Asst Supv salary growth rate from 2015 to 2019 is below average.
Animal Control Supervisor salary growth rate from 2015 to 2019 is below average.


Running the `percentile` function on the `negative` data frame we learn that the percentiles of the salary growth rate for each of the 5 jobs is in the bottom 25th percentile.

In [257]:
percentile(negative)

Admin Analyst 3 salary growth rate is in the bottom 25th percentile.
Administrative Services Mgr salary growth rate is in the bottom 25th percentile.
Airport Noise Abatement Spec salary growth rate is in the bottom 25th percentile.
Animal Care Asst Supv salary growth rate is in the bottom 25th percentile.
Animal Control Supervisor salary growth rate is in the bottom 25th percentile.


##  Summary   <a id='summary'></a>
---
In comparing an occupation's compensation tendencies to that of other occupations within the same market, I had to aggregate my data in such a manner that would allow me to observe the changes in median salary over the course of a few years. I decided to narrow my study so that I would only focus my study on the <ins>salary growth rate</ins> for full time employees from 2015 to 2019. In doing so, I created several subsets that contained the yearly median salary per occupation, as well as the salary growth rate. These subsets were built to be of particular use for any student or jobseeker looking for a secure occupation with a competitive salary growth rate. 

In the `positive` subset, for example, these students or jobseekers can learn that jobs like *Account Clerk*, *Accountant II*, *Accountant III*, etc., all exhibited a positive growth rate. A student or jobseeker may want to know which jobs exhibited negative salary growth so that they can stray clear from these occupations. In this case, they can see from the `negative` subset  that jobs like *Admin Analyst 3*, *Administrative Services Mgr*, *Airport Noise Abatement Spec*, etc., all exhibited a negative growth rate.

A student or jobseeker may also be interested in the occupations with the highest or lowest salary growth rate. To find the occupations that exhibited the highest salary growth rate, they can see from the `fast_jobs` vector that these include *Telecommunications Tech Supv*, *Sr Airport Noise Abatement Spe*, *Statistician*, etc. To find the occupations that exhibited the lowest salary growth rate, they can see from the `slow_jobs` vector that these include *Pr Investigator, Tax Collector*, *Chief Surveyor*, *Sprv Adult Prob Ofc (SFERS)*, etc.

In some cases, a student or jobseeker may be interested learning which occupations are in the upper 25th percentile based on salary growth rate. Or perhaps the bottom 25th percentile (so they know to stay clear from these). Or even in between. To learn this information, students or jobseekers can utilize the lists created under the [5 different Lists](#5-different-lists) section. They can find this information by running a dataframe in the `percentile()` function under the [Conditionals & Loops](#conditionals-&-loops) section. Overall, students or jobseekers should find this article particularly useful in finding the best opportunity out there.

##  Recommendations   <a id='recommendations'></a>
---
When observing changes in income over a period of time, we can see how an occupation's compensation tendencies compare to other occupations within the same market. This is particularly helpful for any student or jobseeker that is looking for a secure occupation with a competitive salary growth rate. For these students or jobseekers, I would recommend them to look at the jobs with the highest salary growth rate which can be found in the `fast_jobs` vector:

In [259]:
cat("Jobs with the highest salary growth rate:")
fast_jobs

Jobs with the highest salary growth rate:

Additionally, I would recommend them to look at the jobs with salary growth rates in the upper 50th percentile. These can be found in the `middle.2` and `upper_quartile` lists. I would also recommend them to look at the jobs that only exhibited a positive salary growth rate. These can be found in the `positive_growth` list:

In [260]:
cat("Jobs in between the 50th and 75th percentiles:")
clean(middle.2)

cat("Jobs in the upper 25th percentile:")
clean(upper_quartile)

cat("Jobs with a positive salary growth rate:")
clean(positive_growth)

Jobs in between the 50th and 75th percentiles:

Jobs in the upper 25th percentile:

Jobs with a positive salary growth rate:

Lastly, I would recommend them to stray away from jobs with the lowest salary growth rate or with a growth rate in the lower 50th percentile. These can be found in the `slow_jobs` vector as well as the `middle.1` and `lower_quartile` lists:

In [261]:
cat("Jobs with the lowest salary growth rate:")
slow_jobs

cat("Jobs in between the 25th and 50th percentiles:")
clean(middle.1)

cat("Jobs in the lower 25th percentile:")
clean(lower_quartile)

Jobs with the lowest salary growth rate:

Jobs in between the 25th and 50th percentiles:

Jobs in the lower 25th percentile: