# Data Munging Part II - Filtering and Joining Datasets
This lab was adapted from # Glassdoor Jobs Data-Analysis 
https://github.com/Atharva-Phatak/Glassdoor-Jobs_Data-Analysis

In Data Munging Part I we learned how to explore our data and clean it up so that missing values are removed.

In this Data Munging Part II lab, we are going to learn how to:
1. Filter our Data
2. Sort Data 
3. Merge/Concatenate Data Sources

Recall that the point of data munging is to `wrangle` multiple data sources so that you can begin to perform data analysis on the data that you were given or scraped from the web. 

In most cases you are given a dataset and you must supplement your dataset with sources from web.

In this lab we will perform analysis of Glassdoor data

## About Glassdoor

![glass](https://upload.wikimedia.org/wikipedia/commons/e/e1/Glassdoor_logo.svg)

"Glassdoor is one of the world’s largest job and recruiting sites.

Built on the foundation of increasing workplace transparency, Glassdoor offers millions of the latest job listings, combined with a growing database of company reviews, CEO approval ratings, salary reports, interview reviews and questions, benefits reviews, office photos and more. Unlike other job sites, all of this information is shared by those who know a company best — the employees. In turn, job seekers on Glassdoor are well-researched and more informed about the jobs and companies they apply to and consider joining. This is why thousands of employers across all industries and sizes turn to Glassdoor to help them recruit and hire quality candidates at scale who stay longer. Glassdoor is available anywhere via its mobile apps."

## Q1. Write the code to import the pandas, numpy, and matplotlib.pyplot libraries

In [3]:
import pandas as pd
import numpy as np 
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt

# More Data Cleaning

In the next block we are improting the libraries `plotly.express`, `gc`, `re`, and `yellowbrick`. 

## The gc python library
This library is a Garbage Collector¶. This module provides an interface to the optional garbage collector.
- It is useful for when you are working with large datasets and you pull out useful informatin from these large datasets and store them in a separate dataframe.
- Also useful if you have limited space. Some cloud servers only allow you to use a certain amount of space for free services.  (i.e. Collab, jupyter notebooks, etc.)


# The re library
This is the regular expression library. You should have already been introduced to this in a previous lab. 
Go here: https://docs.python.org/3/library/re.html for more information

# Yellowbrick library
visual analysis and diagnostic tools
you may need to install it to get it to work

`pip install yellowbrick`

In [4]:
!pip3 install plotly
!pip install seaborn
!pip install nltk
!pip install gensim
!pip install yellowbrick

Collecting gensim
  Downloading gensim-3.8.3-cp38-cp38-manylinux1_x86_64.whl (24.2 MB)
[K     |████████████████████████████████| 24.2 MB 4.1 MB/s eta 0:00:01    |█                               | 819 kB 4.1 MB/s eta 0:00:06
Collecting smart-open>=1.8.1
  Downloading smart_open-4.0.1.tar.gz (117 kB)
[K     |████████████████████████████████| 117 kB 60.0 MB/s eta 0:00:01
Building wheels for collected packages: smart-open
  Building wheel for smart-open (setup.py) ... [?25ldone
[?25h  Created wheel for smart-open: filename=smart_open-4.0.1-py3-none-any.whl size=108249 sha256=7ac23dc5a83be0f3c1c2243830f0401ac0b0c730c859133a587c1d4f15dfa426
  Stored in directory: /home/jovyan/.cache/pip/wheels/8c/f9/f4/4ddd9ddee3488f48be20e9bf3108961f03ae23da29b7ed26d1
Successfully built smart-open
Installing collected packages: smart-open, gensim
Successfully installed gensim-3.8.3 smart-open-4.0.1
Collecting yellowbrick
  Downloading yellowbrick-1.2-py3-none-any.whl (269 kB)
[K     |██████████████████

In [5]:
import seaborn as sns
import nltk 
import plotly.express as px
import gc
import string
import re
import yellowbrick

#!pip install pandas plotnine
!pip install datascience

#from plotnine import *
from datascience import *

pd.set_option('display.max_colwidth', 0)
pd.options.display.max_columns = 0



## Q2. Write the code to use pandas to load the csv files Data_Job_NY.csv, Data_Job_SF.csv, Data_Job_TX.csv, and Data_Job_WA.csv into dataframes.
Name the dataframes `ny_df`, `sf_df`, `tx_df`, and `wa_df`

Remember that your csv files should be located in the data directory

In [20]:
ny_df = pd.read_csv("Data_Job_NY.csv")
sf_df = pd.read_csv("Data_Job_SF.csv")
tx_df = pd.read_csv("Data_Job_TX.csv")
wa_df = pd.read_csv("Data_Job_WA.csv")

## Q3. Write the code to print out the count, mean, std, min, and max of all of the datasets loaded. 
Note: You'll have to run the code in a separate cell for each of the datasets

In [21]:
ny_df.describe()

Unnamed: 0,Min_Salary,Max_Salary,Rating
count,900.0,900.0,660.0
mean,33789.711111,49847.461111,3.922727
std,40201.559469,59552.391775,0.65132
min,-1.0,-1.0,2.5
25%,-1.0,-1.0,3.5
50%,20000.0,35000.0,4.0
75%,64829.0,87057.0,4.3
max,125410.0,212901.0,5.0


In [22]:
sf_df.describe()

Unnamed: 0,Min_Salary,Max_Salary,Rating
count,889.0,889.0,808.0
mean,75989.293588,105111.84027,3.915223
std,56101.457881,75131.99965,0.666049
min,-1.0,-1.0,1.3
25%,-1.0,-1.0,3.6
50%,88309.0,125886.0,3.9
75%,117464.0,160387.0,4.4
max,205735.0,315439.0,5.0


In [23]:
tx_df.describe()

Unnamed: 0,Min_Salary,Max_Salary,Rating
count,643.0,643.0,587.0
mean,49856.833593,75973.337481,3.742589
std,37174.830891,55070.548762,0.593329
min,-1.0,-1.0,1.0
25%,-1.0,-1.0,3.4
50%,51465.0,86476.0,3.8
75%,77272.0,114060.0,4.1
max,195818.0,383416.0,5.0


In [24]:
wa_df.describe()

Unnamed: 0,Min_Salary,Max_Salary,Rating
count,892.0,892.0,794.0
mean,60523.627803,92022.095291,3.758564
std,41024.359069,59570.260961,0.56714
min,-1.0,-1.0,1.0
25%,27842.0,56870.0,3.4
50%,67662.0,106081.5,3.7
75%,90930.25,128731.25,4.1
max,179685.0,294949.0,5.0


## Q4. Write the code to print the first 2 rows of the NY dataset

In [25]:
ny_df.head(2)

Unnamed: 0,Job_title,Company,State,City,Min_Salary,Max_Salary,Job_Desc,Industry,Rating,Date_Posted,Valid_until,Job_Type
0,Chief Marketing Officer (CMO),National Debt Relief,NY,New York,-1,-1,"Who We're Looking For:\n\nThe Chief Marketing Officer (CMO) is an exempt, executive position, responsible for all marketing operations of the company including lead acquisition, sales enablement, communications, retention, and brand development. This executive leads a team of enthusiastic, analytical, and passionate marketing professionals to develop, execute, and optimize the marketing strategy. We are looking for someone with a history of brand development and proven ability to accelerate company growth leveraging the latest marketing strategies and technologies. This role goes beyond traditional marketing tactics to generate awareness, educate the consumer on the viability of our service, and in turn drive the consumer to take action and engage the brand.\n\nPrincipal Duties and Responsibilities:\n\nLead the full marketing strategy and have accountability over development, execution, and optimization across all channels including paid and organic search, display, email, social, TV, radio, direct mail, and affiliate marketing.\nCommunicate with the leadership team and key stakeholders to execute lead generation, sales enablement, and retention-based marketing campaigns that align with and deliver against business goals.\nDevelop and execute social media, content, and communication strategies to further our public relations and community engagement.\nIdentify, forge, and grow strategic marketing partnerships.\nBuild a highly efficient and capable team of marketing professionals.\nDefine the competitive marketplace and evolve our brand awareness through strategy development and brand building tactics.\nLead research and development into new marketing tactics and strategies while improving current systems.\nEstablish key metrics and manage goals while leading the improvement of our pipeline for sales.\nEstablish framework for all marketing activity, tracking results and reporting progress with management.\nDevelop segmentation, competitive analysis, market intelligence, salesforce effectiveness, strategic planning and revenue retention and growth.\n\nQualifications:\n\nA completed BS degree in Business, Marketing, Advertising or other related discipline.\nMinimum experience required 10+ years of professional experience in a leadership marketing role.\nExperience building and executing brand awareness and public relations campaigns.\nExperience in a fast-growing company with a track record of delivering big results.\nHighly proficient and effective communication skills\nAbility to utilize data analytics to deliver insight and identify opportunities for growth.\nA strong record of developing successful, innovative and cost-effective marketing campaigns.\nPunctual and ready to report to work on a consistent basis.\nTravel up to 25 percent of the time.\nExcel in a fast-paced environment.\n\nWhat We Offer:\n\nA team-first, work hard play hard culture, full of rewards and recognition for our employees. We are dedicated to our employees' success and growth.\n\nOur extensive benefits package includes:\n\n\nGenerous Medical, Dental, and Vision Benefits\n401(k) with Company Match\nPaid Holidays, Volunteer Time Off, Sick Days, and Vacation\n10 weeks Paid Parental Leave\nPre-tax Transit Benefits\nDiscounted Gym Membership\nCiti Bike Annual Membership Discounts\nNo-Cost Life Insurance Benefits\nVoluntary Benefits Options\nASPCA Pet Health Insurance Discount\n\nAbout National Debt Relief:\n\nNational Debt Relief is one of the country's largest and most reputable debt settlement companies. We are made up of energetic, smart, and compassionate individuals who are passionate about helping thousands of Americans with debt relief. Most importantly, we're all about helping our customers through a tough financial time in their lives with education and individual customer service.\n\nWe are dedicated to helping individuals and families rid their lives of burdensome debt. We specialize in debt settlement and have negotiated settlements for thousands of creditor and collections accounts. We provide our clients with both our expertise and our proven results. This means helping consumers in their time of hardship to get out of debt with the least possible cost. It can also mean conducting financial consultations, educating the consumer, and recommending the appropriate solution. Our core services offer debt settlement as an alternative to bankruptcy, credit counseling, and debt consolidation. We become our clients' number one advocate to help them reestablish financial stability as quickly as possible.\n\nNational Debt Relief is a certified Great Place to Work®!\n\nNational Debt Relief is an equal opportunity employer and makes employment decisions without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, disability status, or any other status protected by law\n\n#ZR",Finance,4.0,2020-05-08,2020-06-07,FULL_TIME
1,Registered Nurse,Queens Boulevard Endoscopy Center,NY,Rego Park,-1,-1,"Queens Boulevard Endoscopy Center, an endoscopy ASC located in Rego Park, has an exciting opportunity for Full-Time Registered Nurse! Successful candidates will provide quality nursing care in all areas of the Center including pre-assessment, pre-op and pacu Qualified candidates must possess the following:\n\nCurrent NY state RN license\nBLS Certification, ACLS preferred\nMust be a team-player with excellent multi-tasking and interpersonal skills\nCompassion for patient needs and a high degree of professionalism\nChinese Speaking and Spanish Preferred\n\nQueens Boulevard Endoscopy Center offers a pleasant professional work environment and no evening or holiday work hours. Drug-free work environment and EOE.",,3.0,2020-04-25,2020-06-07,FULL_TIME


## Q5. Write the code to print the name of the columns for only one of the dataframes

Note: the data was scrapted from glassdoor and will have the same column information for each dataframe loaded

In [26]:
ny_df.columns

Index(['Job_title', 'Company', 'State', 'City', 'Min_Salary', 'Max_Salary',
       'Job_Desc', 'Industry', 'Rating', 'Date_Posted', 'Valid_until',
       'Job_Type'],
      dtype='object')

## ***Information About the columns present in the Data***

1. The 12 columns in the datasets:
    * ***Job_title*** : The title of job which you are applying to
    * ***Company*** : Company name
    * ***State/City*** : State/City in which the companies job posting is listed.
    * ***Min_Salary*** : Minimum yearly salary in USD.
    * ***Max_Salary*** : Maximum yearly salary in USD.
    * ***Job_Desc*** : The job description which included skills,requirements,etc
    * ***Industry*** : The industry in which the company works.
    * ***Date_posted*** : The date  on which the job was posted on glassdoor
    * ***Valid_until*** : The last date of applying to the job.
    * ***Job_Type*** : Type of job full-time , part-time,etc.


### Sorting column names

You can sort the names of the columns alphabettically using the below `sorted` function
`sorted(df)` where df is the name of the dataframe

## Q6. Write the code to sort the column names alphabetically

In [27]:
sorted(ny_df)

['City',
 'Company',
 'Date_Posted',
 'Industry',
 'Job_Desc',
 'Job_Type',
 'Job_title',
 'Max_Salary',
 'Min_Salary',
 'Rating',
 'State',
 'Valid_until']

# Joining OR Concatenating Dataframes
To join dataframes together use the panda function `concat`.
`pd.concat(df1, df2, df3, ..., dfn)` where pd is the panda library name and df1 is dataframe1, df2 is dataframe2, and df3 is dataframe3

In [28]:
all_df = pd.concat([ny_df , sf_df , tx_df, wa_df] , axis = 0 , ignore_index = True)

# Garbage Collection
In some cases you should perform garbage collection to clear up your workspace
This is especially true when working on cloud-based systems like Collab or Jupyter notebooks

Use the `gc.collect()` function to clean up any dataframes that you don't need anymore
To do this you'll need to delete them first then call `gc.collect()`

In [29]:
del ny_df , sf_df , tx_df ,wa_df
gc.collect()

198

## Q7. Write the output from the collect function below

198

## Q8. What do you think it means?

That could be the number of value or rows that were deleted and collected.

# Beginning Exploratory Data Analysis

## Q9. How many rows and columns does your all_df have? Write the code below.

In [30]:
all_df.shape

(3324, 12)

In [31]:
all_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3324 entries, 0 to 3323
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Job_title    3324 non-null   object 
 1   Company      3324 non-null   object 
 2   State        3322 non-null   object 
 3   City         3318 non-null   object 
 4   Min_Salary   3324 non-null   int64  
 5   Max_Salary   3324 non-null   int64  
 6   Job_Desc     3324 non-null   object 
 7   Industry     2700 non-null   object 
 8   Rating       2849 non-null   float64
 9   Date_Posted  3324 non-null   object 
 10  Valid_until  3324 non-null   object 
 11  Job_Type     3324 non-null   object 
dtypes: float64(1), int64(2), object(9)
memory usage: 311.8+ KB


# Working with Data to Sort and Filter it

Sometimes the data you are given or that you have scraped will need to be converted to another format. 

In all_df, we'll mainly we working with min_salary and max_salary

To work with these values we'll need to convert them to int

In [32]:
all_df['Min_Salary'] = all_df['Min_Salary'].apply(lambda x : int(x))
all_df['Max_Salary'] = all_df['Max_Salary'].apply(lambda x : int(x))

# Working with Dates in Datasets
Many datasets have dates within them
To work with dates, and to sort and filter them properly you may need to work with only the month
or only the year or only the day.

Use the `calendar` library as shown below

In [33]:
import calendar
all_df['Month'] = all_df['Date_Posted'].apply(lambda x : calendar.month_abbr[int(str(x).split('-')[1])]) 

## Q10 Write the code to extract the date and day from Valid Until column. 
data is the format y-m-d
Name it `all_df['Valid_Month']`

In [36]:
all_df['Valid_Month'] = all_df['Valid_until'].apply(lambda x : calendar.month_abbr[int(str(x).split('-')[1])])

## Converting Dates to Day
Sometimes you will need to convert a date into a given day
To do this, you can use the function created below called 
`Convert_to_Day`


In [37]:
def Convert_to_Day(x):
    sl = x.split('-')  
    return calendar.day_abbr[int(calendar.weekday(int(sl[0]) , int(sl[1]) , int(sl[2])))]

## Q11. Use the Convert to Day function to convert the Date_Posted and Valid_Until values to days
Print out row 105 in the dataset

In [41]:
Convert_to_Day(all_df['Date_Posted'][105]), Convert_to_Day(all_df['Valid_until'][105])

('Tue', 'Sun')

# Revisiting Working with Missing Data
In Data Munging Part I, we removed missing data

Sometimes you'll want to save that data for later so you can do some analysis on the erroneously provided or missing data
This is shown below

In [46]:
index_missing = all_df[(all_df['Min_Salary'] == -1)].index
test_df = all_df.iloc[index_missing, :].reset_index(drop = True)

Int64Index([   0,    1,    2,    9,   12,   16,   17,   19,   20,   21,
            ...
            3289, 3292, 3293, 3295, 3296, 3300, 3301, 3302, 3303, 3304],
           dtype='int64', length=1092)

## Q11. Now that you have this missing data, you can now drop it from the dataframe. Write the code to do this below.
**Hint: You should use the function `drop` that follows this format
`df.drop(missing_data_index, axis=0, inplace=True)` where `df` is the dataframe
and `missing_data_index` is a list of rows to drop from the dataframe

In [48]:
all_df.drop(index_missing, axis=0, inplace=True)

# Working with Duplicates
Sometimes in your dataset because it is scraped from the web, there may be duplicates
You'll need to check for these duplicates because it will impace your data analysis

In [49]:
cols = [col for col in all_df.columns if col not in ['Day' , 'Month']]
 
train_series = all_df.duplicated(cols , keep = 'first')
data_df      = all_df[~train_series].reset_index(drop = True)
test_series  = test_df.duplicated(cols , keep = 'first')
test_df      = test_df[~test_series].reset_index(drop = True)

# Looking for Unique Values in your Dataframe
Sometiems you'll need to look for unique values in your dataframe 
Use the `unique` function to do this
Follows this format `df['COL_NAME'].unique()` where df is the dataframe and COL_NAME is the column name in the dataframe

In [50]:
print(all_df['State'].unique())

['NY' 'NJ' 'CA' 'KY' 'TX' 'TN' 'VA' 'MD' 'DC' 'NC']


## Q12. Write the code to count the number of unique States from the previous operation. Name the variable num_states and print it

In [54]:
num_states = len(all_df['State'].unique())
num_states

10

In [55]:
for state in all_df['State'].unique():
    print(f"State of {state}")
    print(all_df[all_df['State'] == state]['City'].value_counts()[:5])

State of NY
New York          240
Williston Park    30 
Rego Park         30 
Maspeth           30 
Staten Island     30 
Name: City, dtype: int64
State of NJ
Paramus        30
Jersey City    30
Name: City, dtype: int64
State of CA
San Francisco          302
South San Francisco    122
Menlo Park             29 
San Mateo              27 
Redwood City           20 
Name: City, dtype: int64
State of KY
Florence    1
Name: City, dtype: int64
State of TX
Austin         132
Dallas         79 
Houston        67 
San Antonio    41 
Irving         40 
Name: City, dtype: int64
State of TN
Chennai    1
Name: City, dtype: int64
State of VA
Arlington      77
McLean         50
Reston         35
Springfield    34
Alexandria     29
Name: City, dtype: int64
State of MD
Gaithersburg     41
Rockville        36
Silver Spring    25
College Park     23
Bethesda         20
Name: City, dtype: int64
State of DC
Washington    155
Name: City, dtype: int64
State of NC
Raleigh    1
Name: City, dtype: int64


## Q13. What city has the most job openings? Write your answer below

San Fransico has the most with 302.

## Q14. What city has the least job openings? What states do they occur in?

Florence of KY, Chennai of TN and Raleigh of NC all only have one opening.

# Identifying and Removing Outliers
In some cases you'll have outliers in your data. 
An `outlier` is an observation that lies an abnormal distance from other values in a random sample from a population. 
Sometimes negative numbers, zero, or really large numbers can be outliers in your sample population

See your textbook Sampling from a Population https://www.inferentialthinking.com/chapters/10/2/Sampling_from_a_Population.html

In [61]:
index_outlier = all_df[(all_df['State'] =='NC') | (all_df['State'] =='TN') | (all_df['State'] =='KY')].index
all_df.drop(index_outlier , inplace = True)

# Visualizing the Data with pie charts
The below code shows how to make a pie chart for the CA

In [74]:
max_state = all_df[(all_df['State'] =='CA')].index
for i,state in enumerate(max_state,1):
    cities = all_df[all_df['State'] == state]['City'].value_counts()[:5].index.to_list()
    counts = all_df[all_df["State"] == state]['City'].value_counts()[:5].to_list()

my_colors  = ['lightgray','lightblue','crimson', 'beige', 'yellow']
my_explode = (0, 0.1, 0)

plt.pie(counts,labels=cities,autopct='%1.1f%%',startangle=15, shadow = True, colors=my_colors, normalize=False)
plt.title('California GlassDoor Cities')
plt.axis('equal')
plt.show()

## Q15. Write the code to create a pie chart for TX. 

Add a title to your pie chart

In [60]:
max_state = ['TX' ]
for i,state in enumerate(max_state,1):
    cities = all_df[all_df['State'] == state]['City'].value_counts()[:5].index.to_list()
    counts = all_df[all_df["State"] == state]['City'].value_counts()[:5].to_list()

my_colors  = ['lightblue','lightsteelblue','silver', 'lightgrey', 'crimson']
my_explode = (0, 0.1, 0)

plt.pie(counts,labels=cities,autopct='%1.1f%%',startangle=15, shadow = True, colors=my_colors)
plt.title('Texas GlassDoor Cities')
plt.axis('equal')
plt.show() 

# Using the Groupby functionality
A groupby operation involves some combination of splitting the object, 
applying a function, and combining the results. 

This can be used to group large amounts of data and compute operations on these groups.

This is shown in the example below

In [75]:
states = all_df['State'].unique().tolist()

min_sal =  all_df.groupby('State')['Min_Salary']
max_sal =  all_df.groupby('State')['Max_Salary']

min_sal.min()


State
CA    29611
DC    21096
MD    20268
NJ    38471
NY    20000
TX    19857
VA    29516
Name: Min_Salary, dtype: int64

## Q16. Use the groupby function to find the minimal salary for all companies
Print this information

In [80]:
min_sals =  all_df.groupby('Company')['Min_Salary']
print(min_sals.min())

Company
159 Solutions, Inc.          110591
1901 Group                   79171 
22nd Century Technologies    85715 
23andMe                      78913 
911 Datamaster Inc           45694 
                             ...   
price.com                    122998
steampunk                    108661
sydata                       109626
tekwissen                    24457 
vidIQ                        137812
Name: Min_Salary, Length: 959, dtype: int64


## Extracting Features out of Job Description 

In [95]:
x = all_df.Job_Desc.str.replace('\n\n', '\n')
x = x.str.split('\n')

## Q17. What are some observations that you noticed about the job description column. What's the format or structure of the job description

The responsibilities are split by '\n'.

    
    
    
# Cleaning up HTML Artifiacts 
Sometimes you will need to clean up the data
Use the regular expression library to do that
Use the `replace` function

In [97]:
all_df['Job_Desc'] = all_df['Job_Desc'].replace('\n\n' , " " , regex = True)
all_df['Job_Desc'] = all_df['Job_Desc'].replace('\n' , " " , regex = True)

test_df['Job_Desc'] = test_df['Job_Desc'].replace('\n\n' , " " , regex = True)
test_df['Job_Desc'] = test_df['Job_Desc'].replace('\n' , " " , regex = True)

from gensim.parsing.preprocessing import remove_stopwords
def Remove_puncutations_stopwords(s):

    s = ''.join([i for i in s if i not in string.punctuation])
    s = remove_stopwords(s)
    return s

data_df['Job_Desc'] = data_df['Job_Desc'].apply(lambda x : Remove_puncutations_stopwords(x))

data_df['Job_Desc'][2]

'Emergency VeterinarianThe family joining VEG rapidly growing group emergency practices multiple locations single mission Helping People Their Pets When They Need Most We changing face emergency veterinary medicine “client first” mentality We’re group passionate thought leaders believe power open mind servant leadership If you’re ideal candidate you’ll Have earned DVM equivalent degree Be fulfilled helping Thrive teamoriented environments think hospital retreats team dinners happy hours Have ‘glass half full’ attitude sense humor Live breathe emergency medicine Be passionate emergency surgery soft tissue kind Benefits Why choose Because emergency best We offer Industryleading compensation signing bonus monthly bonuses Health Insurance 401K w company match Unlimited CE Flexible work schedules true worklife balance 3 shifts week fulltime Growth potential Fresh groceries sent weekly monthly quarterly contests quarterly hospital outings annual companywide retreat'

# Saving Data
After you worked with some data sometimes you'll need to save it to work with later

In [99]:
all_df.to_csv("all_data.csv" , index = False)

In [100]:
all_df

Unnamed: 0,Job_title,Company,State,City,Min_Salary,Max_Salary,Job_Desc,Industry,Rating,Date_Posted,Valid_until,Job_Type,Month,Valid_Month
3,Senior Salesforce Developer,National Debt Relief,NY,New York,44587,82162,"Principle Duties & Responsibilities: Analyze complex systems and troubleshoot and isolate system issues; Understand requirements for business users and translate into design specifications, utilizing thorough understanding of the Salesforce platform, Salesforce products and licensing models; Utilize thorough understanding of application development, project lifecycle, and methodologies and ability to work under tight deadlines and handle multiple detail-oriented tasks; Apply knowledge of Salesforce developmentand customizations, with APEX, Visual Force, API, Force.com and Workflows, taking into account com best practices, support mechanisms, procedures, and limitations, as well as NDR's unique needs; Responsible for Salesforce administration, release management and deployment as well as management of Salesforce.com sandboxes, including their integrations; Design and execute Salesforce.com configuration changes, leveraging the Salesforce interface to sync with internal tracking systems; Design, develop, and maintain integration and synchronization programs; Design the data model, user interface, business logic, and security for custom applications; and Design, develop, and customize software solutions for end users by using analysis and mathematical models to effectively predict and measure the results of the design using Chatter, Communities and other Salesforce applications. Requirements: Bachelor of Science degree or foreign equivalent in Information Systems, Computer Science, Computer Engineering, Software Engineering or a related field 3 years of experience with the Salesforce platform, specifically: development with Apex, VisualForce, and Force.com; Design and execute Salesforce.com configuration changes, leveraging the Salesforce interface to sync with internal tracking systems; Salesforce administration, release management, and deployment Salesforce products and licensing models Management of Salesforce.com sandboxes, including their integrations; Chatter, Communities, and other Salesforce apps com best practices, support mechanisms, procedures, and limitations. What We Offer: We believe in a team-first culture, full of rewards and recognition for our employees. We are dedicated to our employees' success and growth within the company, through our employee mentorship and leadership programs. Our extensive benefits package includes: Medical, Dental, and Vision Benefits 401(k) Match Paid Holidays, Volunteer Time Off, Sick Days, and Vacation 10 Weeks Paid Parental Leave Pre-tax Transit Benefits Discounted Gym Membership No-cost Life Insurance Benefits About National Debt Relief: National Debt Relief is one of the country's largest and most reputable debt settlement companies. We are made up of energetic, smart, and compassionate individuals who are passionate about helping thousands of Americans with debt relief. Most importantly, we're all about helping our customers through a tough financial time in their lives with education and individual customer service. We are dedicated to helping individuals and families rid their lives of burdensome debt. We specialize in debt settlement and have negotiated settlements for thousands of creditor and collections accounts. We provide our clients with both our expertise and our proven results. This means helping consumers in their time of hardship to get out of debt with the least possible cost. It can also mean conducting financial consultations, educating the consumer, and recommending the appropriate solution. Our core services offer debt settlement as an alternative to bankruptcy, credit counseling, and debt consolidation. We become our clients' number one advocate to help them reestablish financial stability as quickly as possible. #ZR",Finance,4.0,2020-05-08,2020-06-07,FULL_TIME,May,Jun
4,"DEPUTY EXECUTIVE DIRECTOR, PROGRAM AND LEGAL ADVOCACY",National Advocates for Pregnant Women,NY,New York,125410,212901,"For FULL Job Announcement, visit our website: www.AdvocatesForPregnantWomen.org Reporting to and working collaboratively with the Executive Director (ED), the Deputy Executive Director, Program & Legal Advocacy (DED) is a member of the Senior Management Team (SMT) providing leadership for and supervision of NAPW’s legal team and taking responsibility for the day-to-day program operations of the organization. The DED as an experienced senior level attorney with executive management experience and serves as a strategic thought partner and advisor to the Executive Director and the SMT. In absence of the Executive Director, the DED (in consultation with the COO), is designated as the highest authority to respond to internal and external inquiries, make programmatic/advocacy decisions, and represent NAPW in any and all responsibilities assigned to the ED. Responsibilities include (but are not limited to): Partnering with the ED to create and implement NAPW’s mission-work and strategic planning;Working collaboratively with the SMT (collectively responsible for the critical business functions of Program, Finance/Operations, Human Resources, Communications, and Development/Grant Administration), to develop and implement administrative policies and procedures for guiding operations, strengthening internal systems, ensuring high levels of staff engagement, managing performance, encouraging continuous learning, and promoting administrative and programmatic alignment;Helping to create NAPW’s reproductive justice public policy/public advocacy initiatives and determining when NAPW supports and/or joins related allied efforts by other organizations;Directly supervising the day-to-day work of the Senior Staff Attorneys, Staff Attorneys, post-graduate Fellows, legal & programmatic interns, legal contractors, loaned associates, and Research and Program Associates. Supervision includes coaching and training, performance review, assigning and reviewing work, mentoring, analysis and editing of written work and providing the ED with sufficient time to review; Minimum qualifications include: JD degree from an accredited law school is required; Membership in at least 1 (one) state AND federal bar is required;Master’s Degree in Non-profit Management, Public Policy, Social Work, or a related field is highly-desirable;8-10 years: of senior-level management experience in a non-profit legal advocacy/public interest/social justice environment, with demonstrable success in change implementation; complex litigation and advocacy experience as an attorney providing direct client representation, with a particular emphasis in public interest law and reproductive justice and drug policy litigation in state and federal courts; experience in the supervision of attorneys and managing programs (and staff);Demonstrated capacity to serve as a member of a Senior Management Team and advisor to the Executive Director on all matters pertaining to NAPW's legal advocacy;Knowledge of and experience in reproductive health, rights, and justice; civil rights with knowledge of drug policy reform, women’s rights, family law, child welfare reform, and human rights is highly-desirable. NOTE: YOUR SUBMISSION WILL BE REJECTED IF YOU HAVE NOT PROVIDED ALL MATERIALS AND INFORMATION AS INSTRUCTED BELOW. REQUIRED SUBMISSIONS (MUST INCLUDE ALL ITEMS LISTED BELOW): 1. Cover Letter which must include all of the following elements: a) Your personal & professional motivation for seeking this position. b) A discussion of what makes you the ideal/best candidate for this position. c) Explain how your skill sets and experience best demonstrate your strategic approach. d) Salary Requirement. e) Indicate where you found this Job Announcement. 2. Resumé. 3. Two (2) Writing Samples solely reflecting applicant’s own work (MUST submit BOTH A and B): a) One Non- legal advocacy writing sample such as an article, commentary or blog. b) One Legal writing sample (i.e., a legal brief, argument or analysis) consisting of NO MORE THAN ten pages of text. 4. Complete contact information for three (3) professional references. INSTRUCTIONS: NO PHONE CALLS OR FAXES PLEASE. All submissions must be sent VIA EMAIL ONLY SUBJECT: ATTN: Human Resources – NAPW Deputy Executive Director, Program & Legal Advocacy (JAN. 2020) Job Type: Full-time Experience: Reproductive Justice/Reproductive Rights legal advocacy: 5 years (Preferred)Non-profit Executive/Senior Management: 8 years (Required)Supervising Attorney: 5 years (Required)Public Interest Law and litigation: 6 years (Required) Education: Doctorate (Required) Work Location: One location Benefits: Health insuranceDental insuranceVision insuranceRetirement planPaid time offParental leaveProfessional development assistanceTuition reimbursement Schedule: Monday to Friday",,,2020-04-28,2020-06-07,FULL_TIME,Apr,Jun
5,Emergency Veterinarian - NYC,Veterinary Emergency Group,NY,New York,94715,103279,"Emergency VeterinarianThe family you will be joining: VEG is a rapidly growing group of emergency practices with multiple locations and a single mission: Helping People and Their Pets When They Need it Most. We are changing the face of emergency veterinary medicine with a “client first” mentality. We’re a group of passionate, thought leaders that believe in the power of an open mind and servant leadership. If you’re the ideal candidate, you’ll: Have earned a DVM or equivalent degree Be fulfilled by helping others Thrive in team-oriented environments (think hospital retreats, team dinners, happy hours and more!) Have a ‘glass half full’ attitude and a sense of humor! Live and breathe emergency medicine Be passionate about emergency surgery (the soft tissue kind!) Benefits Why you should choose us: Because emergency is all we do, so we do it best! We also offer: Industry-leading compensation + signing bonus + monthly bonuses Health Insurance 401K w/ company match Unlimited CE Flexible work schedules for a true work-life balance (3 shifts a week is full-time for us!) Growth potential Fresh groceries sent weekly, monthly and quarterly contests, quarterly hospital outings, annual company-wide retreat, etc!",Health Care,4.9,2020-05-05,2020-06-07,FULL_TIME,May,Jun
6,ABA Therapist,Kids Learning Loft Applied Behavior Analysis Services,NY,Williston Park,20000,35000,"Here at Kids Learning Loft Applied Behavior Analysis Services, PLLC, we are the rapidly expanding company in our industry in Williston Park, NY. We are hiring experienced part-time ABA Therapists to help us keep growing. If you're dedicated and ambitious, Kids Learning Loft is an excellent place to grow your career. We offer hands on training and a rigurous supervision program. Don't hesitate to apply.Responsibilities Study patient behavior and apply ABA principles Respond appropriately to different situations common among Autism patients and others with behavioral and developmental challenges Utilize key communication skills to provide effective feedback to patients Effectively communicate positive feedback to patients Be able to recognize and respond to critical improvements in patient behaviors. Become familiar with and use behavioral redirection techniques Know how to respond to negative behaviors appropriately Provide written documentation on each patients. Qualifications Preferred Master's degree in ABA, psychology, education, or related field of study Preferred Registered Behavior Technician certificate from the Behavior Analyst Certification Board. 0-5 years of experience required for entry-level positions Strong communication skills required Ability to work under high-stress situations Exhibits significant reliable habits, including timeliness and organizational skills Proven experience working with pre-school and elementary school-aged children Additional evidence of successful work with patients suffering from Autism and development issues Other experience, certificates, or qualifications as required by state",,,2020-05-07,2020-06-07,PART_TIME,May,Jun
7,Construction Project Manager,The LiRo Group,NY,Brooklyn,54991,143860,"Overview Ranked among the nation's top 10 Construction Managers by Engineering News-Record, The LiRo Group provides integrated construction, design, and technology solutions for a broad range of public and private sector clients. Our continued growth has created an immediate need for an experienced Construction Project Manager with strong electrical expertise from both a technical and strategic implementation perspective for a critical hospital electrical upgrade in Brooklyn, NY. Responsibilities Overall construction management team leadership, including effective coordination with hospital patient departments and facility groups Direct communication with client, facility, stakeholders and user groups Oversee and ensure quality and consistency of construction and all aspects of electrical improvements Provide technical evaluations, advice and guidance Coordination with adjacent projects impacted by the proposed work Management of project administrative efforts, including progress reports, submittals, requisitions and change orders Qualifications Bachelor's Degree in Electrical Engineering or related discipline Emergency Power Expereince a must 15+ years' experience in Project Management Strong electrical knowledge, particularly for Type 1 EES Systems Experience in an occupied hospital facility a necessity Strong communication skills at multiple project levels ranging from tradespeople to facility executives Ability to work under tight deadlines and handle multiple tasks Please visit our website for all of our career opportunities at: https://careers-liro.icims.com We offer a competitive salary commensurate with experience, a comprehensive benefits package and a positive work environment. Equal Opportunity Employer PI120130472","Construction, Repair & Maintenance",3.8,2020-05-08,2020-06-07,FULL_TIME,May,Jun
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3319,Data Engineer/Architect with Security Clearance,Booz Allen Hamilton,VA,McLean,74916,128610,"Job Number: R0082817 Data Engineer/Architect Key Role: Support the collection, ingestion, storage, processing, and analysis of complex datasets to disseminate mission-critical insights to our clients. Design, architect, implement, monitor, and maintain solutions to enable increasingly complex data analytics. Integrate solutions with broader technology architecture used across the organization while influencing enterprise architecture to meet the needs of the future. Maintain the perspective of the entire client organization, mapping the systems and interfaces used to manage data, setting standards for data management, analyzing the current state and conceiving desired future state, and articulating projects needed to close the gap between the current state and future goals. Basic Qualifications: * 8+ years of experience in data modeling and database design, from conceptualization to database optimization 2+ years of experience with NoSQL databases, including HBase or Cassandra * Experience with Hadoop cluster and all included services, including Hadoop v2, HDFS * Knowledge of Big Data querying tools, including Pig, Hive, or Impala * Knowledge of the system development life cycle; software project management approaches; and requirements, design, and test techniques Knowledge of established and emerging data technologies; conversant in emerging tools like columnar and NoSQL databases, predictive analytics, data visualization, and unstructured data Ability to obtain a security clearance BA or BS degree Additional Qualifications: * Ability to explain advanced concepts to team members, users, and clients Ability to work independently and address ad-hoc challenges Possession of excellent communications skills Possession of excellent problem-solving skills Clearance: Applicants selected will be subject to a security investigation and may need to meet eligibility requirements for access to classified information. We're an EOE that empowers our people-no matter their race, color, religion, sex, gender identity, sexual orientation, national origin, disability, veteran status, or other protected characteristic-to fearlessly drive change.",Business Services,3.7,2020-04-24,2020-06-06,FULL_TIME,Apr,Jun
3320,Data Engineer with Security Clearance,Booz Allen Hamilton,VA,Arlington,58824,112227,"Job Number: R0083152 Data Engineer The Challenge: Are you excited at the prospect of unlocking the secrets held by a data set? Are you fascinated by the possibilities presented by the IoT, machine learning, and artificial intelligence advances? In an increasingly connected world, massive amounts of structured and unstructured data open up new opportunities. As a data scientist, you can turn these complex data sets into useful information to solve global challenges. Across private and public sectors - from fraud detection, to cancer research, to national intelligence - you know the answers are in the data. We have an opportunity for you to use your analytical skills to improve strategic innovation for the federal government. You'll work closely with your customer to understand their questions and needs, and then dig into their data-rich environment to find the pieces of their information puzzle. You'll mentor teammates, develop algorithms, write scripts, build predictive analytics, apply machine learning, and use the right combination of tools and frameworks to turn that set of disparate data points into objective answers to help federal health organizations make informed decisions. You'll provide your customer with a deep understanding of their data, what it all means, and how they can use it. Join us as we use data science for good in the federal government. Empower change with us. You Have: -3+ years of experience within data science and engineering -2+ years of experience in working with machine learning models and algorithms, including natural language processing (NLP) -2+ years of experience with object-oriented programming, including Java, Scala, or Python -Experience with Big Data technologies, including HDFS, Hadoop, and Spark -Experience with manipulating data and extract, transform, and load (ETL) in parallel processing and distributed compute environments -Experience with using Cloud services, including AWS and Azure -Ability to learn technical concepts quickly and communicate with multiple functional groups -Secret clearance -BA or BS degree Nice If You Have: -2+ years of experience with designing novel data analytic methods and workflows, including full data pipelines from raw data through analysis results -Ability to manage and manipulate large data sets, develop data science approaches, and manage data science tasks -Ability to leverage a wide variety of data science capabilities and languages -Ability to exhibit flexibility, initiative, and innovation when dealing with ambiguous and fast-paced situations -MA or MS degree in Engineering, Statistics, Mathematics, or Data Science Clearance: Applicants selected will be subject to a security investigation and may need to meet eligibility requirements for access to classified information; Secret clearance is required Build Your Career: At Booz Allen, we know the power of analytics and we're dedicated to helping you grow as a data analysis professional. When you join Booz Allen, you can expect: * access to online and onsite training in data analysis and presentation methodologies, and tools like Hortonworks, Docker, Tableau, and Splunk a chance to change the world with the Data Science Bowl-the world's premier data science for social good competition participation in partnerships with data science leaders, like our partnership with NVIDIA to deliver Deep Learning Institute (DLI) training to the federal government You'll have access to a wealth of training resources through our Analytics University, an online learning portal specifically geared towards data science and analytics skills, where you can access more than 5000 functional and technical courses, certifications, and books. Build your technical skills through hands-on training on the latest tools and state-of-the-art tech from our in-house experts. Pursuing certifications? Take advantage of our tuition assistance, onsite boot camps, certification training, academic programs, vendor relationships, and a network of professionals who can give you helpful tips. We'll help you develop the career you want, as you chart your own course for success. We're an EOE that empowers our people-no matter their race, color, religion, sex, gender identity, sexual orientation, national origin, disability, veteran status, or other protected characteristic-to fearlessly drive change. #LI-AH1, CJ1",Business Services,3.7,2020-05-02,2020-06-06,FULL_TIME,May,Jun
3321,"Data Engineer, Mid with Security Clearance",Booz Allen Hamilton,VA,Herndon,58824,112227,"Job Number: R0083073 Data Engineer, Mid Key Role: Leverage expertise in structured and unstructured data to perform data engineering activities on cutting-edge projects in the ind us try working with Big Data tools. Architect data systems and stand up data platforms, build out ETL pipelines, write c us tom code, interface with data stores, perform data ingestion, and build data models. Assess, design, build, and maintain scalable data platforms that us e the latest and best in Big Data tools. Perform analytical exploration and examination of data from multiple sources of data. Work in Scrum-based Agile environment with multi-disciplinary team of analysts, data engineers, data scientists, developers, and data consumers in an agile fast-paced environment that is p us hing the envelope of leading-edge Big Data implementations. Basic Qualifications: -2+ years of experience with developing ETL pipelines and developing data manipulation scripts 2+ years of experience in us ing SQL, working with modern relational databases, including MySQL or PostgreSQL -2+ years of experience with Big Data systems, including Hadoop, HDFS, Hive, or Cloudera -Experience with us ing Lucene based search engines, including elasticsearch or solr Active Secret clearance -BS degree in CS or Information Systems required Additional Qualifications: -Experience with Agile sof tware development -Experience with Big Data ETL tools like StreamSets and NiFi Experience with AWS cloud te chn ologies -Experience in working with enterprise and production systems -Ability to have a positive, can-do attitude to solve the challenges of tomorrow -Ability to learn te chn ical concepts and communicate with multiple functional groups -Possession of excellent oral and written communication skills -Hortonworks, Cloudera, or Big data Certifications Clearance: Applicants selected will be subject to a security investigation and may need to meet eligibility requirements for access to classified information; Secret clearance is required. We're an EOE that empowers our people-no matter their race, color, religion, sex, gender identity, sexual orientation, national origin, disability, veteran status, or other protected characteristic-to fearlessly drive change.",Business Services,3.7,2020-05-07,2020-06-06,FULL_TIME,May,Jun
3322,"Data Modeler, Senior with Security Clearance",Booz Allen Hamilton,VA,Springfield,90454,151998,"Job Number: R0082912 Data Modeler, Senior The Challenge: Are you excited at the prospect of unlocking the secrets held by a data set? Are you fascinated by the possibilities presented by the IoT, machine learning, and artificial intelligence advances? In an increasingly connected world, massive amounts of structured and unstructured data open up new opportunities. As a data scientist, you can turn these complex data sets into useful information to solve global challenges. Across private and public sectors - from fraud detection, to cancer research, to national intelligence - you know the answers are in the data. We have an opportunity for you to use your leadership and analytical skills to improve a department of defense client. You'll work closely with your customer to understand their questions and needs, and then dig into their data-rich environment to find the pieces of their information puzzle. You'll mentor teammates, develop algorithms, write scripts to develop workflows, build predictive analytics, use automation, apply machine learning, and use the right combination of tools and frameworks to turn that set of disparate data points into objective answers to help senior leadership make informed decisions. You'll provide your customer with a deep understanding of their data, what it all means, and how they can use it. Join us as we use data science for good in national security. Empower change with us. You Have: * Experience with using data science tools * Experience with data modeling, building workflows, and tasking * Experience with Python to perform data analysis, mining, and data visualization * Knowledge of JEMA * Ability to create mathematical and statistical models * TS/SCI clearance with a polygraph * BA or BS degree or 10 years of experience with analytics Nice If You Have: * Experience with machine learning * Knowledge of GEOINT TCPED * Knowledge of GEOINT tools * MA or MS degree Clearance: Applicants selected will be subject to a security investigation and may need to meet eligibility requirements for access to classified information; TS/SCI clearance with polygraph is required. Build Your Career: At Booz Allen, we know the power of analytics and we're dedicated to helping you grow as a data analysis professional. When you join Booz Allen, you'll have the chance to: * access online and onsite training in data analysis and presentation methodologies, and tools like Hortonworks, Docker, Tableau, and Splunk change the world with the Data Science Bowl-the world's premier data science for social good competition participate in partnerships with data science leaders, like our partnership with NVIDIA to deliver Deep Learning Institute (DLI) training to the federal government You'll have access to a wealth of training resources through our Analytics University, an online learning portal specifically geared towards data science and analytics skills, where you can access more than 5000 functional and technical courses, certifications, and books. Build your technical skills through hands-on training on the latest tools and state-of-the-art tech from our in-house experts. Pursuing certifications that directly impact your role? You may be able to take advantage of our tuition assistance, on-site bootcamps, certification training, academic programs, vendor relationships, and a network of professionals who can give you helpful tips. We'll help you develop the career you want as you chart your own course for success. We're an EOE that empowers our people-no matter their race, color, religion, sex, gender identity, sexual orientation, national origin, disability, veteran status, or other protected characteristic-to fearlessly drive change.",Business Services,3.7,2020-04-28,2020-06-06,FULL_TIME,Apr,Jun
