# JOB SEEKER'S DATA

### 1. Data Examination

In [1]:
# Load nessesary libraries.
import pandas as pd
import re
import os

In [2]:
# Create a new pandas DataFrame from raw file with job seekers data.
df_jobseeker = pd.read_csv('raw_data_jobseeker.csv', index_col=None)

df_jobseeker.head()

Unnamed: 0,participant,data_collection,date,location,preferred_position,education,skill,experience
0,user_1,Voice Call,12/17/2023 15:30,"Dublin, Ireland",Registered nurse,Bachelor's degree: Critical care nursing,patient care\nwound care\nmedical procedures\n...,Registered Nurse: 3 years
1,user_2,Voice Call,12/27/2023 11:50,"Dublin, Ireland",Electrician,•\tHigh school diploma\n•\tVocational electric...,•\tCircuit testing\n•\tBlueprint reading\n•\tF...,Residential Electrician's Helper: 1 year
2,user_3,Google Form,12/31/2023 13:39,"Dublin, Ireland",Data analyst,Degree:\n1. Master of Science in Data Analytic...,1. Python\n2. Data Mining and Extraction\n3. D...,Entry Level Data Analyst: 1 year\nData Coordin...


In [3]:
# Examine the short summary of the DataFrame.
df_jobseeker.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   participant         3 non-null      object
 1   data_collection     3 non-null      object
 2   date                3 non-null      object
 3   location            3 non-null      object
 4   preferred_position  3 non-null      object
 5   education           3 non-null      object
 6   skill               3 non-null      object
 7   experience          3 non-null      object
dtypes: object(8)
memory usage: 324.0+ bytes


### 2. Data Manipulation

The raw data for the job seeker’s information requires some modification and changes on some values, as shown by the output of the above cells.

In [4]:
# Change the 'date' column values to pandas datetime values.
df_jobseeker['date'] = pd.to_datetime(df_jobseeker['date'])

print(df_jobseeker['date'].dtype)

datetime64[ns]


In [5]:
# Check the values of the column 'education'.
for value in df_jobseeker['education']:
    print(value, '\n')

Bachelor's degree: Critical care nursing 

•	High school diploma
•	Vocational electrician certification
•	Construction safety certification  

Degree:
1. Master of Science in Data Analytics
2. Bachelor of Science in Business Administration
Certifications:
1. Microsoft Certified - Azure Data Scientist Associate
2. Google Data Analytics Certificate 



In [6]:
# Manipulate the 'education' column values.
for x in range(3):
    df_jobseeker.iat[x, 5] = df_jobseeker.iat[x, 5].replace('•', '')
    df_jobseeker.iat[x, 5] = re.sub(r'[^\S ]', ' ', df_jobseeker.iat[x, 5])
    df_jobseeker.iat[x, 5] = re.sub(r'\d+\.', '', df_jobseeker.iat[x, 5])
    df_jobseeker.iat[x, 5] = re.sub(r' {2,}', ', ', df_jobseeker.iat[x, 5])
    df_jobseeker.iat[x, 5] = df_jobseeker.iat[x, 5].replace(' Certifications:,', '; Certifications:')
    df_jobseeker.iat[x, 5] = df_jobseeker.iat[x, 5].replace('Degree:,', 'Degree:')
    df_jobseeker.iat[x, 5] = df_jobseeker.iat[x, 5].strip()
    df_jobseeker.iat[x, 5] = df_jobseeker.iat[x, 5].lower()

for value in df_jobseeker['education']:
    print(value, '\n')

bachelor's degree: critical care nursing 

high school diploma, vocational electrician certification, construction safety certification 

degree: master of science in data analytics, bachelor of science in business administration; certifications: microsoft certified - azure data scientist associate, google data analytics certificate 



In [7]:
# Check the values of the column 'skill'.
for value in df_jobseeker['skill']:
    print(value, '\n')

patient care
wound care
medical procedures
adult nursing
infection control
diagnostic 
time management
communication skills
attention to detail 

•	Circuit testing
•	Blueprint reading
•	Fault finding
•	Electrical wiring
•	Troubleshooting
•	Equipment inspection 
•	Installation
•	Organization 
•	Maintenance 
•	Diagnostic
•	Independent worker
•	Safety knowledge 

1. Python
2. Data Mining and Extraction
3. Data Analytics and Visualization
4. ETL Pipeline
5. Data Reporting
6. Database Management Systems
7. SQL and NoSQL
8. Machine Learning
9. A/B Testing
10. Data Governance 



In [8]:
# Manipulate the 'skill' column values.
for x in range(3):
    df_jobseeker.iat[x, 6] = df_jobseeker.iat[x, 6].replace('•', '')
    df_jobseeker.iat[x, 6] = df_jobseeker.iat[x, 6].strip()
    df_jobseeker.iat[x, 6] = re.sub(r'\d+\. ', '', df_jobseeker.iat[x, 6])
    df_jobseeker.iat[x, 6] = re.sub(r'[^\S ]', ', ', df_jobseeker.iat[x, 6])
    df_jobseeker.iat[x, 6] = df_jobseeker.iat[x, 6].replace(', ,', ',').replace(' ,', ',')
    df_jobseeker.iat[x, 6] = df_jobseeker.iat[x, 6].lower()

for value in df_jobseeker['skill']:
    print(value, '\n')

patient care, wound care, medical procedures, adult nursing, infection control, diagnostic, time management, communication skills, attention to detail 

circuit testing, blueprint reading, fault finding, electrical wiring, troubleshooting, equipment inspection, installation, organization, maintenance, diagnostic, independent worker, safety knowledge 

python, data mining and extraction, data analytics and visualization, etl pipeline, data reporting, database management systems, sql and nosql, machine learning, a/b testing, data governance 



In [9]:
# Check the values of the column 'experience'.
for value in df_jobseeker['experience']:
    print(value, '\n')

Registered Nurse: 3 years 

Residential Electrician's Helper: 1 year 

Entry Level Data Analyst: 1 year
Data Coordinator: 2 years 



In [10]:
# Change all the values in the columns ‘data_collection’, ‘location’, ‘preferred_position’ and ‘experience’ to lowercase.
for x in range(3):
    df_jobseeker.iat[x, 1] = df_jobseeker.iat[x, 1].lower()
    df_jobseeker.iat[x, 3] = df_jobseeker.iat[x, 3].lower()
    df_jobseeker.iat[x, 4] = df_jobseeker.iat[x, 4].lower()
    df_jobseeker.iat[x, 7] = df_jobseeker.iat[x, 7].lower()
    df_jobseeker.iat[x, 7] = re.sub(r'[^\S ]', '; ', df_jobseeker.iat[x, 7])
    
df_jobseeker.head()

Unnamed: 0,participant,data_collection,date,location,preferred_position,education,skill,experience
0,user_1,voice call,2023-12-17 15:30:00,"dublin, ireland",registered nurse,bachelor's degree: critical care nursing,"patient care, wound care, medical procedures, ...",registered nurse: 3 years
1,user_2,voice call,2023-12-27 11:50:00,"dublin, ireland",electrician,"high school diploma, vocational electrician ce...","circuit testing, blueprint reading, fault find...",residential electrician's helper: 1 year
2,user_3,google form,2023-12-31 13:39:00,"dublin, ireland",data analyst,"degree: master of science in data analytics, b...","python, data mining and extraction, data analy...",entry level data analyst: 1 year; data coordin...


In [11]:
# Save job seeker's data DataFrame as csv file.
directory = r'C:\Users\temulenbd\OneDrive\Desktop\learn\github_repo\cct\capstone_project'
filename = 'data_jobseeker.csv'
file_path = os.path.join(directory, filename)
df_jobseeker.to_csv(file_path, index=False)

print(f"Raw data was manipulated and exported successfully as {file_path}")

Raw data was manipulated and exported successfully as C:\Users\temulenbd\OneDrive\Desktop\learn\github repo\cct\capstone_project\data_jobseeker.csv
