# **Data Journalism** - Navigating Future Opportunities: An Insightful Exploration of Canadian Employment Wages and Trends

In an era where industry landscapes are rapidly evolving, understanding the dynamics of occupational trends and wages becomes pivotal for aspiring professionals and students aiming to position themselves advantageously in the job market. "Navigating Future Opportunities" delves deep into the fabric of Canada's employment sectors, employing a comprehensive exploratory data analysis (EDA) of recent wage statistics to unravel the nuances of various occupations and industries.

> 1) **Exploratory Data Analysis** <br>
> 2) **Generative AI**<br>

This project is structured into two main parts: an in-depth EDA segment, which leverages data visualization to highlight key trends, disparities, and insights within Canadian employment wages across different sectors, and a generative AI segment, aimed at interpreting these findings through the lens of data journalism. The primary objective of our analysis is to offer a granular look at which industries and occupations hold the promise of prosperity, growth, and stability. By identifying sectors that are leading in wage trends and those that lag, we aim to provide a roadmap for individuals to make informed decisions about their careers and educational paths, ultimately getting ahead in the game.

## **Pre-requisite Actions**

In [1]:
# Import necessary packages
!pip install openai --quiet
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from openai import OpenAI
import os

## **Download the Dataset:**

 >* **Folder Access**: [Click to download the Employee Wages Data](https://drive.google.com/drive/folders/19z4KBZxFu6g4Hgmw9jkYU-WBBHzxFrnh?usp=sharing) <br>
 Sourced from: [Statistics Canada](https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1410041701&pickMembers%5B0%5D=1.1&pickMembers%5B1%5D=2.1&pickMembers%5B2%5D=3.1&pickMembers%5B3%5D=5.1&pickMembers%5B4%5D=6.1&cubeTimeFrame.startYear=2018&cubeTimeFrame.endYear=2023&referencePeriods=20180101%2C20230101)

In [None]:
df = read_csv()

## **Get to know the Data**




In [None]:
# Check data types, row/column headers
# Check the csv delimeter types (\t or ; or ,) to specify the separation between data
# Pick the correct csv file for each task where relevant

# **Exploratory Data Analysis**


### **The Recent Evolution of Wages: A Historical Perspective**

In [None]:
# Task: Create a line plot that shows the overall hourly wage trend over time

# Pick the correct dataset from the folder
# Example:
from google.colab import files
uploaded = files.upload()
df = pd.read_csv('Average_Hourly_Wages_Overall_Canadian.csv', sep='\t')
df.head()

# Compare the overall hourly wage trend (total employees) against health and engineering professionals
# Filter the DataFrame for the selected occupations
selected_occupations = df[df['National Occupational Classification (NOC)'].isin(['Professional occupations in engineering [213]', 'Professional occupations in health [31]', 'Total employees, all occupations [00-95]'])]

# Visualize your findings and make inferences from the outcomes observed
 # (For the text analysis in the Generational AI section)

### **Job and Financial Security: Choosing Between Casual and Permanent Employment**

In [None]:
# Compare the average weekly wage between full-time and part-time employees. Use a bar chart to for visualization
# Hint: You may need to filter the dataframe based on the 'Type of work' column and then use groupby and mean to calculate the average wages.
# Calculation method: Find the average between all the available years in the dataset for both type of work

# Visualize your findings and identify if it is better to be a casual or permanent employee
 # (For the text analysis in the Generational AI section)

###**Wage Disparity: A Comparison by Occupation and Gender"**

In [None]:
# Identify the top 5 occupations with the highest average wage
# Plot a heatmap of those top 5 occupations average wages by occupation and age group to visualize the distribution across these dimensions.
# Hint: Group the data by 'National Occupational Classification (NOC)' and calculate the mean wage. Then sort the results and use head() to get the top 5 occupations.

# Compare the bottom 3 and top 3 occupations for the average hourly wage between sexes. Provide 2 boxplots, separating the top and bottom occupations
# Hint: The boxplot should indicate the distributions over time, and make inferences about the

# Visualize your findings and make inferences from the outcomes observed
 # (For the text analysis in the Generational AI section)

# **Generative AI**

To load the **OPENAI API KEY**: <br>
1.   Please reach out to admins on Discord to receive an API KEY<br>
2.   Input the Key into the a .txt file
3.   Load the .txt file


In [None]:
from google.colab import files
uploaded = files.upload()
api_key_path = 'openai.txt'

In [None]:
# Read the API key
with open('openai.txt', 'r') as file:
    api_key = file.readline().strip()

# Set the API key in the environment (optional if you pass the key directly to the client)
os.environ['OPENAI_API_KEY'] = api_key

# Initialize the OpenAI client
client = OpenAI(api_key=api_key)

# Specify the model
model = "gpt-3.5-turbo"

# The text for analysis, Include both data and its description for context
text = """
Analyze and discuss the data on Canadian wages with a focus on three key aspects.

1. First, provide insight into overall wage trends over time, considering economic factors and policy impacts that have influenced these trends.
(Insert inferences and observations from EDA)

2. Second, detail the gender wage difference, highlighting the ongoing issue of pay equity between men and women across different sectors.
(Insert inferences and observations from EDA)

3. Lastly, explore the employment types to uncover disparities in earnings, especially focusing on how financial stability affects the hourly rates
(Insert inferences and observations from EDA)

Each of these points should form a separate paragraph, together building a coherent narrative for a data journalism piece.
"""

# Preparing messages for the model
messages = [
    {"role": "system", "content": "You are a data journalism assistant"},
    {"role": "user", "content": f"Write a small paragraph here to analyze this data:\n{text}. The goal is to interpret it in a way that's interesting for a Data journalism piece."}
]

# Sending the request to the model
response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=0
)

# Extracting and printing the response
response_message = response.choices[0].message.content
print(response_message)

In analyzing the data on Canadian wages, it is evident that there have been fluctuations in overall wage trends over time. Economic factors such as inflation, unemployment rates, and government policies have played a significant role in shaping these trends. For instance, during periods of economic growth, wages tend to increase as demand for labor rises, while during economic downturns, wages may stagnate or even decrease. Policy impacts, such as minimum wage adjustments and labor market regulations, also influence wage levels. Understanding these factors is crucial in predicting future wage trends and ensuring fair compensation for workers.

When examining the gender wage difference in Canada, it is clear that pay equity remains a pressing issue. Despite efforts to promote gender equality in the workforce, women continue to earn less than men across various sectors. Factors such as occupational segregation, discrimination, and lack of representation in higher-paying roles contribute 

## **Important**

In [None]:
# Structure the generative ai response along with the visuals developed in the EDA section. Example article: https://www.yourmove.ai/post/data