# 🤖 AI Impact on Jobs 2010–2025
### Data Analysis & Visualization
**Author:** Abdul Wasay | [GitHub: theabdulwasay](https://github.com/theabdulwasay) | [LinkedIn: Abdul Wasay](https://www.linkedin.com/in/theabdulwasay) | abdulwasaymalik757@gmail.com

---
**Dataset:** 5,000 global job postings (2010–2025) across 22 features  
**Goal:** Understand how AI is reshaping job markets — salaries, displacement risk, skills demand, and adoption trends.


## 1. Setup & Imports

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns
import numpy as np
from collections import Counter
import warnings
warnings.filterwarnings('ignore')

# ── Dark theme ──────────────────────────────────────────────────
BG, SUR, BDR = '#050a0f', '#0d1620', '#1e3248'
ACCENT, ACCENT2, ACCENT3 = '#00d4ff', '#7c3aed', '#10b981'
WARN, RED, TEXT, MUTED = '#f59e0b', '#ef4444', '#e2eaf4', '#6b8aaa'
PALETTE6 = [ACCENT, ACCENT2, ACCENT3, WARN, RED, '#ec4899']

plt.rcParams.update({
    'figure.facecolor': BG, 'axes.facecolor': SUR,
    'axes.edgecolor': BDR, 'axes.labelcolor': TEXT,
    'xtick.color': MUTED, 'ytick.color': MUTED,
    'text.color': TEXT, 'grid.color': BDR,
    'grid.linestyle': '--', 'grid.alpha': 0.5,
    'font.family': 'monospace',
    'axes.spines.top': False, 'axes.spines.right': False,
})
print("✅ Setup complete")

## 2. Load & Explore Dataset

In [None]:
df = pd.read_csv('ai_impact_jobs_2010_2025.csv')
print(f"Shape: {df.shape}")
df.head()

In [None]:
print("📊 Basic Stats")
print(f"Years covered : {df['posting_year'].min()} – {df['posting_year'].max()}")
print(f"Industries    : {df['industry'].nunique()}")
print(f"Countries     : {df['country'].nunique()}")
print(f"AI-mentioned  : {df['ai_mentioned'].sum()} ({df['ai_mentioned'].mean()*100:.1f}%)")
print(f"Avg Salary    : ${df['salary_usd'].mean():,.0f}")
print(f"Avg Auto Risk : {df['automation_risk_score'].mean():.2f}")
df.describe()

## 3. Visualization 1 — AI Adoption Trend (2010–2025)
**Insight:** What % of job postings mentioned AI each year?

In [None]:
yearly = df.groupby('posting_year')['ai_mentioned'].apply(
    lambda x: (x == True).sum() / len(x) * 100).reset_index()
yearly.columns = ['year', 'pct']

fig, ax = plt.subplots(figsize=(12, 6))
ax.fill_between(yearly['year'], yearly['pct'], alpha=0.15, color=ACCENT)
ax.plot(yearly['year'], yearly['pct'], color=ACCENT, lw=2.5,
        marker='o', markersize=7, markerfacecolor=BG, markeredgewidth=2)
for _, r in yearly.iterrows():
    ax.annotate(f"{r['pct']:.0f}%", (r['year'], r['pct']+1.2),
                ha='center', fontsize=8, color=TEXT)
ax.set_xlabel('Year', labelpad=10)
ax.set_ylabel('% of Job Postings Mentioning AI', labelpad=10)
ax.set_title('AI Adoption in Job Postings  2010 – 2025',
             fontsize=15, fontweight='bold', color=TEXT, pad=18)
ax.set_xticks(yearly['year'])
ax.tick_params(axis='x', rotation=45)
ax.grid(True, axis='y')
plt.tight_layout()
plt.show()

## 4. Visualization 2 — Automation Risk by Industry
**Insight:** Which industries face highest automation threat?

In [None]:
ind_risk = df.groupby('industry')['automation_risk_score'].mean().sort_values(ascending=True)
colors = [RED if v > 0.6 else WARN if v > 0.45 else ACCENT3 for v in ind_risk.values]

fig, ax = plt.subplots(figsize=(11, 7))
bars = ax.barh(ind_risk.index, ind_risk.values, color=colors, height=0.6, edgecolor='none')
for bar, val in zip(bars, ind_risk.values):
    ax.text(val + 0.005, bar.get_y() + bar.get_height()/2,
            f'{val:.2f}', va='center', fontsize=8.5, color=TEXT)
ax.set_xlabel('Average Automation Risk Score (0–1)', labelpad=10)
ax.set_title('Automation Risk Score by Industry',
             fontsize=15, fontweight='bold', color=TEXT, pad=18)
patches = [mpatches.Patch(color=RED, label='High Risk (>0.6)'),
           mpatches.Patch(color=WARN, label='Medium (0.45–0.6)'),
           mpatches.Patch(color=ACCENT3, label='Low Risk (<0.45)')]
ax.legend(handles=patches, loc='lower right', framealpha=0.2,
          facecolor=SUR, edgecolor=BDR, labelcolor=TEXT)
ax.set_xlim(0, 0.85)
ax.grid(True, axis='x')
plt.tight_layout()
plt.show()

## 5. Visualization 3 — Salary Gap: AI vs Non-AI Jobs
**Insight:** Do AI-related jobs pay more?

In [None]:
df['ai_label'] = df['ai_mentioned'].map({True: 'AI-Mentioned', False: 'Non-AI'})
sal = df.groupby(['posting_year', 'ai_label'])['salary_usd'].mean().reset_index()
pivot = sal.pivot(index='posting_year', columns='ai_label', values='salary_usd')

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(pivot.index, pivot['AI-Mentioned'], color=ACCENT, lw=2.5,
        marker='o', markersize=7, markerfacecolor=BG, markeredgewidth=2,
        label='AI-Mentioned Jobs')
ax.plot(pivot.index, pivot['Non-AI'], color=MUTED, lw=2,
        marker='s', markersize=6, markerfacecolor=BG, markeredgewidth=2,
        linestyle='--', label='Non-AI Jobs')
ax.fill_between(pivot.index, pivot['AI-Mentioned'], pivot['Non-AI'],
                alpha=0.08, color=ACCENT)
ax.set_xlabel('Year', labelpad=10)
ax.set_ylabel('Average Salary (USD)', labelpad=10)
ax.set_title('Salary Gap: AI vs Non-AI Job Postings (2010–2025)',
             fontsize=15, fontweight='bold', color=TEXT, pad=18)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'${x/1000:.0f}K'))
ax.legend(framealpha=0.2, facecolor=SUR, edgecolor=BDR, labelcolor=TEXT)
ax.set_xticks(pivot.index)
ax.tick_params(axis='x', rotation=45)
ax.grid(True, axis='y')
plt.tight_layout()
plt.show()

## 6. Visualization 4 — Top AI Keywords
**Insight:** What AI technologies are most in demand?

In [None]:
ai_kw = df[df['ai_mentioned'] == True]['ai_keywords'].dropna()
all_kw = Counter()
for row in ai_kw:
    for k in [x.strip() for x in str(row).split(',')]:
        if k: all_kw[k] += 1
top_kw = pd.Series(dict(all_kw.most_common(12)))

fig, ax = plt.subplots(figsize=(11, 6))
bars = ax.bar(top_kw.index, top_kw.values,
              color=[PALETTE6[i % 6] for i in range(len(top_kw))],
              width=0.65, edgecolor='none')
for bar, val in zip(bars, top_kw.values):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 15,
            str(val), ha='center', fontsize=9, color=TEXT)
ax.set_xlabel('AI Keyword', labelpad=10)
ax.set_ylabel('Frequency', labelpad=10)
ax.set_title('Top AI Keywords in Job Postings',
             fontsize=15, fontweight='bold', color=TEXT, pad=18)
ax.tick_params(axis='x', rotation=40)
ax.grid(True, axis='y')
plt.tight_layout()
plt.show()

## 7. Visualization 5 — Job Displacement Risk by Adoption Stage
**Insight:** At what adoption stage does displacement risk peak?

In [None]:
risk_counts = df.groupby(['industry_ai_adoption_stage', 'ai_job_displacement_risk']).size().unstack(fill_value=0)
risk_counts = risk_counts.reindex(columns=['Low', 'Medium', 'High'], fill_value=0)

fig, ax = plt.subplots(figsize=(10, 6))
bottom = np.zeros(len(risk_counts))
for col, color in zip(risk_counts.columns, [ACCENT3, WARN, RED]):
    ax.bar(risk_counts.index, risk_counts[col], bottom=bottom,
           color=color, label=col, width=0.55, edgecolor='none')
    bottom += risk_counts[col].values
ax.set_xlabel('AI Adoption Stage', labelpad=10)
ax.set_ylabel('Number of Job Postings', labelpad=10)
ax.set_title('Job Displacement Risk by AI Adoption Stage',
             fontsize=15, fontweight='bold', color=TEXT, pad=18)
ax.legend(title='Displacement Risk', framealpha=0.2,
          facecolor=SUR, edgecolor=BDR, labelcolor=TEXT)
ax.grid(True, axis='y')
plt.tight_layout()
plt.show()

## 8. Visualization 6 — Correlation Heatmap
**Insight:** How do key numeric features relate to each other?

In [None]:
num_cols = ['ai_intensity_score', 'salary_usd',
            'automation_risk_score', 'salary_change_vs_prev_year_percent']
corr = df[num_cols].corr()
labels = ['AI Intensity', 'Salary (USD)', 'Automation Risk', 'Salary Change %']

fig, ax = plt.subplots(figsize=(8, 6))
cmap = sns.diverging_palette(220, 10, as_cmap=True)
sns.heatmap(corr, annot=True, fmt='.2f', cmap=cmap, ax=ax,
            linewidths=0.5, linecolor=BDR,
            xticklabels=labels, yticklabels=labels,
            annot_kws={'size': 11, 'color': TEXT},
            cbar_kws={'shrink': 0.8})
ax.set_title('Correlation Heatmap – Key Numeric Features',
             fontsize=14, fontweight='bold', color=TEXT, pad=18)
ax.tick_params(axis='x', rotation=30)
ax.tick_params(axis='y', rotation=0)
plt.tight_layout()
plt.show()

## 9. Key Findings & Conclusions

| # | Finding |
|---|---------|
| 1 | **AI adoption in job postings rose sharply** after 2018, reflecting the deep learning boom |
| 2 | **Manufacturing & Retail** face highest automation risk; **Healthcare & Gov** are lower |
| 3 | **AI-mentioned jobs pay a measurable premium** over non-AI counterparts |
| 4 | **Deep Learning & NLP** dominate AI skill requirements across all years |
| 5 | **"Growing" stage industries** show the most balanced risk; "Mature" carry lower displacement risk |
| 6 | **AI Intensity Score** weakly negatively correlates with automation risk — specialized AI roles are safer |

---
**Author:** Abdul Wasay  
📧 abdulwasaymalik757@gmail.com | 🐙 [github.com/theabdulwasay](https://github.com/theabdulwasay) | 💼 [linkedin.com/in/theabdulwasay](https://www.linkedin.com/in/theabdulwasay)
