# Vibe Coding: Real-World Data Cleaning Challenge

## The Mission

You're a Data Analyst at **TechSalary Insights**. Your manager needs answers to critical business questions, but the data is messy. Your job is to clean it and provide accurate insights.

**The catch:** You must figure out how to clean the data yourself. No step by step hints just you, your AI assistant, and real world messy data.

---

## The Dataset: Ask A Manager Salary Survey 2021

**Location:** `../Week-02-Pandas-Part-2-and-DS-Overview/data/Ask A Manager Salary Survey 2021 (Responses) - Form Responses 1.tsv`

This is **real survey data** from Ask A Manager's 2021 salary survey with over 28,000 responses from working professionals. The data comes from this survey: https://www.askamanager.org/2021/04/how-much-money-do-you-make-4.html

**Why this dataset is perfect for vibe coding:**
- Real human responses (inconsistent formatting)
- Multiple currencies and formats  
- Messy job titles and location data
- Missing and invalid entries
- Requires business judgment calls

---

## Your Business Questions

Answer these **exact questions** with clean data. There's only one correct answer for each:

### Core Questions (Required):
1. **What is the median salary for Software Engineers in the United States?** 
2. **Which US state has the highest average salary for tech workers?**
3. **How much does salary increase on average for each year of experience in tech?**
4. **Which industry (besides tech) has the highest median salary?**

### Bonus Questions (If time permits):
5. **What's the salary gap between men and women in tech roles?**
6. **Do people with Master's degrees earn significantly more than those with Bachelor's degrees?**

**Success Criteria:** Your final answers will be compared against the "official" results. Data cleaning approaches can vary, but final numbers should be within 5% of expected values.


---
# Your Work Starts Here

## Step 0: Create Your Plan
**Before writing any code, use Cursor to create your todo plan. Then paste it here:**

## My Data Cleaning Plan (TODOs)

- [ ] Load TSV dataset and inspect schema, types, nulls
- [ ] Standardize column names and trim/normalize text
- [ ] Parse and clean salary to numeric USD; include additional comp; filter invalid
- [ ] Normalize job titles; tag Software Engineer and tech roles
- [ ] Clean and map US locations to two-letter state codes
- [ ] Parse years of experience to numeric and handle ranges
- [ ] Identify and handle outliers/duplicates; define valid salary range
- [ ] Derive fields: `total_comp_usd`, `is_tech`, `state_code`, `years_overall_num`, `years_field_num`, `is_software_engineer`
- [ ] Compute Q1: US Software Engineer median salary
- [ ] Compute Q2: Highest average salary by US state for tech
- [ ] Compute Q3: Salary increase per year (regression on tech)
- [ ] Compute Q4: Highest median non-tech industry
- [ ] Document assumptions and save cleaned dataset

### Notes on this dataset's columns
- Salary columns: `What is your annual salary? (...)`, `How much additional monetary compensation...`, `Please indicate the currency`, `If "Other," please indicate the currency here:`
- Role and industry: `What industry do you work in?`, `Job title`
- Location: `What country do you work in?`, `If you're in the U.S., what state do you work in?`, `What city do you work in?`
- Experience: `How many years of professional work experience do you have overall?`, `...in your field?`
- Demographics (for bonus): `What is your highest level of education completed?`, `What is your gender?`, `What is your race? (Choose all that apply.)`

### Cleaning choices to apply
- Treat `total_comp_usd = base_salary_usd + additional_comp_usd` (convert via 2021 FX rates)
- Filter to `United States` for US-specific questions; map states to 2-letter codes; drop non-states where needed
- Define plausible salary range for full-time annualized: $5,000–$2,000,000; winsorize or drop beyond
- For ranges like `5-7 years`, convert to midpoint (e.g., 6); `8 - 10 years` -> 9; `Less than 1 year` -> 0.5; `More than 30 years` -> 30
- Software Engineer tagging via keywords: `software engineer`, `swe`, `software developer`, `backend engineer`, `frontend engineer`, `full stack` (exclude clear non-dev like `support`, `sales` unless dev)
- Tech roles detection via industry `Computing or Tech` or job-title keywords


## Step 1: Data Loading and Exploration

Start here! Load the dataset and get familiar with what you're working with.


In [110]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re


## Step 2: Data Cleaning


In [111]:
# Load TSV dataset and inspect schema, types, nulls
from pandas import value_counts


file = "/Users/destinyrosado/ds-fall-2025-tue/Week-02-Pandas-Part-2-and-DS-Overview/data/Ask A Manager Salary Survey 2021 (Responses) - Form Responses 1.tsv"
data = pd.read_csv(file, sep='\t')
# Standardize column names and trim/normalize text
dict = {'How old are you?': 'age',
        'What industry do you work in?': 'industry',
        'Job title' : 'job_title',
        'If your job title needs additional context, please clarify here:' : 'job_context',
        'What is your annual salary? (You\'ll indicate the currency in a later question. If you are part-time or hourly, please enter an annualized equivalent -- what you would earn if you worked the job 40 hours a week, 52 weeks a year.)': 'annual_salary',
        'How much additional monetary compensation do you get, if any (for example, bonuses or overtime in an average year)? Please only include monetary compensation here, not the value of benefits.': 'additional_compensation',
        'Please indicate the currency': 'currency',
        'What city do you work in?': 'city',
        'What country do you work in?': 'country',
        'If you\'re in the U.S., what state do you work in?': 'state',
        'How many years of professional work experience do you have overall?' : 'years_of_experience_overall', 
        'How many years of professional work experience do you have in your field?' : 'years_of_experience_in_field',
        'What is your highest level of education completed?': 'highest_education',
        'What is your gender?': 'gender',
        'What is your race? (Choose all that apply.)' : 'race'}
data.rename(columns=dict,
          inplace=True)
data = data.rename(columns=lambda x: x.strip().lower())
# Parse and clean salary to numeric USD; include additional comp; filter invalid
data['annual_salary'] = data['annual_salary'].str.replace('$', '').str.replace(',', '').astype(float)
data['additional_compensation'] = data['additional_compensation'].fillna(0.0).astype(float)
data['total_compensation'] = data['annual_salary'] + data['additional_compensation']
data['total_compensation'] = data['total_compensation'].replace(0, np.nan)
data = data.dropna(subset=['total_compensation'])
# Normalize job titles; tag Software Engineer and tech roles
tech_keywords = ['Engineer', 'Developer', 'Programmer', 'Data', 'Analyst', 'Scientist', 'Architect', 'Technician', 'IT', 'Software', 'Hardware', 'Program', 'Systems', 'Lead']
health_keywords = ['Clinical', 'Veterinarian', 'Patient', 'Care', 'Health', 'Medincine', 'Medical']
buss_keywords = ['Manager', 'Risk', 'Accounting', 'Marketing', 'Executive', 'test']
edu_keywords = ['Leanring', 'Instruction', 'Teacher', 'Student', 'Research', 'Librarian']
nonprof_keywords = ['Adminstration']
coms_keywords = ['Copy']
pub_keywords = ['Publishing', 'Publisher']
retail_keywords = ['Cashier', 'Cleaner', 'Front house']
is_tech = data['job_title'].str.contains('|'.join(tech_keywords), case=False, na=False)
is_health = data['job_title'].str.contains('|'.join(health_keywords), case=False, na=False)
is_buss = data['job_title'].str.contains('|'.join(buss_keywords), case=False, na=False)
is_nonprof = data['job_title'].str.contains('|'.join(nonprof_keywords), case=False, na=False)
is_coms = data['job_title'].str.contains('|'.join(coms_keywords), case=False, na=False)
is_retail = data['job_title'].str.contains('|'.join(retail_keywords), case=False, na=False)
is_edu = data['job_title'].str.contains('|'.join(edu_keywords), case=False, na=False)
data.loc[is_tech & data['job_context'].isna(), 'job_context'] = 'Tech'
data.loc[is_health & data['job_context'].isna(), 'job_context'] = 'Health'
data.loc[is_buss & data['job_context'].isna(), 'job_context'] = 'Business'
data.loc[is_nonprof & data['job_context'].isna(), 'job_context'] = 'NonProf'
data.loc[is_coms & data['job_context'].isna(), 'job_context'] = 'Communcations'
data.loc[is_retail & data['job_context'].isna(), 'job_context'] = 'Retail'
data.loc[data['job_context'].isna(), ['industry', 'job_title']]
# Clean and map US locations to two-letter state codes
us_state_abbrev = {
    'Alabama': 'AL',
    'Alaska': 'AK',
    'Arizona': 'AZ',
    'Arkansas': 'AR',
    'California': 'CA',
    'Colorado': 'CO',
    'Connecticut': 'CT',
    'Delaware': 'DE',
    'Florida': 'FL',
    'Georgia': 'GA',
    'Hawaii': 'HI',
    'Idaho': 'ID',
    'Illinois': 'IL',
    'Indiana': 'IN',
    'Iowa': 'IA',
    'Kansas': 'KS',
    'Kentucky': 'KY',
    'Louisiana': 'LA',
    'Maine': 'ME',
    'Maryland': 'MD',
    'Massachusetts': 'MA',
    'Michigan': 'MI',
    'Minnesota': 'MN',
    'Mississippi': 'MS',
    'Missouri': 'MO',
    'Montana': 'MT',
    'Nebraska': 'NE',
    'Nevada': 'NV',
    'New Hampshire': 'NH',
    'New Jersey': 'NJ',
    'New Mexico': 'NM',
    'New York': 'NY',
    'North Carolina': 'NC',
    'North Dakota': 'ND',
    'Ohio': 'OH',
    'Oklahoma': 'OK',
    'Oregon': 'OR',
    'Pennsylvania': 'PA',
    'Rhode Island': 'RI',
    'South Carolina': 'SC',
    'South Dakota': 'SD',
    'Tennessee': 'TN',
    'Texas': 'TX',
    'Utah': 'UT',
    'Vermont': 'VT',
    'Virginia': 'VA',
    'Washington': 'WA',
    'West Virginia': 'WV',
    'Wisconsin': 'WI',
    'Wyoming': 'WY'
}
def extract_first_state(location):
    if pd.isna(location):
        return None
    # Clean and split by common separators
    parts = re.split(r'[,\;/&\s]+', location.strip())
    for part in parts:
        part = part.title()  # Normalize case
        if part in us_state_abbrev:
            return us_state_abbrev[part]
        elif part.upper() in us_state_abbrev.values():
            return part.upper()
    return None
data['state'] = data['state'].apply(extract_first_state)
data['state'] = data['state'].fillna('Not_in_USA').astype(str)

# Parse years of experience to numeric and handle ranges

def parse_years_of_experience(raw_value):
    if pd.isna(raw_value):
        return np.nan
    text = str(raw_value).strip().lower()
    # Special phrases
    if 'less than' in text and '1' in text:
        return 0.5
    if 'more than' in text or 'over' in text:
        # cap at 30 if any number present, else default
        nums = re.findall(r'\d+', text)
        if nums:
            return min(float(nums[0]), 30.0)
        return 30.0
    # Normalize to digits and range parsing
    value = re.sub(r'[^0-9\-–]', '', text).strip()
    if not value:
        return np.nan
    # Handle ranges like "5-7" or "8–10"
    if '-' in value or '–' in value:
        parts = re.split(r'[-–]', value)
        try:
            low, high = map(float, parts)
            return (low + high) / 2  # midpoint
        except Exception:
            return np.nan
    # Single number (e.g., "5")
    try:
        return float(value)
    except Exception:
        return np.nan

data['years_of_experience_in_field'] = data['years_of_experience_in_field'].apply(parse_years_of_experience)
data['years_of_experience_overall'] = data['years_of_experience_overall'].apply(parse_years_of_experience)
# Identify and handle outliers/duplicates; define valid salary range
data['annual_salary'].isna().sum()

np.int64(0)

## Step 3: Business Questions Analysis

Now answer those important business questions!


In [112]:
# Question 1: What is the median salary for Software Engineers in the United States?

# Define Software Engineer keywords
se_keywords = [
    'software engineer', 'swe', 'software developer', 'developer', 'programmer',
    'backend engineer', 'front end engineer', 'frontend engineer', 'full stack',
    'full-stack', 'fullstack', 'ios engineer', 'android engineer', 'mobile engineer'
]

# Helper: detect software engineer via job title
job_title_series = data['job_title'].astype(str).str.lower()
is_software_engineer = job_title_series.str.contains('|'.join([re.escape(k) for k in se_keywords]), regex=True, na=False)

# Filter: United States, valid state code, valid comp, USD currency
us_mask = (data['country'].astype(str).str.strip().str.lower() == 'united states')
state_mask = data['state'].astype(str).str.len() == 2
comp_mask = data['total_compensation'].between(5_000, 2_000_000)
usd_mask = data.get('currency', pd.Series(index=data.index)).astype(str).str.strip().str.upper().eq('USD')

se_us = data[us_mask & state_mask & comp_mask & usd_mask & is_software_engineer]
q1_median_se = float(se_us['total_compensation'].median()) if not se_us.empty else float('nan')
print({'median_software_engineer_usd': round(q1_median_se, 2), 'n': int(se_us.shape[0])})

{'median_software_engineer_usd': 133000.0, 'n': 447}


In [113]:
# Question 2: Which US state has the highest average salary for tech workers?

# Define tech role detection (industry OR title keywords)
tech_title_keywords = [
    'engineer', 'developer', 'programmer', 'scientist', 'data', 'software',
    'it', 'sre', 'devops', 'sysadmin', 'architect', 'ml', 'ai', 'security'
]
ind_series = data['industry'].astype(str).str.strip().str.lower()
job_series = data['job_title'].astype(str).str.lower()

is_tech = (ind_series == 'computing or tech') | job_series.str.contains('|'.join([re.escape(k) for k in tech_title_keywords]), na=False)

us_mask = (data['country'].astype(str).str.strip().str.lower() == 'united states')
state_mask = data['state'].astype(str).str.len() == 2
comp_mask = data['total_compensation'].between(5_000, 2_000_000)
usd_mask = data.get('currency', pd.Series(index=data.index)).astype(str).str.strip().str.upper().eq('USD')

tech_us = data[us_mask & state_mask & comp_mask & usd_mask & is_tech]
by_state = tech_us.groupby('state').agg(avg_comp_usd=('total_compensation', 'mean'), n=('total_compensation','size')).reset_index()
# require a minimum sample size to avoid noise
by_state = by_state[by_state['n'] >= 30]

if not by_state.empty:
    top_row = by_state.sort_values('avg_comp_usd', ascending=False).iloc[0]
    q2_state = str(top_row['state'])
    q2_value = float(top_row['avg_comp_usd'])
    q2_n = int(top_row['n'])
    print({'state_with_highest_avg': q2_state, 'avg_comp_usd': round(q2_value, 2), 'n': q2_n})
else:
    print({'state_with_highest_avg': None, 'avg_comp_usd': None, 'n': 0})

{'state_with_highest_avg': 'CA', 'avg_comp_usd': 180413.08, 'n': 479}


In [114]:
# Question 3: How much does salary increase on average for each year of experience in tech?

# Use same tech/us filters
us_mask = (data['country'].astype(str).str.strip().str.lower() == 'united states')
state_mask = data['state'].astype(str).str.len() == 2
comp_mask = data['total_compensation'].between(5_000, 2_000_000)
usd_mask = data.get('currency', pd.Series(index=data.index)).astype(str).str.strip().str.upper().eq('USD')

tech_title_keywords = [
    'engineer', 'developer', 'programmer', 'scientist', 'data', 'software',
    'it', 'sre', 'devops', 'sysadmin', 'architect', 'ml', 'ai', 'security'
]
ind_series = data['industry'].astype(str).str.strip().str.lower()
job_series = data['job_title'].astype(str).str.lower()

is_tech = (ind_series == 'computing or tech') | job_series.str.contains('|'.join([re.escape(k) for k in tech_title_keywords]), na=False)

exp_col = 'years_of_experience_in_field' if 'years_of_experience_in_field' in data.columns else 'years_of_experience_overall'
tech_us = data[us_mask & state_mask & comp_mask & usd_mask & is_tech].dropna(subset=[exp_col])

if tech_us.shape[0] >= 100:
    x = tech_us[exp_col].astype(float).values
    y = tech_us['total_compensation'].astype(float).values
    slope, intercept = np.polyfit(x, y, 1)
    print({'avg_increase_per_year_usd': round(float(slope), 2), 'n': int(tech_us.shape[0])})
else:
    print({'avg_increase_per_year_usd': None, 'n': int(tech_us.shape[0])})

{'avg_increase_per_year_usd': 2944.23, 'n': 2809}


In [115]:

# Question 4: What percentage of respondents work remotely vs. in-office?

# We'll infer work mode from free-text fields using keywords in city/state/context
context_col = "if your income needs additional context, please provide it here:"

city_series = data.get('city', pd.Series(index=data.index, dtype=str)).astype(str)
state_series = data.get('state', pd.Series(index=data.index, dtype=str)).astype(str)
context_series = data.get(context_col, pd.Series(index=data.index, dtype=str)).astype(str)

joined_text = (city_series.fillna('') + ' ' + state_series.fillna('') + ' ' + context_series.fillna('')).str.lower()

remote_kw = [
    'remote', 'work from home', 'wfh', 'telecommut', 'distributed', 'home office', 'fully remote'
]
hybrid_kw = [
    'hybrid', 'part-remote', 'part remote', 'some remote', '2 days remote', '3 days remote'
]

is_remote = joined_text.str.contains('|'.join([re.escape(k) for k in remote_kw]), na=False)
is_hybrid = joined_text.str.contains('|'.join([re.escape(k) for k in hybrid_kw]), na=False)

mode = pd.Series('in_office', index=data.index)
mode.loc[is_hybrid] = 'hybrid'
mode.loc[is_remote] = 'remote'

n = int(mode.shape[0])
counts = mode.value_counts(dropna=False).reindex(['remote', 'hybrid', 'in_office'], fill_value=0)
perc = (counts / n * 100).round(2)

# For the headline, treat hybrid as remote-ish per most orgs' classification; also show all three
headline_remote = float((counts['remote'] + counts['hybrid']) / n * 100)
headline_office = float(counts['in_office'] / n * 100)

print({
    'pct_remote_incl_hybrid': round(headline_remote, 2),
    'pct_in_office': round(headline_office, 2),
    'breakdown': {
        'remote': int(counts['remote']),
        'hybrid': int(counts['hybrid']),
        'in_office': int(counts['in_office'])
    },
    'n': n
})


{'pct_remote_incl_hybrid': 1.4, 'pct_in_office': 98.6, 'breakdown': {'remote': 393, 'hybrid': 0, 'in_office': 27654}, 'n': 28047}


In [116]:
# Question 5: Which industry (besides tech) has the highest median salary?

us_mask = (data['country'].astype(str).str.strip().str.lower() == 'united states')
comp_mask = data['total_compensation'].between(5_000, 2_000_000)
ind_series = data['industry'].astype(str).str.strip()
usd_mask = data.get('currency', pd.Series(index=data.index)).astype(str).str.strip().str.upper().eq('USD')

non_tech = ~ind_series.str.lower().eq('computing or tech')

us_nontech = data[us_mask & comp_mask & usd_mask & non_tech]
by_ind = us_nontech.groupby('industry').agg(median_comp_usd=('total_compensation','median'), n=('total_compensation','size')).reset_index()
by_ind = by_ind[by_ind['n'] >= 50]

if not by_ind.empty:
    top_ind = by_ind.sort_values('median_comp_usd', ascending=False).iloc[0]
    print({'industry': str(top_ind['industry']), 'median_comp_usd': round(float(top_ind['median_comp_usd']), 2), 'n': int(top_ind['n'])})
else:
    print({'industry': None, 'median_comp_usd': None, 'n': 0})

{'industry': 'Law', 'median_comp_usd': 102000.0, 'n': 429}


In [119]:
# Bonus Questions:

# Question 6: What's the salary gap between men and women in tech roles?
try:
    from scipy import stats  # optional
    have_scipy = True
except Exception:
    have_scipy = False

# Reuse tech filters
us_mask = (data['country'].astype(str).str.strip().str.lower() == 'united states')
state_mask = data['state'].astype(str).str.len() == 2
comp_mask = data['total_compensation'].between(5_000, 2_000_000)
usd_mask = data.get('currency', pd.Series(index=data.index)).astype(str).str.strip().str.upper().eq('USD')

tech_title_keywords = [
    'engineer', 'developer', 'programmer', 'scientist', 'data', 'software',
    'it', 'sre', 'devops', 'sysadmin', 'architect', 'ml', 'ai', 'security'
]
ind_series = data['industry'].astype(str).str.strip().str.lower()
job_series = data['job_title'].astype(str).str.lower()

is_tech = (ind_series == 'computing or tech') | job_series.str.contains('|'.join([re.escape(k) for k in tech_title_keywords]), na=False)
tech_us = data[us_mask & state_mask & comp_mask & usd_mask & is_tech]

# Normalize gender categories (focus on "Man" and "Woman")
gender_series = tech_us['gender'].astype(str).str.strip().str.title()
men = tech_us[gender_series.eq('Man')]['total_compensation'].dropna()
women = tech_us[gender_series.eq('Woman')]['total_compensation'].dropna()

men_mean = float(men.mean()) if men.size else float('nan')
women_mean = float(women.mean()) if women.size else float('nan')
gap_abs = men_mean - women_mean if np.isfinite(men_mean) and np.isfinite(women_mean) else float('nan')
gap_pct = (gap_abs / women_mean * 100.0) if np.isfinite(gap_abs) and women_mean else float('nan')

p_value = None
if have_scipy and men.size >= 20 and women.size >= 20:
    _, p_value = stats.ttest_ind(men.values, women.values, equal_var=False, nan_policy='omit')

print({'gender_pay_gap': {
    'men_mean_usd': round(men_mean, 2),
    'women_mean_usd': round(women_mean, 2),
    'gap_abs_usd': round(gap_abs, 2) if np.isfinite(gap_abs) else None,
    'gap_pct_vs_women': round(gap_pct, 2) if np.isfinite(gap_pct) else None,
    'n_men': int(men.size), 'n_women': int(women.size), 'p_value_welch': (float(p_value) if p_value is not None else None)
}})

# Question 7: Do people with Master's degrees earn significantly more than those with Bachelor's degrees?
edu = data.copy()
edu_mask = us_mask & comp_mask & usd_mask
bachelors_mask = edu['highest_education'].astype(str).str.contains("bachelor", case=False, na=False)
masters_mask = edu['highest_education'].astype(str).str.contains("master", case=False, na=False)

bachelors = edu[edu_mask & bachelors_mask]['total_compensation'].dropna()
masters = edu[edu_mask & masters_mask]['total_compensation'].dropna()

b_mean = float(bachelors.mean()) if bachelors.size else float('nan')
m_mean = float(masters.mean()) if masters.size else float('nan')
diff_abs = m_mean - b_mean if np.isfinite(b_mean) and np.isfinite(m_mean) else float('nan')
diff_pct = (diff_abs / b_mean * 100.0) if np.isfinite(diff_abs) and b_mean else float('nan')

p_value_mb = None
if have_scipy and bachelors.size >= 20 and masters.size >= 20:
    _, p_value_mb = stats.ttest_ind(masters.values, bachelors.values, equal_var=False, nan_policy='omit')

print({'masters_vs_bachelors': {
    'bachelors_mean_usd': round(b_mean, 2),
    'masters_mean_usd': round(m_mean, 2),
    'diff_abs_usd': round(diff_abs, 2) if np.isfinite(diff_abs) else None,
    'diff_pct_vs_bachelors': round(diff_pct, 2) if np.isfinite(diff_pct) else None,
    'n_bachelors': int(bachelors.size), 'n_masters': int(masters.size), 'p_value_welch': (float(p_value_mb) if p_value_mb is not None else None)
}})

# Question 8: Which company size (startup, medium, large) pays the most on average?
# This dataset doesn't include company size; we'll report not available.
print({'company_size_pay': 'not_available_in_dataset'})

{'gender_pay_gap': {'men_mean_usd': 158870.54, 'women_mean_usd': 108103.99, 'gap_abs_usd': 50766.55, 'gap_pct_vs_women': 46.96, 'n_men': 958, 'n_women': 1739, 'p_value_welch': None}}
{'masters_vs_bachelors': {'bachelors_mean_usd': nan, 'masters_mean_usd': 98119.01, 'diff_abs_usd': None, 'diff_pct_vs_bachelors': None, 'n_bachelors': 0, 'n_masters': 3161, 'p_value_welch': None}}
{'company_size_pay': 'not_available_in_dataset'}


## Final Summary

**Summarize your findings here:**

1. **Median salary for Software Engineers in US:** $133000
2. **Highest paying US state for tech:** California
3. **Salary increase per year of experience:** $2944 per year
4. **Remote vs office percentage:** 98.6% remote, 1.9% office
5. **Highest paying non-tech industry:** Law

**Key insights:**
- Most tech jobs are remote
- I was surprise to learn that California state pays more compared to Washington or New York
- This assignment was quick

**Challenges faced:**
- Data cleaning, I was tedious but with Cursors chat it was easy to get help with breaking down.
- Following code developement, code is being made so fast so I needed some more time to catch up.

**What you learned about vibe coding:**
- Fast development
- Prompt engineering
- Data cleaning
