# Silver: expeditions_exped
Cleans and transforms `himalaya.bronze.expeditions_exped` into `himalaya.silver.expeditions_exped`

## Transformations
- Will decide during cleaning

##Table Overview

In [0]:
import pyspark.sql.functions as F
from datetime import date

In [0]:
df = spark.table("himalaya.bronze.expeditions_exped")

In [0]:
df.printSchema()

In [0]:
print(df.columns)

# Simplify Data

  > ## Drop irrelevant columns

In [0]:
drop_cols = [
    "leaders", "sponsor", "claimed", "disputed", "countries", "approach",
    "smttime", "smtdays", "totdays", "traverse", "ski", "parapente",
    "rope", "nohired", "o2none", "o2climb", "o2descent", "o2sleep",
    "o2medical", "o2taken", "o2unkwn", "othersmts", "campsites",
    "comrte", "stdrte", "primrte", "primmem", "primref", "primid", "chksum",
    "ascent1", "ascent2", "ascent3", "ascent4"
]
df = df.drop(*drop_cols)

In [0]:
print(df.columns)

  > ## Consolidate routes

In [0]:
# Consolidate routes
df = df.withColumn("routes",
    F.when(
        F.concat_ws(", ", F.col("route1"), F.col("route2"), F.col("route3"), F.col("route4")).isin("", " ") |
        F.concat_ws(", ", F.col("route1"), F.col("route2"), F.col("route3"), F.col("route4")).isNull(),
        "None"
    ).otherwise(
        F.concat_ws(", ", F.col("route1"), F.col("route2"), F.col("route3"), F.col("route4"))
    )
).drop("route1", "route2", "route3", "route4")

In [0]:
display(df.select("routes").distinct())

  > ## Consolidate successes

In [0]:
# Consolidate successes
df = df.withColumn("successes",
    F.col("success1") | F.col("success2") | F.col("success3") | F.col("success4")
).drop("success1", "success2", "success3", "success4")

In [0]:
display(df.select("successes").limit(5))

  > ## Consolidate deaths

In [0]:
# Consolidate deaths
df = df.withColumn("deaths",
    F.col("mdeaths") + F.col("hdeaths")
).drop("mdeaths", "hdeaths")

In [0]:
display(df.select("deaths").limit(5))

  > ## Cast Date

In [0]:
df = df.withColumn("bcdate", F.to_date(F.col("bcdate"))) \
       .withColumn("smtdate", F.to_date(F.col("smtdate"))) \
       .withColumn("termdate", F.to_date(F.col("termdate")))

  > ## Rename Columns

In [0]:
df = df.withColumnsRenamed({
    "bcdate": "base_camp_date",
    "smtdate": "summit_date",
    "termdate": "termination_date",
    "totmembers": "total_members",
    "smtmembers": "summit_members",
    "tothired": "total_hired",
    "smthired": "summit_hired",
    "o2used": "oxygen_used",
    "achievment": "achievement"
})

  > ## Reorder Data

In [0]:
df = df.select(
    "expid", "peakid", "year", "season", "nation", "host",
    "routes", "successes", "termreason", "termnote", "highpoint",
    "deaths", "total_members", "summit_members", "total_hired", "summit_hired",
    "base_camp_date", "summit_date", "termination_date",
    "camps", "oxygen_used", "accidents", "achievement", "agency", "ingested_at"
)

In [0]:
display(df.limit(5))

> ## Silver Transfer

In [0]:
df.write.format("delta").mode("overwrite").saveAsTable("himalaya.silver.expeditions_exped")
print("✅ Written to himalaya.silver.expeditions_exped")

In [0]:
display(spark.table("himalaya.silver.expeditions_exped").limit(5))

# Transformations Applied

| Column | Transformation |
|---|---|
| `route1-4` | Consolidated into `routes`, originals dropped |
| `success1-4` | Consolidated into `successes` (True if any success), originals dropped |
| `ascent1-4` | Dropped — historical ascent numbers, not relevant to analysis |
| `mdeaths` + `hdeaths` | Consolidated into `deaths`, originals dropped |
| `bcdate` | Cast from String to DateType, renamed to `base_camp_date` |
| `smtdate` | Cast from String to DateType, renamed to `summit_date` |
| `termdate` | Cast from String to DateType, renamed to `termination_date` |
| `totmembers` | Renamed to `total_members` |
| `smtmembers` | Renamed to `summit_members` |
| `tothired` | Renamed to `total_hired` |
| `smthired` | Renamed to `summit_hired` |
| `o2used` | Renamed to `oxygen_used` |
| `achievment` | Renamed to `achievement` (typo fixed) |
| 30+ columns | Dropped — not relevant to analysis (sponsors, O2 details, route flags, etc.) |
| Column order | Reordered logically by category |