# GCP Setup Instructions

Before running this notebook, you need to set up Google Cloud Storage (GCS) access. Follow these steps:

1. **Create a GCP Project**:
   - Go to the [Google Cloud Console](https://console.cloud.google.com/).
   - Create a new project or select an existing one.

2. **Enable Cloud Storage API**:
   - In the Cloud Console, go to "APIs & Services" > "Library".
   - Search for "Cloud Storage" and enable the "Cloud Storage JSON API".

3. **Create a GCS Bucket**:
   - In Cloud Console, go to "Cloud Storage" > "Buckets".
   - Click "Create Bucket".
   - Name it (e.g., "housing-regression-data") and choose a region.
   - Update the `bucket_name` variable in the code below with your bucket name.

4. **Authentication (Secure Alternative to Service Account Keys)**:
   - **If your organization blocks Service Account key creation**, use Application Default Credentials (ADC) instead:
     - Install the [Google Cloud CLI](https://cloud.google.com/sdk/docs/install).
     - Run: `gcloud auth application-default login` (this logs you in and sets up ADC).
     - Set your project: `gcloud config set project YOUR_PROJECT_ID`.
   - **For production/deployment**, use Workload Identity (attach Service Account to your compute resource without keys).
   - If you must use a Service Account key, contact your Organization Policy Administrator to disable `iam.disableServiceAccountKeyCreation`.

5. **Install Google Cloud Library** (if not already installed):
   - Run: `pip install google-cloud-storage`

In [1]:
from google.cloud import storage
from pathlib import Path

# ---- Config ----
bucket_name = "housing-price-ml-e2e-xcluo"   # replace with your GCS bucket name

# Set project root as parent of the notebooks folder
PROJECT_ROOT = Path.cwd().parent
local_data_dir = PROJECT_ROOT / "data" / "processed"
local_model_dir = PROJECT_ROOT / "models"

client = storage.Client()
bucket = client.bucket(bucket_name)

# ---- Helper function ----
def upload_file(local_path: Path, gcs_key: str):
    if not local_path.exists():
        print(f"File not found: {local_path}")
        return
    print(f"⬆️ Uploading {local_path} → gs://{bucket_name}/{gcs_key}")
    blob = bucket.blob(gcs_key)
    blob.upload_from_filename(str(local_path))

# ---- Upload required datasets ----
upload_file(local_data_dir / "feature_engineered_holdout.csv", "processed/feature_engineered_holdout.csv")
upload_file(local_data_dir / "cleaning_holdout.csv", "processed/cleaning_holdout.csv")
upload_file(local_data_dir / "feature_engineered_train.csv", "processed/feature_engineered_train.csv")

# ---- Upload model ----
upload_file(local_model_dir / "xgb_best_model.pkl", "models/xgb_best_model.pkl")

⬆️ Uploading /Users/champagnepuppy/de-projects/Housing_Price_ML_E2E/data/processed/feature_engineered_holdout.csv → gs://housing-price-ml-e2e-xcluo/processed/feature_engineered_holdout.csv
⬆️ Uploading /Users/champagnepuppy/de-projects/Housing_Price_ML_E2E/data/processed/cleaning_holdout.csv → gs://housing-price-ml-e2e-xcluo/processed/cleaning_holdout.csv
⬆️ Uploading /Users/champagnepuppy/de-projects/Housing_Price_ML_E2E/data/processed/feature_engineered_train.csv → gs://housing-price-ml-e2e-xcluo/processed/feature_engineered_train.csv
⬆️ Uploading /Users/champagnepuppy/de-projects/Housing_Price_ML_E2E/models/xgb_best_model.pkl → gs://housing-price-ml-e2e-xcluo/models/xgb_best_model.pkl
