In [3]:
import pandas as pd
import geopandas as gpd
import rasterio
from shapely.geometry import Point
from pyproj import Transformer

transformer = Transformer.from_crs("EPSG:4326", "EPSG:3577", always_xy=True)

def extract_geotiff_features(df, lat_col, lon_col, tif_dict):
    """
    Extract features from geotiff files
    """
    gdf = gpd.GeoDataFrame(
        df.copy(),
        geometry=[Point(xy) for xy in zip(df[lon_col], df[lat_col])],
        crs="EPSG:3577"
    )

    for feature_name, tif_path in tif_dict.items():
        print(f"Extracting: {feature_name} from {tif_path}")
        with rasterio.open(tif_path) as src:
            band = src.read(1) 
            nodata = src.nodata
            transform = src.transform

            def get_value(row):
                try:
                    # ✅ 1. Convert lon/lat to EPSG:3577
                    x_3577, y_3577 = transformer.transform(row[lon_col], row[lat_col])

                    # ✅ 2. Use transformed coordinates to find pixel index
                    row_idx, col_idx = src.index(x_3577, y_3577)

                    # ✅ 3. Read value from band
                    val = band[row_idx, col_idx]
                    return None if (val == nodata or val < -99900) else val
                except:
                    return None

            gdf[feature_name] = gdf.apply(get_value, axis=1)

    return gdf.drop(columns=["geometry"])



## 1.Gravity Data Extraction

We extracted key gravity features from three high-resolution GeoTIFF grids provided by Geoscience Australia, covering different aspects of the Earth's crustal density variations. These datasets are georeferenced, gridded products representing gravity anomalies across Australia and were sampled at each training point’s latitude and longitude.

Each GeoTIFF contains one raster band (float32) with a known NoData value, which we masked during processing.

| Feature Name          | Source File                          | Description |
|-----------------------|---------------------------------------|-------------|
| `gravity_iso_residual` | `Gravmap2016-grid-grv_ir.tif`         | Isostatic residual anomaly — reflects shallow density anomalies after isostatic correction. Highlights features like intrusions or potential ore bodies. |
| `gravity_cscba`        | `Gravmap2019-grid-grv_cscba.tif`      | Complete Spherical Cap Bouguer Anomaly (CSCBA) — represents full Bouguer anomaly adjusted with a spherical cap correction, providing insight into regional deep crustal structures. |
| `gravity_cscba_1vd`    | `Gravmap2019-grid-grv_cscba_1vd.tif`  | First vertical derivative of CSCBA — emphasizes structural boundaries such as faults and intrusive contacts. Useful for enhancing linear geological features. |



**Processing Notes**

- All values were extracted using rasterio and matched to sample locations via geographic coordinates.

- Any values equal to or less than the NoData threshold (e.g. -99999.0) were excluded.

- The final gravity features were appended to the training dataset as new columns.

- No normalization or transformation was applied at this stage, leaving it flexible for later preprocessing (e.g., standardization).

In [4]:
df = pd.read_csv("../../data/processed/final_samples_unbalanced_pos_vs_blank.csv")

tif_paths = {
    "gravity_iso_residual": "../../data/aligned/gravity_iso_residual_aligned.tif",
    "gravity_cscba": "../../data/aligned/gravity_cscba_aligned.tif",
    "gravity_cscba_1vd": "../../data/aligned/gravity_cscba_1vd_aligned.tif",
    "gravity_cscba_stddev3x3": "../../data/derived/gravity_cscba_stddev3x3.tif",
    "gravity_iso_residual_stddev3x3": "../../data/derived/gravity_iso_residual_stddev3x3.tif"
}

df_with_gravity = extract_geotiff_features(
    df=df,
    lat_col="LATITUDE",
    lon_col="LONGITUDE",
    tif_dict=tif_paths
)

Extracting: gravity_iso_residual from ../../data/aligned/gravity_iso_residual_aligned.tif
Extracting: gravity_cscba from ../../data/aligned/gravity_cscba_aligned.tif
Extracting: gravity_cscba_1vd from ../../data/aligned/gravity_cscba_1vd_aligned.tif
Extracting: gravity_cscba_stddev3x3 from ../../data/derived/gravity_cscba_stddev3x3.tif
Extracting: gravity_iso_residual_stddev3x3 from ../../data/derived/gravity_iso_residual_stddev3x3.tif


In [5]:
df_with_gravity.head()

Unnamed: 0,source_deposit,LABEL,LONGITUDE,LATITUDE,PROVINCEID,gravity_iso_residual,gravity_cscba,gravity_cscba_1vd,gravity_cscba_stddev3x3,gravity_iso_residual_stddev3x3
0,,0,145.277695,-34.107134,GA.GeologicProvince.559207,52.223591,349.392853,-40.590061,37.011612,0.752113
1,,0,125.714108,-23.943929,GA.GeologicProvince.468269,-195.990967,-534.08844,-556.972656,47.704494,2.474874
2,Waitara,1,148.821727,-21.846854,,376.498871,-531.526184,198.019547,46.138039,0.918559
3,,0,134.851273,-19.145144,GA.GeologicProvince.529556,10.045643,-709.562378,-201.869217,60.546471,1.362985
4,,0,142.587968,-28.984738,GA.GeologicProvince.529302,-34.105846,114.804726,-124.661057,87.284927,2.21875


## 2.Magnetic Features Extraction

We extracted magnetic anomaly features from the AWAGS_MAG_2019 dataset provided by Geoscience Australia. These GeoTIFF grids represent upward continued Total Magnetic Intensity (TMI) data that have been Reduced to the Pole (RTP) and smoothed at varying vertical levels. Each upward continuation depth highlights magnetic responses at a specific crustal level and is valuable for identifying both shallow and deep-seated geological structures related to porphyry copper systems.

Each raster file contains:

- A single float32 band of TMI values (in nanoTesla, approximate)

- A geographic coordinate system (GDA94)

- A high spatial resolution grid (~0.00085° per pixel)

- A consistent NoData value of -99999.0 (which we masked out during processing)

The following magnetic features were extracted by sampling each raster at the sample point coordinates (LATITUDE, LONGITUDE) and appended to the dataset:

| Feature Name      | Upward Continuation Depth | Interpretation                          |
|-------------------|----------------------------|------------------------------------------|
| `mag_uc_1_2km`    | 1–2 km                     | Emphasizes shallow magnetic sources      |
| `mag_uc_2_4km`    | 2–4 km                     | Intermediate-scale structures            |
| `mag_uc_4_8km`    | 4–8 km                     | Balanced view of deeper features         |
| `mag_uc_8_12km`   | 8–12 km                    | Highlights deep crustal trends           |
| `mag_uc_12_16km`  | 12–16 km                   | Emphasizes regional tectonic patterns    |


These features provide multi-scale magnetic context, allowing the machine learning model to learn from both surface and subsurface magnetic patterns.

We recommend:

- Applying Z-score standardization to each magnetic feature;

- Masking or imputing NoData values prior to modeling.


In [6]:
tif_paths_mag = {
    "mag_uc_1_2km": "../../data/aligned/mag_uc_1_2km_aligned.tif",
    "mag_uc_2_4km": "../../data/aligned/mag_uc_2_4km_aligned.tif",
    "mag_uc_4_8km": "../../data/aligned/mag_uc_4_8km_aligned.tif",
    "mag_uc_8_12km": "../../data/aligned/mag_uc_8_12km_aligned.tif",
    "mag_uc_12_16km": "../../data/aligned/mag_uc_12_16km_aligned.tif",
    "mag_uc_2_4km_1vd": "../../data/derived/mag_uc_2_4km_1vd.tif",
    "mag_uc_2_4km_thd": "../../data/derived/mag_uc_2_4km_thd.tif",
    "mag_uc_2_4km_stddev3x3": "../../data/derived/mag_uc_2_4km_stddev3x3.tif",
}

In [7]:
df_with_magnetics = extract_geotiff_features(
    df=df_with_gravity,
    lat_col="LATITUDE",
    lon_col="LONGITUDE",
    tif_dict=tif_paths_mag
)

Extracting: mag_uc_1_2km from ../../data/aligned/mag_uc_1_2km_aligned.tif
Extracting: mag_uc_2_4km from ../../data/aligned/mag_uc_2_4km_aligned.tif
Extracting: mag_uc_4_8km from ../../data/aligned/mag_uc_4_8km_aligned.tif
Extracting: mag_uc_8_12km from ../../data/aligned/mag_uc_8_12km_aligned.tif
Extracting: mag_uc_12_16km from ../../data/aligned/mag_uc_12_16km_aligned.tif
Extracting: mag_uc_2_4km_1vd from ../../data/derived/mag_uc_2_4km_1vd.tif
Extracting: mag_uc_2_4km_thd from ../../data/derived/mag_uc_2_4km_thd.tif
Extracting: mag_uc_2_4km_stddev3x3 from ../../data/derived/mag_uc_2_4km_stddev3x3.tif


In [8]:
df_with_magnetics.head()

Unnamed: 0,source_deposit,LABEL,LONGITUDE,LATITUDE,PROVINCEID,gravity_iso_residual,gravity_cscba,gravity_cscba_1vd,gravity_cscba_stddev3x3,gravity_iso_residual_stddev3x3,mag_uc_1_2km,mag_uc_2_4km,mag_uc_4_8km,mag_uc_8_12km,mag_uc_12_16km,mag_uc_2_4km_1vd,mag_uc_2_4km_thd,mag_uc_2_4km_stddev3x3
0,,0,145.277695,-34.107134,GA.GeologicProvince.559207,52.223591,349.392853,-40.590061,37.011612,0.752113,0.584576,0.99297,1.516337,1.271041,1.204403,-0.00078,0.001489,0.038044
1,,0,125.714108,-23.943929,GA.GeologicProvince.468269,-195.990967,-534.08844,-556.972656,47.704494,2.474874,3.931087,5.403521,5.343763,1.713466,-0.07108,-0.013591,0.021547,0.549094
2,Waitara,1,148.821727,-21.846854,,376.498871,-531.526184,198.019547,46.138039,0.918559,-15.808146,0.000513,9.551896,1.417245,-2.160016,-0.091868,0.096375,2.462595
3,,0,134.851273,-19.145144,GA.GeologicProvince.529556,10.045643,-709.562378,-201.869217,60.546471,1.362985,-12.336065,-17.813541,-20.743647,-12.951694,-10.113371,0.02599,0.028927,0.738085
4,,0,142.587968,-28.984738,GA.GeologicProvince.529302,-34.105846,114.804726,-124.661057,87.284927,2.21875,-2.577452,-4.013026,-5.520884,-3.453854,-2.26458,-0.006811,0.014678,0.374207


## 3.Radiometric Features Extraction

We extracted radiometric features from filtered and ratio-based GeoTIFF datasets published by Geoscience Australia. These grids represent the concentrations of naturally occurring radioactive elements (Potassium, Thorium, and Uranium), and their ratios, which are useful indicators of:

- Rock types and weathering profiles,

- Hydrothermal alteration zones,

- Potential ore-related geochemical anomalies.

Each GeoTIFF raster is georeferenced and contains a single float32 band, with a NoData value of -99999.0. All values were sampled at each training point's geographic coordinates and added as new columns.

| Feature Name         | Source File                                             | Description |
|----------------------|----------------------------------------------------------|-------------|
| `radio_K_pct`        | `Radmap2019-grid-k_conc-Filtered-AWAGS_RAD_2019.tif`     | Filtered potassium concentration (%). Often elevated in felsic rocks or potassic alteration zones. |
| `radio_Th_ppm`       | `Radmap2019-grid-th_conc-Filtered-AWAGS_RAD_2019.tif`    | Filtered thorium concentration (ppm). Typically enriched in residual soils and weathered felsic rocks. |
| `radio_U_ppm`        | `Radmap2019-grid-u_conc-Filtered-AWAGS_RAD_2019.tif`     | Filtered uranium concentration (ppm). Sensitive to groundwater leaching, may indicate hydrothermal systems. |
| `radio_Th_K_ratio`   | `Radmap2019-grid-thk_ratio-AWAGS_RAD_2019.tif`           | Thorium to potassium ratio. Useful for identifying weathered versus fresh bedrock. |
| `radio_U_K_ratio`    | `Radmap2019-grid-uk_ratio-AWAGS_RAD_2019.tif`            | Uranium to potassium ratio. May highlight uranium-enriched zones relative to host lithology. |
| `radio_U_Th_ratio`   | `Radmap2019-grid-uth_ratio-AWAGS_RAD_2019.tif`           | Uranium to thorium ratio. Often elevated in alteration zones or areas of uranium mobility. |

**Processing Notes**
- All rasters were sampled using bilinear or nearest-neighbor interpolation, depending on resolution alignment.

- NoData values were excluded (-99999.0) and handled during model preprocessing.

- Features are preserved in their raw units for modeling flexibility (e.g., log-transform or standardize if needed).

In [9]:
tif_paths_radio = {
    "radio_K_pct": "../../data/aligned/radio_k_pct_aligned.tif",
    "radio_Th_ppm": "../../data/aligned/radio_th_ppm_aligned.tif",
    "radio_U_ppm": "../../data/aligned/radio_u_ppm_aligned.tif",
    "radio_U_K_ratio": "../../data/aligned/radio_u_k_ratio_aligned.tif",
    "radio_U_Th_ratio": "../../data/aligned/radio_u_th_ratio_aligned.tif",
    "radio_Th_K_ratio": "../../data/aligned/radio_th_k_ratio_aligned.tif"
}

In [10]:
df_with_radiometric = extract_geotiff_features(
    df=df_with_magnetics,
    lat_col="LATITUDE",
    lon_col="LONGITUDE",
    tif_dict=tif_paths_radio
)

Extracting: radio_K_pct from ../../data/aligned/radio_k_pct_aligned.tif
Extracting: radio_Th_ppm from ../../data/aligned/radio_th_ppm_aligned.tif
Extracting: radio_U_ppm from ../../data/aligned/radio_u_ppm_aligned.tif
Extracting: radio_U_K_ratio from ../../data/aligned/radio_u_k_ratio_aligned.tif
Extracting: radio_U_Th_ratio from ../../data/aligned/radio_u_th_ratio_aligned.tif
Extracting: radio_Th_K_ratio from ../../data/aligned/radio_th_k_ratio_aligned.tif


In [12]:
df_with_radiometric.head()

Unnamed: 0,source_deposit,LABEL,LONGITUDE,LATITUDE,PROVINCEID,gravity_iso_residual,gravity_cscba,gravity_cscba_1vd,gravity_cscba_stddev3x3,gravity_iso_residual_stddev3x3,...,mag_uc_12_16km,mag_uc_2_4km_1vd,mag_uc_2_4km_thd,mag_uc_2_4km_stddev3x3,radio_K_pct,radio_Th_ppm,radio_U_ppm,radio_U_K_ratio,radio_U_Th_ratio,radio_Th_K_ratio
0,,0,145.277695,-34.107134,GA.GeologicProvince.559207,52.223591,349.392853,-40.590061,37.011612,0.752113,...,1.204403,-0.00078,0.001489,0.038044,1.104598,10.030174,1.367533,1.238717,0.136452,9.082856
1,,0,125.714108,-23.943929,GA.GeologicProvince.468269,-195.990967,-534.08844,-556.972656,47.704494,2.474874,...,-0.07108,-0.013591,0.021547,0.549094,0.158148,15.495625,1.302552,6.512759,0.084177,77.478119
2,Waitara,1,148.821727,-21.846854,,376.498871,-531.526184,198.019547,46.138039,0.918559,...,-2.160016,-0.091868,0.096375,2.462595,0.327474,3.212502,0.507891,1.592497,0.159617,9.933004
3,,0,134.851273,-19.145144,GA.GeologicProvince.529556,10.045643,-709.562378,-201.869217,60.546471,1.362985,...,-10.113371,0.02599,0.028927,0.738085,0.281605,7.375026,0.806524,2.877724,0.111031,26.118765
4,,0,142.587968,-28.984738,GA.GeologicProvince.529302,-34.105846,114.804726,-124.661057,87.284927,2.21875,...,-2.26458,-0.006811,0.014678,0.374207,0.442369,3.28747,0.641557,1.465061,0.198236,7.432631


## 4.Model Input Samples

We constructed the final dataset for model training by integrating spatial coordinates, sample source tags, and a comprehensive suite of geophysical features derived from national-scale raster datasets provided by Geoscience Australia. The input features were selected based on their geological significance and relevance to porphyry copper mineralization.

| Feature Group        | Fields Included                                                                                  |
|----------------------|--------------------------------------------------------------------------------------------------|
| **Spatial Coordinates** | `LONGITUDE`, `LATITUDE` — used for mapping, visualization, or spatial regularization              |
| **Gravity Features**     | `gravity_iso_residual`, `gravity_cscba`, `gravity_cscba_1vd`                                     |
| **Magnetic Features**    | `mag_uc_1_2km`, `mag_uc_2_4km`, `mag_uc_4_8km`, `mag_uc_8_12km`, `mag_uc_12_16km`               |
| **Radiometric Features** | `radio_K_pct`, `radio_Th_ppm`, `radio_U_ppm`, `radio_Th_K_ratio`, `radio_U_K_ratio`, `radio_U_Th_ratio` |
| **Label (Target)**       | `LABEL` — binary class indicating presence or absence of porphyry copper mineralization           |


**Notes**
- All geophysical features were extracted from GeoTIFF raster grids using precise geographic sampling.

- NoData values (-99999.0) were masked prior to modeling.

- No transformations were applied at this stage; downstream models may include normalization, imputation, or feature engineering.

- The resulting dataset was saved as model_input_samples.csv.




In [15]:
features = [
    "LONGITUDE", "LATITUDE",  "LABEL",
    "gravity_iso_residual", "gravity_cscba", "gravity_cscba_1vd", "gravity_iso_residual_stddev3x3", "gravity_cscba_stddev3x3",
    "mag_uc_1_2km", "mag_uc_2_4km", "mag_uc_4_8km", "mag_uc_8_12km", "mag_uc_12_16km", "mag_uc_2_4km_1vd", "mag_uc_2_4km_thd", "mag_uc_2_4km_stddev3x3",
    "radio_K_pct", "radio_Th_ppm", "radio_U_ppm", "radio_Th_K_ratio", "radio_U_K_ratio", "radio_U_Th_ratio"
]

df_model = df_with_radiometric[features]
df_model.to_csv("../../data/processed/model_input_dataset.csv", index=False)
