# Extract Core Positive Samples (Porphyry Copper in Australia)

Based on the dataset's description and analysis objectives from [01_data_exploration_porphyry_datasheet](./01_data_exploration_porphyry_datasheet.ipynb), we carefully selected the following fields to retain as our positive examples.

- DEPOSIT: A unique identifier for each deposit, essential for visualization and traceability.
- LATITUDE: Key coordinates required for spatial localization and integration with geotiff or shapefile data. These fields are indispensable for any geospatial analysis.
- LONGITUDE: Key coordinates required for spatial localization and integration with geotiff or shapefile data. These fields are indispensable for any geospatial analysis.
- CMMI_DEPOSIT_ENVIRONMENT: An optional classification feature that can be used as a one-hot encoded input in machine learning models.
- ORE_TONNAGE_MT: Represents the scale of the ore body, which may correlate with anomaly intensity and can serve as an explanatory feature.
- CU_PERCENT: Actual grades of copper (Cu), molybdenum (Mo), gold (Au), and silver (Ag). These fields are critical for labeling positive samples or serving as explanatory variables in modeling.
- MO_PERCENT: Actual grades of copper (Cu), molybdenum (Mo), gold (Au), and silver (Ag). These fields are critical for labeling positive samples or serving as explanatory variables in modeling.
- AU_GT: Actual grades of copper (Cu), molybdenum (Mo), gold (Au), and silver (Ag). These fields are critical for labeling positive samples or serving as explanatory variables in modeling.
- AG_GT: Actual grades of copper (Cu), molybdenum (Mo), gold (Au), and silver (Ag). These fields are critical for labeling positive samples or serving as explanatory variables in modeling.


In [1]:
import pandas as pd

# read Porphyry_datasheet
df = pd.read_csv('../../data/raw/Dataset/USGS/Porphyry_Copper_Deposit/Porphyry_datasheet.csv'
                 , encoding='ISO-8859-1') # porphyry_dataset unicode is ISO-8859-1

# filter the data for Australia Porphyry Copper
australia_porphyry_copper_data = df[df['COUNTRY'] == 'Australia']

# Define the list of fields to retain
selected_fields = [
    'DEPOSIT', 
    'LATITUDE', 'LONGITUDE', 
    'CMMI_DEPOSIT_ENVIRONMENT', 
    'ASSIGNED_AGE_MA', 
    'ORE_TONNAGE_MT', 
    'CU_PERCENT', 'MO_PERCENT', 'AU_GT', 'AG_GT'
]

# Filter the dataset to retain only the selected fields
australia_positive_data = australia_porphyry_copper_data[selected_fields]

# Display the first few rows of the filtered data
print(australia_positive_data.head())

# Save the filtered data to a new CSV file at \processed
australia_positive_data.to_csv('../data/processed/positive_core_clean.csv', index=False)


            DEPOSIT  LATITUDE  LONGITUDE CMMI_DEPOSIT_ENVIRONMENT   
47        Allendale -32.54985  148.17342    Magmatic hydrothermal  \
65     Anabama Hill -32.71890  140.20550    Magmatic hydrothermal   
171            Bank -20.13699  146.75008    Magmatic hydrothermal   
179  Barrabas Creek -20.10957  146.78779    Magmatic hydrothermal   
203  Beaks Mountain -19.98549  147.62138    Magmatic hydrothermal   

    ASSIGNED_AGE_MA ORE_TONNAGE_MT CU_PERCENT MO_PERCENT AU_GT AG_GT  
47            436.5                                                   
65              502              4        0.6                         
171             395                                                   
179             395                                                   
203             265                                                   
