<a href="https://colab.research.google.com/github/wbnselvi/EDA-Assay/blob/main/EDA-Assay.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploratory Data Analysis (EDA) Pada Data Assay Nikel Laterite

Setelah data cleaning, kita melakukan Exploratory Data Analysis pada data Assay.
Dari EDA kita bisa melihat kecenderungan data sekaligus juga melihat kesesuaian interpretasi layer geologi yang dilakukan geologis pada tahap Evaluation dengan model umum profil laterit.
Contoh hal yang harus diperhatikan :
* Domaining Fe, pada layer limonite tidak boleh lebih dari satu domain, ditunjukkan oleh histogram yang positively skewed.
* Cluster MgO pada bedrock cenderung tinggi, sedangkan di limonite akan rendah, dengan sedikit outlier (kategori MgO, Fe, Ni biasanya ditentukan oleh Principal Geologist untuk area tertentu, pada data ini MgO > 25 = tinggi (penciri area bedrock)).
* Cluster Fe pada limonite cenderung paling tinggi, dan sebaliknya di bedrock, gunakan scatter plot Fe6 vs MgO6 Limonite.
* Jika outlier lebih besar dari 5% kembalikan data Assay pada Geologis Evaluasi untuk diinterpretasi dan di kros cek dengan foto core atau direassay. 
* Sebaran Ni limonite berbentuk normal distribution
* Gunakan scatter plot Fe vs Cr di Saprolite untuk melihat korelasi positif keduanya.
* Dst yang dianggap perlu.
 
 ## Selvi Yuminti
 Version 1.0
 Agustus 2022

### Import Modules

In [49]:
import pandas as pd # Modul yang dipakai untuk mengolah dan analisis data

In [55]:
import matplotlib.pyplot as plt # visualisasi data
import seaborn as sns # visualisasi data

# output dari visualisasi data akan diarahkan ke notebook
%matplotlib inline 

#### Load dataset

In [50]:
from google.colab import drive
assay_df = pd.read_excel('/content/sample_data/assay.xlsx') # memuat file excel atau csv sebagai data frame
assay_df.head() # tampilkan 5 baris pertama


Unnamed: 0,Sub pit,DH ID,Deposit,DPO ID,DepthFrom,DepthTo,LENGTH,Sample Rec_Pct,Major,Minor,...,Lithology,Ore,Ore_limit,Remark,Batas_Interp,XA,YA,ZA,Depth (Log),Contractor
0,BN,BN0513,BN,BN0064,0.0,1.0,1.0,100.0,LIM,,...,,,,,,393500.0,81600.0,,13.0,BPMS
1,BN,BN0513,BN,BN0064,1.0,2.0,1.3,100.0,LIM,,...,,,,,,393500.0,81600.0,,13.0,BPMS
2,BN,BN0513,BN,BN0064,2.0,3.0,0.7,100.0,FSAP,,...,,,,,,393500.0,81600.0,,13.0,BPMS
3,BN,BN0513,BN,BN0064,3.0,4.0,1.0,90.0,RSAP,CST,...,,,,,,393500.0,81600.0,,13.0,BPMS
4,BN,BN0513,BN,BN0064,4.0,5.0,1.0,80.0,RSAP,CST,...,,,,,,393500.0,81600.0,,13.0,BPMS


#### Buang kolom yang tidak diperlukan

In [51]:
assay_df.drop(["Sub pit","Deposit", "Minor", "Minor_Pct", "Rig ID","Geologist","StartDrilling","FinishDrilling", "Dry_dens","K2O","Na2O","TiO2","Mg_inform","Ni_Co","Lithology","Ore","Ore_limit","Remark","Batas_Interp","Depth (Log)","XA","YA","ZA","Contractor"], axis = 1, inplace = True)

#### Identify the shape of the dataset

In [76]:
assay_df.shape # bentuk/dimensi dataset (baris,kolom)

(4807, 27)

#### Get the list of columns

In [53]:
assay_df.columns # daftar nama kolom

Index(['DH ID', 'DPO ID', 'DepthFrom', 'DepthTo', 'LENGTH', 'Sample Rec_Pct',
       'Major', 'Rock %', 'Sample_id', 'QC', 'Ni', 'Co', 'Al2O3', 'CaO',
       'Cr2O3', 'Fe2O3', 'TFe', 'MgO', 'BCMgO', 'MnO', 'P2O5', 'SiO2',
       'SiO2/MgO', 'LOI', 'MC', 'MC_S', 'Alt_index'],
      dtype='object')

#### Describe the dataset

In [54]:
assay_df.describe() # deskripsi data

Unnamed: 0,DepthFrom,DepthTo,LENGTH,Sample Rec_Pct,Rock %,Ni,Co,Al2O3,CaO,Cr2O3,...,MgO,BCMgO,MnO,P2O5,SiO2,SiO2/MgO,LOI,MC,MC_S,Alt_index
count,4725.0,4725.0,4740.0,1581.0,1971.0,4800.0,4800.0,4800.0,4800.0,4800.0,...,4800.0,4797.0,4800.0,4800.0,4800.0,4797.0,4800.0,4800.0,4800.0,4800.0
mean,12.49367,13.489807,1.055479,95.207691,71.497463,0.807644,0.047059,3.347232,0.307105,1.368966,...,21.383248,22.259425,0.366468,0.014522,38.924622,6.563669,11.564348,26.133825,0.261338,2.134417
std,9.35294,9.356663,2.307985,11.851609,32.646619,0.544707,0.075558,4.951628,1.071067,1.506964,...,14.06289,61.582144,0.575353,0.044688,13.523412,12.938905,2.185957,13.07782,0.130778,1.87482
min,0.0,0.5,0.0,15.0,2.0,0.0,0.00089,0.10065,0.0085,0.0,...,0.2475,0.2475,0.0102,0.00068,2.701177,0.00975,2.17,0.097971,0.00098,0.003191
25%,5.0,6.0,1.0,100.0,40.0,0.34,0.0098,0.4356,0.0196,0.43425,...,4.57,4.5881,0.1078,0.003201,37.08975,1.204215,10.660533,15.090089,0.150901,0.117074
50%,11.0,12.0,1.0,100.0,85.0,0.699,0.0102,0.825552,0.049,0.69,...,26.4627,26.51025,0.1594,0.004737,40.9733,1.704796,11.9,26.946994,0.26947,2.004579
75%,18.0,19.0,1.0,100.0,100.0,1.13,0.0588,4.218234,0.13,1.83365,...,34.182837,34.1847,0.3395,0.008703,45.6288,5.032548,12.918485,36.866085,0.368661,3.727116
max,56.0,57.0,85.0,100.0,354.0,3.308,1.2257,30.596001,10.7814,13.46,...,43.9782,4174.0,11.4742,0.702474,81.4176,179.8125,39.601386,63.289185,0.632892,9.347089


#### Scatter Plot

In [59]:
df1=assay_df[assay_df['Major']=='LIM']
df2=assay_df[assay_df['Major']=='RSAP']
df3=assay_df[assay_df['Major']=='FSAP']
df4=assay_df[assay_df['Major']=='BRK']
df5=assay_df[assay_df['QC']=='BLANK']


In [None]:
df1.describe()


#### Scatter Plot untuk Limonite
##### Range normal Ni untuk area Limonite, Rocky Saprolite, Fine Saprolite, Bedrock, dan Blank
|Layer     |      AI    |    Ni     |   Al2O3  |   Fe2O3   |   MgO   |   SiO2   |
|----------|------------|-----------|----------|-----------|---------|----------|
|LIM       | 0.01 - 1   | 0.3 - 1.6 | 2 - 17   | 50 - 75   | 0.5 - 5 | 2 - 20   |
|RSAP      | 1.5 - 4.5  | 1 - 2.5   | 0.9 - 3  | 8 - 19    | 30 - 40 | 35 - 75  |
|FSAP      | 0.01 - 1.4 | 0.5 - 2.5 | 1 - 3    | 20 - 49   | 6 - 29  | 21 - 50  |
|BRK       | 2 - 6      | 0.1 - 1   | 0.2 - 0.9| 3 - 12    | 41 - 45 | 36 - 82  |
|BLANK     | 0.1 - 0.3  | 0.01-0.15 | 12 - 18  | 8 - 13    | 3 - 7   | 44 - 68  |

In [None]:
sns.scatterplot(x='Alt_index', y='Ni', color='blue', data=df1, label = 'LIM')

In [None]:
sns.scatterplot(x='Alt_index', y='Al2O3', color='blue', data=df1, label = 'LIM')

In [None]:
sns.scatterplot(x='Alt_index', y='Fe2O3', color='blue', data=df1, label = 'LIM')

In [None]:
sns.scatterplot(x='Alt_index', y='MgO', color='blue', data=df1, label = 'LIM')

In [None]:
sns.scatterplot(x='Alt_index', y='SiO2', color='blue', data=df1, label = 'LIM')

Cek ulang sample dengan Ni > 1.5 dan Fe < 25, perhatikan fotocore dan posisi from to nya

In [None]:
sns.scatterplot(x='Alt_index', y='Ni', color = 'orange', data=df2,label = 'RSAP')

In [None]:
sns.scatterplot(x='Alt_index', y='Al2O3', color = 'orange', data=df2,label = 'RSAP')

In [None]:
sns.scatterplot(x='Alt_index', y='Fe2O3', color = 'orange', data=df2,label = 'RSAP')

In [None]:
sns.scatterplot(x='Alt_index', y='MgO', color = 'orange', data=df2,label = 'RSAP')

In [None]:
sns.scatterplot(x='Alt_index', y='SiO2', color = 'orange', data=df2,label = 'RSAP')

Cek ulang sample dengan Ni<0.5 dan Fe < 10 (mungkin brk, cek MgO dan Co nya)

In [None]:
sns.scatterplot(x='Alt_index', y='Ni', color = 'green', data=df3,label ='FSAP')

In [None]:
sns.scatterplot(x='Alt_index', y='Al2O3', color = 'green', data=df3,label ='FSAP')

In [None]:
sns.scatterplot(x='Alt_index', y='Fe2O3', color = 'green', data=df3,label ='FSAP')

In [None]:
sns.scatterplot(x='Alt_index', y='MgO', color = 'green', data=df3,label ='FSAP')

In [None]:
sns.scatterplot(x='Alt_index', y='SiO2', color = 'green', data=df3,label ='FSAP')

In [None]:
sns.scatterplot(x='Alt_index', y='Ni', color = 'violet', data=df4,label ='BRK')

In [None]:
sns.scatterplot(x='Alt_index', y='Al2O3', color = 'violet', data=df4,label ='BRK')

In [None]:
sns.scatterplot(x='Alt_index', y='Fe2O3', color = 'violet', data=df4,label ='BRK')

In [None]:
sns.scatterplot(x='Alt_index', y='MgO', color = 'violet', data=df4,label ='BRK')

In [None]:
sns.scatterplot(x='Alt_index', y='SiO2', color = 'violet', data=df4,label ='BRK')

In [None]:
sns.scatterplot(x='Alt_index', y='Ni', color = 'yellow', data=df5,label ='BLANK')

In [None]:
sns.scatterplot(x='Alt_index', y='Al2O3', color = 'yellow', data=df5,label ='BLANK')

In [None]:
sns.scatterplot(x='Alt_index', y='Fe2O3', color = 'yellow', data=df5,label ='BLANK')

In [None]:
sns.scatterplot(x='Alt_index', y='MgO', color = 'yellow', data=df5,label ='BLANK')

In [None]:
sns.scatterplot(x='Alt_index', y='SiO2', color = 'yellow', data=df5,label ='BLANK')

Cek Outliers

In [None]:
sns.scatterplot(x='NI', y='MGO', color = 'orange', data=df2,label = 'SAP')

Cek Low Ni with high MgO

In [None]:
sns.scatterplot(x='NI', y='MGO', color = 'green', data=df3,label ='BRK')

Cek Ni > 1 dan MgO < 25

In [None]:
sns.scatterplot(x='NI', y='MGO', data=assay_df, hue='LYR')

#### Kesimpulan
Pada dasarnya interpretasi profil geologi sudah bisa diterima, dengan catatan beberapa outlier dicek ulang. Proses EDA sudah selesai, berikutnya adalah pemodelan geologi menggunakan metode geostatistik kriging (dan Nearest Neighbour sebagai pembanding). Di bawah ini merupakan estimasi tanpa spatial(collar/koordinat) menggunakan K-Nearest Neighbour hanya untuk pembelajaran/eksperimen machine learning perilaku pada dataset assay Nikel laterite.