# KLIMATA ILOILO DATA EXTRACTION AND PREPROCESSING STAGE

### For the next step, we extract and process the NDVI data. The NDVI is an index that measures how dense the vegetation in the area is. Basically, how green and healthy the land is. This jupyter notebook contains code that preprocesses the vegetation data.

### Importing essential libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### Importing CSV file for NDVI data

In [10]:
land_df = pd.read_csv(r'C:/Users/Value Lines/Documents/climate_land.csv')

### Checking of NDVI dataframe rows

In [11]:
land_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6421095 entries, 0 to 6421094
Data columns (total 5 columns):
 #   Column      Dtype  
---  ------      -----  
 0   uuid        object 
 1   adm4_pcode  object 
 2   date        object 
 3   freq        object 
 4   ndvi        float64
dtypes: float64(1), object(4)
memory usage: 244.9+ MB


### Checking of NDVI dataframe rows and columns

In [12]:
land_df.shape

(6421095, 5)

### Checking columns'  null value counts

In [13]:
print(land_df.isnull().sum())

uuid              0
adm4_pcode        0
date              0
freq              0
ndvi          11249
dtype: int64


### Cleans up all column names for consistency

In [14]:
land_df.columns = land_df.columns.str.strip().str.lower().str.replace(' ', '_')

### Converting column 'date' into proper datetime format

In [15]:
if 'date' in land_df.columns:
    df['date'] = pd.to_datetime(land_df['date'], errors='coerce')

### Replacing of missing values with the median of the column

In [16]:
num_cols = land_df.select_dtypes(include=['float64', 'int64']).columns
land_df[num_cols] = land_df[num_cols].fillna(land_df[num_cols].median())


### Cleans up all text (categorical) columns in the NDVI dataframe for consistency

In [17]:
cat_cols = land_df.select_dtypes(include=['object']).columns
for col in cat_cols:
    df[col] = land_df[col].str.strip().str.lower()

### Drops irrelevant column

In [18]:
land_df.drop(columns="freq", inplace=True)

### Viewing the preprocessed NDVI dataframe

In [19]:
land_df.head(10)

Unnamed: 0,uuid,adm4_pcode,date,ndvi
0,CLAND000000,PH015518001,2003-01-01,0.49
1,CLAND000001,PH015518001,2003-01-02,0.47
2,CLAND000002,PH015518001,2003-01-03,0.44
3,CLAND000003,PH015518001,2003-01-04,0.44
4,CLAND000004,PH015518001,2003-01-05,0.45
5,CLAND000005,PH015518001,2003-01-06,0.43
6,CLAND000006,PH015518001,2003-01-07,0.43
7,CLAND000007,PH015518001,2003-01-08,0.43
8,CLAND000008,PH015518001,2003-01-09,0.44
9,CLAND000009,PH015518001,2003-01-10,0.45


### Importing of NDVI dataframe as a csv file

In [None]:
land_df.to_csv("NDVI.csv", index=False)