## Fire Classification

This file contains a function which can classify the geographic setting of the fire in an area. <br>
Using the Land Use Land Cover data this script classifies if the fire was in a cropped area, around trees, flooded vegetation, etc. <br>
The useful data here comes from the fires in the agricultural areas. <br>

In [1]:
import warnings
warnings.filterwarnings('ignore')

In case you experience module related import errors <br>

Use this to install the necessary libraries: 
>!pip install geopandas pandas rasterio rasterstats matplotlib

In [10]:
import geopandas as gpd
import pandas as pd
import rasterio
from rasterstats import zonal_stats
import matplotlib.pyplot as plt
from rasterio.plot import show
import os

In [11]:
os.listdir()

['.ipynb_checkpoints',
 'Classification_Fires.ipynb',
 'fires_data_classified.csv',
 'fires_data_classified.geojson',
 'fire_classification_function.ipynb',
 'telanganafire.geojson']

### Importing data

This code imports a CSV file called _'telangana_fires.csv'_ containing information from the fires in Telangana between 2015 and 2018 and stores it in the variable _'fire_data'_. 

It also imports three geojson files: _'TL_state_shapefile_for_clip.geojson'_, _'TS_district_boundary.json'_, and _'telanganafire.geojson'_, 
and stores them in the variables _'telangana_shape'_, _'district_boundaries'_, and _'fires_2021'_, respectively. 
We are using Geopandas to read geojson files. 

Here, fire_data contains parameters such as acq_date (date of recording), acq_time (time of recording), satellite (name/ type of satellite), instrument (instrument used), confidence (certainity that it is fire), bright_t31 (brightness) frp (fire radiative power) daynight (day/night). 
Telangana shapefiles contains geographic boundaries of telangana and it's districts, and fire contains data of fire.

In [7]:
fire_data = pd.read_csv('../../../geospatial_internship/datasets/telangana_fires.csv')
telangana_shape = gpd.read_file('../../../../../src/data_preprocessing/base_geojson/TL_state_shapefile_for_clip.geojson')

district_boundaries = gpd.read_file('../../../../../src/data_preprocessing/base_geojson/TS_district_boundary.json')

fires_2021 = gpd.read_file('telanganafire.geojson')

In [8]:
fire_data['acq_date'] = pd.to_datetime(fire_data['acq_date'])
fires_2021['acq_date'] = pd.to_datetime(fires_2021['acq_date'])

### Processing and classification

We read the fire data, classify it into a geographical setting (farm, forest, urban, etc.) using LULC data, then we see if there are trees around the fire areas, which helps us understand the nature of the surrounding setting better. The obtained data is then assigned to the closest mandal We read the fire data, classify it into a geographical setting (farm, forest, urban, etc.) The resulting data is saved to a new file to be used for further use/reference.  

In [9]:
#Create geodataframe from the data
geo_fire_data = gpd.GeoDataFrame(fire_data,geometry = gpd.points_from_xy(fire_data.longitude,fire_data.latitude), crs = {'init': 'epsg:4326'}) 

#Make sure the
geo_fire_data['geometry'] = geo_fire_data['geometry'].geometry.to_crs(epsg = 4326)
district_boundaries['geometry'] = district_boundaries['geometry'].geometry.to_crs(epsg = 4326)

#Combine fires from 2021 with the rest
geo_fire_data = geo_fire_data.append(fires_2021)
geo_fire_data['fireID'] = [a for a in range(0, len(geo_fire_data))]

polygons = []

#-------create a buffer of square of 1km size using buffer function from shapely-----#

for i in geo_fire_data.geometry:
    p1 = i
    buffer = p1.buffer(0.004504505, cap_style = 3)         #500m = 0.004504505 and cap_style 3 is square box of same of 1km side length
    polygons.append(buffer)

#create a new column in GeoDataFrame newdf and dump polygon buffer of respective point values
geo_fire_data['geometry buffered'] = polygons 

geo_fire_data['acq_date'] =  pd.to_datetime(geo_fire_data['acq_date'])
geo_fire_data['year'] = (geo_fire_data['acq_date']).dt.year

geo_fire_data = geo_fire_data[geo_fire_data['year']<2022]
geo_fire_data

LULC_location = '../../../../../src/data_preprocessing/LULC/'
df = pd.DataFrame(columns=geo_fire_data.columns)

for a in geo_fire_data['year'].unique():
    year = a
    year_data = geo_fire_data[geo_fire_data['year'] == year]
    if (year==2015) | (year == 2016):
        tiff = LULC_location+'01-01-2017.tif'
    else:
        tiff = LULC_location+'01-01-'+str(year)+'.tif'
    lulc = rasterio.open(tiff, mode = 'r')
    lulc_array = lulc.read(1) # landuse corresponding to each rasterpixel, so we extracted the pixel values from the raster
    # affine: 1: corresponds to the width of each pixel, 2: row rotation, 3: x-coordinate of the upper left pixel, 4: column rotation, 5: height of each pixel, 6: y-coordinate of the upper left pixel
    affine = lulc.transform
    cmap = {1: 'Water', 2: 'Trees', 4: 'Flooded Vegetation', 5: 'Crops', 7: 'Built Area', 8: 'Bare Ground', 9: 'Snow/Ice', 10: 'Clouds', 11: 'Rangeland'}
    year_data['Land use'] = 0
    year_data['Trees Around'] = 0

    for i in range(0, len(year_data)):
        test = zonal_stats(year_data['geometry'].iloc[i], lulc_array, affine = affine, geojson_out = True, stats = 'majority', nodata = lulc.nodata, categorical=True, category_map = cmap, all_touched=True)
        try: 
            year_data['Land use'].iloc[i] = test[0]['properties']['majority']
        except:
            year_data['Land use'].iloc[i] = 'unknown'

    for i in range(0, len(year_data)):
        test = zonal_stats(year_data['geometry buffered'].iloc[i], lulc_array, affine = affine, geojson_out = True, stats = 'majority', nodata = lulc.nodata, categorical=True, category_map = cmap, all_touched=True)

        if 'Trees' in list(test[0]['properties'].keys()):
            year_data['Trees Around'].iloc[i]=1
        else:
            year_data['Trees Around'].iloc[i]=0

    
    df = pd.concat([df, year_data], axis = 0)

    

In [11]:
df['agricultural'] = 0
df['agricultural strict'] = 0

for i in range(0, len(df)):
    if (df['Land use'].iloc[i] == 5) | (df['Land use'].iloc[i] == 4) :
        df['agricultural'].iloc[i] = 1
    if ((df['Land use'].iloc[i] == 5) | (df['Land use'].iloc[i] == 4)) & (df['Trees Around'].iloc[i]==0):
        df['agricultural strict'].iloc[i] = 1

In [12]:
df[['fireID', 'latitude', 'longitude', 'brightness', 'scan', 'track',
       'acq_date', 'acq_time', 'satellite', 'instrument', 'confidence',
       'version', 'bright_t31', 'frp', 'daynight', 'type', 'geometry','agricultural', 'agricultural strict']].to_csv('fires_data_classified.csv')

In [13]:
df = gpd.GeoDataFrame(df,geometry = df.geometry, crs = {'init': 'epsg:4326'}) 

In [14]:
df[['fireID', 'latitude', 'longitude', 'brightness', 'scan', 'track',
       'acq_date', 'acq_time', 'satellite', 'instrument', 'confidence',
       'version', 'bright_t31', 'frp', 'daynight', 'type', 'geometry','agricultural', 'agricultural strict']].to_file('fires_data_classified.geojson', driver="GeoJSON") 

# Classification (Using Shivangs Approach)

Reference : https://github.com/luckyw0w/Data4Policy/tree/main/Geospatial%20Data%20Science%20Internship/ShivangPandey

So, the buffer function of geopandas (based on the shapely library) is to create a buffer polygon around a point. The first parameter of the buffer is to tell length in the degree to mark the point in a square pixel and 'cap_style' to tell the type of polygon. here 'cap_style = 3' means square polygon for buffer point. 

We need to create a polygon, because MODIS Thermal Anomalies & Fire Daily data is calculated on 1km resolution and given esri LULC data is at 10m resolution and we can say it like that 1 pixel of MODIS fire data should be as big as 10000 pixels of LULC map.
How we are going to use that? by creating a polygon as a size of 10000 pixels of LULC map.


Reference : https://www.arcgis.com/home/item.html?id=d6642f8a4f6d4685a24ae2dc0c73d4ac

In [15]:
#os.chdir('c:\\Users\\Jesse\\OneDrive\\Documenten\\Master BAOR\\Thesis\\GitHub\\dicra\\analytics\\notebooks\\crop_fires')
tiff = '../../../../../src/data_preprocessing/LULC/01-01-2017.tif'
    
data = geo_fire_data
data['index'] = data.index

#------------------Checking probability in mosaic tiff file----------------------#
import time                                      #to calculate time taken to run the model
start_time = time.time()                         #start time of the program

index_list = []                               #declaring empty to get corresponding id of fire point
flag_list = []                                #empty list to store class of corresponding fire points based polygon
    
for j in range(len(data)):                #iterating all rows of dataframe to get point info

    stats = zonal_stats(data.iloc[j].geometry, tiff, stats="*", categorical=True)         #getting statistics from the raster point 
    i = stats[0]                                                                          #storing statsistical dictionary in a value
    if i['count'] != 0:                                                                   #check if polygon is within the tiff file or not
        index = data.iloc[j]['index']                                                     #get id of polygon
        index_list.append(index)                                                          #store id in a list
        if (5 in i) or (4 in i): 
            #print(i)                                                         #check if crop class or flooded vegitation is in polygon region or not
            flag = 1                                                                      #Mark whether occurence is in crop field
        else:
            flag = 0                                                                      # if crop class is not in polygon, return 0 pixels
        flag_list.append(flag)                                                            #store class value in a list
        
print("--- %s seconds ---" % (time.time() - start_time))                                  #print total time taken to run code 

#dictionary created wth fireID and class (1,0)
id_class = {key: value for key, value in zip(index_list, flag_list)}

#creating tuple of each key and  value pair
data_items = id_class.items()                                                
#dumping all tuples in a list
data_list = list(data_items)

#creating DataFrame with id and class values
class_df = pd.DataFrame(data_list, columns= ['fireID','class (1,0)'])


--- 135.38010573387146 seconds ---


In [16]:
geo_fire_data['agricultural'] = class_df['class (1,0)']

In [17]:
geo_fire_data[['fireID', 'latitude', 'longitude', 'brightness', 'scan', 'track',
       'acq_date', 'acq_time', 'satellite', 'instrument', 'confidence',
       'version', 'bright_t31', 'frp', 'daynight', 'type', 'geometry','agricultural']].to_csv('fires_data_classified.csv')