<a href="https://colab.research.google.com/github/the-Soke/Sentinel-model/blob/main/sentinelNote.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!git clone https://github.com/the-Soke/Sentinel-model.git

Cloning into 'Sentinel-model'...
remote: Enumerating objects: 3, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (2/2), done.[K
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (3/3), done.


In [2]:
import os
os.chdir("/content/Sentinel-model")
!pwd

/content/Sentinel-model


In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
!cp -r "/content/drive/MyDrive/nigeria_insecurity_dataset.csv" "/content/Sentinel-model/"


In [14]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier

# Load dataset
df = pd.read_csv("nigeria_insecurity_dataset.csv")

# -----------------------------------------------
# 1. CLEANING SECTION
# -----------------------------------------------

# Convert Date → datetime
df["Date"] = pd.to_datetime(df["Date"], errors="coerce")

# Extract date features
df["DayOfYear"] = df["Date"].dt.dayofyear
df["Month"] = df["Date"].dt.month
df["Week"] = df["Date"].dt.isocalendar().week.astype(int)

# Clean TimeOfDay
# Expected formats: "14:30", "2pm", "7AM", etc.
def extract_hour(t):
    if pd.isna(t):
        return np.nan
    t = str(t).strip().lower()

    # Case: "14:30" or "07:00"
    if ":" in t:
        try:
            return int(t.split(":")[0])
        except:
            pass

    # Case: "2pm", "11am"
    if "am" in t or "pm" in t:
        try:
            h = int("".join([c for c in t if c.isdigit()]))
            if "pm" in t and h != 12:
                h += 12
            if "am" in t and h == 12:
                h = 0
            return h
        except:
            pass

    # Last fallback: try integer
    try:
        return int(t)
    except:
        return np.nan

df["Hour"] = df["TimeOfDay"].apply(extract_hour)
df["Hour"] = df["Hour"].fillna(df["Hour"].median()).astype(int)

# -----------------------------------------------
# 2. ENCODING CATEGORICAL VARIABLES
# -----------------------------------------------

label_cols = ["State", "Location", "WeaponsUsed"]

encoders = {}
for col in label_cols:
    enc = LabelEncoder()
    df[col] = df[col].astype(str)
    df[col] = enc.fit_transform(df[col])
    encoders[col] = enc

# -----------------------------------------------
# 3. ENSURE NUMERIC COLUMNS
# -----------------------------------------------

numeric_cols = ["Casualties", "Kidnapped", "PastIncidentsInArea"]
df[numeric_cols] = df[numeric_cols].apply(pd.to_numeric, errors="coerce")
df[numeric_cols] = df[numeric_cols].fillna(0)

# -----------------------------------------------
# 4. FINAL DATASET PREP
# -----------------------------------------------

feature_cols = [
    "State",
    "Location",
    "WeaponsUsed",
    "Casualties",
    "Kidnapped",
    "PastIncidentsInArea",
    "DayOfYear",
    "Month",
    "Week",
    "Hour"
]

X = df[feature_cols]

# Task
Encode the 'RiskLevel' column using `LabelEncoder` to convert its string values ('High', 'Medium', 'Low') into numerical representations, and then assign the encoded column to the target variable `y`.

## Encode RiskLevel column

### Subtask:
Apply `LabelEncoder` to the 'RiskLevel' column to convert its categorical string values (e.g., 'High', 'Medium', 'Low') into numerical representations. This will be done before assigning it to the target variable `y`.


**Reasoning**:
The subtask requires encoding the 'RiskLevel' column from categorical strings to numerical representations using `LabelEncoder` before assigning it to the target variable `y`.



In [10]:
from sklearn.preprocessing import LabelEncoder

# Create an instance of LabelEncoder
risk_level_encoder = LabelEncoder()

# Fit and transform the 'RiskLevel' column
df['RiskLevel_encoded'] = risk_level_encoder.fit_transform(df['RiskLevel'])

**Reasoning**:
The subtask specifies that after encoding the 'RiskLevel' column, it should be assigned to the target variable 'y'. This step completes that requirement.



In [11]:
y = df["RiskLevel_encoded"]

## Build and Evaluate Model

### Subtask:
Proceed with building and evaluating the machine learning model now that the target variable `y` has been correctly encoded and assigned.


**Reasoning**:
The subtask requires splitting the data, initializing and fitting a RandomForestClassifier model, and then evaluating its accuracy, all of which can be done in a single Python code block using the existing variables and imported libraries.



In [12]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# 1. TRAIN/TEST SPLIT
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# 2. BUILD MODEL
model = RandomForestClassifier(
    n_estimators=300,
    max_depth=20,
    class_weight="balanced",
    random_state=42
)

# 3. FIT MODEL
model.fit(X_train, y_train)

# 4. PRINT TRAINING COMPLETE MESSAGE
print("Training complete.")

# 5. CALCULATE AND PRINT ACCURACY
print("Accuracy:", model.score(X_test, y_test))

Training complete.
Accuracy: 0.9666666666666667


## Summary:

### Data Analysis Key Findings
*   The `RiskLevel` column, containing categorical string values such as 'High', 'Medium', and 'Low', was successfully encoded into numerical representations using `LabelEncoder`. These encoded values were stored in a new column, `RiskLevel_encoded`, and subsequently assigned to the target variable `y`.
*   A `RandomForestClassifier` model was built and trained using a train/test split (80% training, 20% testing) with `n_estimators=300`, `max_depth=20`, and `class_weight="balanced"`.
*   The trained model achieved an accuracy of approximately 96.67% on the test set.

### Insights or Next Steps
*   The high accuracy of 96.67% on the test set suggests that the `RandomForestClassifier` model is highly effective at predicting risk levels based on the provided features.
*   To further validate the model's robustness and understand its performance across different risk levels, it would be beneficial to evaluate additional metrics such as precision, recall, F1-score, and a confusion matrix, especially given the use of `class_weight="balanced"`.


In [17]:
import onnx
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

# -----------------------------------------------
# 7. EXPORT AS ONNX
# -----------------------------------------------

initial_type = [("input", FloatTensorType([None, X.shape[1]]))]
onnx_model = convert_sklearn(model, initial_types=initial_type)

with open("sentinel_model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

print("ONNX model exported as sentinel_model.onnx")

ONNX model exported as sentinel_model.onnx


In [16]:
!pip install onnx skl2onnx

Collecting onnx
  Downloading onnx-1.20.0-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (8.4 kB)
Collecting skl2onnx
  Downloading skl2onnx-1.19.1-py3-none-any.whl.metadata (3.8 kB)
Downloading onnx-1.20.0-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (18.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.1/18.1 MB[0m [31m101.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading skl2onnx-1.19.1-py3-none-any.whl (315 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m315.5/315.5 kB[0m [31m23.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: onnx, skl2onnx
Successfully installed onnx-1.20.0 skl2onnx-1.19.1


In [18]:
!ls -lh


total 1.2M
-rw------- 1 root root  23K Dec 12 20:58 nigeria_insecurity_dataset.csv
-rw-r--r-- 1 root root   54 Dec 12 20:57 README.md
-rw-r--r-- 1 root root 1.2M Dec 12 21:30 sentinel_model.onnx


In [19]:
import shutil
shutil.move("/content/sentinelNote.ipynb", "/content/Sentinel-model/your_notebook.ipynb")


FileNotFoundError: [Errno 2] No such file or directory: '/content/sentinelNote.ipynb'

In [21]:
!git config --global user.name "the-Soke"
!git config --global user.email "kesibosoke@gmail.com"


In [22]:
!git add .


In [23]:
!git commit -m "Add notebook, dataset, and scripts"

[main 3105f57] Add notebook, dataset, and scripts
 2 files changed, 301 insertions(+)
 create mode 100644 nigeria_insecurity_dataset.csv
 create mode 100644 sentinel_model.onnx


In [25]:
#push here

Enumerating objects: 5, done.
Counting objects:  20% (1/5)Counting objects:  40% (2/5)Counting objects:  60% (3/5)Counting objects:  80% (4/5)Counting objects: 100% (5/5)Counting objects: 100% (5/5), done.
Delta compression using up to 2 threads
Compressing objects:  25% (1/4)Compressing objects:  50% (2/4)Compressing objects:  75% (3/4)Compressing objects: 100% (4/4)Compressing objects: 100% (4/4), done.
Writing objects:  25% (1/4)Writing objects:  50% (2/4)Writing objects:  75% (3/4)Writing objects: 100% (4/4)Writing objects: 100% (4/4), 85.18 KiB | 2.13 MiB/s, done.
Total 4 (delta 0), reused 0 (delta 0), pack-reused 0
To https://github.com/the-Soke/Sentinel-model.git
   4b783af..3105f57  main -> main


In [20]:
!ls -F /content/

drive/	sample_data/  Sentinel-model/
