# 🌿 NDVI Land Cover Classification - Summer Analytics 2025
Using Logistic Regression to classify land cover types based on NDVI time-series satellite data. This notebook handles missing data, scales features, and generates a submission file for Kaggle.



## Import required Python libraries
We need pandas for data handling, NumPy for arrays, and scikit-learn tools for preprocessing and modeling.


In [19]:
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression


## Load the training and test datasets
We load the datasets: `hacktrain.csv` contains the NDVI time-series data and target class; `hacktest.csv` contains the test samples to predict.


In [26]:

train = pd.read_csv("hacktrain.csv")
test = pd.read_csv("hacktest.csv")


print(train.head())


   Unnamed: 0  ID  class  20150720_N  20150602_N  20150517_N  20150501_N  \
0           0   1  water    637.5950     658.668   -1882.030    -1924.36   
1           1   2  water    634.2400     593.705   -1625.790    -1672.32   
2           3   4  water     58.0174   -1599.160         NaN    -1052.63   
3           4   5  water     72.5180         NaN     380.436    -1256.93   
4           7   8  water   1136.4400         NaN         NaN     1647.83   

   20150415_N  20150330_N  20150314_N  ...  20140610_N  20140525_N  \
0     997.904   -1739.990     630.087  ...         NaN   -1043.160   
1     914.198    -692.386     707.626  ...         NaN    -933.934   
2         NaN   -1564.630         NaN  ...    -1025.88     368.622   
3     515.805   -1413.180    -802.942  ...    -1813.95     155.624   
4    1935.800         NaN    2158.980  ...     1535.00    1959.430   

   20140509_N  20140423_N  20140407_N  20140322_N  20140218_N  20140202_N  \
0   -1942.490     267.138         NaN        

## Prepare features and target labels
We separate out:
- `X`: features (drop ID and class)
- `y`: target labels (class column)


In [21]:

X = train.drop(columns=["ID", "class"])
y = train["class"]



## Handle missing NDVI values
Missing values (caused by cloud cover) are filled using the **mean of each column**.


In [None]:
imputer = SimpleImputer(strategy='mean')
X = imputer.fit_transform(X)
X_test = imputer.transform(test.drop(columns=["ID"]))


## Scale the features
NDVI values vary, so we scale them using `StandardScaler` to make the model training more stable.


In [22]:

scaler = StandardScaler()
X = scaler.fit_transform(X)
X_test = scaler.transform(X_test)


## Train Logistic Regression model
We use `multi_class='multinomial'` to handle multiclass classification.


In [23]:

model = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=1000)
model.fit(X, y)




## Make predictions and save submission file
We predict land cover classes for the test data and save the results in the required submission format.


In [27]:

predictions = model.predict(X_test)
submission = pd.DataFrame({'ID': test['ID'], 'class': predictions})


submission.to_csv("vaibhavisubmission.csv", index=False)
print("Submission file saved as 'vaibhavisubmission.csv'")


Submission file saved as 'vaibhavisubmission.csv'


In [28]:

from google.colab import files
files.download("vaibhavisubmission.csv")



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>