# Laboratory exercise 4

## Warm-Up Mode (2 points)

**Task Description**  
Using the given dataset, develop and implement **3** different neural networks to predict the **air quality level**. Each network should differ in the following ways:  

- **layer configurations** - use different numbers and types of layers;
- **activation functions** - try different activation functions;
- **neurons per layer** - experiment with different numbers of neurons in each layer; and
- **number of layers** - build networks with varying depths.

After developing the models, evaluate and compare the performance of all **3** approaches.

**About the Dataset**  
This dataset focuses on air quality assessment across various regions. The dataset contains 5,000 samples and captures critical environmental and demographic factors that influence pollution levels.

**Features**:  
- **Temperature (°C)**: Average temperature of the region.  
- **Humidity (%)**: Relative humidity recorded in the region.  
- **PM2.5 Concentration (µg/m³)**: Levels of fine particulate matter.  
- **PM10 Concentration (µg/m³)**: Levels of coarse particulate matter.  
- **NO2 Concentration (ppb)**: Nitrogen dioxide levels.  
- **SO2 Concentration (ppb)**: Sulfur dioxide levels.  
- **CO Concentration (ppm)**: Carbon monoxide levels.  
- **Proximity to Industrial Areas (km)**: Distance to the nearest industrial zone.  
- **Population Density (people/km²)**: Number of people per square kilometer in the region.  

**Target Variable**: **Air Quality**  
- **Good**: Clean air with low pollution levels.  
- **Moderate**: Acceptable air quality but with some pollutants present.  
- **Poor**: Noticeable pollution that may cause health issues for sensitive groups.  
- **Hazardous**: Highly polluted air posing serious health risks to the population.  

In [1]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import classification_report, confusion_matrix, r2_score
from keras.models import Sequential
from keras.layers import Dense, Input, Dropout, LeakyReLU
from xgboost import XGBClassifier
from tensorflow.keras.optimizers import Adam

In [2]:
!pip install keras

!pip install patchify    
!pip install segmentation_models
!pip install tensorflow == 2.15.0



ERROR: Invalid requirement: '==': Expected package name at the start of dependency specifier
    ==
    ^


In [3]:
data = pd.read_csv('pollution_dataset.csv')
data

Unnamed: 0,Temperature,Humidity,PM2.5,PM10,NO2,SO2,CO,Proximity_to_Industrial_Areas,Population_Density,Air Quality
0,29.8,59.1,5.2,17.9,18.9,9.2,1.72,6.3,319,Moderate
1,28.3,75.6,2.3,12.2,30.8,9.7,1.64,6.0,611,Moderate
2,23.1,74.7,26.7,33.8,24.4,12.6,1.63,5.2,619,Moderate
3,27.1,39.1,6.1,6.3,13.5,5.3,1.15,11.1,551,Good
4,26.5,70.7,6.9,16.0,21.9,5.6,1.01,12.7,303,Good
...,...,...,...,...,...,...,...,...,...,...
4995,40.6,74.1,116.0,126.7,45.5,25.7,2.11,2.8,765,Hazardous
4996,28.1,96.9,6.9,25.0,25.3,10.8,1.54,5.7,709,Moderate
4997,25.9,78.2,14.2,22.1,34.8,7.8,1.63,9.6,379,Moderate
4998,25.3,44.4,21.4,29.0,23.7,5.7,0.89,11.6,241,Good


In [4]:
le = LabelEncoder()
data['Air Quality'] = le.fit_transform(data['Air Quality'])

In [5]:
data

Unnamed: 0,Temperature,Humidity,PM2.5,PM10,NO2,SO2,CO,Proximity_to_Industrial_Areas,Population_Density,Air Quality
0,29.8,59.1,5.2,17.9,18.9,9.2,1.72,6.3,319,2
1,28.3,75.6,2.3,12.2,30.8,9.7,1.64,6.0,611,2
2,23.1,74.7,26.7,33.8,24.4,12.6,1.63,5.2,619,2
3,27.1,39.1,6.1,6.3,13.5,5.3,1.15,11.1,551,0
4,26.5,70.7,6.9,16.0,21.9,5.6,1.01,12.7,303,0
...,...,...,...,...,...,...,...,...,...,...
4995,40.6,74.1,116.0,126.7,45.5,25.7,2.11,2.8,765,1
4996,28.1,96.9,6.9,25.0,25.3,10.8,1.54,5.7,709,2
4997,25.9,78.2,14.2,22.1,34.8,7.8,1.63,9.6,379,2
4998,25.3,44.4,21.4,29.0,23.7,5.7,0.89,11.6,241,0


In [6]:
target = 'Air Quality'
features = data.drop(target, axis=1)
target = data[target]

In [7]:
scaler = MinMaxScaler()
scaled_features = scaler.fit_transform(features)

In [8]:
scaled_features

array([[0.36283186, 0.25081433, 0.01762712, ..., 0.3485342 , 0.16309013,
        0.17035111],
       [0.32964602, 0.42996743, 0.00779661, ..., 0.32247557, 0.15021459,
        0.55006502],
       [0.21460177, 0.42019544, 0.09050847, ..., 0.31921824, 0.11587983,
        0.56046814],
       ...,
       [0.27654867, 0.45819761, 0.04813559, ..., 0.31921824, 0.30472103,
        0.24837451],
       [0.26327434, 0.09120521, 0.07254237, ..., 0.0781759 , 0.39055794,
        0.06892068],
       [0.23672566, 0.45494028, 0.27694915, ..., 0.23778502, 0.24892704,
        0.3550065 ]])

In [9]:
X_train, X_test, Y_train, Y_test = train_test_split(scaled_features, target, test_size=0.2, random_state=42)

In [10]:
input_shape = scaled_features.shape[1]

In [11]:
num_classes = len(le.classes_)
num_classes

4

In [12]:
model_1 = Sequential([
    Input(shape=(input_shape,)),
    Dense(64, activation='relu', kernel_initializer='uniform'),
    Dense(16, activation='relu', kernel_initializer='uniform'),
    Dense(num_classes, activation='softmax', kernel_initializer='uniform')
])
model_1.compile(optimizer=Adam(learning_rate=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [13]:
model_1.summary()

In [14]:
model_2 = Sequential([
    Input(shape=(input_shape,)),
    Dense(64, activation='tanh'),
    Dense(32, activation='tanh'),
    Dropout(0.3),
    Dense(16, activation='tanh'),
    Dense(num_classes, activation='softmax')
])
model_2.compile(optimizer=Adam(learning_rate=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [15]:
model_2.summary()

In [16]:
model_3 = Sequential([
    Input(shape=(input_shape,)),
    Dense(128),
    LeakyReLU(alpha=0.1),
    Dense(64),
    LeakyReLU(alpha=0.1),
    Dense(32),
    LeakyReLU(alpha=0.1),
    Dense(num_classes, activation='softmax')
])
model_3.compile(optimizer=Adam(learning_rate=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])



In [17]:
model_3.summary()

In [19]:
history_1 = model_1.fit(X_train, Y_train, validation_split=0.1, epochs=64, batch_size=32)

Epoch 1/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.5772 - loss: 1.3525 - val_accuracy: 0.6625 - val_loss: 1.0076
Epoch 2/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.7813 - loss: 0.8070 - val_accuracy: 0.8225 - val_loss: 0.4924
Epoch 3/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 995us/step - accuracy: 0.8662 - loss: 0.4243 - val_accuracy: 0.9000 - val_loss: 0.3383
Epoch 4/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9160 - loss: 0.3068 - val_accuracy: 0.9050 - val_loss: 0.2749
Epoch 5/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9283 - loss: 0.2597 - val_accuracy: 0.9300 - val_loss: 0.2491
Epoch 6/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 999us/step - accuracy: 0.9233 - loss: 0.2299 - val_accuracy: 0.9150 - val_loss: 0.2409
Epoch 7/64
[1m113/113[

In [20]:
history_2 = model_2.fit(X_train, Y_train, validation_split=0.1, epochs=64, batch_size=32)

Epoch 1/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.5562 - loss: 1.0792 - val_accuracy: 0.8225 - val_loss: 0.5048
Epoch 2/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8412 - loss: 0.4512 - val_accuracy: 0.8750 - val_loss: 0.3397
Epoch 3/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8571 - loss: 0.3702 - val_accuracy: 0.9125 - val_loss: 0.2789
Epoch 4/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8986 - loss: 0.2968 - val_accuracy: 0.9250 - val_loss: 0.2445
Epoch 5/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9147 - loss: 0.2660 - val_accuracy: 0.9225 - val_loss: 0.2233
Epoch 6/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9095 - loss: 0.2426 - val_accuracy: 0.9350 - val_loss: 0.2044
Epoch 7/64
[1m113/113[0m 

In [21]:
history_3 = model_3.fit(X_train, Y_train, validation_split=0.1, epochs=64, batch_size=32)

Epoch 1/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.6694 - loss: 1.0531 - val_accuracy: 0.8425 - val_loss: 0.4058
Epoch 2/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8742 - loss: 0.3334 - val_accuracy: 0.9000 - val_loss: 0.2750
Epoch 3/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9223 - loss: 0.2447 - val_accuracy: 0.9075 - val_loss: 0.2321
Epoch 4/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9279 - loss: 0.2120 - val_accuracy: 0.9250 - val_loss: 0.2082
Epoch 5/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9222 - loss: 0.1988 - val_accuracy: 0.9175 - val_loss: 0.2278
Epoch 6/64
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9271 - loss: 0.1850 - val_accuracy: 0.9375 - val_loss: 0.1742
Epoch 7/64
[1m113/113[0m 

In [25]:
y_pred_1 = model_1.predict(X_test).argmax(axis=1)
y_pred_2 = model_2.predict(X_test).argmax(axis=1)
y_pred_3 = model_3.predict(X_test).argmax(axis=1)

[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 796us/step
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 791us/step
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 763us/step


In [26]:
print(classification_report(Y_test, y_pred_1))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00       409
           1       0.95      0.77      0.85       111
           2       0.94      0.96      0.95       294
           3       0.83      0.88      0.85       186

    accuracy                           0.94      1000
   macro avg       0.93      0.90      0.91      1000
weighted avg       0.94      0.94      0.94      1000



In [27]:
print(classification_report(Y_test, y_pred_2))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00       409
           1       0.88      0.82      0.85       111
           2       0.96      0.95      0.96       294
           3       0.83      0.87      0.85       186

    accuracy                           0.94      1000
   macro avg       0.92      0.91      0.91      1000
weighted avg       0.94      0.94      0.94      1000



In [28]:
print(classification_report(Y_test, y_pred_3))

              precision    recall  f1-score   support

           0       0.99      1.00      0.99       409
           1       0.91      0.87      0.89       111
           2       0.95      0.96      0.95       294
           3       0.88      0.87      0.88       186

    accuracy                           0.95      1000
   macro avg       0.93      0.92      0.93      1000
weighted avg       0.95      0.95      0.95      1000



- Model 1: Plitka mrezha so ReLU aktivacija
- Model 2: Podlaboka mrezha so Tanh aktivacija
- Model 3: Poshiroka mrezha so LeakyReLU aktivacija

Najdobri performansi dava Model 3 poradi accuracyh od 95%, kako i najbalansirano hendlanje so site 4 kategorii na kvalitet na vozduh