# Laboratory exercise 4

## Warm-Up Mode (2 points)

**Task Description**  
Using the given dataset, develop and implement **3** different neural networks to predict the **air quality level**. Each network should differ in the following ways:  

- **layer configurations** - use different numbers and types of layers;
- **activation functions** - try different activation functions;
- **neurons per layer** - experiment with different numbers of neurons in each layer; and
- **number of layers** - build networks with varying depths.

After developing the models, evaluate and compare the performance of all **3** approaches.

**About the Dataset**  
This dataset focuses on air quality assessment across various regions. The dataset contains 5,000 samples and captures critical environmental and demographic factors that influence pollution levels.

**Features**:  
- **Temperature (°C)**: Average temperature of the region.  
- **Humidity (%)**: Relative humidity recorded in the region.  
- **PM2.5 Concentration (µg/m³)**: Levels of fine particulate matter.  
- **PM10 Concentration (µg/m³)**: Levels of coarse particulate matter.  
- **NO2 Concentration (ppb)**: Nitrogen dioxide levels.  
- **SO2 Concentration (ppb)**: Sulfur dioxide levels.  
- **CO Concentration (ppm)**: Carbon monoxide levels.  
- **Proximity to Industrial Areas (km)**: Distance to the nearest industrial zone.  
- **Population Density (people/km²)**: Number of people per square kilometer in the region.  

**Target Variable**: **Air Quality**  
- **Good**: Clean air with low pollution levels.  
- **Moderate**: Acceptable air quality but with some pollutants present.  
- **Poor**: Noticeable pollution that may cause health issues for sensitive groups.  
- **Hazardous**: Highly polluted air posing serious health risks to the population.  

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# For machine learning preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler

# For building neural networks
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

print("Libraries imported successfully.")


Libraries imported successfully.


In [2]:
data = pd.read_csv('pollution_dataset.csv')
print("Dataset loaded. Here are the first 5 rows:")
data.head()


Dataset loaded. Here are the first 5 rows:


Unnamed: 0,Temperature,Humidity,PM2.5,PM10,NO2,SO2,CO,Proximity_to_Industrial_Areas,Population_Density,Air Quality
0,29.8,59.1,5.2,17.9,18.9,9.2,1.72,6.3,319,Moderate
1,28.3,75.6,2.3,12.2,30.8,9.7,1.64,6.0,611,Moderate
2,23.1,74.7,26.7,33.8,24.4,12.6,1.63,5.2,619,Moderate
3,27.1,39.1,6.1,6.3,13.5,5.3,1.15,11.1,551,Good
4,26.5,70.7,6.9,16.0,21.9,5.6,1.01,12.7,303,Good


In [3]:
print("Dataset Info:")
print(data.info())

print("\nData Description:")
display(data.describe())

print("\nChecking class distribution in 'Air Quality':")
print(data['Air Quality'].value_counts())


Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 10 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Temperature                    5000 non-null   float64
 1   Humidity                       5000 non-null   float64
 2   PM2.5                          5000 non-null   float64
 3   PM10                           5000 non-null   float64
 4   NO2                            5000 non-null   float64
 5   SO2                            5000 non-null   float64
 6   CO                             5000 non-null   float64
 7   Proximity_to_Industrial_Areas  5000 non-null   float64
 8   Population_Density             5000 non-null   int64  
 9   Air Quality                    5000 non-null   object 
dtypes: float64(8), int64(1), object(1)
memory usage: 390.8+ KB
None

Data Description:


Unnamed: 0,Temperature,Humidity,PM2.5,PM10,NO2,SO2,CO,Proximity_to_Industrial_Areas,Population_Density
count,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0
mean,30.02902,70.05612,20.14214,30.21836,26.4121,10.01482,1.500354,8.4254,497.4238
std,6.720661,15.863577,24.554546,27.349199,8.895356,6.750303,0.546027,3.610944,152.754084
min,13.4,36.0,0.0,-0.2,7.4,-6.2,0.65,2.5,188.0
25%,25.1,58.3,4.6,12.3,20.1,5.1,1.03,5.4,381.0
50%,29.0,69.8,12.0,21.7,25.3,8.0,1.41,7.9,494.0
75%,34.0,80.3,26.1,38.1,31.9,13.725,1.84,11.1,600.0
max,58.6,128.1,295.0,315.8,64.9,44.9,3.72,25.8,957.0



Checking class distribution in 'Air Quality':
Air Quality
Good         2000
Moderate     1500
Poor         1000
Hazardous     500
Name: count, dtype: int64


In [4]:
# 1) Handle Missing Values
data.dropna(inplace=True)

# 2) Encode the Target Variable
label_encoder = LabelEncoder()
data['Air Quality Encoded'] = label_encoder.fit_transform(data['Air Quality'])

# 3) Split into Features (X) and Target (y)
X = data.drop(columns=['Air Quality', 'Air Quality Encoded'])  # drop original and encoded target
y = data['Air Quality Encoded']

# 4) Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2, 
                                                    random_state=42)

# 5) Scale the Features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("Data preprocessing completed.")
print("X_train_scaled shape:", X_train_scaled.shape)
print("y_train shape:", y_train.shape)


Data preprocessing completed.
X_train_scaled shape: (4000, 9)
y_train shape: (4000,)


In [5]:
model1 = keras.Sequential([
    layers.Dense(16, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    layers.Dense(8, activation='relu'),
    layers.Dense(4, activation='softmax')
])

model1.compile(optimizer='adam',
               loss='sparse_categorical_crossentropy',
               metrics=['accuracy'])

history1 = model1.fit(X_train_scaled, y_train, 
                      epochs=20, 
                      batch_size=32,
                      validation_split=0.2, 
                      verbose=1)

print("Model 1 training completed.")


Epoch 1/20


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.4428 - loss: 1.1791 - val_accuracy: 0.6875 - val_loss: 0.8157
Epoch 2/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.7136 - loss: 0.7532 - val_accuracy: 0.8250 - val_loss: 0.5489
Epoch 3/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8406 - loss: 0.4982 - val_accuracy: 0.8775 - val_loss: 0.4076
Epoch 4/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8725 - loss: 0.3775 - val_accuracy: 0.8875 - val_loss: 0.3357
Epoch 5/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8829 - loss: 0.3287 - val_accuracy: 0.8925 - val_loss: 0.2905
Epoch 6/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9029 - loss: 0.2857 - val_accuracy: 0.9062 - val_loss: 0.2583
Epoch 7/20
[1m100/100[0m [32m━━━━━━━

In [6]:
loss1, accuracy1 = model1.evaluate(X_test_scaled, y_test, verbose=0)
print("Model 1 - Test Loss:", loss1)
print("Model 1 - Test Accuracy:", accuracy1)


Model 1 - Test Loss: 0.16157551109790802
Model 1 - Test Accuracy: 0.9430000185966492


In [7]:
model2 = keras.Sequential([
    layers.Dense(32, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    layers.Dense(16, activation='relu'),
    layers.Dense(8, activation='tanh'),
    layers.Dense(4, activation='softmax')
])

model2.compile(optimizer='adam',
               loss='sparse_categorical_crossentropy',
               metrics=['accuracy'])

history2 = model2.fit(X_train_scaled, y_train, 
                      epochs=20, 
                      batch_size=32,
                      validation_split=0.2, 
                      verbose=1)

print("Model 2 training completed.")


Epoch 1/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.4578 - loss: 1.2283 - val_accuracy: 0.7237 - val_loss: 0.7091
Epoch 2/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.7861 - loss: 0.6308 - val_accuracy: 0.8500 - val_loss: 0.4572
Epoch 3/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8634 - loss: 0.4223 - val_accuracy: 0.9075 - val_loss: 0.3543
Epoch 4/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9035 - loss: 0.3341 - val_accuracy: 0.9137 - val_loss: 0.2981
Epoch 5/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9172 - loss: 0.2825 - val_accuracy: 0.9225 - val_loss: 0.2648
Epoch 6/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9263 - loss: 0.2559 - val_accuracy: 0.9200 - val_loss: 0.2460
Epoch 7/20
[1m100/100[0m 

In [8]:
loss2, accuracy2 = model2.evaluate(X_test_scaled, y_test, verbose=0)
print("Model 2 - Test Loss:", loss2)
print("Model 2 - Test Accuracy:", accuracy2)


Model 2 - Test Loss: 0.14880283176898956
Model 2 - Test Accuracy: 0.9490000009536743


In [9]:
model3 = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    layers.Dense(32, activation='relu'),
    layers.Dense(16, activation='elu'),
    layers.Dense(8, activation='elu'),
    layers.Dense(4, activation='softmax')
])

model3.compile(optimizer='adam',
               loss='sparse_categorical_crossentropy',
               metrics=['accuracy'])

history3 = model3.fit(X_train_scaled, y_train, 
                      epochs=20, 
                      batch_size=32,
                      validation_split=0.2, 
                      verbose=1)

print("Model 3 training completed.")


Epoch 1/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.5945 - loss: 0.9856 - val_accuracy: 0.8925 - val_loss: 0.3586
Epoch 2/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8916 - loss: 0.3121 - val_accuracy: 0.9137 - val_loss: 0.2230
Epoch 3/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9270 - loss: 0.1921 - val_accuracy: 0.9100 - val_loss: 0.2127
Epoch 4/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9347 - loss: 0.1699 - val_accuracy: 0.9250 - val_loss: 0.1827
Epoch 5/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9315 - loss: 0.1701 - val_accuracy: 0.9262 - val_loss: 0.1871
Epoch 6/20
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9466 - loss: 0.1534 - val_accuracy: 0.9325 - val_loss: 0.1688
Epoch 7/20
[1m100/100[0m 

In [10]:
loss3, accuracy3 = model3.evaluate(X_test_scaled, y_test, verbose=0)
print("Model 3 - Test Loss:", loss3)
print("Model 3 - Test Accuracy:", accuracy3)


Model 3 - Test Loss: 0.1589588224887848
Model 3 - Test Accuracy: 0.9390000104904175


In [11]:
models_accuracies = {
    "Model 1": accuracy1,
    "Model 2": accuracy2,
    "Model 3": accuracy3
}

print("Comparison of Test Accuracies:")
for m, acc in models_accuracies.items():
    print(f"{m}: {acc:.4f}")


Comparison of Test Accuracies:
Model 1: 0.9430
Model 2: 0.9490
Model 3: 0.9390
