### Air Quality Prediction

Goal of this project is to predict the type of activity with indoor gas concentration data

1. This dataset contains indoor gas concentration level collected by 6 low cost sensors
2. First 6 columns represent data from MQ sensors
2. Last column indicates the action that generated the values ​​acquired by the sensors

1 - Normal situation - Activity: clean air, a person sleeping or studying or resting - Samples: 595;

2 - Preparing meals - Activities: cooking meat or pasta, fried vegetables. One or two people in the room, forced air circulation - Samples: 515.

3 - Presence of smoke - Activity: burning paper and wood for a short period of time in a room with closed windows and doors - Example: 195.

4 - Cleaning - Activity: use of spray and liquid detergents with ammonia and / or alcohol. Forced air circulation can be activated or deactivated - Samples: 540.

## Import Dependencies

In [4]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

import shap

In [5]:
data = pd.read_csv('dataset.csv', names=['MQ1', 'MQ2', 'MQ3', 'MQ4', 'MQ5', 'MQ6', 'CO2'])

In [6]:
data.head()

Unnamed: 0,MQ1,MQ2,MQ3,MQ4,MQ5,MQ6,CO2
0,670,696,1252,1720,1321,2431,4
1,641,674,1156,1652,1410,2433,4
2,642,646,1159,1643,1455,2361,4
3,640,590,1105,1608,1459,2427,4
4,616,627,1192,1637,1466,2447,4


In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1845 entries, 0 to 1844
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   MQ1     1845 non-null   int64
 1   MQ2     1845 non-null   int64
 2   MQ3     1845 non-null   int64
 3   MQ4     1845 non-null   int64
 4   MQ5     1845 non-null   int64
 5   MQ6     1845 non-null   int64
 6   CO2     1845 non-null   int64
dtypes: int64(7)
memory usage: 101.0 KB


## Preprocessing

In [8]:
def preprocess_inputs(df):
    df = df.copy()
    
    y = df['CO2']
    X = df.drop('CO2', axis=1)
    
    
    # train test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1)
    
    return X_train, X_test, y_train, y_test

In [9]:
X_train, X_test, y_train, y_test = preprocess_inputs(data)

In [10]:
len(X_train)

1291

## Training

In [11]:
model = RandomForestClassifier(random_state = 1)

In [12]:
model.fit(X_train, y_train)

In [13]:
acc = model.score(X_test, y_test)
print("Accuracy: {:.2f}%".format(acc * 100))

Accuracy: 94.40%
