# Report for 2D Project Physical World and Digital World

Cohort: 07

Team No.: 07

Members:
* Ang Jing Yuen Andre (1003308)
* Tim Yap Ming En (1003495)
* Cao Bingquan (1003881)
* Dionetta Young (1003735)
* Lan Xiaojin (1003773)


# Introduction

The objective is to overcome the waiting time that is required for a thermometer to reach thermal equilibrium with a water bath before obtaining the temperature. Our solution is to write a program that reads data from a DS18B20 temperature sensor and use machine learning and statistical analysis to predict the actual temperature of a water bath accurately within the range of 10℃ to 60℃ in the shortest time possible. 

Based on experimental data from the Physical World analysis, we found out that the inital gradient of the temperature of the sensor against time graph is linearly related to the temperature of the water bath. The initial gradient of the first 20 seconds can be determined by the temperature sensor and obtained using python code. As such, a linear regression model can be used to predict the temperature of the water bath given the initial gradient of the temperature against time graph. 

# Description of Data from Experiment

## Data Collection

A water bath is set up in an insulated container of volume 1L, for a given temperature of water bath. A laboratory thermometer is then used to record the temperature of the water bath. The temperature sensor is placed in the water bath to keep track of the temperature reading of the temperature sensor. 

We created a class called Stopwatch to record the time elapsed. 

The function read_temp() records the temperature in degrees Celsius of the water bath. From 0 to 20 seconds, at a time interval of 0.1 seconds, the function collect_data records both the temperature reading, through the read_temp() function, and elapsed time as a tuple. The function appends this data to a list that it returns at time = 20 seconds. 

A list of tuples containing the float of the elapsed_time and the sensor temperature is then written into a text document.

The following is an example of the data. 
e.g. [(0.901, 25.812), (1.941, 29.562), (2.901, 34.937), (3.861, 38.125), (4.901, 40.312), (5.861, 41.875), (6.821, 43.0), (7.781, 44.0), (8.741, 45.0), (9.701, 45.937), (10.661, 46.812), (11.621, 47.625), (12.581, 48.312), (13.541, 48.875), (14.521, 49.375), (15.541, 49.875), (16.501, 50.375), (17.461, 50.812), (18.421, 51.187), (19.381, 51.5), (20.341, 51.75), '55.5']

A total of 1332 datapoints of the elapsed_time and the temperature readings of sensors were collected at 37 different temperature of the water bath.

In [None]:
#The following code was run on the RPi to collect data at different temperatures of the water bath

#Modules imported for tracking time and to initialise the temperature sensor
import os
import glob
import time

#code to connect the GPIO to the temperature sensor on the RPi
os.system('modprobe w1-gpio')
os.system('modprobe w1-therm')

base_dir = '/sys/bus/w1/devices/'
device_folder = glob.glob(base_dir + '28*')[0]
device_file = device_folder + '/w1_slave' #w1_slave is used to callback temperatures


#StopWatch class that creates a stopwatch to track time
class StopWatch:
    #initialise start_time and end_time
    def __init__(self, start_time = time.time(), end_time = -1):
        self.start_time = start_time
        self.end_time = end_time
    
    #defines a method to start the time during the start of the reading
    def start(self):
        self.start_time = time.time()
    
    #defines a method to get the elapsed time of the sensor taking reading
    def elapsed_time(self):
        x = time.time()
        y = self.start_time
        return round((x - y),3) #return time elapsed in seconds, to 3 dp

#create stopwatch object with class StopWatch
sw = StopWatch()

#read_temp_raw is used to read the terminal of the RPi
def read_temp_raw():
    f = open(device_file, 'r')
    lines = f.readlines()
    f.close()
    return lines

#read_temp takes the raw data of read_temp_raw and returns a normalised temperature reading in Celcius
def read_temp():
   lines = read_temp_raw()
   while lines[0].strip()[-3:] != 'YES':
       time.sleep(0.2)
       lines = read_temp_raw()
   equals_pos = lines[1].find('t=')
   if equals_pos != -1:
        temp_string = lines[1][equals_pos +2:]
        temp_c = float(temp_string)/ 1000.0
        
        return temp_c
    
#write_file is used to write the data obtained to a file on the RPi to be extracted when done
def write_file(temp):
    f = open('temp data.txt','a')
    f.write(str(temp)+',')
    f.close()

#Function to collect [(elapsed_time, temperature reading)] for a given temperature of the water bath and appends to the .txt file
def collect_data():
    output = []
    sw.start_time #starts the time on the stopwatch
    
    #creates a loop to collect data every 0.1 second to 20 seconds
    while True:
        temp_c = read_temp()
        dataset = (sw.elapsed_time(),temp_c)
        time.sleep(0.1)
        output.append(dataset)
        if sw.elapsed_time()>20:
            Tw = input('what is Thermometer reading?') #gets the user to input the thermometer reading for the temperature of the water bath
            output.append(Tw)
            write_file(output)
            return output  

collect_data()


## Data Preparation

The temperature readings previously recorded are obtained from the text document. For each dataset, the average initial gradient and the temperature of the water can be found using the function extract_data() which returns a 2 lists, the first is a 2D array of the average gradients the second is a 2D array of the temperature of the water bath.
The temperature of the water bath is plotted against the average initial gradient.

After obtaining the 2 parameters, the function linear_regression(avg_grad_1, Tw_data_1, size, seed) is used to generate the linear model.

First the dataset is split into the train set and the test set with thh train_test_split function of sklearn. The dataset is split with (size) of the dataset being the test set and (1-size) of the dataset being the train set at a random_state of (seed). 

The following were the parameters that we used to obtain our dataset. 
e.g. linear_regression(avg_grad_1, Tw_data_1, 0.4, 2752)

After obtaining the train datasets, we fit a linear regression model to the data and return the coefficient and intercept for the prediction model of the temperature of the water.

The plot of average initial gradient against the temperature for the test datasets have a resulting R-squared value of 0.955.


The following is an example of the data. 
e.g. [(0.901, 25.812), (1.941, 29.562), (2.901, 34.937), (3.861, 38.125), (4.901, 40.312), (5.861, 41.875), (6.821, 43.0), (7.781, 44.0), (8.741, 45.0), (9.701, 45.937), (10.661, 46.812), (11.621, 47.625), (12.581, 48.312), (13.541, 48.875), (14.521, 49.375), (15.541, 49.875), (16.501, 50.375), (17.461, 50.812), (18.421, 51.187), (19.381, 51.5), (20.341, 51.75), '55.5']

The function, avg_grad(), then takes the list of data points and calculate the gradient between the first and subsequent points in the list. The gradients are then appended into a new list.

The following is an example of the gradient list. 
e.g. [4.562499999999999, 4.159797297297297, 3.624999999999999, 3.238508064516129, 2.9033783783783784, 2.6436046511627906, 2.4474489795918366, 2.2869318181818175, 2.151639344262295, 2.0347947761194027, 1.9263698630136983, 1.8246044303797466, 1.7300293685756238, 1.6436475409836064, 1.574551282051282, 1.509661835748792, 1.4483447488584473, 1.39004329004329, 1.334259259259259]
The average gradient is then determined. 
e.g. 2.33869025939072

The average initial gradient and the temperature of the water bath were then put into a train_test_split to ascertain a linear resgression between the 2 variables.

A total of 35 datasets of temperature of waterbath and estimated initial gradient were collected.



## Data Format

The features of the linear regression model is a numpy 2D array of the estimated average initial gradient from the dataset from the start_time to the 20seconds. Labelled as x

The label of the data is a numpy 2D array of the temperature of the water bath.

e.g. x_train = array([[ 1.72745749],
       [ 1.96714688],
       [-0.66755193],
       [ 1.43305237],
       [ 1.19910881],
       [ 1.88458582],
       [ 1.78132902],
       [ 1.41904691],
       [-0.58819454],
       [ 2.15229109],
       [-0.71432488],
       [ 2.17870241],
       [ 1.27897004],
       [ 1.56357202],
       [ 1.31130386],
       [ 1.41093417],
       [ 1.55509185],
       [ 1.7445255 ],
       [ 1.60115305],
       [ 2.20024851],
       [ 1.32921523]])
       
       y_train = array([[47.6],
       [50.3],
       [16.3],
       [43.3],
       [39. ],
       [48.5],
       [47.2],
       [42.9],
       [16.3],
       [51.1],
       [16. ],
       [53. ],
       [41.6],
       [45. ],
       [41.1],
       [44.7],
       [40.1],
       [46.8],
       [45.6],
       [54.1],
       [42.1]])
       
       
       
       
       

# Training Model

The 37 datasets are stored in a txt file "Test data.txt". These values are first extracted from the txt file and then separated into 2 seperate lists. These 2 lists are then randomly used to both generate the training set and 

# Verification and Accuracy

Describe how you check the accuracy of your model and its result. State any analysis you have and the steps you have taken to improve its accuracy.

# Example Scripts

Instruction:

* Read an excel file with the following format:
```
time (s)	reading
0.00	    25.812
0.90	    28.562
1.79	    31.875
2.68	    35.062
3.55	    37.937
4.43	    40.687
5.30	    43.25
```
where the first column indicates the time in seconds and the second column indicates the sensor reading in Celsius. 
* Write a code to prepare the data, extract the features.
* Write a code to split the data.
* Write a code to train the model.
* Write a code to predict the final temperature.
* Write a code to check accuracy.

**The script below is just for your example. You don't have to use it and you can write your own script.**



In [None]:
# write a code to read an excel file
import pandas as pd
import numpy as np
from sklearn import linear_model 
from sklearn.metrics import mean_squared_error, r2_score

# specify the base of your filename, e.g. temp_1.xlsx, temp_2.xlsx
filename = 'temp_' 

# if you have more than one files, 
# you can use some key to differentiate them, e.g. '1', '2'
filekeys = [] 

# this is to store the data for different files, 
# the keys are in filekeys
dataframe = {} 
for key in filekeys:
    dataframe[key] = pd.read_excel(filename + key + '.xlsx')


In [None]:
# write a code to prepare the data for predicting
def preprocess(df):
    # use this function to extract the features from the data frame
    pass

data_processed = {}
for key in filekeys:
    data_processed[key]=preprocess(dataframe[key])

In [None]:
# write a code to split the data to train and test
def prepare_train_test(data):
    

data_train, data_test = prepare_train_test(data_processed)

In [None]:
# write a code to train the model
# the function should return the trained model
def train_model(data):
    pass

model = train_model(data_train)

In [None]:
# write a code to predict the final temperature
# store the predicted temperature in a variable called "predicted"
# predicted is a dictionary where the keys are listed in filekey

predicted = {}
for key in filekeys:
    predicted[key]=model.predict(data_test[key])

In [None]:
# write a code to check your accuracy

