## Analyzing data using Object-oriented Programming (OOP)
A group of Chinese researchers are curious to find out what the ideal length of chopsticks for a selected group of 30 people. There are 3 columns in this dataset:
- "Food pinching efficiency" - measures on effective of the chopsticks
- "Individual" - identification of each person who participated the study
- "Chopstick length" - measures the length of the chopsticks in millimeter
J

In [37]:
import numpy as np
import pandas as pd
import csv
f = open("data/chopsticks.csv", 'r')
data = list(csv.reader(f))
data = data[1:]
data[0:5]

[['19.55', '1', '180'],
 ['27.24', '2', '180'],
 ['28.76', '3', '180'],
 ['31.19', '4', '180'],
 ['21.91', '5', '180']]

Since we are comparing the efficiency of chopsticks of different lengths, we would want to group the same-length chopsticks into one category and compare it to others. In order to achieve that, we'll create a class (named Trial) that stores information about each row of the data.

## Declaring the attribute class
To create the blueprint of how to navigate the information in the dataset.

In [44]:
class Trial(object):
    def __init__(self, row):
        try:
            self.efficiency = float(row[0])
            self.individual = int(row[1])
            self.length = int(row[2])
        except ValueError: #force the entry if there is an error
            self.efficiency = -1.0
            self.individual = -1
            self.chopstick_length = -1
        
#Initiating the first row of data
first_trial = Trial(data[1])

## Declaring the manager class 
To create the blueprint to manipulate the grouping and to control the behaviors of the data.
We are grouping all the chopstick information by lengths.

Refining the Chopstick class to store all the trials of a certain length, so when an amount is passed in the class, all the data pertaining the chopsticks with that length will be stored in one list.

In [39]:
class Chopstick(object):
    def __init__(self, length):
        self.length = length
        # Start our trial list empty
        self.trials = []
        # Now, fill our list with relevant trials
        for row in data:
            if int(row[2]) == self.length:
                trial = Trial(row)
                if trial.efficiency != -1 and trial.individual != -1 and trial.chopstick_length != 1:
                    self.trials.append(Trial(row))

#Initiating the class with a length of 240
medium_chopstick = Chopstick(240)


## Adding the feature to calculate efficiency in the Chopstick class

In [47]:
import math
class Chopstick(object):
    def __init__(self, length):
        self.length = length
        self.trials = []
        
        for row in data:
            if int(row[2]) == self.length:
                trial = Trial(row)
                if trial.efficiency!=-1 and trial.individual!=-1 and trial.length!=1:
                    self.trials.append(Trial(row))
                
    #writing a method that counts the number of chopsticks with that length
    def num_trials(self):
        return len(self.trials)
    
    #writing a methdo that calculates the average efficiency of chopsticks with that length
    def avg_efficiency(self):
        try:
            return math.fsum([row.efficiency for row in self.trials]) / self.num_trials()
        except:
            return -1
#Initiating the class to calculate the average efficiency of chopsticks 210mm long in avg_eff_210
avg_eff_210 = Chopstick(210).avg_efficiency()
print(avg_eff_210)

25.4838709677


Finding out each unique value of the length of the chopsticks in the dataset and initiate each into the class Chopstick.

In [53]:
chopstick_lengths = list(set([int(row[2]) for row in data]))
print(chopstick_lengths)

[330, 300, 270, 240, 210, 180]


In [55]:
chopstick_list = [Chopstick(c) for c in chopstick_lengths]

## Overloading the comparison operators for the Chopstick class so we can take advantage of built-in Python functions

In [66]:
import math
class Chopstick(object):
    def __init__(self, length):
        self.length = length
        self.trials = []
        
        for row in data:
            if int(row[2]) == self.length:
                trial = Trial(row)
                if trial.efficiency!=-1 and trial.individual!=-1 and trial.length!=1:
                    self.trials.append(Trial(row))
    
    def __str__(self):
        return self.length
                
    def num_trials(self):
        return len(self.trials)
    
    def avg_efficiency(self):
        try:
            return math.fsum([row.efficiency for row in self.trials]) / self.num_trials()
        except:
            return -1
    
        
    #Adding the comparison operators overloads
    def __lt__(self, other):
        return self.avg_efficiency < other.avg_efficiency
    def __gt__(self, other):
        return self.avg_efficiency > other.avg_efficiency
    def __le__(self, other):
        return self.avg_efficiency <= other.avg_efficiency
    def __ge__(self, other):
        return self.avg_efficiency >= other.avg_efficiency
    def __eq__(self, other):
        return self.avg_efficiency == other.avg_efficiency
    def __ne__(self, other):
        return self.avg_efficiency != other.avg_efficiency


In [68]:
#Find out the most efficient length of chopstick
most_efficient = max(chopstick_list)
most_efficient.length

240

In [69]:
#Find out the least efficient length of chopstick
least_efficient = min(chopstick_list)
least_efficient.length

270