# Verisk - computer purchase problem

*by Jolanta Śliwa*

## Problem Statement
Suppose you're trying to help a company determine which computers to purchase.
### Data - utilization data by employee:
The company has been able to pull utilization data by employee that classifies users into 3 bins, depending on how much they use their computer in their work:
* Low usage - spends a lot of time in meetings, checking email, doing people management
* Average usage - requires some compute power, with balanced mix of heads down/technical work along with a
good amount of meetings/email writing
* High usage - power user, relies heavily on computer performance


In [37]:
import pandas as pd


utilization = pd.read_csv(
    "https://raw.githubusercontent.com/shubhamkalra27/dsep-2020/main/datasets/util_b_emp.csv"
)

In [38]:
utilization.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 2 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   employee_id      146 non-null    int64 
 1   utilization_bin  146 non-null    object
dtypes: int64(1), object(1)
memory usage: 2.4+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 2 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   employee_id      146 non-null    int64 
 1   utilization_bin  146 non-null    object
dtypes: int64(1), object(1)
memory usage: 2.4+ KB


In [39]:
utilization.head()

Unnamed: 0,employee_id,utilization_bin
0,1743,high
1,1752,high
2,1758,high
3,1825,high
4,1842,high


Unnamed: 0,employee_id,utilization_bin
0,1743,high
1,1752,high
2,1758,high
3,1825,high
4,1842,high


In [40]:
utilization["utilization_bin"].unique()

array(['high', 'medium', 'low'], dtype=object)

array(['high', 'medium', 'low'], dtype=object)

## Data - survey
Additionally, they've surveyed employees to collect the relative importance of the following variables describing a
computer's performance:
* Memory
* Processing
* Storage
* Price inverse - this metric was given to you by the company as you can see in the dataset, with the directive that
price inverse being fixed at a 25% weight in the purchase decision

In [41]:
survey = pd.read_csv(
    "https://raw.githubusercontent.com/shubhamkalra27/dsep-2020/main/datasets/survey_emp.csv"
)

In [42]:
survey.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   employee_id    146 non-null    int64  
 1   memory         146 non-null    float64
 2   processing     146 non-null    float64
 3   storage        146 non-null    float64
 4   inverse_price  146 non-null    float64
dtypes: float64(4), int64(1)
memory usage: 5.8 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   employee_id    146 non-null    int64  
 1   memory         146 non-null    float64
 2   processing     146 non-null    float64
 3   storage        146 non-null    float64
 4   inverse_price  146 non-null    float64
dtypes: float64(4), int64(1)
memory usage: 5.8 KB


In [43]:
survey.head()

Unnamed: 0,employee_id,memory,processing,storage,inverse_price
0,1743,0.375,0.225,0.15,0.25
1,1752,0.45,0.225,0.075,0.25
2,1758,0.375,0.3,0.075,0.25
3,1825,0.3,0.3,0.15,0.25
4,1842,0.3,0.3,0.15,0.25


Unnamed: 0,employee_id,memory,processing,storage,inverse_price
0,1743,0.375,0.225,0.15,0.25
1,1752,0.45,0.225,0.075,0.25
2,1758,0.375,0.3,0.075,0.25
3,1825,0.3,0.3,0.15,0.25
4,1842,0.3,0.3,0.15,0.25


In [44]:
for i, row in survey.iterrows():
    if sum(row[1:]) != 1:
        print("problem")
        break
print("ok")

ok
ok


## Data - computers
Lastly, the company is looking to purchase a maximum of 3 different computer models, and have compiled the following
list scoring their memory, processing, storage, and relative price. Each dimension is scored from 0-10, with 10 being the best.

In [45]:
computers = pd.read_csv(
    "https://raw.githubusercontent.com/shubhamkalra27/dsep-2020/main/datasets/vendor_options.csv"
)

In [46]:
computers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11 entries, 0 to 10
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   computer_id    11 non-null     int64  
 1   memory         11 non-null     int64  
 2   processing     11 non-null     int64  
 3   storage        11 non-null     int64  
 4   inverse_price  11 non-null     float64
dtypes: float64(1), int64(4)
memory usage: 568.0 bytes
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11 entries, 0 to 10
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   computer_id    11 non-null     int64  
 1   memory         11 non-null     int64  
 2   processing     11 non-null     int64  
 3   storage        11 non-null     int64  
 4   inverse_price  11 non-null     float64
dtypes: float64(1), int64(4)
memory usage: 568.0 bytes


In [47]:
computers.head()

Unnamed: 0,computer_id,memory,processing,storage,inverse_price
0,13,5,7,10,2.7
1,16,9,8,9,1.3
2,4,8,9,10,1.0
3,1,8,8,9,1.7
4,3,5,4,4,5.7


Unnamed: 0,computer_id,memory,processing,storage,inverse_price
0,13,5,7,10,2.7
1,16,9,8,9,1.3
2,4,8,9,10,1.0
3,1,8,8,9,1.7
4,3,5,4,4,5.7


In [48]:
computers

Unnamed: 0,computer_id,memory,processing,storage,inverse_price
0,13,5,7,10,2.7
1,16,9,8,9,1.3
2,4,8,9,10,1.0
3,1,8,8,9,1.7
4,3,5,4,4,5.7
5,2,6,7,7,3.3
6,20,7,10,7,2.0
7,8,9,6,9,2.0
8,9,9,8,7,2.0
9,7,7,7,9,2.3


Unnamed: 0,computer_id,memory,processing,storage,inverse_price
0,13,5,7,10,2.7
1,16,9,8,9,1.3
2,4,8,9,10,1.0
3,1,8,8,9,1.7
4,3,5,4,4,5.7
5,2,6,7,7,3.3
6,20,7,10,7,2.0
7,8,9,6,9,2.0
8,9,9,8,7,2.0
9,7,7,7,9,2.3


In [49]:
print(computers.max())

computer_id      20.0
memory            9.0
processing       10.0
storage          10.0
inverse_price     5.7
dtype: float64
computer_id      20.0
memory            9.0
processing       10.0
storage          10.0
inverse_price     5.7
dtype: float64


In [50]:
print(computers.min())

computer_id      1.0
memory           5.0
processing       4.0
storage          4.0
inverse_price    1.0
dtype: float64
computer_id      1.0
memory           5.0
processing       4.0
storage          4.0
inverse_price    1.0
dtype: float64


## Task
**Given this information, provide the company with a recommendation on which computers to purchase.**

List of parameters:

In [51]:
parameters = computers.columns[1:]
print(parameters)

Index(['memory', 'processing', 'storage', 'inverse_price'], dtype='object')
Index(['memory', 'processing', 'storage', 'inverse_price'], dtype='object')


In [52]:
employees = utilization.merge(survey, left_on="employee_id", right_on="employee_id")

In [53]:
employees.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 146 entries, 0 to 145
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   employee_id      146 non-null    int64  
 1   utilization_bin  146 non-null    object 
 2   memory           146 non-null    float64
 3   processing       146 non-null    float64
 4   storage          146 non-null    float64
 5   inverse_price    146 non-null    float64
dtypes: float64(4), int64(1), object(1)
memory usage: 8.0+ KB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 146 entries, 0 to 145
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   employee_id      146 non-null    int64  
 1   utilization_bin  146 non-null    object 
 2   memory           146 non-null    float64
 3   processing       146 non-null    float64
 4   storage          146 non-null    float64
 5   inverse_price    146 non-null    fl

In [54]:
employees.head()

Unnamed: 0,employee_id,utilization_bin,memory,processing,storage,inverse_price
0,1743,high,0.375,0.225,0.15,0.25
1,1752,high,0.45,0.225,0.075,0.25
2,1758,high,0.375,0.3,0.075,0.25
3,1825,high,0.3,0.3,0.15,0.25
4,1842,high,0.3,0.3,0.15,0.25


Unnamed: 0,employee_id,utilization_bin,memory,processing,storage,inverse_price
0,1743,high,0.375,0.225,0.15,0.25
1,1752,high,0.45,0.225,0.075,0.25
2,1758,high,0.375,0.3,0.075,0.25
3,1825,high,0.3,0.3,0.15,0.25
4,1842,high,0.3,0.3,0.15,0.25


In [55]:
from problem import ProblemMax, ProblemScale

In [56]:
prob_max = ProblemMax(computers, employees)
prob_scale = ProblemScale(computers, employees)

In [57]:
from simulated_annealing import SimulatedAnnealing, SimulatedAnnealingConfig

In [58]:
config = SimulatedAnnealingConfig()

In [59]:
annealing_max = SimulatedAnnealing(config, prob_max)

In [60]:
annealing_max.solve()

SOLUTION:
 Best: [8, 5, 4]


[8, 5, 4]

SOLUTION:
 Best: [2, 5, 4]


[2, 5, 4]

In [61]:
from stupid_solution import Stupid

stupid_max = Stupid(prob_max)

In [62]:
stupid_max.solve()

SOLUTION:
 Best: [1, 4, 5]


[1, 4, 5]

SOLUTION:
 Best: [1, 4, 5]


[1, 4, 5]

In [63]:
annealing_scale = SimulatedAnnealing(SimulatedAnnealingConfig(), prob_scale)

In [64]:
annealing_scale.solve()

SOLUTION:
 Best: [5, 4, 1]


[5, 4, 1]

SOLUTION:
 Best: [5, 4, 8]


[5, 4, 8]

In [65]:
stupid_scale = Stupid(prob_scale)

In [66]:
stupid_scale.solve()

SOLUTION:
 Best: [1, 4, 5]


[1, 4, 5]

SOLUTION:
 Best: [1, 4, 5]


[1, 4, 5]

In [67]:
prob_scale.calculate_state_cost([4, 5, 8])

1149.1999999999987

1149.1999999999987

In [68]:
prob_scale.calculate_state_cost([1, 4, 5])

1151.0499999999988

1151.0499999999988

In [69]:
from problem import ProblemNothing

prob_nothing = ProblemNothing(computers, employees)
annealing_nothing = SimulatedAnnealing(SimulatedAnnealingConfig(), prob_nothing)

In [70]:
annealing_nothing.solve()

SOLUTION:
 Best: [8, 2, 1]


[8, 2, 1]

SOLUTION:
 Best: [8, 9, 10]


[8, 9, 10]

In [71]:
stupid_nothing = Stupid(prob_nothing)
stupid_nothing.solve()

SOLUTION:
 Best: [1, 2, 8]


[1, 2, 8]

SOLUTION:
 Best: [1, 2, 8]


[1, 2, 8]

In [72]:
prob_nothing.calculate_state_cost([1, 2, 8])

1024.650000000001

1024.650000000001