# Verisk - computer purchase problem

*by Jolanta Śliwa*

## Problem Statement
Suppose you're trying to help a company determine which computers to purchase.
### Data - utilization data by employee:
The company has been able to pull utilization data by employee that classifies users into 3 bins, depending on how much they use their computer in their work:
* Low usage - spends a lot of time in meetings, checking email, doing people management
* Average usage - requires some compute power, with balanced mix of heads down/technical work along with a
good amount of meetings/email writing
* High usage - power user, relies heavily on computer performance


In [1]:
import pandas as pd


utilization = pd.read_csv(
    "https://raw.githubusercontent.com/shubhamkalra27/dsep-2020/main/datasets/util_b_emp.csv"
)

In [2]:
utilization.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 2 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   employee_id      146 non-null    int64 
 1   utilization_bin  146 non-null    object
dtypes: int64(1), object(1)
memory usage: 2.4+ KB


In [3]:
utilization.head()

Unnamed: 0,employee_id,utilization_bin
0,1743,high
1,1752,high
2,1758,high
3,1825,high
4,1842,high


In [4]:
utilization["utilization_bin"].unique()

array(['high', 'medium', 'low'], dtype=object)

## Data - survey
Additionally, they've surveyed employees to collect the relative importance of the following variables describing a
computer's performance:
* Memory
* Processing
* Storage
* Price inverse - this metric was given to you by the company as you can see in the dataset, with the directive that
price inverse being fixed at a 25% weight in the purchase decision

In [5]:
survey = pd.read_csv(
    "https://raw.githubusercontent.com/shubhamkalra27/dsep-2020/main/datasets/survey_emp.csv"
)

In [6]:
survey.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   employee_id    146 non-null    int64  
 1   memory         146 non-null    float64
 2   processing     146 non-null    float64
 3   storage        146 non-null    float64
 4   inverse_price  146 non-null    float64
dtypes: float64(4), int64(1)
memory usage: 5.8 KB


In [7]:
survey.head()

Unnamed: 0,employee_id,memory,processing,storage,inverse_price
0,1743,0.375,0.225,0.15,0.25
1,1752,0.45,0.225,0.075,0.25
2,1758,0.375,0.3,0.075,0.25
3,1825,0.3,0.3,0.15,0.25
4,1842,0.3,0.3,0.15,0.25


In [8]:
for i, row in survey.iterrows():
    if sum(row[1:]) != 1:
        print("problem")
        break
print("ok")

ok


## Data - computers
Lastly, the company is looking to purchase a maximum of 3 different computer models, and have compiled the following
list scoring their memory, processing, storage, and relative price. Each dimension is scored from 0-10, with 10 being the best.

In [9]:
computers = pd.read_csv(
    "https://raw.githubusercontent.com/shubhamkalra27/dsep-2020/main/datasets/vendor_options.csv"
)

In [10]:
computers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11 entries, 0 to 10
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   computer_id    11 non-null     int64  
 1   memory         11 non-null     int64  
 2   processing     11 non-null     int64  
 3   storage        11 non-null     int64  
 4   inverse_price  11 non-null     float64
dtypes: float64(1), int64(4)
memory usage: 568.0 bytes


In [11]:
computers.head()

Unnamed: 0,computer_id,memory,processing,storage,inverse_price
0,13,5,7,10,2.7
1,16,9,8,9,1.3
2,4,8,9,10,1.0
3,1,8,8,9,1.7
4,3,5,4,4,5.7


In [12]:
computers

Unnamed: 0,computer_id,memory,processing,storage,inverse_price
0,13,5,7,10,2.7
1,16,9,8,9,1.3
2,4,8,9,10,1.0
3,1,8,8,9,1.7
4,3,5,4,4,5.7
5,2,6,7,7,3.3
6,20,7,10,7,2.0
7,8,9,6,9,2.0
8,9,9,8,7,2.0
9,7,7,7,9,2.3


## Task
**Given this information, provide the company with a recommendation on which computers to purchase.**

In [13]:
parameters = computers.columns[1:]

In [14]:
# print(parameters[:-1])

In [15]:
employees = utilization.merge(survey, left_on="employee_id", right_on="employee_id")

In [16]:
employees.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 146 entries, 0 to 145
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   employee_id      146 non-null    int64  
 1   utilization_bin  146 non-null    object 
 2   memory           146 non-null    float64
 3   processing       146 non-null    float64
 4   storage          146 non-null    float64
 5   inverse_price    146 non-null    float64
dtypes: float64(4), int64(1), object(1)
memory usage: 8.0+ KB


In [16]:
from problem import ProblemMax, ProblemScale