# Verisk - computer purchase problem

*by Jolanta Śliwa*

## Problem Statement
Suppose you're trying to help a company determine which computers to purchase.
### Data - utilization data by employee:
The company has been able to pull utilization data by employee that classifies users into 3 bins, depending on how much they use their computer in their work:
* Low usage - spends a lot of time in meetings, checking email, doing people management
* Average usage - requires some compute power, with balanced mix of heads down/technical work along with a
good amount of meetings/email writing
* High usage - power user, relies heavily on computer performance


In [4]:
import pandas as pd


utilization = pd.read_csv("https://raw.githubusercontent.com/shubhamkalra27/dsep-2020/main/datasets/util_b_emp.csv")

In [5]:
utilization.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 2 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   employee_id      146 non-null    int64 
 1   utilization_bin  146 non-null    object
dtypes: int64(1), object(1)
memory usage: 2.4+ KB


In [6]:
utilization.head()

Unnamed: 0,employee_id,utilization_bin
0,1743,high
1,1752,high
2,1758,high
3,1825,high
4,1842,high


In [7]:
utilization["utilization_bin"].unique()

array(['high', 'medium', 'low'], dtype=object)

## Data - survey
Additionally, they've surveyed employees to collect the relative importance of the following variables describing a
computer's performance:
* Memory
* Processing
* Storage
* Price inverse - this metric was given to you by the company as you can see in the dataset, with the directive that
price inverse being fixed at a 25% weight in the purchase decision

In [8]:
survey = pd.read_csv("https://raw.githubusercontent.com/shubhamkalra27/dsep-2020/main/datasets/survey_emp.csv")

In [9]:
survey.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   employee_id    146 non-null    int64  
 1   memory         146 non-null    float64
 2   processing     146 non-null    float64
 3   storage        146 non-null    float64
 4   inverse_price  146 non-null    float64
dtypes: float64(4), int64(1)
memory usage: 5.8 KB


In [10]:
survey.head()

Unnamed: 0,employee_id,memory,processing,storage,inverse_price
0,1743,0.375,0.225,0.15,0.25
1,1752,0.45,0.225,0.075,0.25
2,1758,0.375,0.3,0.075,0.25
3,1825,0.3,0.3,0.15,0.25
4,1842,0.3,0.3,0.15,0.25


In [11]:
for i, row in survey.iterrows():
    if sum(row[1:]) != 1:
        print("problem")
        break
print("ok")

ok


## Data - computers
Lastly, the company is looking to purchase a maximum of 3 different computer models, and have compiled the following
list scoring their memory, processing, storage, and relative price. Each dimension is scored from 0-10, with 10 being the best.

In [12]:
computers = pd.read_csv("https://raw.githubusercontent.com/shubhamkalra27/dsep-2020/main/datasets/vendor_options.csv")

In [13]:
computers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11 entries, 0 to 10
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   computer_id    11 non-null     int64  
 1   memory         11 non-null     int64  
 2   processing     11 non-null     int64  
 3   storage        11 non-null     int64  
 4   inverse_price  11 non-null     float64
dtypes: float64(1), int64(4)
memory usage: 568.0 bytes


In [14]:
computers.head()

Unnamed: 0,computer_id,memory,processing,storage,inverse_price
0,13,5,7,10,2.7
1,16,9,8,9,1.3
2,4,8,9,10,1.0
3,1,8,8,9,1.7
4,3,5,4,4,5.7


In [15]:
computers

Unnamed: 0,computer_id,memory,processing,storage,inverse_price
0,13,5,7,10,2.7
1,16,9,8,9,1.3
2,4,8,9,10,1.0
3,1,8,8,9,1.7
4,3,5,4,4,5.7
5,2,6,7,7,3.3
6,20,7,10,7,2.0
7,8,9,6,9,2.0
8,9,9,8,7,2.0
9,7,7,7,9,2.3


## Task
**Given this information, provide the company with a recommendation on which computers to purchase.**

In [16]:
parameters = computers.columns[1:]

In [17]:
# print(parameters[:-1])

In [18]:
employees = utilization.merge(survey, left_on="employee_id", right_on="employee_id")

In [19]:
employees.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 146 entries, 0 to 145
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   employee_id      146 non-null    int64  
 1   utilization_bin  146 non-null    object 
 2   memory           146 non-null    float64
 3   processing       146 non-null    float64
 4   storage          146 non-null    float64
 5   inverse_price    146 non-null    float64
dtypes: float64(4), int64(1), object(1)
memory usage: 8.0+ KB


In [21]:
from typing import TypeVar


series_type = TypeVar("pandas.core.series.Series")
heuristic_value = {"high": 10, "medium": 7, "low": 3 }

In [None]:
"""Simulated Annealing parameters"""
initial_temperature: int = 5
temperature: int = initial_temperature
local_optimum_escapes: int = 0
min_temperature: float = 1e-10
cooling_step: float = 0.999
cooling_time: int = 0
escape_reheat_ratio: float = 0.1
steps_from_last_state_update: int = 0

In [22]:
def get_value_by_needs(value: int, employee_type: str) -> int:
    return min(value, heuristic_value.get(employee_type))

In [23]:
# for key in heuristic_value.keys():
#     """
#         testing get_value_by_needs
#     """
#     test = employees[employees["utilization_bin"] == key].iloc[0]
#     print(key + ":", heuristic_value.get(key))
#     print(get_value_by_needs(9, test["utilization_bin"]))

In [27]:
def calculate_computer_value_for_employee(computer: series_type, employee: series_type) -> float:
    result = sum([employee[col] * get_value_by_needs(computer[col], employee["utilization_bin"]) for col in parameters[:-1]])
    result += employee[parameters[-1]] * computer[parameters[-1]]
    return result

In [25]:
# testing calculate_computer_value_for_employee
# print(calculate_computer_value_for_employee(computers.iloc[0], employees.iloc[0]))

In [24]:
def get_best_from_three_for_employee(computers_indexes: list[int], employee: series_type) -> float:
    return max([calculate_computer_value_for_employee(computers.iloc[c], employee) for c in computers_indexes])


In [29]:
# for i in [j for j in range(3)]:
#     print(calculate_computer_value_for_employee(computers.iloc[i], employees.iloc[0]))
#
# print("\n", get_best_from_three_for_employee([i for i in range(3)], employees.iloc[0]), sep="")

In [30]:
def calculate_state_cost(state: list[int]) -> float:
    cost = 0
    for _, e in employees.iterrows():
        cost += get_best_from_three_for_employee(state, e)
    return cost

In [31]:
sum_all = 0
for _, e in employees.iterrows():
    sum_all += get_best_from_three_for_employee([i for i in range(3)], e)

print( calculate_state_cost([i for i in range(3)]) == sum_all, sep="")

True


In [67]:
def improvement(new_state: list[int], old_state: list[int]) -> float:
    return calculate_state_cost(new_state) - calculate_state_cost(old_state)

In [72]:
from random import randint, shuffle, random, choices
from copy import deepcopy


def get_random_neighbour(state: list[int]) -> generator:
    neighbour_states = [(i, j) for i in range(len(state))
                                for j in [x for x in range(computers.shape[0]) if x not in state]]

    shuffle(neighbour_states)

    for i, j in neighbour_states:
        new_state = deepcopy(state)
        new_state[i] = j
        yield new_state

In [66]:
# generator = get_random_neighbour([1, 4, 6])
# print(next(generator))

[8, 4, 6]


In [70]:
# TODO: implement transition probability
# - calculate probability of transition according to the metropolis function
        #   p = exp(delta / temperature) [1]
        # where:
        #   - delta is the improvement of the objective function[2]
        #
        # [1] `mpmath.exp` calculates `exp` function
        # [2] `model.improvement` method
#
def calculate_transition_probability(old_state: list[int], new_state: list[int]) -> float:
    # return mpmath.exp(improvement(new_state, old_state) / temperature)
    raise NotImplementedError()

In [None]:
def update_temperature():
    # TODO:
        # — update self.temperature according to the exponential decrease function:
        #   `T_k = T * a^k`
        #   where:
        #       [1] `a` is `self.config.cooling_step`
        #       [2] `k` is stored as `self.cooling_time``
        # - update self.cooling_time
        # - make sure, the temperature can't go below `self.config.min_temperature`!
    raise NotImplementedError()

In [None]:
def reheat(from_state: list[int]):
    # TODO:
    # — restore the initial temperature based on config (escape_reheat_ratio * initial_temperature)
    #   [1] initial temperature is stored in `self.config.initial_temperature`
    #   [2] you should decrease it a bit (multiply by `self.config.escape_reheat_ratio`)
    temperature = initial_temperature * escape_reheat_ratio
    # — reset cooling schedule (`self.cooling_time`)
    cooling_time = 0
    # — reset counter looking for local minima (`self.steps_from_last_state_update`)
    steps_from_last_state_update = 0
    # - return the `from_state`
    return from_state
    # raise NotImplementedError()

In [None]:
def find_next_state(state: list[int]) -> list[int]:
    # — find random neighbour:
    #   [1] create a generator of the random neighbors
    generator = get_random_neighbour(state)
    #   [2] use `next` to read a single element from a generator, e.g. `next(generator)`
    neighbour = next(generator)
    # — if the neighbour is better then mark is as the next state:
        #   [1] check for improvement
    if improvement(neighbour, state) > 0:
        return neighbour
    # — otherwise calculate the probability of transition
    prob = calculate_transition_probability(state, neighbour)
    #   [1] use random() to generate a random number from range [0,1];
    p = random()
    #   [2] compare it to the probability to check if algorithm should go to the new state
    if p > prob:
        # — update temperature using `update_temperature`
        update_temperature()
        # — return the new state
        return neighbour


In [14]:
escape_strategies = {

}

def escape_local_optimum(state: list[int], best_state: list[int]) -> list[int]:
    # TODO
    # implement strategies and choosing proces
    # strategy = choices(escape_strategies, weights=)
    raise NotImplementedError()