# Efficiency Analysis

## Intro

Here we use a mathematical model called Data Envelopment Analysis (DEA) to measure efficiency of different Decision Making Units (DMUs) like different departments in an organization, stores of a grocery chain store, branches of a bank.

Efficiency is a general concept but here we mathematically define it as:

$\text{efficiency} = \frac{\text{outputs}}{\text{inputs}}$

This means for the same amount of inputs (like the number of employees a departments has or the budget a department uses), DMUs that create more output (like revenue, customer satisfaction output, etc.) haver higher efficiency scores.

The output term itself can and usually is comprised of different factors like revenue, customer satisfaction score, etc. The interesting thing about DEA is that these outputs could be of different dimensions like money, survey score, etc. Similarly inputs are usually comprised of different factors like the number of employees a department uses, the budget it uses, etc.

We solve this problem with Pyomo, an open-source optimization framework.

## Background

 The Data Envelopment Analysis (DEA) is a nonparametric problem in operations research and economics whose solution is an estimation of production frontiers. It is used to empirically measure the productive efficiency of decision making units (DMUs). There are a number of linear programming formulations of the DEA problem. Fuller coverage of the subject can be found in Farrell (1957), Charnes et al. (1978) and Thanassoulis et al. (1987). The formulation given by H.P. Williams is described in Land (1991). This formulation is the dual model of a model  commonly used that relies on finding weighted ratios of outputs to inputs. We will use the formulation that is commonly used and can be found in Cooper et al. (2007).

The Data Envelopment Analysis has been used to evaluate the performance of many different kinds of entities engaged in many different activities, and in many different contexts in many different countries. Examples include the maintenance activities of U.S. Air Force bases in different geographic locations, or police forces in England and Wales as well as the performance of branch banks in Cyprus and Canada and the efficiency of universities in performing their education and research functions in the U.S., England and France. 

The DEA approach is concerned with evaluations of *efficiency*. The most common measure of efficiency takes the form of a ratio like the following one:

$$
\text{efficiency} = \frac{\text{output}}{\text{input}}
$$

## Model Formulation

Assume there is a set of DMUs. Some common input and output items for each of these DMUs are selected as follows:
1. Numerical data are available for each input and output, with the data assumed to be positive, for all DMUs.
2. The items (inputs, outputs and choice of DMUs) should reflect an analyst's or a manager's interest in the components that will enter into the relative efficiency evaluations of the DMUs.
3. In principle, smaller input amounts are preferable and larger output amounts are preferable so the efficiency scores should reflect these principles.
4. The measurement units of the different inputs and outputs do not need to be congruent. Some may involve a number of persons, or areas of floor space, money expended, etc.

### Fractional problem formulation
The proposed measure of the efficiency of a target DMU $k$ is obtained as the maximum of a ratio of weighted outputs to weighted inputs subject to the condition that the similar ratios for every DMU be less than or equal to one.

### Sets and indices

$ j,k \in \text{DMUS} $: Indices and set of DMUs, where $k$ represents the target DMU.

$i \in \text{Inputs}$: Index and set of inputs.

$r \in \text{Outputs}$: Index and set of outputs.

### Parameters

$\text{invalue}_{i,j} > 0$: Value of input $i$ for DMU $j$.

$\text{outvalue}_{r,j} > 0$: Value of output $r$ for DMU $j$.

### Decision Variables

$u_{r} \geq 0$: Weight of output $r$.

$v_{i} \geq 0$: Weight of input  $i$.

### Objective function

**Target DMU Efficiency**: Maximize efficiency at the target DMU $k$.

$$
\text{Maximize} \quad E_k = 
\frac{\sum_{r \in \text{Outputs}} \text{outvalue}_{r,k}*u_{r}}{\sum_{i \in \text{Inputs}} \text{invalue}_{i,k}*v_{i}}
\tag{FP0}
$$


### Constraints

**Efficiency ratios**: The efficiency of a DMU is a number between $[0,1]$.

\begin{equation}
\frac{\sum_{r \in \text{Outputs}} \text{outvalue}_{r,j}*u_{r}}{\sum_{i \in \text{Inputs}} \text{invalue}_{i,j}*v_{i}}
 \leq 1 \quad \forall j \in \text{DMUS}
 \tag{FP1}
\end{equation}

### Linear programming problem formulation

This linear programming formulation can be found in the book by Cooper et al. (2007).

### Objective function

**Target DMU Efficiency**: Maximize efficiency at the target DMU $k$.

$$
\text{Maximize} \quad E_k = \sum_{r \in \text{Outputs}} \text{outvalue}_{r,k}*u_{r}
\tag{LP0}
$$


### Constraints

**Efficiency ratio**: The efficiency of a DMU is a number between $[0,1]$.

\begin{equation}
\sum_{r \in \text{Outputs}} \text{outvalue}_{r,j}*u_{r} -
\sum_{i \in \text{Inputs}} \text{invalue}_{i,j}*v_{i}
 \leq 0  \quad \forall j \in \text{DMUS}
\tag{LP1}
\end{equation}

**Normalization**: This constraint ensures that the denominator of the objective function of the fractional problem is equal to one.

\begin{equation}
\sum_{i \in \text{Inputs}} \text{invalue}_{i,k}*v_{i} = 1 
\tag{LP2}
\end{equation}

It is easy to verify that the fractional problem and the linear programming problem are equivalent. Let's assume that the denominator of the efficiency ratio constraints of the fractional problem is positive for all DMUs, then we can obtain the constraints $LP1$ by multiplying both sides of the constraints $FP1$ by the denominator. Next, we set the denominator of $FP0$ eqaul to 1 and define constraint $LP2$, and then maximize the numerator, resulting in the objective function $LP0$.

### Definition of efficiency

1. $DMU_k$ is efficient if the optimal objective function value $E_{k}^{*} = 1$.
2. Otherwise, $DMU_k$ is inefficient.

## Problem Description

Inputs

* engagement survey score
* turnover rate

Outputs

* dollars per basket
* basket size


## References

H. Paul Williams, Model Building in Mathematical Programming, fifth edition.

Cooper, W. W, L. M. Seiford, K. Tone. (2007) Data Envelopment Analysis: A Comprehensive Text with Models, Applications, References and DEA-Solver Software. Second edition. Springer-Verlag US.

Land, A. (1991) Data envelopment analysis, Chapter 5, in Operations Research in Management (eds S.C. Littlechild and M.F. Shutler), Prentice Hall, London.

Farrell, M.J. (1957) The measurement of productive efficiency. Journal of the Royal Statistical Society, Series A, 120, 253–290.

Charnes, A., Cooper, W.W. and Rhodes, E. (1978) Measuring the efficiency of decision making units. European Journal of Operational Research, 2, 429–444.

Thanassoulis, E., Dyson, R.G. and Foster, M.J. (1987) Relative efficiency assessments using data envelopment analysis: an application to data on rates departments. Journal of the Operational Research Society, 5, 397–411.

Copyright © 2020 Gurobi Optimization, LLC

# Modeling

In [1]:
%load_ext autoreload
%autoreload 2

import pandas as pd
import pyomo.environ as pyo
from pyomo.environ import AbstractModel, Set, Param, Var, Objective, Constraint, PositiveReals, NonNegativeReals, Binary, maximize, minimize, inequality, SolverFactory, DataPortal, SolverStatus, TerminationCondition
from dea import DEAProgram

## Modeling with Pyomo

### Model Definition

In [2]:
program = DEAProgram("input_oriented_ccr_model")

### Loading data into an instance and solving the instance
After defining the model abstractly we need to add our data to the model, i.e., define values for sets and parameters.
We can either use Python bulitins to define the model or we can use read data from `dea.dat` and solve the model with magic commands in terminal.

#### Using Python builtins

In [3]:
data = {None: {
  "Inputs": {None: ["avg_salary", "employee_engagement"]},
  "Outputs": {None: ["basket_size", "dollars_per_basket"]},
  "Units": {None: [1, 2, 3]},
  "target": {1:0, 2:0, 3:1},
  "invalues": {("avg_salary", 1): 1, ("avg_salary", 2): 1, ("avg_salary", 3): 3, 
                ("employee_engagement", 1): 1, ("employee_engagement", 2): 1, ("employee_engagement", 3): 1},
  "outvalues": {("basket_size", 1): 10, ("basket_size", 2): 5, ("basket_size", 3): 5, 
                ("dollars_per_basket", 1): 5, ("dollars_per_basket", 2): 10, ("dollars_per_basket", 3): 5}
  }}

In [4]:
model = program.model
instance = model.create_instance(data)
# instance.pprint()

In [5]:
# solve
solver = SolverFactory('glpk')
results = solver.solve(instance)
print(f"Solver status (`ok` means the problem was solved successfully): \n {results.solver.status}")

Solver status (`ok` means the problem was solved successfully): 
 ok


In [6]:
results

{'Problem': [{'Name': 'unknown', 'Lower bound': 0.666666666666667, 'Upper bound': 0.666666666666667, 'Number of objectives': 1, 'Number of constraints': 4, 'Number of variables': 4, 'Number of nonzeros': 14, 'Sense': 'maximize'}], 'Solver': [{'Status': 'ok', 'Termination condition': 'optimal', 'Statistics': {'Branch and bound': {'Number of bounded subproblems': 0, 'Number of created subproblems': 0}}, 'Error rc': 0, 'Time': 0.012012004852294922}], 'Solution': [OrderedDict([('number of solutions', 0), ('number of solutions displayed', 0)])]}

In [7]:
print(f"The efficiency of the target DMU is: {instance.efficiency()}")

The efficiency of the target DMU is: 0.666666666666667


#### Using `dea.dat` and `dea.py`
The model defined in `dea.py` is the same model we have defined above.
Data in `dea.dat` is in the AMPL data format. This format is very readable too.

In [8]:
!pyomo solve --solver=glpk input_oriented_ccr_model.py data/dea.dat

[    0.00] Setting up Pyomo environment
[    0.00] Applying Pyomo preprocessing actions
[    0.00] Creating model
[    0.02] Applying solver
[    0.04] Processing results
    Number of solutions: 1
    Solution Information
      Gap: 0.0
      Status: feasible
      Function Value: 0.666666666666667
    Solver results file: results.yml
[    0.04] Applying Pyomo postprocessing actions
[    0.04] Pyomo Finished


#### Using TAB files

In [9]:
data = DataPortal()
data.load(filename='data/units.tab', set=model.Units, format="set")
print(data["Units"])

[1, 2, 3]


In [10]:
data.load(filename='data/inputs.tab', set=model.Inputs, format="set")
print(data["Inputs"])

['avg_salary', 'employee_engagement']


In [11]:
data.load(filename='data/outputs.tab', set=model.Outputs, format="set")
print(data["Outputs"])

['basket_size', 'dollars_per_basket']


In [12]:
data.load(filename='data/target.tab', param=model.target)
print(data["target"])

{1: 0, 2: 0, 3: 1}


In [13]:
data.load(filename='data/invalues.tab', param=model.invalues, format='transposed_array')
print(data["invalues"])

{('avg_salary', 1): 1, ('avg_salary', 2): 1, ('avg_salary', 3): 3, ('employee_engagement', 1): 1, ('employee_engagement', 2): 1, ('employee_engagement', 3): 1}


In [14]:
data.load(filename='data/outvalues.tab', param=model.outvalues, format='transposed_array')
print(data["outvalues"])

{('basket_size', 1): 10, ('basket_size', 2): 5, ('basket_size', 3): 5, ('dollars_per_basket', 1): 5, ('dollars_per_basket', 2): 10, ('dollars_per_basket', 3): 5}


In [15]:
instance = model.create_instance(data)

In [16]:
# solve
solver = SolverFactory('glpk')
results = solver.solve(instance)
print(f"Solver status (`ok` means the problem was solved successfully): \n {results.solver.status}")

Solver status (`ok` means the problem was solved successfully): 
 ok


In [17]:
results

{'Problem': [{'Name': 'unknown', 'Lower bound': 0.666666666666667, 'Upper bound': 0.666666666666667, 'Number of objectives': 1, 'Number of constraints': 4, 'Number of variables': 4, 'Number of nonzeros': 14, 'Sense': 'maximize'}], 'Solver': [{'Status': 'ok', 'Termination condition': 'optimal', 'Statistics': {'Branch and bound': {'Number of bounded subproblems': 0, 'Number of created subproblems': 0}}, 'Error rc': 0, 'Time': 0.01136159896850586}], 'Solution': [OrderedDict([('number of solutions', 0), ('number of solutions displayed', 0)])]}

In [18]:
print(f"The efficiency of the target DMU is: {instance.efficiency()}")

The efficiency of the target DMU is: 0.666666666666667


## Input oriented CCR model

In [19]:
data = {None: {
  "Inputs": {None: ["avg_salary", "employee_engagement"]},
  "Outputs": {None: ["basket_size", "dollars_per_basket"]},
  "Units": {None: [1, 2, 3]},
  "target": {1:0, 2:0, 3:1},
  "invalues": {("avg_salary", 1): 1, ("avg_salary", 2): 1, ("avg_salary", 3): 3, 
                ("employee_engagement", 1): 1, ("employee_engagement", 2): 1, ("employee_engagement", 3): 1},
  "outvalues": {("basket_size", 1): 10, ("basket_size", 2): 5, ("basket_size", 3): 5, 
                ("dollars_per_basket", 1): 5, ("dollars_per_basket", 2): 10, ("dollars_per_basket", 3): 5}
  }}

In [20]:
program = DEAProgram("input_oriented_bcc_model")
program.model_type

'input_oriented_bcc_model'

In [21]:
model = program.model
instance = model.create_instance(data)
# instance.pprint()

In [22]:
# solve
solver = SolverFactory('glpk')
results = solver.solve(instance)
print(f"Solver status (`ok` means the problem was solved successfully): \n {results.solver.status}")

Solver status (`ok` means the problem was solved successfully): 
 ok


In [23]:
results

{'Problem': [{'Name': 'unknown', 'Lower bound': 0.666666666666667, 'Upper bound': 0.666666666666667, 'Number of objectives': 1, 'Number of constraints': 4, 'Number of variables': 4, 'Number of nonzeros': 14, 'Sense': 'minimize'}], 'Solver': [{'Status': 'ok', 'Termination condition': 'optimal', 'Statistics': {'Branch and bound': {'Number of bounded subproblems': 0, 'Number of created subproblems': 0}}, 'Error rc': 0, 'Time': 0.011891841888427734}], 'Solution': [OrderedDict([('number of solutions', 0), ('number of solutions displayed', 0)])]}

In [24]:
print(f"The efficiency of the target DMU is: {instance.efficiency():.2f}")

The efficiency of the target DMU is: 0.67


In [25]:
for x in instance.Units: print(instance.lmbda[x].value)

0.333333333333333
0.333333333333333
0.0


In [26]:
print(program.get_constraint_status(instance, data))

                     slack
avg_salary           -1.33
employee_engagement  -0.00
basket_size          -0.00
dollars_per_basket   -0.00


In [27]:
# Assert that the target DMU can in fact reach to an efficiency of 1 if it decreases it inputs by the efficiency factor we found above.
data[None]["invalues"][("avg_salary", 3)] *= instance.efficiency()
data[None]["invalues"][("employee_engagement", 3)] *= instance.efficiency()

In [28]:
instance = model.create_instance(data)

In [29]:
results = solver.solve(instance)
print(f"Solver status (`ok` means the problem was solved successfully): \n {results.solver.status}")

Solver status (`ok` means the problem was solved successfully): 
 ok


In [30]:
print(f"The efficiency of the target DMU after decreasing its inputs : {instance.efficiency()}")

The efficiency of the target DMU after decreasing its inputs : 1.0


## Output oriented CCR model

In [31]:
data = {None: {
  "Inputs": {None: ["avg_salary", "employee_engagement"]},
  "Outputs": {None: ["basket_size", "dollars_per_basket"]},
  "Units": {None: [1, 2, 3]},
  "target": {1:0, 2:0, 3:1},
  "invalues": {("avg_salary", 1): 1, ("avg_salary", 2): 1, ("avg_salary", 3): 3, 
                ("employee_engagement", 1): 1, ("employee_engagement", 2): 1, ("employee_engagement", 3): 1},
  "outvalues": {("basket_size", 1): 10, ("basket_size", 2): 5, ("basket_size", 3): 5, 
                ("dollars_per_basket", 1): 5, ("dollars_per_basket", 2): 10, ("dollars_per_basket", 3): 5}
  }}

In [32]:
program = DEAProgram("output_oriented_ccr_model")
program.model_type

'output_oriented_ccr_model'

In [33]:
model = program.model
instance = model.create_instance(data)
# instance.pprint()

In [34]:
# solve
solver = SolverFactory('glpk')
results = solver.solve(instance)
print(f"Solver status (`ok` means the problem was solved successfully): \n {results.solver.status}")

Solver status (`ok` means the problem was solved successfully): 
 ok


In [35]:
results

{'Problem': [{'Name': 'unknown', 'Lower bound': 1.5, 'Upper bound': 1.5, 'Number of objectives': 1, 'Number of constraints': 4, 'Number of variables': 4, 'Number of nonzeros': 14, 'Sense': 'minimize'}], 'Solver': [{'Status': 'ok', 'Termination condition': 'optimal', 'Statistics': {'Branch and bound': {'Number of bounded subproblems': 0, 'Number of created subproblems': 0}}, 'Error rc': 0, 'Time': 0.012154102325439453}], 'Solution': [OrderedDict([('number of solutions', 0), ('number of solutions displayed', 0)])]}

In [36]:
# NOTE: for output efficiency the lower the better. Output efficiency of 1 is optimal.
# If efficiency is `x`, then a DMU that has an efficiency of `x` can get to the optimal efficiency of 1 
# if it increases all of its outputs uniformly by a factor of `x`, where `x` for inefficient DMUs is greater than 1.
print(f"The output efficiency of the target DMU is: {instance.efficiency()}.")

The output efficiency of the target DMU is: 1.5.


In [37]:
# Assert that the target DMU can in fact reach to an efficiency of 1 if it increases it outputs by the efficiency factor we found above.
data[None]["outvalues"][("basket_size", 3)] *= instance.efficiency()
data[None]["outvalues"][("dollars_per_basket", 3)] *= instance.efficiency()

In [38]:
instance = model.create_instance(data)

In [39]:
results = solver.solve(instance)
print(f"Solver status (`ok` means the problem was solved successfully): \n {results.solver.status}")

Solver status (`ok` means the problem was solved successfully): 
 ok


In [40]:
print(f"The efficiency of the target DMU after increasing its outputs : {instance.efficiency()}")

The efficiency of the target DMU after increasing its outputs : 1.0000000000000002


## Input oriented slack model
This models enables us to use the slack variables and see how DMUs could improve their efficiency by decreasing their inputs.

In [41]:
# TODO