# Does Testing Save Time?

$T = r_{\rm code} l + r_{\rm test} l + t_{\rm DB}$,
where
* $T \equiv$ total time spent coding
* $l \equiv$ number of lines of code
* $r_{\rm code} \equiv$ time spent writing the code itself per line of code
* $r_{\rm test} \equiv$ time spent writing the testing code per line of code
* $t_{\rm DB} \equiv$ time spent debugging

### Time spent Debugging

Let's breakdown that last term a little more...

$t_{\rm DB} = n_l t_l + n_m t_m$, where
* $n_l \equiv$ number of bugs that are easily locatable
* $t_l \equiv$ time spent fixing an easily locatable bug
* $n_m \equiv$ number of bugs that are in a mysterious, hard-to-find location
* $t_m \equiv$ time spent fixing a hard-to-find bug

With a bit of re-ordering this becomes

$t_{\rm DB} = n_{\rm caught} t_l ( 1 + f_m \frac{( t_m - t_l )}{t_l} )$, where
* $n_{\rm caught} \equiv$ number of bugs caught
* $f_m \equiv$ fraction of bugs that are in a mystery location.

Finally we can break down $n_{\rm caught}$ into

$n_{\rm caught} \equiv f_{c} n = f_{c} e l$, where
* $f_{c} \equiv$ the fraction of bugs that are caught
* $n \equiv$ total number of bugs in the code
* $e \equiv$ number of bugs per line of code 

Therefore

$t_{\rm DB} = l f_c e t_l ( 1 + f_m \frac{( t_m - t_l )}{t_l} )$

## Final parametrized form to explore

Putting everything together, we get

$T/l = r_{\rm code} + r_{\rm test} + f_c e t_l ( 1 + f_m \frac{( t_m - t_l )}{t_l} )$

When we do/don't do testing we can expect that the only values that change are $r_{\rm test}$, $f_c$, and $f_m$.
Therefore we can explore the ratio

$T_{\rm testing} / T_{\rm no~testing} = (r_{\rm code} + r_{\rm test} + f_{c,t} e t_l ( 1 + f_{m,t} \frac{( t_m - t_l )}{t_l} )) / (r_{\rm code} + f_{c,n} e t_l ( 1 + f_{m,n} \frac{( t_m - t_l )}{t_l} ))$

## Import Estimates

### Old Estimates

In [35]:
import numpy as np
import pandas as pd

In [36]:
df = pd.read_csv('~/Downloads/Coding Habits Survey (Responses Cleaned) - Form Responses 1.csv', header=0,)

In [37]:
df = df.drop( 0 )

In [38]:
df

Unnamed: 0.1,Unnamed: 0,r_code,e,t_l,t_m,1 - f_m_notest,1 - f_c_notest,1 - f_m_test,1 - f_c_test
1,8/27/2020 10:36:54,2.0,0.6,2.0,20.0,0.5,0.1,0.9,0.01
2,9/9/2020 14:39:30,1.0,0.01,2.0,10.0,0.9,0.01,0.9,0.01
3,9/9/2020 14:45:00,1.0,1.0,0.8,3.5,0.95,0.01,0.9,0.01
4,9/9/2020 14:48:11,1.6,1.0,5.0,400.0,0.9,0.04,0.9,0.01
5,9/9/2020 14:48:36,4.0,1.0,3.0,30.0,0.75,0.04,0.9,0.01
6,9/9/2020 17:40:21,0.08,0.05,1.0,10.0,0.9,0.05,0.9,0.01


### Recent Estimates

In [19]:
df = pd.read_csv('/Users/zhafen/Downloads/Attitude Thruster Pre-Session Survey (Responses) - Form Responses 1.csv', header=0,)

In [20]:
df

Unnamed: 0,Timestamp,What is your current title?,How many minutes does it take you to write a single line of code (assuming it interacts with other lines of code)?,How many bugs do you normally introduce to a script that you write?,How many minutes does it take you to fix an easy-to-locate bug?,How many minutes does it take you to fix a hard-to-locate bug?,"What fraction of the bugs that exist in your code are easy-to-find? If you use automated testing, don't include bugs that would be easy-to-find only with automated testing. If you don't know what automated testing is ignore the previous sentence.","What fraction of the bugs that exist in your code are undetected? If you use automated testing, don't include bugs found with automated testing. If you don't know what automated testing is ignore the previous sentence.","If you use automated testing, what fraction of the bugs that exist in your code are easy-to-find overall (both due to automated testing and by default)? If you don't know what automated testing is don't answer.","If you use automated testing, what fraction of bugs in your code are undetected (both due to automated testing and by default)? If you don't know what automated testing is don't answer."
0,11/9/2020 14:01:37,Grad: Yr 4+,1-2 min,5-10,2-3,5+,>50%,<50%,,
1,11/9/2020 15:06:18,Grad: Yr 2,1-2 min,0-5,2-3,5+,<50%,>50%,,
2,11/11/2020 9:10:16,Grad: Yr 2,2-5 min,5-10,2-3,5+,>50%,<50%,,
3,11/11/2020 14:08:46,Grad: Yr 4+,1-2 min,5-10,2-3,5+,>50%,<50%,,
4,11/19/2020 11:42:19,Grad: Yr 4+,1-2 min,0-5,1-2,5+,>50%,<50%,,
5,11/19/2020 15:32:37,Grad: Yr 4+,1-2 min,0-5,1-2,5+,>50%,<50%,,
6,11/19/2020 15:33:06,Grad: Yr 4+,1-2 min,0-5,5+,5+,>50%,<50%,,
7,11/19/2020 15:33:20,Grad: Yr 4+,1-2 min,0-5,1-2,5+,>50%,<50%,,
8,11/19/2020 15:33:38,Grad: Yr 1,2-5 min,more than 10,3-4,5+,<50%,<50%,>50%,
9,11/19/2020 15:33:39,Grad: Yr 2,1-2 min,5-10,2-3,3-4,>50%,<50%,<50%,<50%


In [21]:
def interpret_response( resp_str ):
    
    try:    
        if resp_str[:2] == '5+':
            return 5.
        elif resp_str == 'more than 10':
            return 10.
        elif resp_str == '>50%':
            return 0.75
        elif resp_str == '<50%':
            return 0.25
        elif resp_str == 'NaN':
            return np.nan
        else:
            try:
                resp_str = resp_str.split( ' ' )[0]
                low, high = resp_str.split( '-' )
                mean = 0.5 * ( float( low ) + float( high ) )
                return mean
            except:
                return resp_str
    except TypeError:
        return resp_str
        

In [24]:
column_names = [ None, None, 'r_code', 'e', 't_l', 't_m', '1 - f_m_notest', '1 - f_c_notest', '1 - f_m_test', '1 - f_c_test' ]

In [25]:
assumed_values = {
    '1 - f_m_test': 0.9,
    '1 - f_c_test': 0.01
}

In [26]:
# Clean the data
cleaned_data = {}
for i, name in enumerate( df.columns ):
    
    if column_names[i] != None:
        used_name = column_names[i]
    else:
        used_name = name
    
    # Skip first two columns
    if i < 2:
        cleaned_data[used_name] = df[name]
    else:
        cleaned_data[used_name] = np.array( [ interpret_response( _ ) for _ in df[name] ] )
        
    # Fix NaNs in last columns
    if used_name == '1 - f_m_test' or used_name == '1 - f_c_test':
        is_nan = np.isnan( cleaned_data[used_name] )
        cleaned_data[used_name][is_nan] = assumed_values[used_name]
    
df = pd.DataFrame( cleaned_data )

In [27]:
df

Unnamed: 0,Timestamp,What is your current title?,r_code,e,t_l,t_m,1 - f_m_notest,1 - f_c_notest,1 - f_m_test,1 - f_c_test
0,11/9/2020 14:01:37,Grad: Yr 4+,1.5,7.5,2.5,5.0,0.75,0.25,0.9,0.01
1,11/9/2020 15:06:18,Grad: Yr 2,1.5,2.5,2.5,5.0,0.25,0.75,0.9,0.01
2,11/11/2020 9:10:16,Grad: Yr 2,3.5,7.5,2.5,5.0,0.75,0.25,0.9,0.01
3,11/11/2020 14:08:46,Grad: Yr 4+,1.5,7.5,2.5,5.0,0.75,0.25,0.9,0.01
4,11/19/2020 11:42:19,Grad: Yr 4+,1.5,2.5,1.5,5.0,0.75,0.25,0.9,0.01
5,11/19/2020 15:32:37,Grad: Yr 4+,1.5,2.5,1.5,5.0,0.75,0.25,0.9,0.01
6,11/19/2020 15:33:06,Grad: Yr 4+,1.5,2.5,5.0,5.0,0.75,0.25,0.9,0.01
7,11/19/2020 15:33:20,Grad: Yr 4+,1.5,2.5,1.5,5.0,0.75,0.25,0.9,0.01
8,11/19/2020 15:33:38,Grad: Yr 1,3.5,10.0,3.5,5.0,0.25,0.25,0.75,0.01
9,11/19/2020 15:33:39,Grad: Yr 2,1.5,7.5,2.5,3.5,0.75,0.25,0.25,0.25


## Let's actually explore the function

In [39]:
def time_coding_per_line( r_code, r_test, e, t_l, t_m, f_c, f_m,  ):
        
    return r_code + r_test + f_c * e * t_l * ( 1. + f_m * ( t_m - t_l ) / t_l )

In [40]:
def test_vs_no_test( r_code, r_test, e, t_l, t_m, f_c_test, f_m_test, f_c_notest, f_m_notest ):
    
    results = []
    for i, (f_c, f_m) in enumerate( zip( [ f_c_test, f_c_notest ], [ f_m_test, f_m_notest ] ) ):
        
        # During the new-test case there should be no time spent testing, ofc
        if i == 1:
            r_test = 0.
                
        results.append( time_coding_per_line( r_code, r_test, e, t_l, t_m, f_c, f_m,  ) )
        
    return results

In [41]:
estimated_parameters = {
    'r_code': df['r_code'].values.astype( float ),
    'r_test': df['r_code'].values.astype( float ), # We'll assume it takes just as long to test as to code, a conservative assumption
    'e': df['e'].values.astype( float ),
    't_l': df['t_l'].values.astype( float ),
    't_m': df['t_m'].values.astype( float ),
    'f_c_notest': 1. - df['1 - f_c_notest'].values.astype( float ),
    'f_m_notest': 1. - df['1 - f_m_notest'].values.astype( float ),
    'f_c_test': 1. - df['1 - f_c_test'].values.astype( float ),
    'f_m_test': 1. - df['1 - f_m_test'].values.astype( float ),
}

In [42]:
estimated_parameters

{'r_code': array([2.  , 1.  , 1.  , 1.6 , 4.  , 0.08]),
 'r_test': array([2.  , 1.  , 1.  , 1.6 , 4.  , 0.08]),
 'e': array([0.6 , 0.01, 1.  , 1.  , 1.  , 0.05]),
 't_l': array([2. , 2. , 0.8, 5. , 3. , 1. ]),
 't_m': array([ 20. ,  10. ,   3.5, 400. ,  30. ,  10. ]),
 'f_c_notest': array([0.9 , 0.99, 0.99, 0.96, 0.96, 0.95]),
 'f_m_notest': array([0.5 , 0.1 , 0.05, 0.1 , 0.25, 0.1 ]),
 'f_c_test': array([0.99, 0.99, 0.99, 0.99, 0.99, 0.99]),
 'f_m_test': array([0.1, 0.1, 0.1, 0.1, 0.1, 0.1])}

In [43]:
# Don't allow f_c_test to fall below f_c_no_test
invalid = estimated_parameters['f_c_test'] <= estimated_parameters['f_c_notest']
# Assume we catch half the remaining bugs
estimated_parameters['f_c_test'][invalid] = (
    estimated_parameters['f_c_notest'][invalid] +
    ( 1. - estimated_parameters['f_c_notest'][invalid] ) * 0.5
)
print( 'Fixed {} invalid f_c_test values'.format( invalid.sum() ) )

# Don't allow f_m_notest to fall below f_m_test
invalid = estimated_parameters['f_m_test'] >= estimated_parameters['f_m_notest']
# Assume half the remaining bugs are now easy to find
estimated_parameters['f_m_test'][invalid] = estimated_parameters['f_m_notest'][invalid] * 0.5
print( 'Fixed {} invalid f_m_test values'.format( invalid.sum() ) )


Fixed 2 invalid f_c_test values
Fixed 4 invalid f_m_test values


In [44]:
# Reformat
import verdict
responses = {}
for key, item in estimated_parameters.items():
    responses[key] = {}
    for i, v in enumerate( item ):
        responses[key][i] = v
responses = verdict.Dict( responses ).transpose()

In [45]:
# Get results
results = []
for i, parameters in responses.items():
    result = test_vs_no_test( **parameters )
    results.append( result )
    print( '{:>6.2g} min per line w/ testing, {:>6.2g} min per line w/o, {:>6.2g} ratio, {:>6.2g} min writing tests, {:>7.2g} df_m, {:>7.2g} df_c'.format(
            result[0], 
            result[1], 
            result[0]/result[1],
            parameters['r_code'],
            parameters['f_m_test'] - parameters['f_m_notest'],
            parameters['f_c_test'] - parameters['f_c_notest'],
        )
    )
    responses[i]['T/l test'], responses[i]['T/l notest'] = result

   6.3 min per line w/ testing,    7.9 min per line w/o,   0.79 ratio,      2 min writing tests,    -0.4 df_m,    0.09 df_c
     2 min per line w/ testing,      1 min per line w/o,      2 ratio,      1 min writing tests,   -0.05 df_m,   0.005 df_c
   2.9 min per line w/ testing,    1.9 min per line w/o,    1.5 ratio,      1 min writing tests,  -0.025 df_m,   0.005 df_c
    28 min per line w/ testing,     44 min per line w/o,   0.63 ratio,    1.6 min writing tests,   -0.05 df_m,    0.03 df_c
    14 min per line w/ testing,     13 min per line w/o,      1 ratio,      4 min writing tests,   -0.15 df_m,    0.03 df_c
  0.23 min per line w/ testing,   0.17 min per line w/o,    1.4 ratio,   0.08 min writing tests,   -0.05 df_m,    0.04 df_c


Yes, I know tables are not great, but not enough data to warrant a figure.