# Recap

In order of priority/time taken

1. basalareaincremementnonspatialaw
    - this is actually slow because of the number of times the BAFromZeroToDataAw function is called as shown above
    - relaxing the tolerance may help
    - indeed the tolerance is 0.01 * some value while the other factor finder functions have 0.1 tolerance i think
    - can also use cython for the increment functions
2. vectorize merch and gross volume functions
    - they require a lot of getting scalars off data frame, which is quite slow. faster to get an array

do a profiling run with IO (of reading input data and writing the plot curves to files) in next run


# Decide on the action

- speed up increment functions
    - use cython for increment functions
    - it turns out this may not help that much. the function is pretty fast, it's called almost 500,000 times on the sample of 300 plots
    - reduce the number of times its called maybe by using gradient descent for the optimization?
    - relax the tolerance
    - gives a context to refactor them as well (into their own module) which would be a welcome change
    - the increment functions use numpy functions but operate on scalars, there is no benefit to using numpy functions there
- performance-wise, it is not clear that this will pay off so much. vectorizing the volume functions is probably wiser

# Characterize what is happening

In [1]:
import pandas as pd
import numpy as np

The original gross volume function checks that top height is greater than 0


``` python
def GrossTotalVolume_Pl(BA_Pl, topHeight_Pl):
    Tvol_Pl = 0

    if topHeight_Pl > 0:
        a1 = 0.194086
        a2 = 0.988276
        a3 = 0.949346
        a4 = -3.39036
        Tvol_Pl = a1* (BA_Pl**a2) * (topHeight_Pl **a3) * numpy.exp(1+(a4/((topHeight_Pl**2)+1)))

    return Tvol_Pl
```

This makes it fail if trying to use it on an array:

In [22]:
from gypsy.GYPSYNonSpatial import GrossTotalVolume_Pl


GrossTotalVolume_Pl(np.random.random(10) * 100, np.random.random(10) * 100)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

# MWEs

If we can rewrite it to handle 0s properly, i.e. to return 0 where an input is 0, then it is trivial to vectorize

In [24]:
def GrossTotalVolume_Pl_arr(BA_Pl, topHeight_Pl):
    a1 = 0.194086
    a2 = 0.988276
    a3 = 0.949346
    a4 = -3.39036
    Tvol_Pl = a1* (BA_Pl**a2) * (topHeight_Pl **a3) * np.exp(1+(a4/((topHeight_Pl**2)+1)))

    return Tvol_Pl

print(GrossTotalVolume_Pl_arr(10, 10))
print(GrossTotalVolume_Pl_arr(0, 10))
print(GrossTotalVolume_Pl_arr(10, 0))
print(GrossTotalVolume_Pl_arr(np.random.random(10) * 100, np.random.random(10) * 100))
print(GrossTotalVolume_Pl_arr(np.zeros(10) * 100, np.random.random(10) * 100))

44.1908473047
0.0
0.0
[  415.44083176  3769.55471466  1288.45549553    49.77278285   298.91564963
  1367.74822794   729.29619243   906.68358934  2393.18506461   385.18108024]
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]


### Timings

In [29]:
ba = np.random.random(1000) * 100
top_height = np.random.random(1000) * 100
d = pd.DataFrame({'ba': ba, 'th': top_height})

In [30]:
%%timeit
d.apply(
    lambda x: GrossTotalVolume_Pl(
        x.at['ba'],
        x.at['th']
    ),
    axis=1
)

10 loops, best of 3: 31.9 ms per loop


In [32]:
%%timeit
GrossTotalVolume_Pl_arr(ba, top_height)

10000 loops, best of 3: 165 µs per loop


The array method is 20x faster. This is worth implementing. We should also add tests to help be explicity about the behaviour of these volume functions.

# Revise the code

Go on. Do it.

# Tests

Yes, though data was changed as the new implementations yield NaN where input is NaN, instead of yielding 0.

# Review code changes

In [33]:
%%bash
git log --since "2016-11-14 19:30" --oneline # 19:30 GMT/UTC

3694134 Update test data
ab526af Vectorise the merch vol functions
4aa48dc Fixup merch vol constants
bd1239b Refactor merch volume functions
4f6757d Remove species checks in the merch vol functions
23d2674 Remove species suffix where not needed
36bc44a Add reminder for testing gross total volume
a26b09c Update test data-vectorized volume functs yield na
7b70fcf vectorize the gross total volume functions
5c42031 Add notebook for 3rd iteration - vectorize volumes


In [34]:
! git diff "HEAD~$(git log --since "2016-11-14 19:30" --oneline | wc -l)" ../gypsy

[1mdiff --git a/gypsy/GYPSYNonSpatial.py b/gypsy/GYPSYNonSpatial.py[m
[1mindex 7b93c0f..ec1d1fc 100644[m
[1m--- a/gypsy/GYPSYNonSpatial.py[m
[1m+++ b/gypsy/GYPSYNonSpatial.py[m
[36m@@ -1381,82 +1381,88 @@[m [mdef BAfromZeroToDataPl(startTage, startTagePl, y2bh_Pl, SC_Pl, SI_bh_Pl,[m
     return basal_area_arr[m
 [m
 [m
[31m-def GrossTotalVolume_Aw(BA_Aw, topHeight_Aw):[m
[31m-    '''Gross total volume is estimated only using species specific Basal Area and[m
[31m-    Top height[m
[32m+[m[32mdef GrossTotalVolume_Aw(basal_area, top_height):[m
[32m+[m[32m    ''' White Aspen Gross Total Volume[m
 [m
[31m-    :param float BA_Aw: basal area of Aw[m
[31m-    :param float topHeight_Aw: top height of Aw[m
[32m+[m[32m    Note that inputs may be scalars, or numpy arrays.[m
[32m+[m
[32m+[m[32m    :param float basal_area: basal area[m
[32m+[m[32m    :param float top_height: top height[m
 [m
     '''[m
[31m-    Tvol_Aw = 0[m


# Run timings

From last time:

```
real	5m36.407s
user	5m25.740s
sys 	0m2.140s
```

After cython'ing iter functions:

In [36]:
%%bash
# git checkout 36941343aca2df763f93192abef461093918fff4 -b vectorize-volume-functions
# time gypsy simulate ../private-data/prepped_random_sample_300.csv --output-dir tmp
# rm -rfd tmp

# real	4m51.287s
# user	4m41.770s
# sys	0m1.070s

In [38]:
45/336.

0.13392857142857142

It yielded a 13% reduction in the time.

# Run profiling

In [2]:
from gypsy.forward_simulation import simulate_forwards_df

In [3]:
data = pd.read_csv('../private-data/prepped_random_sample_300.csv', index_col=0, nrows=10)

In [4]:
%%prun -D forward-sim-3.prof -T forward-sim-3.txt -q
result = simulate_forwards_df(data)

 
*** Profile stats marshalled to file u'forward-sim-3.prof'. 

*** Profile printout saved to text file u'forward-sim-3.txt'. 


In [5]:
!head forward-sim-3.txt

         1076718 function calls (1074662 primitive calls) in 13.756 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   492069    6.234    0.000    6.234    0.000 GYPSYNonSpatial.py:428(BasalAreaIncrementNonSpatialAw)
     7191    2.179    0.000    8.563    0.001 GYPSYNonSpatial.py:956(BAfromZeroToDataAw)
    31044    0.734    0.000    0.734    0.000 GYPSYNonSpatial.py:585(BasalAreaIncrementNonSpatialSw)
    16426    0.338    0.000    0.338    0.000 GYPSYNonSpatial.py:683(BasalAreaIncrementNonSpatialPl)
    79970    0.279    0.000    0.348    0.000 {isinstance}


# Compare performance visualizations

Now use either of these commands to visualize the profiling

```
pyprof2calltree -k -i forward-sim-1.prof forward-sim-3.txt

# or

dc run --service-ports snakeviz notebooks/forward-sim-3.prof
```

### Old

![2nd iteration performance](forward-sim-2a-performance.png)

### New

![3nd iteration performance](forward-sim-3-performance.png)

## Summary of performance improvements

The calculation of gross and merchantable volume is drastically faster now; under profiling it decrease to 1 second from 22 seconds.

A lot of that seems to be profiler overhead, as when using gypsy simulate CLI it only got 15% faster; however I expect i/o is obfuscating the outcome there.


# Profile with I/O


In [None]:
! rm -rfd gypsy-output

In [None]:
output_dir = 'gypsy-output'

In [20]:
%%prun -D forward-sim-2.prof -T forward-sim-2.txt -q
# restart the kernel first
data = pd.read_csv('../private-data/prepped_random_sample_300.csv', index_col=0, nrows=10)
result = simulate_forwards_df(data)
os.makedirs(output_dir)
for plot_id, df in result.items():
    filename = '%s.csv' % plot_id
    output_path = os.path.join(output_dir, filename)
    df.to_csv(output_path)


 
*** Profile stats marshalled to file u'forward-sim-1.prof'. 

*** Profile printout saved to text file u'forward-sim-1.txt'. 


# Identify new areas to optimize



- from last time:
    - parallel (3 cores) gets us to 2 - 6 days - save for last
    - AWS with 36 cores gets us to 4 - 12 hours ($6.70 - $20.10 USD on a c4.8xlarge instance in US West Region)
    - aws lambda and split up the data 
- now:
    - cython for icnrement functions epsecially bA
    - 