## Our Testing Framework is a Runaway Success.

We have lots of open source clients,a booming Patreon, and packed weeks of conversations with potential sponsors.

Now our clients are asking us for more tools. This time, they're hoping that we'll make something to improve their workflow for data analysis. In particular, one client does lots of number crunching on Chicago open data, and they'd like something better than the csv library for exploring and analyzing the datasets.

So we start working on a library called `phoenixcell`.

We start with a `Series` object that extends a `list` and has an `apply` function.

In [None]:
import csv

class Series(list):
    def apply(self, func):
        self = [func(x) for x in self]
        return Series(self)

### Challenge: What does the apply function do?

1. What argument(s) does it take in?
2. What does it do with the argument(s)?
3. What does it modify?
4. What does it return?

See it in action here:

In [None]:
s = Series()
s.append(1)
s.append(2)
s.append(3)

result = s.apply(float)

We also make an object called `GroupBy` that extends a `dict` and has a fuction called `aggregate()`.

In [None]:
    class GroupBy(dict):
        def sum(self, column=None):
            return self.aggregate(column=column, using_func=sum)

        def average(self, column=None):
            def func(listo):
                return sum(listo) / len(listo)
            return self.aggregate(column=column, using_func=func)
        avg = average

        def count(self, column=None):
            return self.aggregate(column=column, using_func=len)

        def aggregate(self, column=None, func=None): 
            aggregator = {}
            if column == None:
                raise Exception("What column do you want aggregated?")
            if using_func == None:
                raise Exception(f"How do you want '{column}' aggregated?")
            else:
                for key in self.keys():
                    addends = [item[column] for item in self[key]]
                    aggregator[key] = using_func(addends)
            return aggregator

### Challenge: 

What does the aggregate function do?

1. What argument(s) does it take in?
2. What does it do with the argument(s)?
3. What does it modify?
4. What does it return?

Here's an example of it running: 

In [None]:
birds = GroupBy( \
  oriole = [ 
  {'species': 'oriole', 'specimen_id': '7dr4h32ss24g6t7f2', 'weight': 4.23},
  {'species': 'oriole', 'specimen_id': 'g6t7f2dr4h327ss24', 'weight': 4.17},
  {'species': 'oriole', 'specimen_id': 't77ss24g6f2dr4h32', 'weight': 5.21},
  ],
 bluejay = [
  {'species': 'bluejay', 'specimen_id': '88Jnnb323es29bs2f', 'weight': 5.0},
  {'species': 'bluejay', 'specimen_id': 'g6t3f2dr4h322ss24', 'weight': 6.32},
  {'species': 'bluejay', 'specimen_id': 'f2dr4t76ss24g6h32', 'weight': 5.21},
  {'species': 'bluejay', 'specimen_id': 't7f2312ss24g6dr4h', 'weight': 4.85},
  {'species': 'bluejay', 'specimen_id': '9f237ss24g6t8dr4h', 'weight': 5.69}
 ],
 titmouse = [
  {'species': 'titmouse', 'specimen_id': '1sn32ufks82d92b39', 'weight': 5.22},
  {'species': 'titmouse', 'specimen_id': '8sh2bdn4s24g6t7f2', 'weight': 2.13},
  {'species': 'titmouse', 'specimen_id': 'h38snsdr4h327ss24', 'weight': 3.1},
  {'species': 'titmouse', 'specimen_id': '32bf72f9m27f2dr4h', 'weight': 2.22},
  {'species': 'titmouse', 'specimen_id': '2b47f29fn34h47dn3', 'weight': 3.0},
  {'species': 'titmouse', 'specimen_id': 't77ss24g6f27s41md', 'weight': 2.98}
 ]
)

In [None]:
# This is kinda counterintuitive. What does this do?
# Run it and see if you can figure out what it's telling you.

birds.aggregate(column="specimen_id", using_func=len)

In [None]:
from statistics import median

birds.aggregate(column="weight", using_func=median)

### Challenge: 

`GroupBy` also has several **convenience functions** that _call_ the aggregate function with methods that are commonly used for aggregating.

Observe:

In [None]:
birds.average(column="weight")

In [None]:
birds.count(column="specimen_id")

In [None]:
# How many total ounces of bird we have in each species category
# Not sure how this would be useful but there it is
birds.sum(column="weight")

### Challenge

Write some new convenience functions for for `GroupBy`: 

- `min`: aggregate by finding the smallest value
- `max`: aggregate by finding the largest value
- `spread`: aggregate by finding the difference between the largest and the smallest value  

In [None]:
import sys
!{sys.executable} -m pip install colorama 

sys.path.insert(0, '..')
from test_framework_exercise.phoenix_test.matchers import FailedAssertion, Assertion, assert_that
from test_framework_exercise.phoenix_test.test import Test
sys.path.remove('..')

class GroupByTest(Test):
        
    def test_min_function(self):
        assert_that(birds.min(on="weight")).equals({'oriole': 4.17, 'bluejay': 4.85, 'titmouse': 2.13})
    
    def test_max_function(self):
        assert_that(birds.max(on="weight")).equals({'oriole': 5.21, 'bluejay': 6.32, 'titmouse': 5.22})
        
    def test_spread_function(self):
        assert_that(birds.spread(on="weight")).equals({'oriole': 1.04, 'bluejay': 1.4700000000000006, 'titmouse': 3.09})
        
GroupByTest().run()