<!-- no collapse -->
# Code Golfing the BP D&A Python Coding Challenge

As I was testing our questions from the coding challenge, I started challenging myself by "code golfing" the solutions.  This refers to a game wherein you try to solve a problem with the least amount of code.  I decided to aim to write a solution to each problem in a single _logical_ line.  I wasn't trying to minimize characters, and in one case opted for a solution which was slightly longer by character count, as I found it more elegant.

> By logical line, I mean a single Python statement.  That means that
> ```python
> x = (1 +
>      2)
> ```
> counts as a single logical line, while
> ```python
> x = 1; y = 2
> ```
> counts as two.  I'm also not counting import lines because, hey, I get to make the rules.

To be clear, you should **NEVER write code in this style** for any production process.  Python is a great language in part because of its focus on readability.  Many of these solutions throw away readability for terseness, and that is a poor trade off!

However, this process required me to make use of a number interesting Python features, and I thought Python students might like to learn some of them. These solutions do demonstrate a number of advanced techniques, like:
- list, dictionary, and generator comprehensions
- `map`
- key functions for sorting
- `lambda` functions
- Pandas pipelining
- Pandas query method
- double comprehensions

So treat this as a visit to a cabinet of curiosities.  You may be surprised, shocked, or disgusted, but hopefully you come away with a renewed appreciation for the everyday.

## Question 0

*I'll summarize each question in italics as a reminder.  This one is just an import.*

In [None]:
from grade import grade, get_grade_code, parse_grade_code

And since imports don't count as lines, I'll count this as zero!

## Question 1

*Append the value `33` to the variable `my_list`.*

There's nothing too interesting here.  The most straightforward approach is the shortest.

In [None]:
%%grade append

my_list.append(33)

## Question 2

*Append `66` and `99` to `my_list`.*

The question is trying to lead you to call `my_list.append` twice.  But Python lists also have an `extend` method that lets you add multiple items to a list.

In [None]:
%%grade appendmult

my_list.extend([66, 99])

## Question 3

*Assemble a list of squared values.*

The question text was leading you towards writing a for loop, but a list comprehension is shorter.

In [None]:
%%grade appendfor

squared = [i**2 for i in some_list]

In this case, the code-golfed answer is actually the more Pythonic solution.  Comprehensions are preferred over for loops in most cases, both for performance and readability.  Any time you find yourself starting with an empty and appending a value to it for each element in another list, you should consider replacing this code with a comprehension!

## Question 4

*Filter a list to just those divisible by 5 or 7.*

Comprehensions can include an `if` clause to include an element only when a condition is true.

In [None]:
%%grade modulo

new_list = [i for i in numbers_list if (i % 5) * (i % 7) == 0]

The more Pythonic solution would be to write `i % 5 == 0 or i % 7 == 0`.  Their product will be zero if either is zero, so that saves a few characters.

I think there is a solution that would check if `i % 35` was one of several values, but I never worked out the math.

## Question 5

*Construct a dictionary mapping odd numbers to their cubes.*

Python also supports dictionary comprehensions.  This is both shorter and more Pythonic than assembling the dictionary in a `for` loop.  Filtering the even numbers out would work, but here we use the fact that the `range` function takes an optional third argument, the step size.

In [None]:
%%grade cubes

cubes = {i: i**3 for i in range(3, 20, 2)}

## Question 6

*Write a function to sum the digits in a number.*

When we iterate over a string, we get the digits of that string.  The `map` function iterates its second argument, so `map(int, digits)` will produce integers corresponding to the digits in the string `digits`.  Then `sum` can add them all up.

Lambda functions are generally used for anonymous functions, but there's no reason you can't assign them to a name.  While you can actually define a standard function on a single line, the return is not implied, so the lambda is shorter.

In [None]:
%%grade digits

sum_digits = lambda n: sum(map(int, str(n)))

## Question 7

*Find the ID of the employee with the maximum salary.*

When you iterate over a dictionary, you get the keys.  This happens in any iteration, so `max(salaries_dict)` would return the maximum key.  Normally, this would be maximum in lexicographic order, but the `max` function takes a `key` argument.  This argument should be a function that transforms the input into a value that will be used for the comparisons.  In this case, we provide a function that looks up the salary for each ID, causing `max` to return the ID with the maximum salary.

In [None]:
%%grade salaries

max_salary_id = max(salaries_dict, key=lambda k: salaries_dict[k]['salary'])

## Question 8

*Sum all integers passed as arguments to a function.*

Lambda functions can take `*args` and `**kw`, just like named functions.  By omitting the square brackets around the comprehension, we have produced a *generator comprehension*.  This is iterable, but it doesn't calculate the values in advance, like a list comprehension.  In many cases, generator comprehensions need to be surrounded by a set of parenthesis, but in this case, the parenthesis from the `sum` function make the syntax unambiguous.

In [None]:
%%grade var_args

sum_numbers = lambda *args: sum(a for a in args if isinstance(a, int))

## Question 9

*Implement a Student class.*

Alack and alas!  I could not get this down to a single line.  But that doesn't mean there isn't room for shenanigans.

The shortest straightforward solution would use inheritance to avoid redefining what's already in the `Person` class.

In [None]:
%%grade classes

class Student(Person):
    def __init__(self, first, last, topics):
        super().__init__(first, last)
        self.topics = topics
    def num_topics(self):
        return len(self.topics)

However, Python objects can have arbitrary attributes added to them.  So instead of making a new class, we can save a line with a function that makes an instance of `Person` and then adds appropriate attributes.  A lambda function attached as an attribute ends up acting much like an instance method, although it is using a closure to maintain a reference to the object.

In [None]:
%%grade classes

def Student(first, last, topics):
    p = Person(first, last)
    p.topics = topics
    p.num_topics = lambda: len(p.topics)
    return p

This works in part because the grader doesn't check whether `Student` is a class.  It just calls it and expects an object to be returned.

## Question 10

*Calculate the number of times the most common complaint type occurs.*

Panda's method chaining makes it easy to write quite a long, complex calculation on a single line, so these almost feels like cheating.

Note that the use of grouping parentheses, which are usually preferred over line continuation characters.

In [None]:
import pandas as pd

In [None]:
%%grade most_common

num_most_common = (pd.read_csv('data/311complaints_2009_001.csv')['Complaint Type']
                   .value_counts().max())

## Question 11

*Which agency had the fewest complaints assigned during March?*

The usual way I would find the entries from March would be to load the data to a DataFrame `df` and then filter:
```python
df[df['Created Date'].dt.month == 3]
```
But this would require at least two lines of code, which is one too many!  Instead, we make use of the fact that, if a DataFrame has a datetime index, slices of `.loc` will get all rows in a particular time interval.

The `.idxmax` method, like `np.argmax`, finds the index of the maximum value.

In [None]:
%%grade fewest

agency_fewest = (pd.read_csv('data/311complaints_2009_001.csv', parse_dates=['Created Date'])
                 .set_index('Created Date')
                 .loc['2009-03-01':'2009-03-31', 'Agency']
                 .value_counts()
                 .idxmin())

## Question 12

*Find the agency assigned the most noise complaints.*

The pandas `concat` function takes any iterable of DataFrames, so we can pass it a generator comprehension.  The `query` method on DataFrames is quite capable.

In [None]:
import glob

In [None]:
%%grade noise

agency_noise = (pd.concat(pd.read_csv(f) 
                          for f in glob.glob('data/311complaints_2009_*.csv'))
                .query('`Complaint Type`.str.contains("Noise")')['Agency']
                .value_counts()
                .idxmax())

## Question 13

In [None]:
import requests
import bs4

*Find the average book price by scraping a website.*

We need to find every book on every page.  A straightforward solution with nested comprehensions would produce a nested structure as a result.  Using a double comprehension gives us a flattened sequence of prices, which can be summed easily.

This takes advantage of the fact we already know the count of books.  Calculating that at the same time as the sum would be possible, but it'd be much more complicated.

In [None]:
%%grade books

avg_book_price = sum(float(el.text[2:]) for i in range(1, 51) 
                     for el in bs4.BeautifulSoup(
                         requests.get(f'http://books.toscrape.com/catalogue/page-{i}.html').text
                     ).select('.price_color')) / 1000

## Question 14

*Create a decorator to ensure a function returns a non-negative value.*

Because a lambda function definition is an expression, it can occur inside of another lambda function.

I am unreasonably proud of this line.  It's the first Python I've ever written that actually looks like the [Lambda calculus](https://en.wikipedia.org/wiki/Lambda_calculus).

In [None]:
%%grade decorator

ensure_nonneg = lambda f: lambda *args: max(f(*args), 0)

## Question 15

*Train a churn model.*

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LogisticRegression

In [None]:
features_to_use = ['total intl minutes', 'total eve minutes', 'total day minutes',
                   'total intl calls', 'total eve calls', 'total day calls',
                   'state', 'number vmail messages', 'international plan', 'voice mail plan', 
                   'customer service calls', 'account length']

Although it looks pretty long, the following straightforward solution is actually only two logical lines.  It takes advantage of the fact that scikit-learn estimators are set up to allow method chaining on the `fit` method.

In [None]:
%%grade churn

df = pd.read_csv('data/Customer_telecom.csv')
churn_model = Pipeline([
    ('cols', ColumnTransformer([
        ('ohe', OneHotEncoder(), ['state', 'international plan', 'voice mail plan'])
    ], remainder='passthrough')),
    ('clf', LogisticRegression(solver='newton-cg'))
]).fit(df[features_to_use], df['churn'])

I could reduce it down to a single logical line by loading the CSV file twice, once for the feature matrix and once for the label vector.

In [None]:
%%grade churn

churn_model = Pipeline([
    ('cols', ColumnTransformer([
        ('ohe', OneHotEncoder(), ['state', 'international plan', 'voice mail plan'])
    ], remainder='passthrough')),
    ('clf', LogisticRegression(solver='newton-cg'))
]).fit(
    pd.read_csv('data/Customer_telecom.csv')[features_to_use],
    pd.read_csv('data/Customer_telecom.csv')['churn']
)

But this feels wasteful.  Yes it got me to a single logical line, but loading the data twice is inelegant!

Instead, I'll create an anonymous function that takes a full DataFrame as an argument, and then immediately call this with the DataFrame loaded from the file.  So much more elegant, and it even saves a few characters.

In [None]:
%%grade churn

churn_model = (lambda df: Pipeline([
    ('cols', ColumnTransformer([
        ('ohe', OneHotEncoder(), ['state', 'international plan', 'voice mail plan'])
    ], remainder='passthrough')),
    ('clf', LogisticRegression(solver='newton-cg'))
]).fit(df[features_to_use], df['churn']))(
    pd.read_csv('data/Customer_telecom.csv')
)

## Conclusion

There we go: 14/15 on getting solutions onto a single line.

To reiterate the message from the beginning, **don't write code like this** for anything other than fun.  As the Zen of Python says, "Readability counts."  That said, all of the techniques used here are valid Python idioms, and they have their place.  Use them appropriately, to make your code clearer, and don't try to stuff everything into a single statement.

*Copyright &copy; 2021 Pragmatic Institute. Redistribution or publication of this material is strictly prohibited.*