# Advanced Data Manipulation

Thus far in this course we've covered a lot of the basic uses of python. However, the rabbit hole always gets deeper! In this section we'll be covering several concepts (that belong to many data manipulation frameworks) as well as some python specific ones.

What these are really openning the door to is an alternative to the Object Oriented Style of programming that we covered before. These all belong to the functional style, my personal favorite.

## So what is functional programming?

Wikipedia defines:

> In computer science, functional programming is a programming paradigm—a style of building the structure and elements of computer programs—that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data. It is a declarative programming paradigm, which means programming is done with expressions. In functional code, the output value of a function depends only on the arguments that are input to the function, so calling a function f twice with the same value for an argument x will produce the same result f(x) each time. 

Which is a perfect definition for our exploration. Now it may seem obvious but some of the code that we wrote for t his course violates this very principle. Let me show you another example.

Say that I've got an instance of a class

In [1]:
class Drone:
    power_system = "battery"
    def fly(self):
        return "The %s-powered drone is flying" % (self.power_system)
d = Drone()

In [2]:
def drone_flyer(drone):
    print(drone.fly())

In [3]:
drone_flyer(d)

The battery-powered drone is flying


Now that I've got that class, let's change out its power system.

In [4]:
d.power_system = "'Subway - Eat Fresh'"

In [5]:
drone_flyer(d)

The 'Subway - Eat Fresh'-powered drone is flying


We've created a perfect example. We are executing the exact same method and would expect the same output however because its *state* was *mutable*, we were able to fundamentally modify this object. You may think to yourself "well I just won't change it" and unfortunately you're wrong. You may not intentionally do so but maybe a user of your system will or you may write the code and change it later.

Once you've started working in an object oriented style - you've got to think about and manage state. We can start controlling this through *properties* like we did previously but the functional programmer would say that these are all just band-aids to the core problem. A program should have no state, that is, given the same input, the function should always return the same output.

More simply, **once something is created - you shouldn't be able to change its value.**

Let's walk through another example.

In [6]:
d = Drone()

def drone_changer():
    d.power_system = "I've made a huge mistake"

print(d.fly())
drone_changer()
print(d.fly())

The battery-powered drone is flying
The I've made a huge mistake-powered drone is flying


While these seems innocuous this is honestly one of the WORST things you can do coding because this function has a *side-effect*. That means that it is modifying something outside of its own scope.

A good way of knowing whether or not something has a side-effect, does moving the function to another file make it useless? In this case, unless that file as an instance of drone d it does. Now why is this so bad, because it makes it difficult to reason about the program. Imagine if you had 50 functions that all had the potential to change the Drone instance. Anyone gong to read your code would have no idea what *state* the drone was at.

Now these are problems or issues you may not have encountered yet but I promise you they exist and are very real. Let's go over some functional programming concepts available in python to try and handle this.

## Mapping

Mapping is basically mapping one value to another one, almost like a dictionary. This is a functional programming concept but can be useful in certain circumstances and will certainly come up in your data analysis career. This is a fundamental part of the MapReduce style of programming popular in big data.

Let's explore how it works.

In [7]:
x = range(0,10)

In [8]:
x

range(0, 10)

Now that we have a range of integers, we're going to want to apply a transformation to each value in that list. For example, let's cube every value in that list. So we write our cube function which in theory operates on one individual datum.

In [9]:
def cube(num):
    return num ** 3

Now you may this a for loop is the way to do this, but it's really not because we have no easy way of capturing the output. We can put it in a new list, but that makes it mutable. Here's that example.

In [11]:
new_list = []
for item in x:
    new_list.append(cube(item))

In [12]:
print(new_list)

[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]


Why don't we just do this? Because we had to create a mutable varible new_list to do it. Something that is completely unnecessary and is a mutable value.

Now our solution is easy, we just create a map. Remember what mapping does, it maps a value to another value via a functional transformation. it does this with no mutability and no side-effects. It's also much more concise and easy to read.

In [13]:
map_list = map(cube, x)

In [14]:
print(list(map_list))

[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]


We have to convert it to a list because it gives us back a generator. That means that functional transformations are *lazily-evaluated*. What does that mean? Well it means that python won't execute this code until the very last second when it's needed - this means there is very little waste in that if we don't end up needing a transformation, we don't have to perform it.

## Filters

Now filters are more self-explanatory than maps. They're just what they sound like : they allow you filter filter certain values that meet a criteria out of a list.

In [15]:
x

range(0, 10)

We've got our good 'ol range again, let's create a function that checks whether or not a function is divisible by two.

In [16]:
def divis_by_2(num):
    return num % 2 == 0

In [17]:
divis_by_2(2)

True

In [18]:
divis_by_2(3)

False

Now filters can be thought of as a special kind of map. For example, using divis_by_2. We're mapping a value to true or false depending on whether or not it's divisible by 2.

In [19]:
list(map(divis_by_2, x))

[True, False, True, False, True, False, True, False, True, False]

Now what we do is remove all those values that are false.

In [20]:
list(filter(divis_by_2, x))

[0, 2, 4, 6, 8]

This is really valueable because we can start chaining a lot of these operations together and we'll always get the same output given the same input. Let's try it with a different example.

In [21]:
test_list = [
"hello",
"x,y,z. i like this.",
"this is two xx",
"this, x, here x is, here it is againx"
]

We'll check whether or not a string has 2 'x' characters.

In [22]:
def has_x(my_string):
    return my_string.count("x") >= 2

In [23]:
has_x("hello")

False

In [24]:
has_x("xxhello")

True

In [25]:
list(filter(has_x, test_list))

['this is two xx', 'this, x, here x is, here it is againx']

We'll filter out the values that don't match this criteria.

And remember, this is just like a map that just removes values.

In [26]:
list(map(has_x, test_list))

[False, False, True, True]

## List Comprehensions

So far we've covered the basics. However python has it's own special way of doing this stuff as well and that's called a list comprehension. Now these aren't always used (and you won't always use them) however they're very 'pythonic' - that is to say it's a very python programmer way of solving a problem. You'll certainly come across these as you read through code bases so they're a good thing to be comfortable with.

Now what are they? They're sort of like maps, and sort of like filters combined into one. We've got our x range like usual.

In [27]:
x

range(0, 10)

Now what we'll do is cube every value in this list like we had done above.

In [28]:
[item**3 for item in x]

[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]

Now these are always strange for novice programmers because it kind of looks like a for loop and in a way it is, it's just a bit more compressed. and rather than outputing or appending to a list, we're just wrapping our it in brackets to tell python that that is what we're doing.

Let's do the same with multiplying each number by 2.

In [29]:
[item * 2 for item in x]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Let's add a little bit of filtering to it now. What if we filter by whether or not it's divisible by three and if it is we'll multiply it by 3.

In [30]:
[item * 3 for item in x if item % 3 == 0]

[0, 9, 18, 27]

I know it's strange that we can just tack on an if statement like that but we can, that's just the way that things are. We can of course do other things with list comprehensions too.

We can convert types, for example.

In [31]:
[float(item) for item in x]

[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

Or we can apply a more complicated function or one that we create outside of the loop

In [32]:
def cube_if(num):
    if num % 3 == 0:
        return num * 3
    else:
        return num * 2

In [33]:
[cube_if(z) for z in x]

[0, 2, 4, 9, 8, 10, 18, 14, 16, 27]

You'll notice that we don't have to call the thing we're iterating through 'item' that's just a convention, we can really call it whatever we want. Now list comprehensions don't fit every use case but they are useful and worth knowing about! Just keep in mind that they're really no different from maps and filters.

## Lambda Functions

The last part of this section concerns lambda functions. These are functions that are anonymous and don't need to be created as "official" functions prior to use. This means that you don't have to create functions like "square_a_num" when you want to square a number, you can just create a function to do so on the fly.

Let's go through a couple of ways to do this then introduce lambda functions.

We can create a list comprehension.

In [34]:
[item**2 for item in x]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

We can create a function to do it.

In [35]:
def square(num):
    return num**2

In [36]:
[square(z) for z in x]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

we can do a map with this operation

In [37]:
list(map(square, x))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

And we can create a lambda function!

In [38]:
list(map(lambda z: z**2, x))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Interesting huh!? Now they're going to take some getting used two but it makes it super simple when you're performing small operations on lists that you don't want to have to save somewhere.

The requirement for lambda functions is that they are single expressions of python code - they're for things that are simple.

We can also save that lambda function as an object though!

In [39]:
square_lambda = lambda z: z**2

In [40]:
list(map(square_lambda,x))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [41]:
print(type(square_lambda))

<class 'function'>


In python, functions can be passed around just like any other object. We see that with maps and filters too, we essentially telling python to take this function and for every value in this list, do this to it and give me back the result. This is a feature in a lot of other programming languages too and is one of my favorite features.

Let's look at what this allows us to do on a bigger scale... Let's say we want to square all values that are divisible by 2 from 1 to 20. We need to start from the outside and move in.

In [42]:
map(lambda z: z**2, filter(lambda z: z % 2 == 0, range(1,20)))

<map at 0x104063898>

Now you'll notice that we've got a map here, that's because of the lazy evaluation - we haven't asked for the result back so it hasn't given it to us. Let's take that and convert it to a list and we'll get the results back.

In [43]:
list(map(lambda z: z**2, filter(lambda z: z % 2 == 0, range(1,20))))

[4, 16, 36, 64, 100, 144, 196, 256, 324]

That's one like of code to do something pretty cool. Let's just reflect on how we would have written that before.

In [44]:
my_range = range(1,20)
output = []
for z in my_range:
    if z % 2 == 0:
        output.append(z ** 2)

The latter may seem more familiar but it's also a lot more prone to errors as we make code modifications later on, it's also more code lines and isn't nearly as extensible. This may seem trivial but I promise you that it's not.

Now there are a lot of other reasons to love functional programming but these are some of the reasons why you most definitely should. This way of thinking is certainly a departure from how you might be accustomed to thinking, but it's most certainly valid and extremely useful in future contexts.

We're going to see a lot of this style of programming the coming chapters so be sure to get used to it!