# Transforming the data

In [None]:
# Import pandas
import pandas as pd

# 1. `.map()` method

- this method on Series accepts a function or dict-like object containing a mapping
- it is a convenient way to perform element-wise transformations and other data cleaning-related operations
- it takes a function as a parameter along with a sequence of iterables (list, tuple, dictionary, set, or Series) and returns an output after applying the function to each iterable which is present in the sequence
- the resulting values can be passed to `list()` function or `set()` function to create a list or a set

Example code:

`map(function, iterable)`

In [None]:
# Run this code
our_list = ['This', 'is', 'the', 'first', 'example']

In [None]:
# Use .map() method to get the length of the words in our_list
# Pass .list() function to create a list of resulting values
result = list(map(len, our_list))
print(result)

In the above example, `.map()` method iterates over the our_list, applies function on each element and return the length of the strings as a new list.

Here, we create function triple and pandas Series numbers that will be our iterable.

In [None]:
# Run this code 
def triple(x):
  return x * 3

In [None]:
# Run this code
numbers = pd.Series([15, 4, 8, 45, 36, 7])

In [None]:
# TASK 1 >>>> Apply .map() method with function triple on pandas Series 'numbers' and store it in variable result_2 
#             Print the result_2 (the result should by numbers multiply by 3)

# 2. `.apply()` method

- this method applies a function along an axis of the DataFrame $^{1}$ 
- it also works elementwise but is suited to more complex functions and operations

You can find nice comparison of `.map()` and `.apply()` methods and when to use them in [this article on stackoverflow](https://stackoverflow.com/questions/19798153/difference-between-map-applymap-and-apply-methods-in-pandas) 

In [None]:
# Run this code
students = [(1, 'Robert', 30, 'Slovakia', 26),
           (2, 'Jana', 29, 'Sweden' , 27),
           (3, 'Martin', 31, 'Sweden', 26),
           (4, 'Kristina', 26,'Germany' , 30),
           (5, 'Peter', 33, 'Austria' , 22),
           (6, 'Nikola', 25, 'USA', 23),
           (7, 'Andrej', 25, 'USA', 26)]

students_1 = pd.DataFrame(students, columns= ['student_id', 'first_name', 'age', 'city', 'score'])
print(students_1)

`.apply()` method accepts user-defined function that applies a transformation/aggregation on a DataFrame (or Series) as well.

In [None]:
# Run this code to create regular function
def score_func(x): 
    if x < 25: 
        return "Retake exam" 
    else: 
        return "Pass"

In [None]:
# Use .apply() along with score_func that 
students_1['result'] = students_1.score.apply(score_func)
print(students_1)

As we already know, regular functions are created using `def` keyword. These type of functions can have any number of arguments and expressions.

In [None]:
# Example of regular function
def multi_add(x):
    return x * 2 + 5

In [None]:
result_1 = multi_add(5)
print(result_1)

Let's compare this regular function `multi_add` with another type of functions that are called `lambda`.

**Lambda Function**:

- it is an anonymous function (it means that it's defined without the name)
- lambda can have any number of parameters, but the function body can only **contain one expression** (you can not write multiple statements in the body of lambda function) = it is used for *one-line expressions*
- it returns a function object which can be assigned to variable

General syntax: `lambda x: x`

In [None]:
# Example oflambda function
our_lambda = lambda x: x * 2 + 5
print(our_lambda(5))

This simple lambda function takes input x (in our case number 5), multiply it by 2 and add 5.
Lambda functions are commonly used along `.apply()` method and can be really useful.

For now, imagine that the score of students has not been correctly recorded and we need to multiply it with 10. We use lambda function along with `apply()` and assign it to the specific column of the dataset ('score'). 

In [None]:
students_1.score = students_1.score.apply(lambda x: x * 10)
print(students_1)

In [None]:
# TASK 2 >>>> Use .apply() method on column 'city' along with lambda to make words uppercase 
#             Do not forget assign it to this column

# References

$^{1}$ pandas. pandas.DataFrame.apply. [ONLINE] Available at: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html#pandas-dataframe-apply. [Accessed 14 September 2020].

Stackoverflow. Difference between map, applymap and apply methods in Pandas. [ONLINE] Available at: https://stackoverflow.com/questions/19798153/difference-between-map-applymap-and-apply-methods-in-pandas. [Accessed 14 September 2020].