<a href="https://github.com/theonaunheim">
    <img style="border-radius: 100%; float: right;" src="static/strawberry_thief_square.png" width=10% alt="Theo Naunheim's Github">
</a>

<br style="clear: both">
<hr>
<br>

<h1 align='center'>Advanced Methods</h1>

<br>

<div style="display: table; width: 100%">
    <div style="display: table-row; width: 100%;">
        <div style="display: table-cell; width: 50%; vertical-align: middle;">
            <img src="static/red_panda.jpg" width="80%">
        </div>
        <div style="display: table-cell; width: 10%">
        </div>
        <div style="display: table-cell; width: 40%; vertical-align: top;">
            <blockquote>
                <p style="font-style: italic;">"I have yet to see any problem, however complicated, which when you looked at it in the right way, did not become more complicated."</p>
                <br>
                <p>-Poul Anderson</p>
            </blockquote>
        </div>
    </div>
</div>


<br>




<br>

<div align='left'>
    Image courtesy of <a href='https://commons.wikimedia.org/wiki/File:Red_Panda_-_Nashville_Zoo.jpg'>Pmeenen</a> under the <a href='https://creativecommons.org/licenses/by/2.5/deed.en'>CC BY 2.5</a>
</div>

<hr>

In [15]:
# Import stuff so we can use libraries.
import numpy as np
import pandas as pd

# Use %matplotlib inline to plot in this screen.
import matplotlib
%matplotlib inline

## Advanced Methods

In addition to the methods we discussed previously, there is a set of special methods that warrant separate examination. They are:

* **[map](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.map.html)**: allows you conveniently transform each value in the Series with a function or key-value mapping.
* **[apply](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html#pandas.Series.apply)**: run a function on an entire Series.
* **[groupby](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.groupby.html)**: create a special object for analyzing groups collectively.
* **[rolling](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html)**: create a special object for performing window functions.
* **[str namespace methods](https://pandas.pydata.org/pandas-docs/stable/api.html#string-handling)**: a namespace/collection of vectorized Python string functions.
* **[dt namespace methods](https://pandas.pydata.org/pandas-docs/stable/api.html#datetimelike-properties)**: a namespace/collection of methods for dealing with datetimes.
* **[plot](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.plot.html)**: this function creates different types of visualizations.

# Series.map()

The map method creates a new Series with each item in the Series transformed in a similar way. Map can use a variety of inputs, but most commonly uses either a 1) a function, 2) a dictionary or Series.

This is useful for when you need an arbitrary function instead of the default vectorized operations attached to the Series. Note: you generally won't get the speed of a true vectorized operation, but it does let you use Pandas syntax.

### Functions Passed to Map()

If you pass a function to map, it will:

1. Take the Series.
2. For each value in the Series, pass the value as an argument for a specific function you passed.


    new_value = function(old_value)

3. Stuff that new value (can be a different type) into a new Series.
4. Return the new Series, which is the same length as the old Series.

This makes more sense when you see it in action.

In [25]:
# We define an arbitrary mapping function
def is_2(input_item):
    '''This function:
    
       Returns True if input is 2.
       Returns False if input is greater than 2.
       Does not provide a return value otherwise (implicitly returns None).
       
    '''
    # Print newline.
    print(f'\nRunning is_2() for item {input_item}', end=' ... ')
    
    # Check if equal to 2.
    if input_item == 2:
        print(f'Input item {input_item} is equal to 2! Returning True.')
        return True
    
    # Check if greater than 2
    if input_item > 2:
        print(f'Input item {input_item} is greater than 2! Returning False.')
        return False
    
    print(f'Input item {input_item} not equal or greater than 2! Implicitly returning None.')
    
    # Fall off end.


# Running for individual values (defaults to verbose=True)
false_return = is_2(5)
true_return  = is_2(2)
none_return  = is_2(-5)


Running is_2() for item 5 ... Input item 5 is greater than 2! Returning False.

Running is_2() for item 2 ... Input item 2 is equal to 2! Returning True.

Running is_2() for item -5 ... Input item -5 not equal or greater than 2! Implicitly returning None.


In [30]:
# Create a series
s1 = pd.Series([1,2,3])

# Note, you want to pass a function object to our function without calling it ...
# Yes: series.map(function)
# No!: series.map(function())
# Which returns a transformed Series.
s2 = s1.map(is_2)

# Which is now a regular Series we can now use as a boolean indexer
print()
print(s1.loc[s2.fillna(False)])

# Showing transformed series
s2


Running is_2() for item 1 ... Input item 1 not equal or greater than 2! Implicitly returning None.

Running is_2() for item 2 ... Input item 2 is equal to 2! Returning True.

Running is_2() for item 3 ... Input item 3 is greater than 2! Returning False.

1    2
dtype: int64


0     None
1     True
2    False
dtype: object

In [23]:
# Again, this can be whatever function that takes a single argument. E.g. get log of number.
s1.map(np.log)

0    0.000000
1    0.693147
2    1.098612
dtype: float64

### Dictionaries or Series passed to map()

This is much simpler. It's a lot like using a surrogate key in SQL. This will:

1. Take the Series.
2. For each value in Series:

    a. if using a mapping dict, if the value is a key, replace with the value.
    
    b. if using a mapping Series, if the value is an index value, replace with the data value.
    
    c. if the value is not found, replace with np.NaN (which may change your dtype)

Again, this makes more sense when you see it in action.

In [45]:
# Data Series
s3 = pd.Series(['one', 'two', 'three', 'four', 'five'])

# Mapping Series
s4 = pd.Series(
    index=['one', 'two', 'four'],
    data=['uno', 'dos', 'cuatro']
)

# Equivalent mapping dict
map_dict = {'one': 'uno', 'two': 'dos', 'four': 'cuatro'}

# Original series 
print('\nShowing original:\n')
print(s3)

# Mapping dict
print('\nShowing mapping Series:\n')
print(s4)

# Showing output
print('\nShowing the new mapped Series:\n')
s5 = s3.map(s4)
s5 = s3.map(map_dict)
print(s5)


Showing original:

0      one
1      two
2    three
3     four
4     five
dtype: object

Showing mapping Series:

one        uno
two        dos
four    cuatro
dtype: object

Showing the new mapped Series:

0       uno
1       dos
2       NaN
3    cuatro
4       NaN
dtype: object


In [None]:
# Apply

In [None]:
# Groupby

In [None]:
# Rolling

In [None]:
# String Namespace

In [None]:
# Datetime Namespace

In [None]:
# Plot

# Additional Learing Resources

* ### [Pandas Split-Apply-Combine](https://pandas.pydata.org/pandas-docs/stable/groupby.html)
* ### [Pandas Computational Tools](http://pandas.pydata.org/pandas-docs/stable/computation.html)
* ### [Pandas Datetime Methods](https://pandas.pydata.org/pandas-docs/stable/api.html#datetimelike-properties)
* ### [Pandas String Methods](https://pandas.pydata.org/pandas-docs/stable/api.html#string-handling)
* ### [Pandas Vizualization](https://pandas.pydata.org/pandas-docs/stable/visualization.html)

---

# Next Up: [Preprocessing](3_preprocessing.ipynb)

<br>

<img style="margin-left: 0;" src="static/log_transform.svg" width="20%">

<br>

<div align='left'>
    Image courtesy of <a href='https://commons.wikimedia.org/wiki/File:Population_vs_area.svg'>Skbkekas</a> under the <a href='https://creativecommons.org/licenses/by-sa/3.0/deed.en'>CC BY-SA 3.0</a>
</div>

---