# Series Methods More

## Overview

In the previous chapter, we covered the most essential and common attributes along with the statistical methods for pandas Series objects. In this chapter, we cover several other useful and common methods from the [Series API](http://pandas.pydata.org/pandas-docs/stable/api.html#series).

### Objectives

* Understand how to use the following methods to handle missing data: `isna`, `notna`, `fillna`, `dropna`
* Find the percentage of missing values by chaining the `mean` method to the `isna` method
* Sort values and the index with `sort_values` and `sort_index`

In [None]:
import pandas as pd
movie = pd.read_csv('../data/movie.csv', index_col='title')
movie.head()

In [None]:
duration = movie['duration']
duration.head()

## Methods for handling missing values
pandas provides the following methods to handle missing values:

* `isna` - Returns a Series of booleans based on whether each value is missing or not
* `notna` - Exact opposite of `isna`
* `fillna` - Fills missing values in a variety of ways
* `dropna` - Drops the missing values from the Series

### Counting the number of missing values
pandas doesn't have a single method that counts the number of missing values, so you can find them in two ways. 

* Use the `count` method to find the number of non-missing values and subtract this from the total number of values
* Use the `isna` method to return a Series of booleans and chain the `sum` method

In [None]:
len(duration) - duration.count()

In [None]:
duration.isna().sum()

### Finding the percentage of missing values
To find the percentage of missing values in a Series we can chain the `mean` method to the `isna` method.

In [None]:
duration.isna().mean()

### Alternate calculation
The last calculation might be confusing. We could have been more explicit and calculated the percentage of missing values by dividing the number missing by the total size of the Series as done below.

In [None]:
total = len(duration)
num_missing = total - duration.count()
num_missing / total

### Why does taking the mean of the boolean Series work?
The mean is defined as the sum divided by the total. The sum in this case is the sum of all `True` values which is just the number of missing values.

## Filling missing values
Occasionally, it will be necessary to fill missing values. pandas provides the `fillna` method to do so. There are many strategies on how to replace missing values. We will only cover how to fill the missing values with a constant here. A popular choice is to use the median or mean of the Series.

In [None]:
duration.head()

First, let's find the median of the Series.

In [None]:
median = duration.median()

Now, we can fill in all of the missing values with the median.

In [None]:
duration.fillna(median).head()

You can use any constant number directly as well:

In [None]:
duration.fillna(-99).head()

### Dropping missing values
The `dropna` method simply removes the values from the Series that are missing. Notice that the size of the Series has decreased.

In [None]:
duration.dropna().size

## Sorting
The `sort_values` method sorts the Series from least to greatest by default. It places missing values at the end.

In [None]:
duration.sort_values().head()

The `ascending` parameter can be set to `False` to sort from greatest to least:

In [None]:
duration.sort_values(ascending=False).head()

### Sorting the index
Since Series also have an index, pandas allows you to sort by it as well with the `sort_index` method.

In [None]:
duration.sort_index().head()

In [None]:
duration.sort_index(ascending=False).head()

## Exercises

### Exercise 1
<span  style="color:green; font-size:16px">What percentage of actor 1 Facebook likes are missing?</span>

### Exercise 2
<span  style="color:green; font-size:16px">Use the `notna` method to find the number of non-missing values in the actor 1 Facebook like column. Verify this number is the same as the `count` method.</span>

### Exercise 3
<span  style="color:green; font-size:16px">Use one line of code to fill the missing values of `actor1_fb` with the maximum of `actor2_fb`. Save this result to variable `actor1_fb_full`</span>

### Exercise 4
<span  style="color:green; font-size:16px">Verify the results of problem 3 by selecting just the values of `actor1_fb_full` that were filled by `actor2_fb`.</span>