<a href="https://github.com/theonaunheim">
    <img style="border-radius: 100%; float: right;" src="static/strawberry_thief_square.png" width=10% alt="Theo Naunheim's Github">
</a>

<br style="clear: both">
<hr>
<br>

<h1 align='center'>Basic Methods</h1>

<br>

<div style="display: table; width: 100%">
    <div style="display: table-row; width: 100%;">
        <div style="display: table-cell; width: 50%; vertical-align: middle;">
            <img src="static/xiao_liwu.jpg" width="400">
        </div>
        <div style="display: table-cell; width: 10%">
        </div>
        <div style="display: table-cell; width: 40%; vertical-align: top;">
            <blockquote>
                <p style="font-style: italic;">"Object-oriented programming is an exceptionally bad idea which could only have originated in California."</p>
                <br>
                <p>-Edsger Dijkstra</p>
            </blockquote>
        </div>
    </div>
</div>


<br>

<div align='left'>
    Image courtesy of <a href='https://commons.wikimedia.org/wiki/File:Xiao_Liwu_im_San_Diego_Zoo_-_Foto_2.jpeg'>jballeis</a> under the <a href='https://creativecommons.org/licenses/by-sa/3.0/deed.en'>CC BY-SA 3.0</a>
</div>

<hr>

In [None]:
# Import stuff so we can use libraries.
import numpy as np
import pandas as pd

## Series Attributes and Methods, Generally

### You won't remember all of this. This is just to give you an idea what is out there.

As we've mentioned before, Series in Pandas are more than just a data container. Series also have many attributes and methods that you can use. If you're not familiar with object-oriented programming, you can think of attributes as objects that are attached to our Series objects, and methods as functions or behaviors attached to our object that are accessed via 'dot' ('.') notation.

Note: we are only scratching the surface of what you can do with methods. Each of these methods has additional options you can use to tweak their usage. Again, we're shooting for a broad but shallow base of knowledge.

Note: certain methods may not be available for all types (e.g. Pandas can't calculate the standard deviation of a string).

In [None]:
# Create a series with the name "Example series!"
s1 = pd.Series([1,2,3,4,5,6], name='Example series!')

In [None]:
# And now it has this data attached to it, accessible via dot notation.
s1.name

In [None]:
# And we can trigger activities based on the data. Methods can be passed arguments.
s1.median(skipna=True)

## Attributes

First, let's examine the more useful attributes (there aren't a lot).

* **dtype**: the Numpy datatype of the Series, which dictates what you can do with your data.
* **hasnans**: a convenience attribute for whether you have NaNs in your Series.
* **empty**: a convenience attribute for len(series) == 0.
* **is_monotonic**: does it go only up or down [1,2,3] or [3,2,1], not [1,3,2]?
* **is_unique**: is there only one of each type of value?
* **name**: the series identifier, which is important for dataframe organization.
* **index**: the attached index for the Series (which has its own attributes/methods).
* **values**: the raw values of the Series as a Numpy array.

In [None]:
s1_attributes = {
    's1.dtype'       : s1.dtype,
    's1.hasnans'     : s1.hasnans,
    's1.empty'       : s1.empty,
    's1.is_monotonic': s1.is_monotonic,
    's1.is_unique'   : s1.is_unique,
    's1.name'        : s1.name, 
    's1.index'       : s1.index, 
    's1.values'      : s1.values,
}

pd.Series(s1_attributes)

## Methods: Selectors

The first group of methods we'll discuss are basic view functions. A lot of times you'll only want to take a peek at a tiny portion of your data. 

* [**head**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.head.html): select the first X items (default 5).
* [**tail**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.tail.html#pandas.DataFrame.tail): select the last X items (default 5).

In [None]:
s1.head(3)

In [None]:
s1.tail()

## Methods: Logical

The are simply vectorized versions of Python's all() and any() functions.

* [**all**](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.all.html): are all of the values equal to True?
* [**any**](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.any.html): are any of the values equal to True?

In [None]:
# Do all of the values == 20?
s3 = pd.Series([20, 20, 30])
s4 = s3 == 20  
# s4 == pd.Series([True, True, False])
s4.all()

In [None]:
# Are any of the values True?
s2 = pd.Series([False, True, False])
s2.any()

## Methods: Simple Mathematical

Pandas has an array of statistics functions for you to use (but you'll want Scipy for the fancy stats stuff). As we mentioned before, your garden vareity mathematics can be done by simple operators.

    series_2 = series_1 + 20
    series_3 = series_1 * series_2 

* [**corr**](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.corr.html): the statistical correlation between two series.
* [**describe**](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.describe.html): a utility function that provides a group of statistical meansures.
* [**max**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.max.html): the maximum value.
* [**mean**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mean.html): the average (mean) value.
* [**median**](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.median.html): the average (median) value.
* [**min**](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.min.html): the minimum value.
* **[quantile](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.quantile.html)**: get particular value for given quantiles.
* **[round](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.round.html)**: round to N decimal places.
* **[sample](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sample.html)**: get a random sample of values from the Series with or without replacement.
* [**std**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.std.html): the sample standard deviation.
* [**sum**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sum.html): the sum of all series elements.

In [None]:
# Medley of statistical functions
s5 = pd.Series([-10, 3, 4, 6, 200, 1, 4])
s6 = pd.Series([100, 5, -100, 0, 150, 0, 2])

# Results as a series
pd.Series({
    'max': s5.max(),
    'min': s5.min(),
    'mean': s5.mean(),
    '90th quantile': s5.quantile(.95),
    '10th quantile': s5.quantile(.10),
    'median': s5.median(),
    'sum': s5.sum(),
    'std': s5.std(),
    'sum': s5.sum(),
    'corr': s5.corr(s6)
})

In [None]:
# The basic describe function (the result is itself a series)
s5.describe()

In [None]:
# Sample gives you a random sample (by n or by fraction)
s5.sample(3)

### Quick note: we're running these methods on the last line of the cell to display it simply. We're not saving the result anywhere. In other words, even after we run s5.sample(3), s5 is the exact same as it was before. If you wanted to change s5 in place, you would have to run:

    s5 = sample(3)
    
### Which would then replace s5 with the 3 element sample. Most things do not change in place unless you use an assignment of some sort. E.g.:

    s5.loc[0] = 10
    s5.loc[1] += 1
    s5        = s5.loc[3:4]

## Methods: Identity

Often you will want information about your Series that isn't readily available elsewhere. Series have a specialized series of methods to help with that.

* **[isin](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.isin.html)**: returns True if the Series has an item in common with a second list.
* **[isnull](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.isnull.html)**: returns a same-indexed Series with True for Nones/NaNs and False for valid values.
* **[nonzero](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.nonzero.html)**: returns an array of the indices for all non-zero items.
* **[notnull](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.notnull.html)**: returns a same-indexed Series with False for Nones/NaNs and True for valid values.

Note: isna() is an alias for isnull(). notna() is an alias for notnull().

In [None]:
# Create two series
s7 = pd.Series([0, np.NaN, 5, 3])
s8 = pd.Series([np.NaN, None, 0])

In [None]:
# For each item in S7, is it also in S8?
s7.isin(s8)

In [None]:
# What are the nonzero indices in s7
s7.nonzero()

In [None]:
# What are the null values in s8?
s8.isnull()

In [None]:
# What are the not null values in s8?
s8.notnull()

In [None]:
# Note, this isn't all that useful in itself, but with loc[] ...
bix = s7.notnull()
s7.loc[bix]

## Methods: Transformative

The are also a bunch of methods for mutating things in place. These do everything from changing your Series type to dropping duplicates to clipping values.

* **[astype](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.astype.html)**: convert one type of datatype for a column to a second datatype.
* **[clip](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.clip.html)**: put a ceiling or floor in values for a particular Series.
* **[cumsum](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.cumsum.html)**: create a running cumulative sum for the column.
* **[diff](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.diff.html)**: the the difference between an item and the one immediately before it in the series.
* **[drop](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.drop.html)**: drop a particular value from a Series based on indexer.
* **[drop_duplicates](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.drop_duplicates.html)**: drop all duplicate values.
* **[dropna](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dropna.html)**: drop all NaN values.
* **[fillna](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.fillna.html)**: fill all NaN values with an arbitrary input.
* **[interpolate](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.interpolate.html)**: fill in the gaps between the Series items.
* **[pct_change](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.pct_change.html)**: the percent change between one value and the value immediately previous.
* **[rank](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.rank.html)**: assign an integer to the data by rank.
* **[unique](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unique.html)**: return all the unique values found in the series.

### Drop and fill

In [None]:
# Create a series for the drop and fill functions.
s9 = pd.Series([np.NaN, 'One', 'Two', 'Two'])
s9

In [None]:
# fillna() fills the null value with a substitute value.
s9.fillna('Unknown')

In [None]:
# Drop NAs simply drops all indexes that have NA values.
s9.dropna()

In [None]:
# Drop duplicates drops all duplicate vales so we only have one of each.
s9.drop_duplicates()

In [None]:
# Drop duplicates drops all duplicate vales so we only have one of each.
s9.drop_duplicates().dropna()

In [None]:
# Drop, unsurprisingly, drops stuff from your index.
s9.drop(1)

In [None]:
# Unique is like deduped, but returns a numpy array
s9.unique()

### Type Conversions

Note: [pd.to_numeric()](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_numeric.html) is generally better than astype() for converting to a numeric type, and [pd.to_datetime()](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html) is better for converting to datetimes.

Note: do not fear the float. `float64` is a workhorse. While the '.0' may not appeal to your aesthetic sense, converting to `int64` for the sake of converting to int is more trouble than it's worth. The benefits of floats far ourweight the drawbacks. If you want to make something pretty for the sake of exporting them, use strings.

In [None]:
# As type converts to a different type.
s10 = pd.Series([3, 0, 1])
s10

In [None]:
# S10 As a float
s10.astype(np.float64)

In [None]:
# s10 as a string
s10.astype(str)

In [None]:
# s10 as a booleans
s10.astype(bool)

### Miscellaneous Order based-transformation

There are a few methods that depend on the relative positions of the items.

In [None]:
s11 = pd.Series([10,9,11])
s11

In [None]:
# cumsum() is for cumulative sum (see also cumdiff, cum, etc)
s11.cumsum()

In [None]:
# diff() gives the change between one item and the next
s11.diff()

In [None]:
#pct_change() is like diff but on a percentage basis (e.g. stock pricing)
s11.pct_change()

### Miscellaneous Transformations

Items that don't fit elsewhere.

In [None]:
s12 = pd.Series([88, 115, -10, 95, 65])
s12

In [None]:
# Clip sets a ceiling or floor on possible values (here only between 0 and 100)
s12.clip(lower=0, upper=100)

In [None]:
# Rank assigns an integer rank to each item (ascending or descending)
s12.rank()

In [None]:
# Interpolate fills in any NA gaps with interventing values
pd.Series([0, np.NaN, 10, np.NaN, 15]).interpolate()

## Methods: Utility

These are useful methods that aren't easily categorized.

* **[copy](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.copy.html)**: create a complete new Series instead of a view of a previous series.
* **[iteritems](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.iteritems.html)**: iterate through each item in the Series as (key, value) tuples. **Sometimes it's necessary, but odds are if you're using this method, you are probably doing the wrong thing.**
* **[replace](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.replace.html)**: replace Series items with replacement values.
* **[value_counts](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html)**: get the number of each value in the Series as a new Series.

In [None]:
s13 = pd.Series(['MacReady', 'Windows', 'MacReady', 'Childs', 'Childs'])
s13

In [None]:
# Copy gives you an exact copy. Changing the copy won't change the original
s14 = s13.copy()

# Change the original (sub Nauls for Windows)
s13.loc[1] = 'Nauls'

# Doesn't affect the new one
s14

In [None]:
# Iteritems gives you each item  as a tuple (again, there are usually better ways)
for index, item in s14.iteritems():
    print(item + ' is at index ' + str(index))
    print('Next!')

In [None]:
# replace will replace values using a scalar
s14.replace('Childs', 'Blair')

# Or a dict
s14.replace({
    'MacReady': 'Russell',
    'Childs'  : 'David'
})

In [None]:
# value_counts() is a convenience method to get the counts of all elements. Returns a Series.
s14.value_counts()

## Methods: Index

These methods work by tweaking the index.

* **[idxmax](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.idxmax.html)**: get the index of the greatest item in the Series (gets first max if multiple).
* **[idxmin](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.idxmin.html#pandas.Series.idxmin)**: get the index of the smallest item in the Series (gets first min if multiple).
* **[reindex](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.reindex.html)**: coerce the Series to use a new index.
* **[reset_index](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.reset_index.html)**: remove index and put in a plain 0 to n range index. Returns a DataFrame.
* **[shift](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.shift.html)**: move the Series data down N items so that the index is now associated with the data at index + N or - N.
* **[sort_index](http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.sort_index.html)**: sort the index based on index values or an arbitrary function.
* **[sort_values](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sort_values.html)**: sort the values based on a particular value or arbitrary function.
* **[unstack](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unstack.html)**: for a multi index, take a level and make it into separate dimensions. Returns a DataFrame.

In [None]:
s15 = pd.Series(
    data=[20, -4, 99, 0], 
    index=['A', 'C', 'B', 'D']
)

s15

In [None]:
# sort_values can sort values ascending or descending
s15.sort_values()

In [None]:
# sort_index can sort index values ascending or descending
s15.sort_index()

In [None]:
# idxmin gives you the index for the smallest item.
minimum_index = s15.idxmin()
minimum_value = s15.loc[minimum_index] # or s15.min()

# idxmax gives you the index for the largest.
maximum_index = s15.idxmax()
maximum_value = s15.loc[maximum_index] # or s15.max()

print(f'Smallest item is {minimum_value} at index \'{minimum_index}\'.')
print(f'Largest item is {maximum_value} at index \'{maximum_index}\'.')

In [None]:
# Reindex makes your Series use a new index
# Items previously in the previous index will be repositioned.
# Items not in the previous index will be np.NaN
# Note: integer columns can't have NaNs, ergo float64.
s15.reindex(['A', 'B', 'F'])

In [None]:
# To write over the index values without changing anything ekse, use the index attribute.
# Note: this is an assignment and will alter the Series in place.
s15.index = ['F', 'G', 'H', 'I']
s15

In [None]:
# reset_index is a hard reset.
# Your index is obliterated and replaced wtih the default. Returns a DataFrame.
s15.reset_index()

In [None]:
# Shift moves each item up or down X times, which is useful time series analysis.
s15.shift(-1)

In [None]:
# Multiindexed series can be unstacked as a dataframe (useful for groupbys later on)
m_ix = pd.MultiIndex.from_product([('Chevy', 'Ford', 'Dodge'), ('Compact', 'Full Size', 'Truck')])
s16 = pd.Series(
    data=[item for item in range(0, len(m_ix))],
    index=m_ix
)

print(s16)

s16.unstack()

## Methods: Output

This methods output the Series in various formats. Fairly self-explanatory.

* **[to_clipboard](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.to_clipboard.html)**: exports data to the clipboard.
* **[to_csv](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.to_csv.html)**: exports CSVs in various encodings and formats.
* **[to_dict](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.to_dict.html)**: exports index/data as a dict with key/value pairs.
* **[to_excel](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.to_excel.html)**: exports to Microsoft Excel.
* **[to_json](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.to_json.html)**: exports to JavaScript Object Notation.
* **[to_pickle](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.to_pickle.html)**: exports to Python's native serialization format (usually unnecessary)
* **[to_string](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.to_string.html)**: exports to a string that looks like a fixed width table.
* **[tolist](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.tolist.html)**: exports the data only to a native Python list.

### to_x() methods returning data

In [None]:
# Take a Series.
s17 = pd.Series([1, 2.0, True, np.NaN, 'last'])

In [None]:
# List of vlaues
s17.tolist()

In [None]:
# Dict with index
s17.to_dict()

In [None]:
# As a printable string.
s17.to_string()

In [None]:
# As json
s17.to_json()

### to_x() methods generating outputs

In [None]:
# CSV is the preferred method.
# You can omit the index if you use the index=False option.
# You can optionally change the encoding.
s17.to_csv('data/csv_output.csv', encoding='cp1252', index=False)

In [None]:
# Excel (requires xlwt and xlrd which is included in Anaconda)
s17.to_excel('data/excel_output.xlsx')

In [None]:
# Pickle is Python's binary serialization format. Probably unnecessary for Series.
s17.to_pickle('data/pickle_output.p')

In [None]:
# Clipboard if you just want to copy paste.
s17.to_clipboard()

# If you were to Control V:
#
# 0       1
# 1       2
# 2    True
# 3     NaN
# 4    last
# dtype: object

# Additional Learing Resources

* ### [Pandas Series API Reference](https://pandas.pydata.org/pandas-docs/stable/api.html#series)
* ### [Pandas Intro to Data Structures](https://pandas.pydata.org/pandas-docs/stable/dsintro.html)

---

# Next Up: [Advanced Methods](3_advanced_methods.ipynb)

<br>

<img style="margin-left: 0;" src="static/red_panda.jpg" width="20%">

<br>

<div align='left'>
    Image courtesy of <a href='https://commons.wikimedia.org/wiki/File:Red_Panda_-_Nashville_Zoo.jpg'>Pmeenen</a> under the <a href='https://creativecommons.org/licenses/by/2.5/deed.en'>CC BY 2.5</a>
</div>

---