<a href="https://github.com/theonaunheim">
    <img style="border-radius: 100%; float: right;" src="static/strawberry_thief_square.png" width=10% alt="Theo Naunheim's Github">
</a>

<br style="clear: both">
<hr>
<br>

<h1 align='center'>Basic Methods</h1>

<br>

<div style="display: table; width: 100%">
    <div style="display: table-row; width: 100%;">
        <div style="display: table-cell; width: 50%; vertical-align: middle;">
            <img src="static/xiao_liwu.jpg" width="400">
        </div>
        <div style="display: table-cell; width: 10%">
        </div>
        <div style="display: table-cell; width: 40%; vertical-align: top;">
            <blockquote>
                <p style="font-style: italic;">"Object-oriented programming is an exceptionally bad idea which could only have originated in California."</p>
                <br>
                <p>-Edsger Dijkstra</p>
            </blockquote>
        </div>
    </div>
</div>


<br>

<div align='left'>
    Image courtesy of <a href='https://commons.wikimedia.org/wiki/File:Xiao_Liwu_im_San_Diego_Zoo_-_Foto_2.jpeg'>jballeis</a> under the <a href='https://creativecommons.org/licenses/by-sa/3.0/deed.en'>CC BY-SA 3.0</a>
</div>

<hr>

In [None]:
# Import stuff so we can use libraries.
import numpy as np
import pandas as pd

import matplotlib
%matplotlib inline
matplotlib.style.use('fivethirtyeight')

## Series Attributes and Methods, Generally

As we've mentioned before, Series in Pandas are more than just a data container. Series also have many attributes and methods that you can use. If you're not familiar with object-oriented programming, you can think of attributes as objects that are attached to our Series objects, and methods as functions or behaviors attached to our object that are accessed via 'dot' ('.') notation.

Note: we are only scratching the surface of what you can do with methods. We're shooting for a broad but shallow base of knowledge.

Note: certain methods may not be available for all types (e.g. Pandas can't give you the standard deviation of a string).

In [None]:
# Create a series with the name "Example series!"
s1 = pd.Series([1,2,3,4,5,6], name='Example series!')

# And now it has this data attached to it, accessible via dot notation.
print('The series has string \"' + s1.name + '" attached as the attribute "name".' )

# And we can trigger activities based on the data. Methods can be passed arguments.
median = s1.median(skipna=True)
print('The median() method returns the value: ' + str(median))

## Attributes

First, let's examine the more useful attributes (there aren't a lot).

* **dtype**: the Numpy datatype of the Series, which dictates what you can do with your data.
* **name**: the series identifier, which is important for dataframe organization.
* **index**: the attached index for the Series (which has its own attributes/methods).
* **values**: the raw values of the Series as a Numpy array.

In [None]:
for attribute in [s1.dtype, s1.name, s1.index, s1.values]:
    print(attribute)

## Methods: Selectors

The first group of methods we'll discuss are basic view functions. A lot of times you'll only want to take a peek at a tiny portion of your data. 

* [**head**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.head.html): select the first X items (default 5).
* [**tail**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.tail.html#pandas.DataFrame.tail): select the last X items (default 5).

In [None]:
s1.head(3)

In [None]:
s1.tail()

## Methods: Logical

The are simply vectorized versions of Python's all() and any() functions.

* [**all**](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.all.html): are all of the values equal to True?
* [**any**](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.any.html): are any of the values equal to True?

In [None]:
# Do all of the values == 20?
s3 = pd.Series([20, 20, 30])
s4 = s3 == 20 # [True, True, False]
s4.all()

In [None]:
# Are any of the values True?
s2 = pd.Series([False, True, False])
s2.any()

## Methods: Statistics

Pandas has an array of statistics functions for you to use (but you'll want Scipy for the fancy stats stuff).

* [**corr**](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.corr.html): the statistical correlation between two series.
* [**describe**](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.describe.html): a utility function that provides a group of statistical meansures.
* [**max**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.max.html): the maximum value.
* [**mean**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mean.html): the average (mean) value.
* [**median**](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.median.html): the average (median) value.
* [**min**](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.min.html): the minimum value.
* [**std**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.std.html): the sample standard deviation.
* [**sum**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sum.html): the sum of all series elements.

In [36]:
# Medley of statistical functions
s5 = pd.Series([-10, 3, 4, 6, 200, 1, 4])
s6 = pd.Series([100, 5, -100, 0, 150, 0, 2])

print({
    'max': s5.max(),
    'min': s5.min(),
    'mean': s5.mean(),
    'median': s5.median(),
    'sum': s5.sum(),
    'std': s5.std(),
    'sum': s5.sum(),
    'corr': s5.corr(s6)
})

{'max': 200, 'min': -10, 'mean': 29.714285714285715, 'median': 4.0, 'sum': 208, 'std': 75.27441859780848, 'corr': 0.6578845456040049}


In [34]:
# The basic describe function (the result is itself a series)
s5.describe()

count      7.000000
mean      29.714286
std       75.274419
min      -10.000000
25%        2.000000
50%        4.000000
75%        5.000000
max      200.000000
dtype: float64

In [None]:




# Transformative
s1.astype()
s1.drop()
s1.drop_duplicates()
s1.drop_na()
s1.fillna()
s1.interpolate
s1.cumsum()
s1.clip()
s1.unique
s1.diff()
s1.rank()
s1.pct_change()
quantile.





# Identity
s1.hasnans()
s1.is_unique
s1.is_monotonic
s1.empty()
s1.nonzero
s1.notna
s1.notnull

# Utility
s1.iteritems
s1.value_counts
s1.sample
s1.replace
s1.round
s1.copy()

# Sorting
s1.sort_index
s1.sort_values

# Index
s1.reindex
s1.reset_index # WHERE IS SET INDEX?
s1.shift
stack
unstack
s1.idxmax()
s1.inxmin() # Vs argmax?

# To items
s1.to clip csv dense dict excel frame hdf json latex msgpack period pickle sparse sql string timestamp xarray tolist




# Second
s1.dt
s1.map
s1.apply
s1.groupby
s1.rolling
s1.str
s1.plot

# Additional Learing Resources

* ### [Scikit-Learn Quick Start](http://scikit-learn.org/stable/tutorial/basic/tutorial.html)
* ### [Scikit-Learn Tutorials](http://scikit-learn.org/stable/tutorial/index.html)
* ### [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/05.00-machine-learning.html). Seriously. Read this.
* ### [Getting Started with Scikit Learn (Part 1)](https://www.youtube.com/watch?v=L7R4HUQ-eQ0)
* ### [Getting Started with Scikit Learn (Part 2)](https://www.youtube.com/watch?v=oGqGxvqA9-k)

---

# Next Up: [Preprocessing](3_preprocessing.ipynb)

<br>

<img style="margin-left: 0;" src="static/log_transform.svg" width="20%">

<br>

<div align='left'>
    Image courtesy of <a href='https://commons.wikimedia.org/wiki/File:Population_vs_area.svg'>Skbkekas</a> under the <a href='https://creativecommons.org/licenses/by-sa/3.0/deed.en'>CC BY-SA 3.0</a>
</div>

---