# Numpy

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

a powerful N-dimensional array object
sophisticated (broadcasting) functions
tools for integrating C/C++ and Fortran code
useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

NumPy is licensed under the BSD license, enabling reuse with few restrictions.

In [None]:
import numpy as np

In [None]:
np.array?

# Array Creation
There are several ways to create arrays. For example, you can create an array from a regular Python list or tuple using the array function. The type of the resulting array is deduced from the type of the elements in the sequences.


In [None]:
p = [1,2,3,4,5,6,7,8]
p = np.array(p)
p

To create sequences of numbers, NumPy provides a function called arange, analogous to Python's range, that returns arrays instead of lists.

In [None]:
a = np.arange(15)
a

In [None]:
c = np.arange( 10, 30, 5 )
c

In [None]:
a = np.arange(15).reshape(3, 5)
a

In [None]:
a.shape

The array method transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into three-dimensional arrays, and so on.

In [None]:
b = np.array([(1.5,2,3), (4,5,6)])
b

Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation.

The function zeros creates an array full of zeros, the function ones creates an array full of ones, and the function empty creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is float64.

In [None]:
c = np.zeros( (3,4) )
c


In [None]:
d =np.ones( (2,3,4), dtype=np.int16 )                # dtype can also be specified
d


In [None]:
e = np.empty( (2,3) )                                 # uninitialized, output may vary
e

# Array Performance

In [None]:
#this creates a big array of random numbers
s = np.array(np.random.randint(0,1000,10000))
s

In [None]:
len(s)

In [None]:
import timeit #This module provides a simple way to time small bits of Python code.

def pure_sum():
    return sum(s)

def numpy_sum():
    return np.sum(s)

n = 10000

t1 = timeit.timeit(pure_sum, number = n)
print 'Pure Python Sum:', t1
t2 = timeit.timeit(numpy_sum, number = n)
print 'Numpy Sum:', t2

In [None]:
%%timeit -n 10000

summary = 0
for item in s:
    summary+=item

In [None]:
%%timeit -n 10000

summary = np.sum(s)

# Array Indexing and Slicing 
One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.

In [None]:
x = np.arange(10)**3

x

In [None]:
x[2]

In [None]:
x[2:5]

In [None]:
y = np.arange(0,1000,10)
z = y.reshape(10,10)
z

In [None]:
alpha = z[0:4]
alpha

In [None]:
beta = z[0:4,0:3]
beta

In [None]:
gamma = z [0:5,:]
gamma

# ArcPy 
FeatureClassToNumPyArray

http://desktop.arcgis.com/en/arcmap/10.3/analyze/arcpy-data-access/featureclasstonumpyarray.htm

arcpy.FeatureClassToNumPyArray (in_table, field_names, {where_clause}, {spatial_reference}, {explode_to_points}, {skip_nulls}, {null_value})

TableToNumPyArray

http://desktop.arcgis.com/en/arcmap/10.3/analyze/arcpy-data-access/tabletonumpyarray.htm

arcpy.TableToNumPyArray (in_table, field_names, {where_clause}, {skip_nulls}, {null_value})

# Pandas

Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. 

The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering. For R users, DataFrame provides everything that R’s data.frame provides and much more. pandas is built on top of NumPy and is intended to integrate well within a scientific computing environment with many other 3rd party libraries.

Here are just a few of the things that pandas does well:

Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data
Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations
Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
Intuitive merging and joining data sets
Flexible reshaping and pivoting of data sets
Hierarchical labeling of axes (possible to have multiple labels per tick)
Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving / loading data from the ultrafast HDF5 format
Time series-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc.
Many of these principles are here to address the shortcomings frequently experienced using other languages / scientific research environments. For data scientists, working with data is typically divided into multiple stages: munging and cleaning data, analyzing / modeling it, then organizing the results of the analysis into a form suitable for plotting or tabular display. pandas is the ideal tool for all of these tasks.

# The Series Data Structure

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. 

In [None]:
import pandas as pd


In [None]:
pd.Series?

In [None]:
import pandas as pd

animals = ['Tiger', 'Bear', 'Moose']

pd.Series(animals)


In [None]:
rando = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])

rando

In [None]:
animals = ['Tiger', 'Bear', None]
pd.Series(animals)

In [None]:
sports = {'Archery': 'Bhutan',
          'Golf': 'Scotland',
          'Sumo': 'Japan',
          'Taekwondo': 'South Korea'}
s = pd.Series(sports)
s

In [None]:
s.index

In [None]:
s.iloc[3]

In [None]:
s.loc['Golf']

In [None]:
s['Golf']

# The DataFrame Data Structure

The Data Frame is Panda's primary data structure. It is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. 

In [None]:
pd.DataFrame?

In [None]:
purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevin',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Betty',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})
purchase_4 = pd.Series({'Name': 'Sally',
                        'Item Purchased': 'Snails',
                        'Cost': 12.00})
df = pd.DataFrame([purchase_1, purchase_2, purchase_3, purchase_4], index=['Store 1', 'Store 1', 'Store 2','Store 3'])
df.head()

In [None]:
df.loc['Store 2']

In [None]:
type(df.loc['Store 2'])

In [None]:
df.loc['Store 1']

In [None]:
df.loc['Store 1', 'Cost']

In [None]:
df.iloc[2]

Let's transpose our dataframe. 

In [None]:
df.T

In [None]:
df.index

In [None]:
ccc = df.T
ccc

In [None]:
ccc.index

# Querying a DataFrame

In [None]:
df['Cost'] > 6

In [None]:
expensive = df.where(df['Cost'] > 6)
expensive

In [None]:
expensive['Cost'].count()

In [None]:
only_expensive = expensive.dropna()
only_expensive

# Arcpy

You can convert an ArcGIS feature class to a Panda's dataframe using this function.

In [None]:
from pandas import DataFrame

def feature_class_to_pandas_data_frame(feature_class, field_list):
    """
    Load data into a Pandas Data Frame for subsequent analysis.
    :param feature_class: Input ArcGIS Feature Class.
    :param field_list: Fields for input.
    :return: Pandas DataFrame object.
    """
    return DataFrame(
        arcpy.da.FeatureClassToNumPyArray(
            in_table=feature_class,
            field_names=field_list,
            skip_nulls=False,
            null_value=-99999
        )
    )

In [None]:
df.describe()

In [None]:
df_ds = pd.DataFrame(df.describe())
df_ds



In [None]:
input = df_ds.iloc[0:2, :].values
input



In [None]:
input[0][0]