# Tutorial 2.1: NumPy Introduction
Python for Data Analytics | Module 2  
Professor James Ng

In [None]:
# SETUP CODE - DON'T MODIFY THIS
# JUST EXECUTE IT
import pandas as pd
pd.set_option('display.max_colwidth', -1)

Today we move beyond learning the basics of Python as a programming language and start to learn how we can utilize some incredible open-source packages which are written in Python to perform data science.

## NumPy in 30 Seconds
NumPy, short for *Numerical Python*, is a package written by the Python community which you can **import** into your programs to provide an super efficient way to store and operate on data sets through the custom data type, the **NumPy Array** (*`ndarray`*).  

In many ways, *`NumPy`* arrays are like Python's built in `list` type which you learned about in Module 1. The key differences are that *`NumPy`* arrays provide (1) lightning fast operations when dealing with a data set of (2) homogenous data (i.e. all integers, all floats, etc). 

Equivalent operations (ex. getting the sum of 1000 numbers) will be 10-100 times faster when using `ndarray` objects vs. `list` objects.

**Note:** Recently, the Python language itself introduced an `array` data type to efficiently store sets of single data type elements.  But NumPy arrays still provide significant benefits in comparison to the native Python `array` because of the methods (operations) that the `NumPy` authors built into them.

## Heterogeneous Lists & Performance Problems
In the previous tutorial, we introduced the *`list`* data type. In our examples of this data type all of the elements have been of the same type. 

But this is not a requirement in Python.  You can place any number of different data types in a single list. For example: 

In [None]:
mixed_type_list = ['Hi', 3, True, None, 124.55, 'Surprise']

## Installing NumPy
Thus far, we've only interacted with objects and types that are part of Python's **Standard Library**. These are the objects that are part of core Python language and don't require any additional installation.

*`NumPy`* is a third party package or libary that is not a part of the standard library, so normally it has to be installed into our environment. That said, all the popular Python distributions have it preinstalled these days. So does Vocareum.

## Importing NumPy
Before you can use any of `NumPy's` functionality, you'll need to import it into your programs.

In [None]:
# This should always go at the top of your program
import numpy

By convention, you'll find that most people in the Python community will import *`NumPy`* using *`np`* as an alias:

In [None]:
import numpy as np

Once you've imported the numpy package as `np` you can interact with it as you would any other object. You can use `help()`, `?` and tab completition to start to learn what is available on it or any of it's nested objects, functions, etc.

## The Basics of NumPy Arrays

### Arrays can be Created from Lists
The simpliest way to create an `ndarray` is to simply pass a normal `list` to the `np.arrray()` constructor function like so:

In [None]:
list_of_ints = [0, 1, 2, 3, 4, 5]
int_array = np.array(list_of_ints)
int_array

Now, you may have noticed that when we created an `ndarray` object from `list_of_ints` above, it didn't print out `ndarray([0, 1, 2, 3, 4, 5])` as the object representation, but `array([0, 1, 2, 3, 4, 5])`.

Make sure not to let this cause you problems when checking for the type of an object with `isinstance()` or `type()`. The real class of the object is `ndarray`. It just says "array" as a convenience... but it can be confusing.

In [None]:
# Notice what the type of the object actually is
type(int_array)

In [None]:
# If you try to isinstance() with the wrong class name
# you'll have problems
isinstance(int_array, array)

### Passing Different Datatypes to a NumPy Array
If you try to pass in a list with different data types, *`NumPy`*  will attempt to convert them to a single type (this is called 'upcasting'). For example, let's say we had a list of student grades for our course that looked like this:

```python
student_grades = [87.5, 91, 86, 100.5] 
# Wow, somebody is so smart they got a 100.5
```

The first and fourth elements in this list are `float` objects while the second and third element are `int` objects. Let's see what happens when we convert this list to a NumPy array:

In [None]:
np.array([87.5, 91, 86, 100.5])

You can see that the `int` objects were converted to `float` objects.

#### Only One Data Type Allowed
Remember, that `ndarray` objects can only hold elements of a single data type. That is why NumPy converted the int objects to float objects in this example.

One must be careful to avoid having `str` objects in the lists you pass to the `ndarray` constructor. If you do, NumPy will upcast all the elements to strings, which will make performing mathematical calcuations a real mess.

In [None]:
bad_list_data = [12, 523, 23424, 'YUCKY']
nonideal_array = np.array(bad_list_data)

# Notice when this is printed that all the elements
# of the array will be strings (i.e. have quotes around them).
# Almost certainly not what you want...
nonideal_array

### Forcing an Array's Data Type
You also have the option of forcing all `ndarray` elements to have a specific data type.  Here I'll take a `list` of `float` objects and coerce them into `int` objects by specifying the **`dtype`** parameter on the `np.array` function.

In [None]:
# Notice that this doesn't round anything!
# Just drops the fractional values
list_of_floats = [11.2, 57.6, 95.1]
np.array(list_of_floats, dtype=int)

Sometimes you can't coerce a data type. For instance, if there is a string in the source list that can't be converted to a numeric type.

In [None]:
# This will work because the strings
# can be interpreted as numbers.
np.array([1, '2.5', 14, '-19'], dtype=float)

In [None]:
# But this will fail...
np.array([1, '2.5 tacos', 14, '-19 burgers'], dtype=float)

The <code>dtype</code> parameter can take many different values in addition to these standard Python types: <code>int, float, bool, complex</code>.

### Arrays can be Multi-Dimensional
Ok, this might make your head hurt a little bit. 

NumPy arrays can be multi-dimensional. While in theory we could have an array with N levels of dimensions, you will most commonly work with 2-dimensional data as beginners. So let's look at a two dimensional array, which can be thought of a grid with x and y coordinates.

In [None]:
# Don't focus on how this is being created. Just focus on the output.
two_dimensional_array = np.array([range(i, i + 3) for i in [2, 4, 6]])
two_dimensional_array

And now, just so that you know what they look like when displayed, here is  an example of a 3-dimensional array that has 4 elements in each dimension... which you could think of as a three dimensional cube.

In [None]:
three_dimensional_array = np.ones(shape=(4, 4, 4))
three_dimensional_array

## NumPy Array Data Attributes
Like all objects, *`NumPy`* *`ndarray`* objects have a set of data attributes. Let's take moment to discuss some that you may find helpful:

1. `ndim`
1. `shape`
1. `size`
1. `dtype`

To get started, let's create 3 arrays.

In [None]:
x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

print("One Dimensional Array", x1, '\n', sep='\n')
print("Two Dimensional Array", x2, '\n', sep='\n')
print("Three Dimensional Array", x3, '\n', sep='\n')

### `ndim` Attribute
Returns the number of dimensions a given array has.

In [None]:
print(x1.ndim, x2.ndim, x3.ndim)

### `shape` Attribute
Returns the number of elements in each dimension of an array. For our `x2` array we'd get this output:

In [None]:
x2.shape

In [None]:
# Now, reprint the x2 array
x2


You can see that the `(3, 4)` returned by `x2.shape` is referring to the fact that our array had three rows and four columns in each row.

In [None]:
### Get the shape of x3 and explain what it means.
x3.shape


### `size` Attribute
Returns the total number of elements across all dimensions of an array.

For our one-dimensional array, `size` will be 6.  
Our 2 dimensional array had dimensions of 3 and 4 elements, so `size` will be 12.  
Our 3 dimensional array had dimensions of 3, 4, and 5. Therefore it will have a total size of 60 elements.

In [None]:
print("x1 size:", x1.size)
print("x2 size:", x2.size)
print("x3 size:", x3.size)

### `dtype` Attribute
Previously, we passed a `dtype` argument to the `np.array` constructor function to coerce the data type of the array's elements.

It comes about again here, but this time as an attribute of all NumPy arrays. Unsurprisingly, it tells you the data type of the elements held inside the array.

In [None]:
float_array = np.array([1, 2, 3.1])
float_array.dtype