Hi, welcome to our tutorial for NumPy. First, let's discuss what NumPy is. NumPy is a Python library that consists of multidimensional array objects and variations thereof along with hundreds of potential math functions for use. The package is one of the most useful for scientific computing and its versatility and power make it a common tool used by scientists of many disciplines. 

What makes NumPy so special is that it's optimized for math, both syntactically for the user and operationally for the computer. Syntactically, you can multiply arrays with each other using syntax very similar to mathematical notation. However, in Python lists, you must iterate through two lists and multiply each individual element together. Within a computer, NumPy consists of fixed type arrays, which consume less memory compared to regular Python lists and only hold data of one type. In contrast, Python lists contain object values, types, and reference counts, all of which are stored in individual bytes. Not only does this allow the computer to cut down on memory usage, but because NumPy arrays are of the same type, there is no need for the type checking used in Python lists, which cuts down on performance cost. Another performance optimization NumPy uses is that all array values within the same array are stored contiguously in memory, instead of using references to various memory locations like the typical Python list. (Include SIMD Vector processing and effective cache utilization? )

Let's dig a little deeper into what more NumPy can offer compared to typical Python lists. First, we can compare the common operations. Both NumPy and Python lists allow for insertion and deletion at a specific index and appendication and concatenation at the end of the data structure. However, NumPy's powerful capabilities extend beyond this. NumPy allows for array multiplication (easily), plotting, and even for use as a potential backend in simple applications. Lastly, NumPy's computational capabilities are crucial in developing machine learning programs. 

With all these distinctions in mind and a clear understanding of **why** NumPy is important, let's get into the installation. The code below will install NumPy and load our dataset. 

In [1]:
import numpy as np
from sklearn import datasets
iris = datasets.load_iris()

Great! (note about pip install?) So now we have imported NumPy as well as a dataset to use for our examples. Let's go ahead and learn some basic syntax about NumPy arrays and their various properties.

In [44]:
#instantiate an array of integers
arr = np.array([0,1,2,3,4], dtype='int32')
print(arr)

[0 1 2 3 4]


Our first instantiated array! The beauty of NumPy is that we don't have to stick to 1d arrays. We can create 2d and 3d arrays too. 

In [45]:
arr1 = np.array([[0,1,2, 3],[1,2,3,4]])
print(arr1)

[[0 1 2 3]
 [1 2 3 4]]


In [46]:
arr2 = np.array([[[2,17], [45, 78]], [[88, 92], [60, 76]],[[76,33],[20,18]]])
print(arr2)

[[[ 2 17]
  [45 78]]

 [[88 92]
  [60 76]]

 [[76 33]
  [20 18]]]


From these, we can obtain information about different properties of our instantiated arrays.

In [47]:
#Dimension (number of dimensions the array is in)

print(arr1.ndim)

#Shape (Number of rows, columns, etc)

print(arr2.shape)

#Type (data type)

print(arr1.dtype)

#Size

print(arr2.size)

#Total bytes used within array

print(arr1.nbytes)

2
(3, 2, 2)
int64
12
64


We can also instantiate certain matrices of mathematical significance. Let's instantiate a 0s matrix, a 1s matrix, and a matrix with random values.

In [48]:
np.zeros((2,2))


array([[0., 0.],
       [0., 0.]])

In [49]:
np.ones((2,3,2))

array([[[1., 1.],
        [1., 1.],
        [1., 1.]],

       [[1., 1.],
        [1., 1.],
        [1., 1.]]])

In [50]:
np.full((2,3), 2)

array([[2, 2, 2],
       [2, 2, 2]])

In [51]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

Knowing what we know now about creating arrays and displaying array properties, let's take a preliminary look at our Iris dataset. 

In [52]:
print(iris.data.ndim)
print(iris.data.shape)
print(iris.data.size)
print(iris.DESCR)


2
(150, 4)
600
.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.

Great! Now that we know how to look at basic properties, let's look at how to access information within our array. After all, thats what we want to do, right?

In [53]:
dset = iris.data
#gets data at row 2 column 3 within the data set
print(dset[2,3])
#gets the 0th row
print(dset[0,:])
#gets the 4th column
print(dset[:,3])
#gets the dataset with just the first 2 rows and first 3 columns
print(dset[0:2,0:3])


0.2
[5.1 3.5 1.4 0.2]
[0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 0.2 0.2 0.1 0.1 0.2 0.4 0.4 0.3
 0.3 0.3 0.2 0.4 0.2 0.5 0.2 0.2 0.4 0.2 0.2 0.2 0.2 0.4 0.1 0.2 0.2 0.2
 0.2 0.1 0.2 0.2 0.3 0.3 0.2 0.6 0.4 0.3 0.2 0.2 0.2 0.2 1.4 1.5 1.5 1.3
 1.5 1.3 1.6 1.  1.3 1.4 1.  1.5 1.  1.4 1.3 1.4 1.5 1.  1.5 1.1 1.8 1.3
 1.5 1.2 1.3 1.4 1.4 1.7 1.5 1.  1.1 1.  1.2 1.6 1.5 1.6 1.5 1.3 1.3 1.3
 1.2 1.4 1.2 1.  1.3 1.2 1.3 1.3 1.1 1.3 2.5 1.9 2.1 1.8 2.2 2.1 1.7 1.8
 1.8 2.5 2.  1.9 2.1 2.  2.4 2.3 1.8 2.2 2.3 1.5 2.3 2.  2.  1.8 2.1 1.8
 1.8 1.8 2.1 1.6 1.9 2.  2.2 1.5 1.4 2.3 2.4 1.8 1.8 2.1 2.4 2.3 1.9 2.3
 2.5 2.3 1.9 2.  2.3 1.8]
[[5.1 3.5 1.4]
 [4.9 3.  1.4]]
