# Introduction to NumPy
Learning NumPy
## Differences between lists and NumPy Arrays
- An array's size is immutable. You cannot append, insert or remoce elements, like you can with a list
- All an array's elements must be of the same data type
- A Num-Py array behaves in a pythonic fashion you can `len(my_array)` just like you would assume

In [1]:
import numpy as np
np.__version__

'1.16.4'

In [2]:
gpas_as_list = [3,4,3.483]
gpas_as_list.append(5)
gpas_as_list.insert(1,'whateves')
gpas_as_list.pop(1)

'whateves'

In [3]:
gpas = np.array(gpas_as_list)

In [4]:
?gpas

In [5]:
gpas.dtype

dtype('float64')

In [6]:
gpas.itemsize

8

## About data types

* By choosing the proper [data type](https://docs.scipy.org/doc/numpy-1.14.0/user/basics.types.html) you can greatly reduce the size required to store objects
* Data types are maintained by wrapping values in a [scalar representation](https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.scalars.html)
* `np.zeros` is a handy way to create an empty array filled with zeros.

In [7]:
study_minutes = np.zeros(100,np.uint16)
study_minutes
#dots after zeros in output show that it is a floating point object

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint16)

In [8]:
%whos

Variable        Type       Data/Info
------------------------------------
gpas            ndarray    4: 4 elems, type `float64`, 32 bytes
gpas_as_list    list       n=4
np              module     <module 'numpy' from '/Us<...>kages/numpy/__init__.py'>
study_minutes   ndarray    100: 100 elems, type `uint16`, 200 bytes


In [9]:
study_minutes[0]=150

In [10]:
first_day_minutes = study_minutes[0]

In [11]:
first_day_minutes

150

In [12]:
type(first_day_minutes)

numpy.uint16

In [13]:
study_minutes[1] = 60

In [14]:
study_minutes[2:6] = [80,60,30,90]
study_minutes

array([150,  60,  80,  60,  30,  90,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0], dtype=uint16)

In [15]:
student_gpas = np.array([
    [4.0,3.286,3.5,4.0],
    [3.2,3.8,4.0,4.0],
    [3.96,3.92,4.0,4.0]
],np.float16)
student_gpas

array([[4.   , 3.285, 3.5  , 4.   ],
       [3.2  , 3.8  , 4.   , 4.   ],
       [3.96 , 3.92 , 4.   , 4.   ]], dtype=float16)

In [16]:
student_gpas.ndim

2

In [17]:
student_gpas.shape

(3, 4)

In [18]:
student_gpas.size

12

In [19]:
len(student_gpas)

3

In [20]:
student_gpas.itemsize

2

In [21]:
student_gpas.itemsize*student_gpas.size

24

In [22]:
np.info(student_gpas)

class:  ndarray
shape:  (3, 4)
strides:  (8, 2)
itemsize:  2
aligned:  True
contiguous:  True
fortran:  False
data pointer: 0x7fad3b765ec0
byteorder:  little
byteswap:  False
type: float16


In [23]:
student_gpas[2]

array([3.96, 3.92, 4.  , 4.  ], dtype=float16)

In [24]:
student_gpas[2][3]

4.0

## Multidimensional Arrays

* The data structure is actually called `ndarray`, representing any **n**umber of **d**imensions
* Arrays can have multiple dimensions, you declare them on creation
* Dimensions help define what each element in the array represents.  A two dimensional array is just an array of arrays
* **Rank** defines how many dimensions an array contains 
* **Shape** defines the length of each of the array's dimensions
* Each dimension is also referred to as an **axis**, and they are zero-indexed. Multiples are called **axes**.
* A 2d array is also known as a **matrix**.

In [25]:
#add 60 minutes to the second day in the study_minutes array
study_minutes[2:6] = [80,60,30,90]

In [26]:
study_minutes = np.array([
    study_minutes,
    np.zeros(100, np.uint16)
])

In [27]:
study_minutes.shape

(2, 100)

In [28]:
#set round 2 day 1 to 60
study_minutes[1][0]= 60

In [29]:
study_minutes[1,0]

60

In [30]:
#tuples are assumed in python when you write numbers with commas
1,0

(1, 0)

In [31]:
rand = np.random.RandomState(42)
fake_log = rand.randint(30,180,size = 100, dtype=np.uint16)
fake_log

array([132, 122, 128,  44, 136, 129, 101,  95,  50, 132, 151,  64, 104,
       175, 117, 146, 139, 129, 133, 176,  98, 160, 179,  99,  82, 142,
        31, 106, 117,  56,  98,  67, 121, 159,  81, 170,  31,  50,  49,
        87, 179,  51, 116, 177, 118,  78, 171, 117,  88, 123, 102,  44,
        79,  31, 108,  80,  59, 137,  84,  93, 155, 160,  67,  80, 166,
       164,  70,  50, 102, 113,  47, 131, 161, 118,  82,  89,  81,  43,
        81,  38, 119,  52,  82,  31, 159,  57, 113,  71, 121, 140,  91,
        70,  37, 106,  64, 127, 110,  58,  93,  79], dtype=uint16)

In [32]:
[fake_log[3],fake_log[8]]

[44, 50]

In [78]:
fake_log[[3,8]]

array([44, 50], dtype=uint16)

In [34]:
index=np.array([
    [3,8],
    [0,1]
])
fake_log[index]

array([[ 44,  50],
       [132, 122]], dtype=uint16)

In [35]:
#axis to tell append on which axis to append data
#no axis will flatten data
study_minutes = np.append(study_minutes, [fake_log], axis=0)
study_minutes

array([[150,  60,  80,  60,  30,  90,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0],
       [ 60,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   

In [36]:
study_minutes[1,1] = 360

## Creation 
* You can create a random but bound grouping of values using the `np.random` package.  
  * `RandomState` let's you seed your randomness in a way that is repeatable.
* You can append a row in a couple of ways
   * You can use the `np.append` method.  Make sure the new row is the same shape.
   * You can create/reassign a new array by including the existing array as part of the iterable in creation.


## Indexing
* You can use an indexing shortcut by separating dimensions with a comma.  
* You can index using a `list` or `np.array`.  Values will be pulled out at that specific index.  This is known as fancy indexing.
  * Resulting array shape matches the index array layout.  Be careful to distinguish between the tuple shortcut and fancy indexing.

In [37]:
fake_log[fake_log < 60]

array([44, 50, 31, 56, 31, 50, 49, 51, 44, 31, 59, 50, 47, 43, 38, 52, 31,
       57, 37, 58], dtype=uint16)

In [38]:
study_minutes[study_minutes<60]

array([30,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0, 44, 50, 31, 56, 31, 50, 49, 51, 44, 31, 59,
       50, 47, 43, 38, 52, 31, 57, 37, 58], dtype=uint16)

In [41]:
#use this result as a boolean indexer
np.array([False,True,True]) & np.array([True,False,True])

array([False, False,  True])

In [42]:
study_minutes[(study_minutes < 60) & (study_minutes > 0)]

array([30, 44, 50, 31, 56, 31, 50, 49, 51, 44, 31, 59, 50, 47, 43, 38, 52,
       31, 57, 37, 58], dtype=uint16)

In [43]:
study_minutes[study_minutes<60] = 0

In [44]:
study_minutes

array([[150,  60,  80,  60,   0,  90,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0],
       [ 60, 360,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   

## Boolean Array Indexing
* You can create a boolean array by using comparison operators on an array.
  * You can use boolean arrays for fancy indexing.
  * Boolean arrays can be compared by using bitwise operators (`&`, `|`)
      * Do not use the `and` keyword.
      * Remember to mind the order of operations when combining
* Even though boolean indexing returns a new array, you can update an existing array using a boolean index.

In [45]:
fruit = ['apple', 'banana', 'cherry', 'durian']

In [47]:
fruit[1:3]

['banana', 'cherry']

In [48]:
fruit[:3]

['apple', 'banana', 'cherry']

In [49]:
fruit[3:]

['durian']

In [50]:
fruit[:]

['apple', 'banana', 'cherry', 'durian']

In [51]:
copied = fruit[:]

In [52]:
copied[3] = 'cheese'
#slicing a list rerutns a copy

In [53]:
fruit,copied

(['apple', 'banana', 'cherry', 'durian'],
 ['apple', 'banana', 'cherry', 'cheese'])

In [54]:
fruit[::2]

['apple', 'cherry']

In [55]:
#step to get every other element in the list
fruit[::2]

['apple', 'cherry']

In [56]:
#step to go backwards and get every element in backwards form
fruit[::-1]

['durian', 'cherry', 'banana', 'apple']

In [57]:
np.arange(20)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [59]:
practice = np.arange(42)
practice.shape = (7,6)
practice

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35],
       [36, 37, 38, 39, 40, 41]])

In [60]:
practice[2]

array([12, 13, 14, 15, 16, 17])

In [64]:
practice[2:5,3::2]

array([[15, 17],
       [21, 23],
       [27, 29]])

In [71]:
# any slicing of a ndarray returns a view and not a copy!
not_copied = practice[:]
practice[0,0] = 384
practice,not_copied

(array([[384,   1,   2,   3,   4,   5],
        [  6,   7,   8,   9,  10,  11],
        [ 12,  13,  14,  15,  16,  17],
        [ 18,  19,  20,  21,  22,  23],
        [ 24,  25,  26,  27,  28,  29],
        [ 30,  31,  32,  33,  34,  35],
        [ 36,  37,  38,  39,  40,  41]]),
 array([[384,   1,   2,   3,   4,   5],
        [  6,   7,   8,   9,  10,  11],
        [ 12,  13,  14,  15,  16,  17],
        [ 18,  19,  20,  21,  22,  23],
        [ 24,  25,  26,  27,  28,  29],
        [ 30,  31,  32,  33,  34,  35],
        [ 36,  37,  38,  39,  40,  41]]))

In [66]:
#boolean to see if variable is base or just a view of somthing else
practice.base is None

True

In [67]:
not_copied.base is None

False

In [68]:
not_copied.base is practice

True

In [69]:
practice.flags['OWNDATA'], not_copied.flags['OWNDATA']

(True, False)

## Slicing
* Works a lot like normal list slicing.
* You can use commas to separate each dimension slice.
* Always returns a data view
* You can access the base object using the `ndarray.base` property

In [72]:
practice_view = practice.reshape(3,14)
practice, practice_view, practice_view.base is practice

(array([[384,   1,   2,   3,   4,   5],
        [  6,   7,   8,   9,  10,  11],
        [ 12,  13,  14,  15,  16,  17],
        [ 18,  19,  20,  21,  22,  23],
        [ 24,  25,  26,  27,  28,  29],
        [ 30,  31,  32,  33,  34,  35],
        [ 36,  37,  38,  39,  40,  41]]),
 array([[384,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
          13],
        [ 14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
          27],
        [ 28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,
          41]]),
 True)

In [74]:
#reshape can infer what you want if you input a -1
practice.reshape(-1,2).shape

(21, 2)

In [75]:
#single dimension it is a view not copy
#use flatten() to make a copy not a view
practice.ravel()

array([384,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41])

In [80]:
#seach function to look for function
np.lookfor("flat")

Search results for 'flat'
-------------------------
numpy.diagflat
    Create a two-dimensional array with the flattened input as a diagonal.
numpy.flatiter
    Flat iterator object to iterate over arrays.
numpy.put
    Replaces specified elements of an array with given values.
numpy.flatnonzero
    Return indices that are non-zero in the flattened version of a.
numpy.ravel
    Return a contiguous flattened array.
numpy.ma.diagflat
    Create a two-dimensional array with the flattened input as a diagonal.
numpy.unravel_index
    Converts a flat index or array of flat indices into a tuple
numpy.matrix.flatten
    Return a flattened copy of the matrix.
numpy.ma.flatten_mask
    Returns a completely flattened version of the mask, where nested fields
numpy.chararray.flatten
    Return a copy of the array collapsed into one dimension.
numpy.chararray.put
    Set ``a.flat[n] = values[n]`` for all `n` in indices.
numpy.ravel_multi_index
    Converts a tuple of index arrays into an array of fl

In [81]:
#get information on the method
np.ravel?

In [84]:
#solving system of equations
#left side
orders = np.array([
    [2, 0, 0, 0],
    [4, 1, 2, 2],
    [0, 1, 0, 1],
    [6, 0, 1, 2]
])
#right side
totals = np.array([3, 20.50, 10, 14.25])
#solve
prices = np.linalg.solve(orders,totals)
prices

array([1.5 , 8.  , 1.25, 2.  ])

In [85]:
#A dot B(dot product)
orders @ prices

array([ 3.  , 20.5 , 10.  , 14.25])

In [86]:
#returns the totals with dot product same as function above
orders.dot(prices)

array([ 3.  , 20.5 , 10.  , 14.25])

In [88]:
a,b = np.split(np.arange(1,11),2)
a,b

(array([1, 2, 3, 4, 5]), array([ 6,  7,  8,  9, 10]))

In [89]:
#able to do all types of math with np.array
#overloaded functions, called universal functions or ufuncs
a+b

array([ 7,  9, 11, 13, 15])

In [90]:
a-b

array([-5, -5, -5, -5, -5])

In [91]:
b-a

array([5, 5, 5, 5, 5])

In [92]:
a*b

array([ 6, 14, 24, 36, 50])

In [93]:
#the value 2 is 'braodcasted across the array'
a+2

array([3, 4, 5, 6, 7])

In [95]:
a + np.repeat(2,5)

array([3, 4, 5, 6, 7])

In [96]:
#can add any two broadcastable arrays, this example will broadcast the smaller array over each row
x1 = np.arange(9.0).reshape((3,3))
x2 = np.arange(3.0)
x1,x2

(array([[0., 1., 2.],
        [3., 4., 5.],
        [6., 7., 8.]]), array([0., 1., 2.]))

In [97]:
np.add(x1,x2)
#can run trig functions over all values in the array all at once
#search for ufuncs on google for more information

array([[ 0.,  2.,  4.],
       [ 3.,  5.,  7.],
       [ 6.,  8., 10.]])

## Universal Functions
* [ufuncs](https://docs.scipy.org/doc/numpy/reference/ufuncs.html) are commonly needed vectorized functions
  * Vectorized functions allow you to operate element by element without using a loop
* The standard math and comparison operations have all been overloaded so that they can make use of vectorization
* Values can be [broadcasted](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html), or stretched to be applied to the ufuncs.