<b><font size = 3>NumPy</font></b><br>
NumPy, short for Numerical Python, is one of the most important foundational packages for <b>numerical computing</b> in Python. It is designed for <b>efficiency on large arrays of data</b>:<br> 
<itemize>
<ul>
    <li>internally stores data in a contiguous block of memory.</li>
    <li>uses much less memory than built-in Python sequences</li>
    <li>performs computations on the entire arrays (in parallel manner) without the need of for loops </li>
    <li>written in C language (the fastest language)
</ul>
        
       
We will study:
<ol>
<!--    <li>ndarray object internals</li> -->
    <li>creating ndarrays</li>
    <li>vectorization (arithmetic with NumPy arrays)</li>
    <li>indexing and slicing</li>
    <li>Pseudorandom number generation</li>
    <li>universal functions</li>
    <li>array-oriented programming</li>
    <li>linear algebra</li>
    <li>concatenating</li>
    <li>broadcasting</li>  
    <li>reading csv files</li>
</ol>




#1 creating an array<br>
#---one dimension arrays (1darrays)<br>
#---multidimension: ndarray = 1darray of arrays<br>
#---shape and data type of an array: shape, dtype, reshape 
#---NumPy-based algorithms are much faster than their pure Python counterparts: 10-100 times faster<br>
#---special arrays: numpy.zeros(), numpy.ones(), numpy.empty(), numpy.eyes()

In [72]:
#---we need to import the package numpy
#---we will use 3 functions (array, asarray, arange) to create an array: 
#---a = np.array(a_sequence) => it will create a NEW array containing the passed data in a_sequence
#---a = np.asarray(a_sequence) => it will create a NEW array containing the passed data in a_sequence if NEEDED
#---a = np.arange([start], stop, [step]) => a numpy version or the range() function. It will create a NEW array


import numpy as np

#---creating 1darrays

a1d = np.array([1, 2, 3, 4, 5]) 
print(a1d)

a1d1 = np.asarray([11, 22, 33])

print(a1d1)
a1d[0] = 100
print(a1d1)
#---creating ndarrays
a = np.array([[1, 2, 3],[4, 5, 6], [7, 8, 9]])
print(a)



[1 2 3 4 5]
[11 22 33]
[11 22 33]
[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [73]:
#2 vectorization
#--- element-wise operations on equal-size arrays: +, -, *, /, >, >=, <, <=, ==, !=, **, >>, <<, ...
#--- operations with scalars propagate the scalar argument to each element in the array


a1 = np.array([1, 2, 3])
a2 = np.array([2, 2, 2])
b = a1 ** a2
b = a1 << a2
print(b)

b = a1 * 100

[ 4  8 12]


In [75]:
#3 indexing and slicing
#--- that is the way you select element(s) from an array.
#--- select just one element a[i]; a[i][j]; a[i,j] 
#--- select multiple elements
#------in a continuous range of indices: slicing (working on views)
#------not in a continuous range of indices (creating a new array): (1) boolean indexing and (2) integer array indexing.

a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

#1 get me all elements from the beginning up to the position 5, inclusively
b = a[0:5]
b[0] = 100

#2 get me numbers >= 5
mask = a >=5 
b = a[mask]
print(b)

#3 get me elements in the positions 1, 3, 5

indices = [1, 3, 5]
b = a[indices]

#4 get me all rows which have the value 5 (your turn)



[100   5   6   7   8   9  10]


In [64]:
#4 Pseudorandom number generation
#---generate an array of sample values from a distribution
#---note: Because of randomness you may get different sample values in 
#---different runs. The seed here is to make sure you get the same 
#---sample values for everytime you run the code.
rng = np.random.default_rng(seed=12345)
a = rng.standard_normal(10)
#print(a)

b = np.array([1, 2, 3, 4, 5])
permu1 = np.random.permutation(b)
print(permu1)


[1 4 3 5 2]


In [77]:
#5 universal functions
#---They perform element-wise operations on ndarrays => very fast.
#---There are 2 kinds of universal functions: unary and binary
#---unary: just takes one ndarray as an input parameter.
#---binary: takes 2 same-size ndarrays as inputs
#---Note: most of universal functions have equivalent operators. For example, 
#---you can the add() function or the + operator to perform the same thing: 
#---adding 2 ndarrays 

#---example: unary function "square"
a = np.array([1, 3, 5, 7, 9])
b = np.square(a)
print(b)

#---example: binary function "add"
c = np.add(a,b)
print(c)

[ 1  9 25 49 81]
[ 2 12 30 56 90]


In [86]:
#6 array-oriented programming
#--- like function-oriented proramming: we try to divide a program into sub
#--- programs, called functions, and combine them together to solve a problem
#--- here array-oriented means we try to use an advantage of the very fast computation on numbers
#--- to accelerate the computing time.
#--- the key for this is vectorization which you have learned
#--- and now we will see some functions in this direction.
#--- the takeaway here is to avoid for loops as many as you can.

#--- conditional logic: numpy.where()
#--- mathemetical and statistical functions: sum, mean, std, max, min, 
#--- cumsum (cumulative sum),cumprod (cumulative product), argmin, argmax
#--- sorting: the sort() function
#--- see the section 4.4 for more functions

#--- z = np.where(condition, x, y) => z will be x if condition is true, y otherwise
#--- z and condition must be numpy arrays with the same size
#--- x and y can be arrays or a scalars. If arrays, it must have the same size
#--- with condition

#--- x, y, z and mask have the same size
x,y = np.array([1, 2, 3, 4, 5]), np.array([11, 12, 13, 14, 15])
mask = np.array([True, False, True, False, True])
z = np.where(mask, x, y)
print (z)

#--- y is just a scalar: 0
z = np.where(mask, x, 0)
print(z)
#z= ?

#--- mean, min, max ...
meanz = np.mean(z)
print(meanz)
meanz = z.mean()
print(meanz)


[ 1 12  3 14  5]
[1 0 3 0 5]
1.8
1.8


In [None]:
#7 linear algebra
# your turn: read and practice the section algebra at home

In [109]:
#8 concatenating and spliting

#--- numpy.concatenate() take a sequence (tuple, list, set, ...) of nparrays
#--- and join them in order along the input axis
#--- convenience functions: vstack, hstack

#--- numpy.split() takes a numpy array and locations and the splits the 
#--- array into subarrays as views

#--- example: concatnate 2 nparrays along rows

rng = np.random.default_rng(seed = 42)
a = rng.standard_normal(size = (5, 5))
b = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
c = np.concatenate((a, b), axis = 0)

b = np.aray([[1],
            [2],
            [5],
            [4],
            [5]])

#print(c)
#--- example: concatnate 2 nparrays along cols

#--- example: split

rng = np.random.default_rng(seed = 42)
a = rng.standard_normal(size = (5, 5))
print(a)
x, y, z = np.split(a, [1,2])

x[0,0] = 0
print(a)




[[ 0.30471708 -1.03998411  0.7504512   0.94056472 -1.95103519]
 [-1.30217951  0.1278404  -0.31624259 -0.01680116 -0.85304393]
 [ 0.87939797  0.77779194  0.0660307   1.12724121  0.46750934]
 [-0.85929246  0.36875078 -0.9588826   0.8784503  -0.04992591]
 [-0.18486236 -0.68092954  1.22254134 -0.15452948 -0.42832782]]
[[ 0.         -1.03998411  0.7504512   0.94056472 -1.95103519]
 [-1.30217951  0.1278404  -0.31624259 -0.01680116 -0.85304393]
 [ 0.87939797  0.77779194  0.0660307   1.12724121  0.46750934]
 [-0.85929246  0.36875078 -0.9588826   0.8784503  -0.04992591]
 [-0.18486236 -0.68092954  1.22254134 -0.15452948 -0.42832782]]


In [112]:
#9 broadcasting
#--- idea here is broadcasting to use vectorization: accelerate computing time
#--- motivated example: 

a = np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
#a = a + [10, 100, 1000] # in rows
a = a + [[10],
     [100],
     [1000]]
print(a)

#--- => the array [10, 100, 1000] will be broadcasted thru all rows of the 
#--- array a. Then the Python intepreter will do element-wise adding 
#--- (vectorization) the array a and the broadcasted array. Here the broadcasted
#--- array is 
#--- [[10, 100, 1000],
#---  [10, 100, 1000],
#---  [10, 100, 1000]]



[[  11   12   13]
 [ 104  105  106]
 [1007 1008 1009]]


In [114]:
#10 reading and writing csv files
#--- reading use the numpy.genfromtext() method
#--- writing: use the savetxt() method

#---example: reading numbers in the file world_geo.csv
#---skip the first column (because it is text), skip the first row (header)
#--- 

#
fname = "./data/world_geo.csv"

usecols = (1, 2, 3)
data = np.genfromtxt(fname, delimiter=',', skip_header=1, usecols=usecols, dtype=int)
print(data.shape)
print(data[:20])

np.savetxt(fname + "_copy", data, delimiter = ",",  fmt='%i')





(263, 3)
[[    8660       -1       -1]
 [  652864       -1       -1]
 [     254       -1       -1]
 [    1580       -1       -1]
 [   28748       -1       -1]
 [ 2381741       -1      381]
 [     199      199        0]
 [     468      468        0]
 [ 1246700       -1      246]
 [      91       91        0]
 [14200000       -1      200]
 [     442      443        0]
 [ 2780400       -1      736]
 [   29743       -1       -1]
 [    3170       -1       -1]
 [     180      180        0]
 [       5        5        0]
 [ 7692024       -1      633]
 [   83871       -1       -1]
 [   86600       -1       -1]]


In [None]:
df[df.van > 5]
