# Class 4 - more on manipulating arrays; basic file I/O

As usual, we are very grateful to J.R. Johansson at [http://github.com/jrjohansson/scientific-python-lectures](http://github.com/jrjohansson/scientific-python-lectures).

And we also use material from

http://www.scipy-lectures.org/intro/numpy/operations.html

and

https://github.com/jakevdp/PythonDataScienceHandbook (02.08)


In [None]:
# what is this line all about?!? Answer in lecture 4
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

## Manipulating arrays

### Index slicing

Index slicing is the technical name for the syntax `M[lower:upper:step]` to extract part of an array:

In [None]:
A = np.array([1,2,3,4,5])
A

In [None]:
A[1:3]

Array slices are *mutable*: if they are assigned a new value the original array from which the slice was extracted is modified:

In [None]:
A[1:3] = [-2,-3]

A

We can omit any of the three parameters in `M[lower:upper:step]`. Indexing works exactly as we saw for strings.

In [None]:
A[::] # lower, upper, step all take the default values

In [None]:
A[::2] # step is 2, lower and upper defaults to the beginning and end of the array

In [None]:
A[:3] # first three elements

In [None]:
A[3:] # elements from index 3

Negative indices counts from the end of the array (positive index from the begining):

In [None]:
A = np.array([1,2,3,4,5])

In [None]:
A[-1] # the last element in the array

In [None]:
A[-3:] # the last three elements

Index slicing works exactly the same way for multidimensional arrays:

In [None]:
A = np.array([[n+m*10 for n in range(5)] for m in range(5)])

A

In [None]:
# a block from the original array
A[1:4, 1:4]

In [None]:
A

In [None]:
# strides
A[::2,::2]

### Fancy indexing

Fancy indexing is the name for when an array or list is used in-place of an index: 

In [None]:
A

In [None]:
row_indices = [1, 2, 3]
A[row_indices,:] #can also not use the : and just write A[row_indices]

In [None]:
A[row_indices]

In [None]:
col_indices = [1, 2, -1] # remember, index -1 means the last element
A[row_indices, col_indices] 

### Masks ###
We can also use index masks: If the index mask is an Numpy array of data type `bool`, then an element is selected (True) or not (False) depending on the value of the index mask at the position of each element: 

In [None]:
B = np.array(range(5))
B

In [None]:
row_mask = np.array([True, False, True, False, False])
B[row_mask]

In [None]:
# same thing
row_mask = np.array([1,0,1,0,0], dtype=bool)
B[row_mask]

This feature is very useful to conditionally select elements from an array, using for example comparison operators:

In [None]:
x = np.arange(0, 10, 0.5)
x

In [None]:
mask = (5 < x) * (x < 7.5)

mask

In [None]:
x[mask]

In [None]:
mask = np.logical_and((5 < x),(x < 7.5))

In [None]:
mask = (5 < x) * (x < 7.5) #equivalent to above

In [None]:
len(x[mask]) 

### where

The index mask can be converted to position index using the `where` function

In [None]:
indices = np.where(mask)

indices

In [None]:
x[indices] # this indexing is equivalent to the fancy indexing x[mask]

In [None]:
np.where(x > 5)

### Reshaping arrays

In [None]:
A

In [None]:
n, m = A.shape

In [None]:
n,m

In [None]:
B = A.reshape((1,n*m))
B

In [None]:
C = np.arange(30)
C

In [None]:
C.reshape(6,5)

### hstack and vstack

In [None]:
a = np.array([[1,2],[3,4]])
b= np.array([[5,6]])

In [None]:
b

In [None]:
np.vstack((a,b))

In [None]:
np.hstack((a,b.T))

## Copy and "deep copy"

To achieve high performance, assignments in Python usually do not copy the underlaying objects. This is important for example when objects are passed between functions, to avoid an excessive amount of memory copying when it is not necessary (technical term: pass by reference). 

In [None]:
A = np.array([[1, 2], [3, 4]])

A

In [None]:
# now B is referring to the same array data as A 
B = A 

In [None]:
# changing B affects A
B[0,0] = 10

B

In [None]:
A

If we want to avoid this behavior, so that when we get a new completely independent object `B` copied from `A`, then we need to do a so-called "deep copy" using the function `copy`:

In [None]:
B = np.copy(A)

In [None]:
# now, if we modify B, A is not affected
B[0,0] = -5

B

In [None]:
A

## Vectorizing functions

A basic rule of programming get good performance we should try to avoid looping over elements in our vectors and matrices, and instead use vectorized algorithms. The first step in converting a scalar algorithm to a vectorized algorithm is to make sure that the functions we write work with vector inputs.

In [None]:
def Theta(x):
    """
    Scalar implementation of the Heaviside step function.
    """
    if x >= 0:
        return 1
    else:
        return 0

In [None]:
Theta(-10)

In [None]:
Theta(np.array([-3,-2,-1,0,1,2,3]))

OK, that didn't work because we didn't write the `Theta` function so that it can handle a vector input... 

To get a vectorized version of Theta we can use the Numpy function `vectorize`. In many cases it can automatically vectorize a function:

In [None]:
Theta_vec = np.vectorize(Theta)

In [None]:
Theta_vec(np.array([-3,-2,-1,0,1,2,3]))

We can also implement the function to accept a vector input from the beginning (requires more effort but might give better performance):

In [None]:
def Theta(x):
    """
    Vector-aware implementation of the Heaviside step function.
    """ 
    return 1 * (x >= 0)

In [None]:
Theta(np.array([-3,-2,-1,0,1,2,3]))

## Using arrays in conditions

When using arrays in conditions,for example `if` statements and other boolean expressions, one needs to use `any` or `all`, which requires that any or all elements in the array evalutes to `True`:

In [None]:
M = np.array([[1, 2], [3, 6]])

In [None]:
if (M > 5).any():
    print("at least one element in M is larger than 5")
else:
    print("no element in M is larger than 5")

In [None]:
M > 5

In [None]:
if (M > 5).all():
    print("all elements in M are larger than 5")
else:
    print("not all elements in M are larger than 5")

## Type casting

Since Numpy arrays are *statically typed*, the type of an array does not change once created. But we can explicitly cast an array of some type to another using the `astype` functions (see also the similar `asarray` function). This always create a new array of new type:

In [None]:
M.dtype

In [None]:
M2 = M.astype(float)

M2

In [None]:
M2.dtype

In [None]:
M3 = M.astype(bool)

M3

### Fast Sorting in NumPy: np.sort and np.argsort

Although Python has built-in sort and sorted functions to work with lists, NumPy's np.sort function turns out to be much more efficient and useful. 

To return a sorted version of the array without modifying the input, you can use np.sort:

In [None]:
x = np.array([2, 1, 4, 3, 5])

print(np.sort(x)) # does not change x

If you prefer to sort the array in-place, you can instead use the sort method of arrays:

In [None]:
x.sort()

print(x)


A related function is argsort, which instead returns the indices of the sorted elements:

In [None]:
x = np.array([2, 1, 4, 3, 5])

i = np.argsort(x)

print(i)

The first element of this result gives the index of the smallest element, the second value gives the index of the second smallest, and so on. These indices can then be used (via fancy indexing) to construct the sorted array if desired:

In [None]:
x[i]

## Quick I/O from files (will do more in Pandas) 

Built-in Python commands are open(), read(), write(), close()

They are best explained through examples, for example those found at

https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files

and also this nice lecture that I found while looking for examples

http://www.grapenthin.org/teaching/geop501/lectures/lecture_07_fileinput_output.pdf

For brevity's sake, we look at the more flexible options in numpy:
    
np.loadtxt()

np.genfromtxt()

np.savetxt()

np.genfromtxt() is more powerful than np.loadtxt() because it can gracefully 
handle missing data, so we'll just use that. 

In [None]:
np.genfromtxt?

Some useful parameters are: 

skipheader = N (skips N rows)

usecols = ... to specifywhich columns to read

delimiter= ‘,’  to define the delimiter (white space is default)

In [None]:
randomdata = np.random.rand(10,5)

In [None]:
randomdata

In [None]:
np.savetxt('randomdata1.txt',randomdata,delimiter=',')






In [None]:
!head randomdata1.txt

In [None]:
np.savetxt('randomdata1.txt',randomdata,delimiter=',',fmt="%.3f")

In [None]:
!head randomdata1.txt #(for Windows use type)

In [None]:
data = np.genfromtxt('randomdata1.txt')

data

#### Aaagh! What happened?

In [None]:
data = np.genfromtxt('randomdata1.txt',delimiter=',')

data

In [None]:
#I can also do

data = np.genfromtxt('randomdata1.txt',skip_header=5,usecols= (2,3),delimiter=',')

data

#### Exercises (these won't be graded but you see some similar stuff in next week's notebook):

1. Generate a size 20 array populated by random numbers between 0 and 5; 
2. Create a mask that filters out the numbers smaller than 1; 
3. Return the new array; 
4. Sort the new array; 
5. Reverse-sort the new array; 
6. Write the new array to file, using one number per line; 
7. Read the file into a third array, skipping the first row 
8. Print the final array to screen.


