## Getting Started

Lets start at the very beginning. You can use Python as a calculator. This gives you an idea of how to do arithmetic (and what the arithmetic operators are) in Python.

![](https://github.com/univai-pyprep-c1/BasicPython/raw/9a66b164175e3032fb88db4112754c1cb07ac4d8//getting_started.slides.dir/8.png)

In [5]:
1 + 1

2

In [6]:
2**10

1024

In [7]:
5/2

2.5

In [8]:
5//2

2

### Variables

Variables are labels for values

- Start with a letter or underscore
- Can contain only alpha-numeric characters and underscore
- Are case sensitive

In [9]:
# CANT DO THIS 9variable = "hello"
# print 9variable

Var = "hello"
#print (var) # will give an error
print (Var)

hello


What happened above? In the computer's memory, a location was found, and then filled with the word "hello". Then a variable Var was created and was used to label this memory location.

![](https://github.com/univai-pyprep-c1/BasicPython/raw/9a66b164175e3032fb88db4112754c1cb07ac4d8//images/labelmem.jpg)

A variable is literally a label. You can think of it as a post-it, or sticky note, or a pointer. Do not think of it as a box in which a value is stored.

### Value Types

Variables point to values that can be of multiple types. See below:

In [10]:
var = 7 # an integer
print (var, type(var))
var = 7.01 # a float, something with decimals
print (var, type(var))
var = "Hello World!" # a string, a set of characters
print (var, type(var))
var = True # a boolean, that is, a true or false value
print (var, type(var))

7 <class 'int'>
7.01 <class 'float'>
Hello World! <class 'str'>
True <class 'bool'>


### Variable location and garbage collection

In the above, we just take the variable var and point it to different pieces of memory holding different values. This is perhaps more clear from a diagram.

![](https://github.com/univai-pyprep-c1/BasicPython/raw/9a66b164175e3032fb88db4112754c1cb07ac4d8//images/labelmemmany.jpg)

What happens to the old memory locations. They are now bereft: no variable points to them. Python will detect this and recycle that memory. This is called garbage collection.

There are comparison operators, which can be used to make decisions:

In [11]:
a = "hgi"
b = "hello"
c = "hi"
d = "hello"
print (a==c)
print (b==d)
var1 = 5
var2 = 3
print (var1 < var2)

False
True
False


The first comparison compares the contents of the memory for a and c and finds that both are different, giving us False. Conversely, the second comparison gives us True. We can utilize such comparisons in "decision statements". The third is a numerical comparison. The fact that these comparisons give us a boolean value can be used in the decision-making:

In [12]:
#Simple If conditions
var1 = 5
var2 = 10

if var1 == var2:
    print("The values are equal")
if var1 < var2:
    print("First variable is lesser than the second variable")
if var1 > var2:
    print("Second variable is lesser than the first variable")

First variable is lesser than the second variable


Notice how python dispenses with brackets, replacing them by a colon and an indented next line. The indentation tells us that the code below runs when the condition holds. Python uses this colon-indentation for many things, such as for loops for iteration and loops in general, for conditionals, for function and class definition, etc.

This conditional is such a common idiom that there is a better way to write it.

In [13]:
#An alternative way to code the previous (If-Else)
var1 = 5
var2 = 10

if (var1 == var2):
    print("The values are equal")
elif (var1 < var2):
    print("First variable is lesser than the second variable")  
else:
    print("Second variable is lesser than the first variable")

First variable is lesser than the second variable


### Functions

A function is a set of statements that take inputs, do some specific computation and produces output.

There are many ways to define functions:

- Functions can be built-in to python, or imported from an existing Python Library
- A function can be user defined
- A function can be anonymous
- functions can belong to objects. More on this later.

properties of functions:

- A function can be called from other functions
- Can return data

![](https://github.com/univai-pyprep-c1/BasicPython/raw/9a66b164175e3032fb88db4112754c1cb07ac4d8//functions.slides.dir/2.png)

In [14]:
# Built-in functions

var1 = -15
abs(var1)

15

Here are two different ways of importing from a module, which is a library of python functions:

In [15]:
from math import sqrt
sqrt(4)

2.0

In [16]:
import os
os.cpu_count()

8

## EXERCISE 1: A Simple Calculator

Here is a fun little calculator which exercises what you have learnt so far.

It has only 4 operations - Addition, Subtraction, Multiplication and Divison

Notice the use of no-operation `pass`. It allows us to incrementally code by doing nothing to start, and put in functionality over time.

In [24]:
print("Select operation\n")
print("1.Addition")
print("2.Subtraction")
print("3.Multiplication")
print("4.Division")

# Something NEW: Take input from the user 
choice = input("\nEnter choice(1/2/3/4): ")

# We convert the input to a floating-pount(or real) number.

num1 = float(input("\nEnter first number: "))
num2 = float(input("\nEnter second number: "))

if choice == '1':
    # your code here
    pass
elif choice == '2':
    # your code here
    pass
elif choice == '3':
    # your code here
    pass
elif choice == '4': # could use else 
    if(num2==0):
        print ("Invalid input")
    else:
        print(num1,"/",num2,"=", num1/num2)

Select operation

1.Addition
2.Subtraction
3.Multiplication
4.Division



Enter choice(1/2/3/4):  

Enter first number:  1

Enter second number:  2


## Listiness.. or things that behave like lists

Python puts great stock in the idea of having protocols or mechanisms of behavior, and identifying cases in which this behavior is common.

One of the most important ideas is that of things that behave like a list of items.

These include lists, strings, and files. Many other data structures in Python are made to behave like lists as well, so that their content can be iterated through, in addition to their own native behavior.

![](https://github.com/univai-pyprep-c1/BasicPython/raw/9a66b164175e3032fb88db4112754c1cb07ac4d8//listiness.slides.dir/3.png)

In [25]:
# CREATING A LIST

# A list is made from zero or more elements, separated by commas, and surrounded by square
empty_list = []
working_days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
print (working_days[2])

Wednesday


![](https://github.com/univai-pyprep-c1/BasicPython/raw/9a66b164175e3032fb88db4112754c1cb07ac4d8//listiness.slides.dir/4.png)

In [26]:
lst = list(range(1,5))
print(lst)
print (lst[-1])

[1, 2, 3, 4]
4


In [27]:
lst[0:3] # from 0, dont include whats at index 3

[1, 2, 3]

In [28]:
lst[-3:-1]

[2, 3]

In [29]:
# A list can have another list, or anything else as its element
numbers = [1, 2, 3, 4, 5]
courses = ['PP', 'BDA', "USP", 'WTA'] 
new_list = [numbers, courses, '6th sem'] 
print (new_list, len(new_list))

[[1, 2, 3, 4, 5], ['PP', 'BDA', 'USP', 'WTA'], '6th sem'] 3


In [30]:
# Membership (using 'in' operator)
'PP' in courses

True

In [31]:
# append(x) - Adds a new element 'x' at the end of the list
alist = ['a', 'b', 'c'] 
alist.append('d') 
print (alist)

['a', 'b', 'c', 'd']


Adding lists produces a bigger list! This is an example of a programming technique called operator overloading

In [32]:
first_list = [1, 2, 3]
second_list = ['a', 'b', 'c']
first_list + second_list

[1, 2, 3, 'a', 'b', 'c']

## Iteration

for loops are fundamental in any language to go over lists.



In [33]:
num = [4, 7, 2, 6, 3, 9]
for ele in num:
    print(ele)

4
7
2
6
3
9


In [34]:
for ele in num:
    if ele % 2 == 0: #even numbers only
        print(ele)

4
2
6


There is a short-cut iteration syntax called a list comprehension, often used to construct new lists

In [35]:
list_with_same_as_num = [e for e in num]
list_with_same_as_num

[4, 7, 2, 6, 3, 9]

In [36]:
squared_list = [e*e for e in num]
squared_list

[16, 49, 4, 36, 9, 81]

In [None]:
list_with_evens = [e for e in num if e % 2 == 0]
list_with_evens

[4, 2, 6]

### Tuples

They are a fast kind of sequence that functions much like a list - they have elements which are indexed starting at 0. They work exactly like lists, except that tuples can't be changed in place!! This means they are immutable, and this guarantee gives them their speed. Thus, unlike lists, they cannot be grown or shrunk.

In [114]:
z = 1,2,3,4 # or z = (1, 2, 3, 4)
type(z) 

tuple

In [115]:
z

(1, 2, 3, 4)

In [116]:
z[2] = 5

TypeError: 'tuple' object does not support item assignment

## Files

The built-in open() function creates a Python file object, which serves as a link to a file residing on your machine. After calling 'open()', strings of data can be transferred to and from the associated external file by calling the returned file object's methods.

At this point, you can read data from the file as a whole (read(), or n bytes at a time, read(n). You can read a line at a time with readline(), and all the lines into a list of strings with readlines(). Similar methods exist for writing.

You must close the file after you finish using it.

![](https://github.com/univai-pyprep-c1/BasicPython/raw/9a66b164175e3032fb88db4112754c1cb07ac4d8//listiness.slides.dir/10.png)

But as you might have expected, you can treat a file just like a list even more idiomatically, as we shall see.

In [37]:
fd = open("data/predictwise.csv")
counter = 0
for line in fd:
    if counter < 10: # print first 10 lines, there are 51!
        print("<<", line, ">>")
    counter = counter + 1 # also writeable as counter += 1
fd.close()

<< Obama,Romney,States,Votes
 >>
<< 0.0,1.0,Alabama,9
 >>
<< 0.0,1.0,Alaska,3
 >>
<< 0.062,0.938,Arizona,11
 >>
<< 0.0,1.0,Arkansas,6
 >>
<< 1.0,0.0,California,55
 >>
<< 0.807,0.193,Colorado,9
 >>
<< 1.0,0.0,Connecticut,7
 >>
<< 1.0,0.0,Delaware,3
 >>
<< 1.0,0.0,District of Columbia,3
 >>


Notice that the newlines remain. You can use the string method `strip` to remove them. There are many such **methods**, which are functions that belong to strings, or in general, to "objects".

### Strings 
This would be a good time to introduce strings as objects. And the key thing to note here is that a lot of what you learn about lists applies to them: they are iterable, and they have a length! But they have one critical additional property: they are immutable, that is they cant be changed!

In [38]:
var = "This is a string"
print (var, len(var))
print (var[3]) # the s of This
for char in var:
    print(char)

This is a string 16
s
T
h
i
s
 
i
s
 
a
 
s
t
r
i
n
g


In [39]:
var[3] = "t" # this will fail because immutability

TypeError: 'str' object does not support item assignment

In [40]:
# String slicing
print(var)
print (var[6:])
print (var[1:3])
print (var[:-1])
print (var[2:10:2]) #the last parameter is for chagning the step size

This is a string
s a string
hi
This is a strin
i sa


![](https://github.com/univai-pyprep-c1/BasicPython/raw/9a66b164175e3032fb88db4112754c1cb07ac4d8//listiness.slides.dir/7.png)

Now go look up the documentation of the `strip` method. It removes whitespace "around" non-whitespace characters in a string. We can thus use it to get the  lines in our file:

In [41]:
fd = open("data/predictwise.csv")
counter = 0
for line in fd:
    if counter < 10: # print first 10 lines, there are lots!
        print("<<", line.strip(), ">>")
    counter = counter + 1 # also writeable as counter += 1
fd.close()

<< Obama,Romney,States,Votes >>
<< 0.0,1.0,Alabama,9 >>
<< 0.0,1.0,Alaska,3 >>
<< 0.062,0.938,Arizona,11 >>
<< 0.0,1.0,Arkansas,6 >>
<< 1.0,0.0,California,55 >>
<< 0.807,0.193,Colorado,9 >>
<< 1.0,0.0,Connecticut,7 >>
<< 1.0,0.0,Delaware,3 >>
<< 1.0,0.0,District of Columbia,3 >>


### What about writing?

Lets write the first ten lines out...

In [42]:
fd = open("data/predictwise.csv")
fd2 = open("data/myown.csv", "w")
counter = 0
for line in fd:
    if counter < 10: # print first 10 lines
        print("<<", line.strip(), ">>")
        fd2.write(line)
    else:
        break # break out of for loop
    counter = counter + 1 # also writeable as counter += 1
fd.close()
fd2.close()
print(counter)

<< Obama,Romney,States,Votes >>
<< 0.0,1.0,Alabama,9 >>
<< 0.0,1.0,Alaska,3 >>
<< 0.062,0.938,Arizona,11 >>
<< 0.0,1.0,Arkansas,6 >>
<< 1.0,0.0,California,55 >>
<< 0.807,0.193,Colorado,9 >>
<< 1.0,0.0,Connecticut,7 >>
<< 1.0,0.0,Delaware,3 >>
<< 1.0,0.0,District of Columbia,3 >>
10


## EXERCISE 2: Read a file and parse words from it

Read Julius Caesar. Get each line. Remove newline characters from each line. Split the line to get the words from the line. Lowercase them. Print the first 1000 words, lower-cased.

Is this really a good way to get words. Can you suggest, a better way, at the cost of more memory?

In [43]:
# your code here
## Read a file, parse lines, and get all words

# make a list with all words in documents
# the words can occur more than once
wordlist = []  
fd = open("data/Julius Caesar.txt")
lines = fd.readlines()
fd.close()
# strip newline characters and other whitespace off the edges
cleaned_lines = [line.strip() for line in lines] 
# make a list of lists. 
# each inner list if the list of words on that line
list_of_lines_words = [line.split() for line in lines]
# Take each list of words, and get all the words
for lines_words in list_of_lines_words:
    wordlist = wordlist + [l.lower() for l in lines_words] # update the wordlist using the new list.
print(wordlist[:1000]) # first 1000 words

['the', 'tragedy', 'of', 'julius', 'caesar', 'by', 'william', 'shakespeare', 'contents', 'act', 'i', 'scene', 'i.', 'rome.', 'a', 'street.', 'scene', 'ii.', 'the', 'same.', 'a', 'public', 'place.', 'scene', 'iii.', 'the', 'same.', 'a', 'street.', 'act', 'ii', 'scene', 'i.', 'rome.', 'brutusâ€™', 'orchard.', 'scene', 'ii.', 'a', 'room', 'in', 'caesarâ€™s', 'palace.', 'scene', 'iii.', 'a', 'street', 'near', 'the', 'capitol.', 'scene', 'iv.', 'another', 'part', 'of', 'the', 'same', 'street,', 'before', 'the', 'house', 'of', 'brutus.', 'act', 'iii', 'scene', 'i.', 'rome.', 'before', 'the', 'capitol;', 'the', 'senate', 'sitting.', 'scene', 'ii.', 'the', 'same.', 'the', 'forum.', 'scene', 'iii.', 'the', 'same.', 'a', 'street.', 'act', 'iv', 'scene', 'i.', 'a', 'room', 'in', 'antonyâ€™s', 'house.', 'scene', 'ii.', 'before', 'brutusâ€™', 'tent,', 'in', 'the', 'camp', 'near', 'sardis.', 'scene', 'iii.', 'within', 'the', 'tent', 'of', 'brutus.', 'act', 'v', 'scene', 'i.', 'the', 'plains', 'of', 

## Data

Now that we know a bit about files, and a bit about reading files in as strings, let us see how we might get data from these files

In [47]:
fd = open("data/predictwise.csv")
counter = 0
data = []
for line in fd:
    linedata = line.strip().split(',')
    counter += 1
    if counter > 1: # dont get header line
        data.append(linedata)
fd.close()
data

[['0.0', '1.0', 'Alabama', '9'],
 ['0.0', '1.0', 'Alaska', '3'],
 ['0.062', '0.938', 'Arizona', '11'],
 ['0.0', '1.0', 'Arkansas', '6'],
 ['1.0', '0.0', 'California', '55'],
 ['0.807', '0.193', 'Colorado', '9'],
 ['1.0', '0.0', 'Connecticut', '7'],
 ['1.0', '0.0', 'Delaware', '3'],
 ['1.0', '0.0', 'District of Columbia', '3'],
 ['0.72', '0.28', 'Florida', '29'],
 ['0.004', '0.996', 'Georgia', '16'],
 ['1.0', '0.0', 'Hawaii', '4'],
 ['0.0', '1.0', 'Idaho', '4'],
 ['1.0', '0.0', 'Illinois', '20'],
 ['0.036000000000000004', '0.9640000000000001', 'Indiana', '11'],
 ['0.8370000000000001', '0.163', 'Iowa', '6'],
 ['0.0', '1.0', 'Kansas', '6'],
 ['0.0', '1.0', 'Kentucky', '8'],
 ['0.0', '1.0', 'Louisiana', '8'],
 ['1.0', '0.0', 'Maine', '4'],
 ['1.0', '0.0', 'Maryland', '10'],
 ['1.0', '0.0', 'Massachusetts', '11'],
 ['0.987', '0.013000000000000001', 'Michigan', '16'],
 ['0.982', '0.018000000000000002', 'Minnesota', '10'],
 ['0.0', '1.0', 'Mississippi', '6'],
 ['0.07400000000000001', '0.92599

It might be easier to arrange this data in multiple arrays, after converting types:

In [50]:
fd = open("data/predictwise.csv")
counter = 0
obamaprobs = []
romneyprobs = []
states = []
electoral_votes = []
for line in fd:
    linedata = line.strip().split(',')
    counter += 1
    if counter > 1: # dont get header line
        obamaprobs.append(float(linedata[0]))
        romneyprobs.append(float(linedata[1]))
        states.append(linedata[2])
        electoral_votes.append(int(linedata[3]))
fd.close()
electoral_votes

[9,
 3,
 11,
 6,
 55,
 9,
 7,
 3,
 3,
 29,
 16,
 4,
 4,
 20,
 11,
 6,
 6,
 8,
 8,
 4,
 10,
 11,
 16,
 10,
 6,
 10,
 3,
 5,
 6,
 4,
 14,
 5,
 29,
 15,
 3,
 18,
 7,
 7,
 20,
 4,
 9,
 3,
 11,
 38,
 6,
 3,
 13,
 12,
 5,
 10,
 3]

## Numpy

The entire process of doing this by yourself is fraught with edge cases and errors: what if a electoral vote value was done in characters? What if a state had a comma? So we prefer to use robust built-in libraries to do the work for us. Here we'll introduce `numpy`, and another workshop will introduce Pandas.

In [91]:
import numpy as np
data2 = np.genfromtxt(fname='data/predictwise.csv', 
                   delimiter=',',
                   skip_header=1,
                   dtype=(float, float, '|S32', int)
        )

In [93]:
data2

array([(0.   , 1.   , b'Alabama',  9), (0.   , 1.   , b'Alaska',  3),
       (0.062, 0.938, b'Arizona', 11), (0.   , 1.   , b'Arkansas',  6),
       (1.   , 0.   , b'California', 55), (0.807, 0.193, b'Colorado',  9),
       (1.   , 0.   , b'Connecticut',  7),
       (1.   , 0.   , b'Delaware',  3),
       (1.   , 0.   , b'District of Columbia',  3),
       (0.72 , 0.28 , b'Florida', 29), (0.004, 0.996, b'Georgia', 16),
       (1.   , 0.   , b'Hawaii',  4), (0.   , 1.   , b'Idaho',  4),
       (1.   , 0.   , b'Illinois', 20), (0.036, 0.964, b'Indiana', 11),
       (0.837, 0.163, b'Iowa',  6), (0.   , 1.   , b'Kansas',  6),
       (0.   , 1.   , b'Kentucky',  8), (0.   , 1.   , b'Louisiana',  8),
       (1.   , 0.   , b'Maine',  4), (1.   , 0.   , b'Maryland', 10),
       (1.   , 0.   , b'Massachusetts', 11),
       (0.987, 0.013, b'Michigan', 16), (0.982, 0.018, b'Minnesota', 10),
       (0.   , 1.   , b'Mississippi',  6),
       (0.074, 0.926, b'Missouri', 10), (0.046, 0.954, b'Montana

In [94]:
type(data2)

numpy.ndarray

What's this? Its a new datatype, a new kind of list, called a n-dimesnsional array, or ndarray. How many dimensions does it have? Whats so special about it?

In [96]:
data2.ndim

1

Ok. Only one dimension. But we had 4 columns. So what is going on here?

In [97]:
data2.dtype

dtype([('f0', '<f8'), ('f1', '<f8'), ('f2', 'S32'), ('f3', '<i8')])

We ask for the datatype, and thus we notice some things. The datatype is a "tuple" of fields f0, f1, f2, f3, with sizes: 8byte floating point, 8 byte floating point, 32 byte string (each character takes a byte so we specified 32 chars to allow for long states like "North Carolina", and a 8 byte integer. 

We saw none of this with regular python integers, floats, characters, strings. Why?

The reason is that numpy was created to let us do high performance array manipulations in python by making all these manipulations in C. One part of such high performance manipulations is to arrange for memory to be contiguously accessed. See this:

In [99]:
data2.strides

(56,)

56 is 8 + 8 + 32 + 8. Which means that you get a new row every 56 bytes in memory. There is no overhead from python objects, etc, just pure memory. From https://numpy.org/doc/stable/reference/internals.html :

>NumPy arrays consist of two major components, the raw array data (from now on, referred to as the data buffer), and the information about the raw array data. The data buffer is typically what people think of as arrays in C or Fortran, a contiguous (and fixed) block of memory containing fixed sized data items. NumPy also contains a significant set of data that describes how to interpret the data in the data buffer. 

In [111]:
data2[0], data2[1]

((0., 1., b'Alabama', 9), (0., 1., b'Alaska', 3))

In [112]:
bytearray(data2.data)[48], bytearray(data2.data)[104]

(9, 3)

This fast access can now be used for fast computations. You have given up something..python lists can have heterogeneous objects of differing sizes. But you have gained something: the ability to poke into the machine's memory and directly get data out to use in computations.

![](numpy_arrays.slides.dir/2.png)

There are other ways to make numpy arrays:

In [4]:
my_array = np.array([1, 2, 3, 4])
my_array

array([1, 2, 3, 4])

In [7]:
np.ones(10) # generates 10 floating point ones

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [9]:
np.ones(10, dtype='int') # generates 10 integer ones

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

Numpy arrays are listy! Below we compute length, slice, and iterate. 

In [5]:
print(len(my_array))
print(my_array[2:4])
for ele in my_array:
    print(ele)

4
[3 4]
1
2
3
4


**However, in general you should manipulate numpy arrays by using numpy module functions** (`np.mean`, for example). This is for efficiency purposes, see the Vanderplas book. But briefly:

1. numpy arrays are typed, you keep ints together or floats together. They are not meant to combine objects of different types like python lists do.
2. numpy arrays are defined in the C language. You do not have to convert between python floats and C floats for example. Python types are different from the corresponding C-types because Python is garbage collected.

You can calculate the mean of the array elements either by calling the method `.mean` on a numpy array or by applying the function np.mean with the numpy array as an argument.

In [6]:
print(my_array.mean())
print(np.mean(my_array))

2.5
2.5


To see this consider making two large numpy arrays, multiplying them, and adding the result..this is called a dot product:

In [148]:
one = np.random.randn(10000000)
two = np.random.randn(10000000)

In [149]:
%%timeit
accum = 0
for i in range(10000000):
    accum += one[i] * two[i]
accum

1.57 s ± 6.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [150]:
%%timeit
one @ two

4.86 ms ± 26.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### Numpy supports vector operations

What does this mean? It means that instead of adding two arrays, element by element, you can just say: add the two arrays. Note that this behavior is very different from python lists.

![](numpy_arrays.slides.dir/3.png)

In [11]:
first = np.ones(5)
second = np.ones(5)
first + second

array([2., 2., 2., 2., 2.])

In [12]:
first_list = [1., 1., 1., 1., 1.]
second_list = [1., 1., 1., 1., 1.]
first_list + second_list #not what u want

[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

On some computer chips this addition actually happens in parallel, so speedups can be high. But even on regular chips, the advantage of greater readability is important.

Numpy supports a concept known as *broadcasting*, which dictates how arrays of different sizes are combined together. There are too many rules to list here, but importantly, multiplying an array by a number multiplies each element by the number. Adding a number adds the number to each element.

In [13]:
first + 1

array([2., 2., 2., 2., 2.])

In [14]:
first*5

array([5., 5., 5., 5., 5.])

## 2D arrays

Similarly, we can create two-dimensional arrays.

![](numpy_arrays.slides.dir/6.png)

In [15]:
my_array2d = np.array([ [1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12] ])
my_array2d

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [16]:
# 3 x 4 array of ones
ones_2d = np.ones([3, 4])
ones_2d

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

Like lists, numpy arrays are 0-indexed.  Thus we can access the $n$th row and the $m$th column of a two-dimensional array with the indices $[n - 1, m - 1]$.

In [17]:
print(my_array2d)
my_array2d[2, 3]

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


12

The 2D arrays are listy as well. They have set length (array dimensions), can be sliced, and can be iterated over with loop.  Below is a schematic illustrating slicing two-dimensional arrays.  

 <img src="images/2dindex_v2.png" alt="Drawing" style="width: 500px;"/>

In two dimensions, we need to provide the **shape** of the array, ie, the number of rows and columns of the array.

In [18]:
onesarray = np.ones([3,4])
onesarray

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [19]:
onesarray.shape

(3, 4)

Numpy functions will by default work on the entire array:

In [20]:
np.sum(onesarray)

12.0

The axis 0 is the one going downwards (the $y$-axis, so to speak), whereas axis 1 is the one going across (the $x$-axis). You will often use functions such as `mean`, `sum`, with an axis.

In [21]:
np.sum(onesarray, axis=0)

array([3., 3., 3., 3.])

In [22]:
np.sum(my_array2d, axis=0)

array([15, 18, 21, 24])

In [23]:
np.sum(onesarray, axis=1)

array([4., 4., 4.])

You should notice that access is row-by-row and one dimensional iteration gives a row. This is because `numpy` lays out memory row-wise.

![](images/2d-array-layout.png)

(from https://aaronbloomfield.github.io)

You know this...we saw the size 56 stride when we read in our file...

### Idiomatic filling

An often seen idiom allocates a two-dimensional array, and then fills in one-dimensional arrays from some function:

In [25]:
empty_array = np.empty((2,3))
empty_array

array([[0., 0., 0.],
       [0., 0., 0.]])

In [28]:
for i in range(empty_array.shape[0]):
    print(empty_array[i])

[0. 0. 0.]
[0. 0. 0.]


In [29]:
for i in range(empty_array.shape[0]):
    empty_array[i] = np.random.rand(3)
empty_array

array([[0.68625855, 0.09451053, 0.9559449 ],
       [0.03386545, 0.08089574, 0.84962253]])

## Exercise 3: Read an array from a file and make computations on it

In [117]:
data3 = np.genfromtxt(fname='data/inflammation-01.csv', 
                   delimiter=',',
        )

In [118]:
data3

array([[0., 0., 1., ..., 3., 0., 0.],
       [0., 1., 2., ..., 1., 0., 1.],
       [0., 1., 1., ..., 2., 1., 1.],
       ...,
       [0., 1., 1., ..., 1., 1., 1.],
       [0., 0., 0., ..., 0., 2., 0.],
       [0., 0., 1., ..., 1., 1., 0.]])

In [120]:
data3.ndim, data3.shape, data3.strides

(2, (60, 40), (320, 8))

In [122]:
data3.dtype

dtype('float64')

In [123]:
data3[0].shape

(40,)

In [124]:
data3.mean()

6.14875

In [125]:
np.mean(data3, axis=0)

array([ 0.        ,  0.45      ,  1.11666667,  1.75      ,  2.43333333,
        3.15      ,  3.8       ,  3.88333333,  5.23333333,  5.51666667,
        5.95      ,  5.9       ,  8.35      ,  7.73333333,  8.36666667,
        9.5       ,  9.58333333, 10.63333333, 11.56666667, 12.35      ,
       13.25      , 11.96666667, 11.03333333, 10.16666667, 10.        ,
        8.66666667,  9.15      ,  7.25      ,  7.33333333,  6.58333333,
        6.06666667,  5.95      ,  5.11666667,  3.6       ,  3.3       ,
        3.56666667,  2.48333333,  1.5       ,  1.13333333,  0.56666667])

In [126]:
np.mean(data3, axis=1)

array([5.45 , 5.425, 6.1  , 5.9  , 5.55 , 6.225, 5.975, 6.65 , 6.625,
       6.525, 6.775, 5.8  , 6.225, 5.75 , 5.225, 6.3  , 6.55 , 5.7  ,
       5.85 , 6.55 , 5.775, 5.825, 6.175, 6.1  , 5.8  , 6.425, 6.05 ,
       6.025, 6.175, 6.55 , 6.175, 6.35 , 6.725, 6.125, 7.075, 5.725,
       5.925, 6.15 , 6.075, 5.75 , 5.975, 5.725, 6.3  , 5.9  , 6.75 ,
       5.925, 7.225, 6.15 , 5.95 , 6.275, 5.7  , 6.1  , 6.825, 5.975,
       6.725, 5.7  , 6.25 , 6.4  , 7.05 , 5.9  ])

In [127]:
data3.max(axis=1)

array([18., 18., 19., 17., 17., 18., 17., 20., 17., 18., 18., 18., 17.,
       16., 17., 18., 19., 19., 17., 19., 19., 16., 17., 15., 17., 17.,
       18., 17., 20., 17., 16., 19., 15., 15., 19., 17., 16., 17., 19.,
       16., 18., 19., 16., 19., 18., 16., 19., 15., 16., 18., 14., 20.,
       17., 15., 17., 16., 17., 19., 18., 18.])

In [128]:
data3.min(axis=0)

array([0., 0., 0., 0., 1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3., 4.,
       5., 5., 5., 5., 4., 4., 4., 4., 3., 3., 3., 3., 2., 2., 2., 2., 1.,
       1., 1., 1., 0., 0., 0.])