# BIOSTAT 203C Lecture 3

## Topics from last weekend:
- Variables and basic data types
- Making decisions
- Repeating code in loops
- Lists
- Reading / Writing files
- Functions

### Variables and basic data types

```python
v1 = "Hi there!"
v2 = 300
v3 = 300.0
v4 = 10 < 14
```

### Making decisions
```python
snow_outside = True
if snow_outside:
    print("Stay home!")
else:
    print("Go to work!")
```


### Repeating code in loops

```python
i = 1
while i <= 10:
    print(i)
    i += 2
print("Done!")
```
    1
    3
    5
    7
    9
    Done!

```python
i = 1
while True:
    if i > 10:
        break
    print(i)
    i += 2
print("Done!")
```
    1
    3
    5
    7
    9
    Done!

```python
for i in range(1, 10, 2):
    print(i)
print("Done!")
```
    1
    3
    5
    7
    9
    Done!
    
```python
i = 1
for i in range(1, 10, 2):
    if i % 3:
        continue
    print(i)
print("Done!")
```
    3
    9
    Done!

### Lists


List is a basic collection type of python. Lists are mutable.

```python
temp_Westwood_avg = [41, 44, 50, 58, 67, 76, 85, 84, 77, 64, 49, 40]
my_favorite_fruits = ["ananas", "bananas", "mango", "apples"]
l = temp_Westwood_avg + my_favorite_fruits # concatenation
l.append("203c") # in-place
l[0] = 34 # in-place
```

### Functions

```python
def say_hello(name):
    "say hi to `name`" # docstring!
    print("Hi {}!".format(name))
def say_hello_to_all(friends):
    """Say hi to everyone in the collection `friends`.
    # Input:
        - `friends`: Names of the friends. A collection of strings.
    # Returns:
        - None
    """ # more detailed docstring
    for f in friends:
        say_hello(f)
```

In [None]:
def say_hello(name):
    "say hi to `name`" # docstring!
    print("Hi {}!".format(name))

say_hello("Matt")

Hi Matt!


In [None]:
def say_hello_to_all(friends):
    """Say hi to everyone in the collection `friends`.
    # Input:
        - `friends`: Names of the friends. A collection of strings.
    # Returns:
        - None
    """ # more detailed docstring
    for f in friends:
        say_hello(f)

In [None]:
say_hello_to_all(["Alice", "Bob", "Chris"])

Hi Alice!
Hi Bob!
Hi Chris!


### Exercise.

Create a function that checks if two lists have at least one element in common.

In [None]:
def has_something_in_common(l1, l2):
    for i1 in l1:
        for i2 in l2:
            if i1 == i2:
                return True
    return False

In [None]:
l1 = [1, 2, 3, 4, 5]
l2 = [2, 7, 8, 9, 10]
has_something_in_common(l1, l2) # Boolean

True

## More collections: Tuples, Sets, and Dictionaries

### Tuples
A tuple basically is an immutable version of a list.

In [None]:
t = (2, 3)

In [None]:
t = (0, 1, 2, [777], (2, 3))
print(t)
t = t * 2 # repetition
print(t)
t[0] = 99
print(t)

(0, 1, 2, [777], (2, 3))
(0, 1, 2, [777], (2, 3), 0, 1, 2, [777], (2, 3))


TypeError: 'tuple' object does not support item assignment

But we can still mutate mutable entries of tuples:

In [None]:
t[3].append(8)
print(t)

(0, 1, 2, [777, 8], (2, 3), 0, 1, 2, [777, 8], (2, 3))


In [None]:
# lists, strings, and tuples all support this operation

L = [1, 2, 3]
print(L + [4])

s = '123'
print(s + '4')

t = (1, 2, 3)
print(t + (4,))

# but the immutable data types dont have something equivalent to list append!

[1, 2, 3, 4]
1234
(1, 2, 3, 4)


In [None]:
(1, 1, 1)

(1, 1, 1)

### Sets

- Sets are used to store multiple items in a single variable.
- Set is unordered collection
- All elements in the set should be unique and immutable
- Sets support mathematical operations like union, intersection, symmetric difference
- Sets are fast in finding elements

#### Set operations
- `|`: Union $A \cup B$
- `&`: Intersection $A \cap B$
- `-`: Set difference $A \setminus B = A \cap B^C$
- `^`: Symmetric difference $A \triangle B = (A \setminus B) \cup (B \setminus A)$

Set literals are defined by curly brackets:
```python
a = {1, 3, 5, 7, "A"}
b = {2, 4, 6, "A"}
```

In [None]:
a = {1, 3, 5, 7, "A"}
b = {2, 4, 6, "A"}

In [None]:
a | b

{1, 2, 3, 4, 5, 6, 7, 'A'}

In [None]:
a & b

{'A'}

In [None]:
a - b

{1, 3, 5, 7}

In [None]:
a ^ b

{1, 2, 3, 4, 5, 6, 7}

Sets are based on a hashtable that allows adding and removing elements very
efficiently.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Hash_table_3_1_1_0_1_0_0_SP.svg/2560px-Hash_table_3_1_1_0_1_0_0_SP.svg.png" width="250"/>

#### Adding/removing an element
```python
a = {1, 3, 5, 7, 'A'}
a.add('B')
print(a)
a.add("B")
print(a)
```

In [None]:
a = {1, 3, 5, 7, 'A'}
a.add("B")
print(a)
a.add("B")
print(a)

{1, 3, 5, 7, 'B', 'A'}
{1, 3, 5, 7, 'B', 'A'}


```python
a = {1, 3, 5, 7, 'A'}
a.remove(1)
print(a)
```

In [None]:
a = {1, 3, 5, 7, "A"}
a.remove(1)
print(a)

{3, 5, 7, 'A'}


This is much slower in a list, because with the list, one has to look for the item from the beginning of the list.

In [None]:
a = {1, 3, 5, 7, 'A'}
a.add([1, 3])
print(a)

TypeError: unhashable type: 'list'

This is not allowed because by default, a set can only contain built-in immutable types with all its elements immutable. We can use a tuple instead.

Technically, sets can contain "hashable" values: see
https://docs.python.org/3/glossary.html#term-hashable for its definition.

In [None]:
a = {1, 3, 5, 7, 'A'}
a.add((1, 3))
print(a)

{1, 3, 5, 7, 'A', (1, 3)}


Sets can be converted from a list. Duplicate entries are automaticllay removed.
```python
l = [1, 3, 5, 7, 'A', "A", 1]
s = set(l)
print(s)
```

In [None]:
l = [1, 3, 5, 7, 'A', "A", 1]
s = set(l)
print(s)

{1, 3, 5, 7, 'A'}


#### Inclusion.

```python
a = {1, 3, 5, 7, 'A'}
print('A' in a)
print('B' in a)
```

In [None]:
a = {1, 3, 5, 7, 'A'}
print("A" in a)
print('B' in a)

True
False


#### Usage in for loops.
```python
a = {1, 3, 5, 7, 'A'}
for e in a:
    print(e)
```
Since set is unordered, you should only use it  when the order of execution is not important.

In [None]:
a = {1, 3, 5, 7, 'A'}
for e in a:
    print(e)

1
3
5
7
A


#### Exercise (5 min)

Create a function that checks if two lists have at least one element in common that
uses sets.

Hints:
- Inside a function create two sets from arrays by using `set()` command
- Use operator `&` to get intersection of two sets
- Check if intersection has any element

Think:
- Using sets for this task is more computationally efficient then using lists directly. Why?

In [None]:
def has_something_in_common_set(l1, l2):
    s1 = set(l1)
    s2 = set(l2)
    intersection = s1 & s2
    n_elements = len(intersection)
    return n_elements > 0

In [None]:
l1 = [1, 2, 3, 4,5]
l2 = [6, 7, 8, 9, 10]
has_something_in_common_set(l1, l2)

False

### Task 1.  Attack of the CipherTexts

- Ruby is a code-breaker. She knows that the very bad people (Mr. X and Mr. Z) are sendingsecret messages about very bad things to each other.
- However, Ruby has managed to intercept a plaintext message and the corresponding ciphertext message.
- Your job is to automate Ruby's codebreaking and help save the world.

- __Input__: The input consists of 3 strings, with each string on a separate line. The first string is the plaintext message which Ruby knows about. The second string is the ciphertext message which corresponds to the plaintext message. The third string is another ciphertext message.
- __Output__: A string which corresponds to the second ciphertext input

- Sample input:
```
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
UIFARVJDLACSPXOAGPYAKVNQTAPWFSAUIFAMB ZAEPH
XFABSFAWFSZACBEAQFPQMFAEPJOHAWFSZACBEAUIJOHTAIBAIB
```
- Sample output:
```
WE ARE VERY BAD PEOPLE DOING VERY BAD THINGS HA HA
```

### Dictionaries

- Dictionaries stores key-value pairs
- Dictionaries like sets use hash-tables for keys and therefore they are very efficient at finding elements

```python
d = {"name": "Zara",
     "age": 34,
     "position": "software engineer",
     "degree": "BS"
    }
```

In [None]:
d = {"name": "Zara",
     "age": 34,
     "position": "software engineer",
     "degree": "BS"}

Dictionaries have `.keys()` method that lists keys in the dictionary:

In [None]:
d.keys()

dict_keys(['name', 'age', 'position', 'degree'])

Dictionaries also have `.values()` method that lists values in the dictionary.

In [None]:
d.values()

dict_values(['Zara', 34, 'software engineer', 'BS'])

Also, there is `.items()` method that lists key-value pairs.

In [None]:
d.items()

dict_items([('name', 'Zara'), ('age', 34), ('position', 'software engineer'), ('degree', 'BS')])

One may find a value by key using square brackets(`[]`):
```python
d["name"]
```

In [None]:
d["name"]

'Zara'

```python
d["degree"]
```

In [None]:
d["degree"]

'BS'

Dictionaries are mutable: you can change the value corresponding to a certain key.
```python
d["degree"] = "MS"
d["degree"]
```


In [None]:
d["degree"] = "MS"
d["degree"]

'MS'

In [None]:
d

{'name': 'Zara', 'age': 34, 'position': 'software engineer', 'degree': 'MS'}

One my iterate over a dictionary using a for loop:
```python
for k in d:
    print("{}: {}".format(k, d[k]))
```

In [None]:
for k in d:
    print(f"{k}: {d[k]}")

name: Zara
age: 34
position: software engineer
degree: MS


One may also use `.items()` method:

```python
for (k, v) in d.items():
    print("{}: {}".format(k, v))
```

In [None]:
for (k, v) in d.items():
    print("{}: {}".format(k, v))

name: Zara
age: 34
position: software engineer
degree: MS


### Back to the Task.
- Ruby is a code-breaker. She knows that the very bad people (Mr. X and Mr. Z) are sendingsecret messages about very bad things to each other.
- However, Ruby has managed to intercept a plaintext message and the corresponding ciphertext message.
- Your job is to automate Ruby's codebreaking and help save the world.

- __Input__: The input consists of 3 strings, with each string on a separate line. The first string is the plaintext message which Ruby knows about. The second string is the ciphertext message which corresponds to the plaintext message. The third string is another ciphertext message.
- __Output__: A string which corresponds to the second ciphertext input

- Sample input:
```
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
UIFARVJDLACSPXOAGPYAKVNQTAPWFSAUIFAMB ZAEPH
XFABSFAWFSZACBEAQFPQMFAEPJOHAWFSZACBEAUIJOHTAIBAIB
```
- Sample output:
```
WE ARE VERY BAD PEOPLE DOING VERY BAD THINGS HA HA
```

In [None]:
plaintext_correct = input("Correct plain text: ")
cipher_text = input("Cipher text: ")
cipher_text_to_decode = input("Cipher text to decode: ")

Correct plain text: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
Cipher text: UIFARVJDLACSPXOAGPYAKVNQTAPWFSAUIFAMB ZAEPH
Cipher text to decode: XFABSFAWFSZACBEAQFPQMFAEPJOHAWFSZACBEAUIJOHTAIBAIB


In [None]:
d = dict()
for (i, c) in enumerate(cipher_text):
    d[c] = plaintext_correct[i]

In [None]:
result = ""
for c in cipher_text_to_decode:
    result += d[c]

print(result)

WE ARE VERY BAD PEOPLE DOING VERY BAD THINGS HA HA


In [None]:
d["a"]

KeyError: 'a'

### When to use `tuple` vs. `list` vs. `set`/`dict`?

- list is more flexible.

- immutable objects, including `tuples` can be a key in a `dict` or `set`. (more precisely, built-in immutable objects support `hash()`, if you're familiar with the concept of hash tables, hash maps.) They are faster than lists.  https://docs.python.org/3/library/functions.html#hash
- searching for an item is the fastest in `set` or `dict`.

### Example.

You are given `N` numbers, `a1`, `a2`, `a3`, ..., `aN`. Output all the modes of this list on a single line.  The mode of a list is/are the value(s) that appear(s) the most times
relative to the other values in the list. It is guaranteed that at least one mode
exists.

#### Sample Input (to be read through input() function)
The first line contains one number, $N$. The second line contains  spaced integers, $a_i$, the numbers in this list.

```
10
9 2 9 6 8 7 1 3 9 6
```

#### Sample Output (printed)

On one line, output the modes of the $N$ numbers in increasing order.

```
9
```

From: https://dmoj.ca/problem/dmopc19c3p1

In [None]:
d = dict()

Step 1: Write a function `mode()` that takes a list of a numbers as an input, and returns the list of modes, sorted.

In [None]:
def mode(l):
    # count occurrence of each number.
    # We build a dictionary with an element as a key,
    # and their number of occurrences as a value.
    d = dict()


    # If the number has not appeared in the list before,
    # we create a key with value 1.
    # If not, we add 1 to the value.

    for i in l:
        if i in d.keys():
            d[i] += 1
        else:
            d[i] = 1

    # Now, we find the maximum among the values.
    # We can use the built-in max() function.
    m = max(d.values())

    # Finally, build a list of modes. sort it. Return the list.
    l_modes = []
    for i in d.keys():
        if d[i] == m:
            l_modes.append(i)
    l_modes.sort()
    return l_modes


In [None]:
l = [1, 2, 3, 1, 2, 3, 1,2, 3, 1, 2, 5]
mode(l)

[1, 2]

Step 2: Build input/output according to the problem's specification.

In [None]:
N = int(input())
numbers = input()


12
1 2 3 1 2 3 1 2 3 1 2 5


In [None]:
# list of numbers parsed
l = []
# splits the string into a list with certain delimiter.
tokens = numbers.split(' ')
numbers = [int(i) for i in tokens]
# for each token, do something...

ms = mode(numbers)
print(' '.join([str(x) for x in ms]))

1 2


`[str(x) for x in ms]` creates a list whose elements are `str(x)` for each element `x` in `ms`. A compact way of writing:
```python
l = []
for x in ms:
  l.append(str(x))
```
it's called __list comprehension__.

In [None]:
l

[]

### Exercise.

Source: https://dmoj.ca/problem/coci14c2p2

Numerous local and international recreational runners were eager to take part in this year's Zagreb Marathon! It is an already traditional race 42125 meters long. A curious statistical info is that this year every single contestant managed to complete the race, except one.

Since marathons are all about taking part, help the organizers figure out, based on the list of registered contestants and ranking list, the identity of the contestant that did not complete the race.

__Input__: The first line of input contains the integer $N$, the number of contestants. Each of the following $N$ lines contains the names of registered contestants.
The additional $N-1$ lines contain the names of contestants in the order which they completed the race.

The contestants' names will consist of at least one and at most twenty lowercase letters of the English alphabet.

The contestants' names won't necessarily be unique.

__Output__: print the name of the contestant who didn't finish the race.

##### Sample I/O
Input:

```
3
leo
kiki
eden
eden
kiki
```

Output:
```
leo
```


Input:
```
5
marina
josipa
nikola
vinko
filipa
josipa
filipa
marina
nikola
```

Output:
```
vinko
```

## Libraries
- Libraries are external code that developers can use for their tools.
- Libraries contain complex data types and functions for advanced math, data analysis, machine learning, web-parsing, file parsing and other purposes.
- Useful libraries are: `sys` (built-in), `math` (built-in), `numpy` (external), `pandas` (external), `matplotlib` (external), and others.

Libraries are important characteristic of how Python works: each application has its libraries.

| Package | Description | Logo |
|:---:|:---|:---:|
| sys | system utilities | |
| os | operating system interfaces | |
| math | extended math library | |
| Numpy | Numerical arrays | <img src="https://numpy.org/doc/stable/_static/numpylogo.svg" width="300"/> |
| Scipy | Scientific Python<br>User-friendly and efficient numerical routines:<br> numerical integration, interpolation, optimization, linear algebra, and statistics | <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/b/b2/SCIPY_2.svg/1024px-SCIPY_2.svg.png?20200904111722" width="150"/> |
| Matplotlib | Plotting | <img src="https://matplotlib.org/stable/_static/logo_dark.svg" width="300"/>|
| Pandas | Data analytics <br> R-like data frames | <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/Pandas_logo.svg/1024px-Pandas_logo.svg.png" width="300"/> |
| Scikit-learn | Machine learning | <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/Scikit_learn_logo_small.svg/2880px-Scikit_learn_logo_small.svg.png" width="300"/> |

All the packages above are included in Anaconda distribution.


#### A very simple way to create one
```python
my_module_code = """
def say_hello(a):
    print("Hello: {}".format(a))
"""
with open("my_module.py", "w") as f:
    f.write(my_module_code)
```

In [None]:
my_module_code = """
def say_hello(a):
    print("Hello: {}".format(a))
"""
with open("my_module.py", "w") as f:
    f.write(my_module_code)

Then you can import it by:
```python
import my_module
```
and use the function with:
```python
my_module.say_hello("Bob")
```

In [None]:
import my_module

In [None]:
my_module.say_hello("Bob")

Hello: Bob


Or, you can give a nickname to the module:
```python
import my_module as m
m.say_hello("Bob")
```

In [None]:
import my_module as m
m.say_hello("Bob")

Hello: Bob


In [None]:
import numpy as np

### Library `sys`

In [None]:
import sys

Version of Python you are running:

In [None]:
sys.version_info

sys.version_info(major=3, minor=10, micro=12, releaselevel='final', serial=0)

Description of floating point numbers:

In [None]:
sys.float_info

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

It's standard 64-bit IEEE "double precision" representation.

Check platform you are running on:

In [None]:
sys.platform

'linux'

See what are available in the module `sys`:

In [None]:
dir(sys)

['__breakpointhook__',
 '__displayhook__',
 '__doc__',
 '__excepthook__',
 '__interactivehook__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '__stderr__',
 '__stdin__',
 '__stdout__',
 '__unraisablehook__',
 '_base_executable',
 '_clear_type_cache',
 '_current_exceptions',
 '_current_frames',
 '_deactivate_opcache',
 '_debugmallocstats',
 '_framework',
 '_getframe',
 '_git',
 '_home',
 '_xoptions',
 'abiflags',
 'addaudithook',
 'api_version',
 'argv',
 'audit',
 'base_exec_prefix',
 'base_prefix',
 'breakpointhook',
 'builtin_module_names',
 'byteorder',
 'call_tracing',
 'copyright',
 'displayhook',
 'dont_write_bytecode',
 'exc_info',
 'excepthook',
 'exec_prefix',
 'executable',
 'exit',
 'flags',
 'float_info',
 'float_repr_style',
 'get_asyncgen_hooks',
 'get_coroutine_origin_tracking_depth',
 'get_int_max_str_digits',
 'getallocatedblocks',
 'getdefaultencoding',
 'getdlopenflags',
 'getfilesystemencodeerrors',
 'getfilesystemencoding',
 'getprofile',
 'getrecursi

## Running scripts

So far, we have learned how to write python code in Jupyter Notebook. However, for large-scale programs, or if you want to run your program on a shared machine or cluster, Jupyter Notebook is not a good application to run Python code. Now, we will learn how to run code directly in terminal. Jupyter Notebook can also send commands by running cells that starts with a `!`. It is equivalent to typing the command in your terminal.  

In [None]:
!python --version

Python 3.10.12


In [None]:
with open("my_first_script.py", "w") as f_out:
    f_out.write("""print("This is output from my first script!")""")

Let's see the content of the newly-created python script:

In [None]:
!cat my_first_script.py

print("This is output from my first script!")

Let's run this python script:

In [None]:
!python my_first_script.py

This is output from my first script!


### User arguments

```python
import sys
print("This is the name of the program: ", sys.argv[0])
print("\nArgument list: \n", "\n".join(sys.argv), end="")
```
Future reading: Check argparse python library if you need working with user's input. It is much more efficient than directly using `sys.argv`. https://docs.python.org/3/library/argparse.html

In [None]:
import sys
print("This is the name of the program: ", sys.argv[0])
print("\nArgument list: \n", "\n".join(sys.argv), end="")

This is the name of the program:  /usr/local/lib/python3.10/dist-packages/colab_kernel_launcher.py

Argument list: 
 /usr/local/lib/python3.10/dist-packages/colab_kernel_launcher.py
-f
/root/.local/share/jupyter/runtime/kernel-f333037e-c1ab-4809-88d6-659fc51cbc8e.json

### Running program with user arguments

#### Example:

Write a program that accepts two
integers `x` and `y` from command line user arguments and prints `x **2 + y` on
the display.

The code would be:
```python
my_program = """
import sys
x = int(sys.argv[1])
y = int(sys.argv[2])

print("x squared plus y equals to {}".format(x ** 2 + y))
"""
```

In [None]:
my_program = """
import sys
x = int(sys.argv[1])
y = int(sys.argv[2])

print("x squared plus y equals to {}".format(x ** 2 + y))
"""

We would like to save it as a file with an extension `.py`. One may just copy-paste the code above to a new `.py` file, or:
```python
with open("x_sq_pl_y.py", "w") as f_out:
    f_out.write(my_program)
```

In [None]:
with open("x_sq_pl_y.py", "w") as f_out:
    f_out.write(my_program)

Then we can test our script with:
```bash
!python x_sq_pl_y.py 5 4
```

In [None]:
!python x_sq_pl_y.py 5 4

x squared plus y equals to 29


## NumPy

In standard python, the data collections are the `list`s.

| Advantages | Disadvantages (for scientists) |
|:---|:---|
|- Can contain different types of objects<br>- Easy insertion<br>- Easy concatenation |- Sum of two lists is concatenation, not vector addition<br>- Slow for large lists<br>- No useful function (mean, variance, maximum, etc.)|

Scientists would like an object closer to the notion of vector or matrix. NumPy implements that.
Thus, nearly all other scientific libraries in Python are based on NumPy. (dubbed "NumPy ecosystem")


NumPy is a library for large multidimensional arrays and matrices, and math operations with them.

In [None]:
import numpy

To create a NumPy array, use `variable = numpy.array(Array_Like)`.

In [None]:
a = [1,2,3,4,5]
print( type(a) )   # displays the type of the variable a

b = numpy.array( [1,2,3,4,5] )
print( type(b) )   # displays the type of the variable b

<class 'list'>
<class 'numpy.ndarray'>


We are going to use `numpy` a lot, and five letter seems to be too much to type each time... It is extremely common to rename it as `np` (just a nickname).

In [None]:
import numpy as np  # allows user to use np instead of numpy

c = np.array( [5,4,3,2,1] )
print( type(c) )    # displays the type of the variable c

<class 'numpy.ndarray'>


Using mathematical operations on lists vs. numpy arrays:

In [None]:
print("Sum of lists: ", a + a)


Sum of lists:  [1, 2, 3, 4, 5, 1, 2, 3, 4, 5]


In [None]:
print("Sum of NumPy arrays: ", b + c )


Sum of NumPy arrays:  [6 6 6 6 6]


In [None]:
print("Original array: ", b)


Original array:  [1 2 3 4 5]


Items in a list occupy different memory locations, operations that take place directly on lists typically require allocating memory many times.

On the other hand, arrays as defined via `numpy` occupy a single, contiguous chunk of memory. This allows `numpy` arrays to occupy significantly less memory, and makes operations involving `numpy` arrays correspondingly faster. This performance improvement is usually only realized for arrays that contain a single data type. For more details on this, see the accompanying [reading](https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html) for this lecture.  

Arrays are ideally suited for operating on large chunks of numbers or text. Let's look at a simple example to see how `numpy` arrays can dramatically improve the performance of our code. The `%timeit` decorator is a "magic" command that will run the supplied line of code many times and print out some measurse of how quickly the code executed.

In [None]:
def add_lists(L1, L2):
    return [L1[i] + L2[i] for i in range(len(L1))]

In [None]:
L1 = list(range(100))
L2 = list(range(100))

%timeit add_lists(L1, L2)

In [None]:
import numpy as np

a1 = np.array(L1) # this how to create an array in Numpy.
a2 = np.array(L2)
%timeit a1 + a2

### Vectorization

You might have noticed that, in the code above, we were able to do `a1 + a2` and get the expected result. As you may have seen already, this doesn't exactly work for lists: `L1 + L2` concatenates the lists, rather than computing the entrywise sum.

A *vectorized* function is one that operates on all elements of an array in entrywise fashion. So, we saw above that the `+` function, when applied to arrays, is vectorized. The `numpy` module includes a large number of vectorized functions. These should almost always be used when working with arrays:

In [None]:
print("Adding a constant: ", b + 7.1 )


In [None]:
print("Multiplying a constant: ", b*2.5 )


In [None]:
print("Exponentiation: ", b ** 3 )


In [None]:
print("Element-wise product: ", b * c )


In [None]:
print("Element-wise division: ", b / c )


In [None]:
print("Element-wise modulo: ", b % c )


In [None]:

print("Elementwise exponential: ", np.exp(b) )
print("Elementwise sine: ", np.sin(b) )
print("Elementwise cosine: ", np.cos(b) )

### "Should I use `numpy` arrays?"

If you will operate on one or more large sets of numbers and are considering writing `for`-loops, stop. Ask yourself whether you can achieve your task with `numpy` arrays instead. 90% of the time, `numpy`-based code is faster to write and faster to execute.



### "Should I use `for`-loops?"

When working with lists and other basic data structures, yes, absolutely! When working with `numpy` arrays, however, writing `for`-loops is almost always the wrong thing to do. Find a way using vectorized `numpy` code.

In [None]:
import random
random.seed(7834)
[random.random() for i in range(10)]

[0.736446806220831,
 0.7627534873373139,
 0.21114349582963865,
 0.6434164418988436,
 0.3825461438662505,
 0.07093566634873538,
 0.40881168051646444,
 0.08332513040834189,
 0.004550147818281891,
 0.6684131546292377]