# Introduction to Python - Lecture 07 (29Oct 2018)

### Agenda for today:
+ Working with files and filesystem:
    - basics of file handling in python
+ Introduction to Numpy
+ Introduction to Matplotlib
+ Introduction to Seaborn (Next Lecture)
+ Introduction to Pandas (Next Lecture)

# Data Persistence

+ Files
    + **<font color='blue'>\*.txt**</font>, \*.xml, *.json
    + \*.csv, \*.tab, *.xlsx (covered later, with pandas)
+ Databases (covered later in thhe course), when you want to capture relationships between data / entities

# Built-in *<font color='blue'>file</font>* object

+ Basic format:
```python
fh = open('<filename>', '<mode>')   # Creates a file object fh
```
+ *filename* can be _**absolute**_ or _**relative**_
+ *mode*: {'r', 'w', 'a'}; Default='r'
+ 'r': open file for reading if exists, else **<font color='blue'>FileNotFoundError</font>**
+ 'w': open new file for writing; overwrite if exists; use 'a' to avoid overwriting

```python
fh = open('data/data.txt')
type(fh)
dir(fh)
```

+ If the file does not exist, open will raise a **FileNotFound** error with traceback

**Notes**:
+ *fh* is not the file itself, but a handle/reference to it. Use it to do desired operations (read/write).
<br />  
![alt text](filehandle.svg)
<br />
+ <font color='blue'>Some additional mode options: 'rb', 'wb' for reading and writing binary files; '+' to open the file for both reading and writing.

# Reading from Files in "text mode"
+ File content is always read in as strings  
<br />
+ Here are the **most common approaches**:

    - **Read all data at once as a string**

    ```python
    import pprint
    fh = open('data/data.txt', 'r')
    data = fh.read()              # to read in all data as one big string
    print(type(data), '\n\n')
    print(data)
    print('\n', data.split('\n'))   # split the big string on new line character (\n); '\r\n' on windows
    ```

+ **file pointers and *<font color='blue'>seek</font>* operation**

    ```python
data_read_again = fh.read()        # can't read more without resetting the read pointer
print("length of data_read_again is: ", len(data_read_again))
fh.seek(0, 0)          # reset the pointer to beginning of the file: fh.seek(offset, from_what)
                       # https://docs.python.org/3/tutorial/inputoutput.html
data_now = fh.read()
print(data_now)
```

+ **Read individual lines as strings**
```python
fh.seek(0, 0)
data = fh.readlines()       # returns list of strings
print(type(data), "\n\n")   # check out the \n newline character at the end of lines
print(data)         # (\r\n on windows machines)
```

+ **Iterate over large files**
```python
fh.seek(0, 0)
for line in fh:           # 'fh' is iterable; use in iteration context for efficient
    print(len(line), line)  # reading of large files
fh.close()                # close file; good practice (esp. when writing files)
```

+ **Context manager**
```python
with open('data/data.txt', 'r') as fh:  # Context-manager; automatically closes the file
    for line in fh:
        print(line)
print('\nFile closed? : ', fh.closed)
```

# Writing to Files in "text mode"
+ Like reading, writing is also done as strings
<br />
+ Here are the **most common approaches**:

```python
fh = open('data/fresh.txt', 'w')           # Open a new file in write mode
fh.write('This is the 1st line\n')    # Write a line; 
                                      # Note that newline chars must be explicitly added
fh.close()
```



+ **Flushing buffers**

```python
fh = open('data/fresh.txt', 'a')           # Open the earlier file in 'append' mode
                                      #    to avoid overwriting
fh.write('This is the 2nd line\n')    # Write a line; 
                                      # Note that newline chars must be explicitly added
fh.flush()                            # Clears the buffer
```

+ **Write multiple lines at once**

```python
fh.writelines(['This is the 3rd line\n', 'This is the 4th line\n'])   # Note the newline
fh.flush()
```

+ **Write iteratively**

```python
more_lines = ['5th line', '6th line', '7th line']

for line in more_lines:       # iteration context
    fh.write(line + '\n')
    fh.flush()                # Don't need to add flush after every write: inefficient; let python handle it
fh.close()
```



### There are specialized modules to work with specific file formats:
1. csv (comma-separated values)
2. xlrd (excel documents): **pip install** xlrd OR **conda install** xlrd
3. json (hierarchical text-based format like python dictionares)
4. yaml (another hierarchical text-based format like json): **pip install** pyyaml OR **conda install** pyyaml
5. pandas (works with csv, tsv, xls and many more formats): **pip install pandas** OR **conda install pandas**

## os.path module
```python
import os
dir(os.path)  # Note that path is another module that os module imports. 
              # When we import os, path becomes available as a module variable within os namespace.
              # --> module / namespace hierarchy
```
+ **path parsing:**
    - os.path.split(<path_str>):
    - os.path.splitext(<path_str>) 
```python
# Ex.
print(os.path.split(os.path.expanduser('~/Documents/my_data.txt')))
print(os.path.splitext('my_data.txt'))
```
+ **path building:**
    - os.path.join(<path_components>)
```python
# Ex.
print(os.path.join('~', 'Desktop', 'itp'))
```
+ **common tests:**
    - os.path.<test>, where test = {isdir(), isfile(), exists(), ...}
```python
# Ex.
print(os.path.isdir('data/data.txt'))
print(os.path.isfile('data/data.txt')
print(os.path.exists('data')
print(os.path.exists('data.txt')
print(os.path.exists('/Users/mark')
```
+ **listing contents of a dir:**
    - os.listdir
```python
# Ex.
import pprint
pprint.pprint(os.listdir(os.path.expanduser('~/Desktop')))
pprint.pprint(os.listdir('.'))
pprint.pprint(os.listdir('..'))
```

('/home/mark/Documents', 'my_data.txt')


## Numpy

+ A package for scienctific computing
+ A more powerful version of lists
+ All of the methods are optimized to run fast
+ Great for linear algebra, statistical analysis

Numpy is not part of Pythons standard libraries and needs to be installed.

This can be done using the conda command if you are using Anaconda:

```bash
conda install numpy
```

Alternatively this can be done using the built in Python package manager Pip

```bash
pip install numpy
```

Once numpy is installed it then needs to be imported to use its functionality

```python
import numpy as np
```

In [None]:
import numpy as np

### Lists recap

Lists are created using '[]'

```python
l1 = [1, 2, 3, 4]
```

Lists can contain mixed types

```python
l2 = [1, 'a', {'abs': abs}, (1, 2)]
```


Lists can be joined using the + operator which creates a new list

```python
l3 = l1 + l2
```

Lists can be extended using the .extend() method which happens in place

```python
l1.extend(l2)
```

The range() function can be used to initialize numeric lists

```python
base = list(range(0, 101, 2))
```

To perform calculations using a list, a for loop is required

```python
base_squared = []
for x in base:
    base_squared.append(x**2)
```

### Creating arrays with Numpy

There are a number of ways of creating numpy arrays.

##### Converting a regular python list into a numpy array:

```python
var = np.array(< list >)
```

This is useful when the original list needs to be constructed from a file. Numpy does not have a simple method to append to lists. For this reason it is sometimes easier to build a normal python list before converting it to a numpy array.

```python
characters = []
for i in range(32, 100):
    characters.append(chr(i))
print(characters)
np_char = np.array(characters)
print(np_char)
```

##### Initializing arrays using numpy

1. Creating an array with *n* zeros

```python
lst = np.zeros(25)
lst_2d = np.zeros((5, 5))
```

2. Creating an array with *n* ones

```python
lst = np.ones(25)
lst_2d = np.ones((5, 5))
```

3. Using a range(start, end, incriment)

```python
lst = np.arange(10, 20, 2)
```

4. Linspace is similar to range - linspace(start, end, number_of_elements) 

```python
lst = np.linspace(10, 20, 2)
```

5. Filling a list with random numbers between [0, 1)

```python
lst = np.random.rand(5)
lst_2d = np.random.rand((5, 5))
```

### Numpy mathematical operations

Numpy is a tool which simplifies performing linear algebra in Python, for this reason most operations will match linear algebra operations.

1. Addition
    1. Adding a scalar to a matrix does not follow normal mathematical rules as the scalar is 
    added to each object in the matrix
  
    ```
    ```
        $
            \begin{bmatrix} 
            a_{0,0} & a_{0,1} & \cdots & a_{0,n} \\
            a_{1,0} & a_{1,1} & \cdots & a_{1,n} \\
            \vdots & \vdots & \ddots & \vdots \\
            a_{m,0} & a_{m,1} & \cdots & a_{m,n} \\
            \end{bmatrix} + C
        $
        $ =
            \begin{bmatrix} 
            a_{0,0} + C & a_{0,1} + C & \cdots & a_{0,n} + C \\
            a_{1,0} + C & a_{1,1} + C & \cdots & a_{1,n} + C \\
            \vdots & \vdots & \ddots & \vdots \\
            a_{m,0} + C & a_{m,1} + C & \cdots & a_{m,n} + C \\
            \end{bmatrix}
        $

    ```python
    lst_2d = np.ones((n, m)) + C
    ```

1. 
    2. Adding two numpy arrays requires them to have the same shape
  
    ```
    ```
        $
            \begin{bmatrix} 
            a_{0,0} & a_{0,1} & \cdots & a_{0,n} \\
            a_{1,0} & a_{1,1} & \cdots & a_{1,n} \\
            \vdots & \vdots & \ddots & \vdots \\
            a_{m,0} & a_{m,1} & \cdots & a_{m,n} \\
            \end{bmatrix} + 
            \begin{bmatrix} 
            b_{0,0} & b_{0,1} & \cdots & b_{0,n} \\
            b_{1,0} & b_{1,1} & \cdots & b_{1,n} \\
            \vdots & \vdots & \ddots & \vdots \\
            b_{m,0} & b_{m,1} & \cdots & b_{m,n} \\
            \end{bmatrix} =
            \begin{bmatrix} 
            a_{0,0} + b_{0,0} & a_{0,1} + b_{0,1} & \cdots & a_{0,n} + b_{0,n} \\
            a_{1,0} + b_{1,0} & a_{1,1} + b_{1,1} & \cdots & a_{1,n} + b_{1,n} \\
            \vdots & \vdots & \ddots & \vdots \\
            a_{m,0} + b_{m,0} & a_{m,1} + b_{m,1} & \cdots & a_{m,n} + b_{m,n} \\
            \end{bmatrix}
        $

    ```python
    lst_2d = np.ones((n, m)) + np.ones((n, m))
    ```
    

2. Multiplication
    1. By a scalar - each element is multiplied by the scalar
  
    ```
    ```
        $
            \begin{bmatrix} 
            a_{0,0} & a_{0,1} & \cdots & a_{0,n} \\
            a_{1,0} & a_{1,1} & \cdots & a_{1,n} \\
            \vdots & \vdots & \ddots & \vdots \\
            a_{m,0} & a_{m,1} & \cdots & a_{m,n} \\
            \end{bmatrix} * C
        $
        $ =
            \begin{bmatrix} 
            a_{0,0} * C & a_{0,1} * C & \cdots & a_{0,n} * C \\
            a_{1,0} * C & a_{1,1} * C & \cdots & a_{1,n} * C \\
            \vdots & \vdots & \ddots & \vdots \\
            a_{m,0} * C & a_{m,1} * C & \cdots & a_{m,n} * C \\
            \end{bmatrix}
        $

    ```python
    lst_2d = np.ones((n, m)) * C
    ```

2.    
    2. Multiplying two equally sized matricies results in element wise multiplication
  
    ```
    ```
        $
            \begin{bmatrix} 
            a_{0,0} & a_{0,1} & \cdots & a_{0,n} \\
            a_{1,0} & a_{1,1} & \cdots & a_{1,n} \\
            \vdots & \vdots & \ddots & \vdots \\
            a_{m,0} & a_{m,1} & \cdots & a_{m,n} \\
            \end{bmatrix} * 
            \begin{bmatrix} 
            b_{0,0} & b_{0,1} & \cdots & b_{0,n} \\
            b_{1,0} & b_{1,1} & \cdots & b_{1,n} \\
            \vdots & \vdots & \ddots & \vdots \\
            b_{m,0} & b_{m,1} & \cdots & b_{m,n} \\
            \end{bmatrix} =
            \begin{bmatrix} 
            a_{0,0} * b_{0,0} & a_{0,1} * b_{0,1} & \cdots & a_{0,n} * b_{0,n} \\
            a_{1,0} * b_{1,0} & a_{1,1} * b_{1,1} & \cdots & a_{1,n} * b_{1,n} \\
            \vdots & \vdots & \ddots & \vdots \\
            a_{m,0} * b_{m,0} & a_{m,1} * b_{m,1} & \cdots & a_{m,n} * b_{m,n} \\
            \end{bmatrix}
        $

    ```python
    lst_2d = np.ones((n, m)) * np.ones((n, m))
    ```
    

2. 
    3. Multiplying a n x m matrix with a vector of size n results in each row being multiplied by the vector
  
    ```
    ```
        $
            \begin{bmatrix} 
            a_{0,0} & a_{0,1} & \cdots & a_{0,n} \\
            a_{1,0} & a_{1,1} & \cdots & a_{1,n} \\
            \vdots & \vdots & \ddots & \vdots \\
            a_{m,0} & a_{m,1} & \cdots & a_{m,n} \\
            \end{bmatrix} * 
            \begin{bmatrix} 
            b_{0} & b_{1} & \cdots & b_{n}
            \end{bmatrix} =
            \begin{bmatrix} 
            a_{0,0} * b_{0} & a_{0,1} * b_{1} & \cdots & a_{0,n} * b_{n} \\
            a_{1,0} * b_{0} & a_{1,1} * b_{1} & \cdots & a_{1,n} * b_{n} \\
            \vdots & \vdots & \ddots & \vdots \\
            a_{m,0} * b_{0} & a_{m,1} * b_{1} & \cdots & a_{m,n} * b_{n} \\
            \end{bmatrix}
        $

    ```python
    lst_2d = np.ones((6, 5)) * np.array([1, 2, 3, 2, 1])
    ```

3. Division follows the same rules as multiplication

    ```python
    lst_2d = np.ones((6, 5)) / np.array([1, 2, 3, 2, 1])
    ```


### Accessing elements/rows/columns


##### Constructing a 2d array for demonstration purposes

```python
lst_2d = np.vstack((np.arange(1, 100, 2), np.arange(100, 1, -2)))
```

This is a 2D array in which the first row comprises of all the odd numbers between [1 and 99] and the second row comprises the even numbers between [100 and 2].

##### Accessing individual values

1. Using standard list indexing
```python
# lst_2d[row_index][column_index]
lst_2d[1][2]
```

2. Numpy has more advanced indexing which allows both values to be specified together
```python
# lst_2d[row_index, column_index]
lst_2d[1, 2]
```

##### Retrieving a row

```python
# lst_2d[row_index, :]
lst_2d[0, :] # return an array of all the elements in the first row
lst_2d[1, :] # return an array of all the elements in the second row
```

##### Retrieving a column

```python
# lst_2d[:, column_index]
lst_2d[:, 5] # return an array of all the elements in the fifth column
lst_2d[:, 20] # return an array of all the elements in the twentieth column
```

In [None]:
lst_2d[:, 5]

### A few numpy functions 

#### Note - There are often two ways of using Numpy functions

1. np.<< function >>()
2. np_list.<< function >>()

#### Example

1. Using the np function
    ```python
lst = np.arange(10)
m = np.mean(lst)
    ```
2. Using the array method
    ```python
lst = np.arange(10)
m = lst.mean()
    ```
    
There are some functions which are only available as one of these two options. But when both are available they are generally interchangable.

#### Min and Max

The names are self explanatory, they will output the minimum and maximum values of the array.

##### 1D Arrays

```python
lst = np.array([21, 19, 11, 17, 20])
print(lst.min())
print(lst.max())
```

##### 2D Arrays

1. Getting the min and max for the entire 2D array

```python
# How I generated the random 2D array
# lst = np.random.randint(5, 25, (5, 5))
lst = np.array(
      [[15, 21,  7, 21, 24],
       [13, 19, 21, 23,  7],
       [ 8, 17, 13, 21, 14],
       [11, 16, 20, 20, 12],
       [22,  5,  8, 23,  8]]
)

min_v = lst.min()
max_v = lst.max()

print('The min is {}'.format(min_v))
print('The max is {}'.format(max_v))
```

Getting the min and max for each column

```python
lst = np.array(
      [[15, 21,  7, 21, 24],
       [13, 19, 21, 23,  7],
       [ 8, 17, 13, 21, 14],
       [11, 16, 20, 20, 12],
       [22,  5,  8, 23,  8]]
)

min_v = lst.min(axis=0)
max_v = lst.max(axis=0)

print('The min for each column is {}'.format(min_v))
print('The max for each column is {}'.format(max_v))

```

Getting the min and max for each row

```python
lst = np.array(
      [[15, 21,  7, 21, 24],
       [13, 19, 21, 23,  7],
       [ 8, 17, 13, 21, 14],
       [11, 16, 20, 20, 12],
       [22,  5,  8, 23,  8]]
)

min_v = lst.min(axis=1)
max_v = lst.max(axis=1)

print('The min for each row is {}'.format(min_v))
print('The max for each row is {}'.format(max_v))
```

#### The axis argument
```
      [[15, 21,  7, 21, 24],  |
       [13, 19, 21, 23,  7],  |
       [ 8, 17, 13, 21, 14],  | axis = 0
       [11, 16, 20, 20, 12],  |
       [22,  5,  8, 23,  8]]  V
       
       ------------------->
               axis=1
```

- axis 0 applies the operation row-wise
- axis 1 appries the operation column-wise

If the data has a third dimension then axis 2 will apply the function in that dimension and so on...

##### Min Max normalization

Subtract the minimum value from each element and divide it by the difference between the maximum and the minimum.

$$ \frac{ X - X_{min}}{X_{max} - X_{min}} $$

Using numpy it is easy to perform these types of calculations

```python
lst = np.array([21, 19, 11, 17, 20])
lst_norm = (lst - lst.min()) / (lst.max() - lst.min())
lst_norm
```

The same method will also work even if the data is 2 dimensional

```python
lst = np.array(
      [[15, 21,  7, 21, 24],
       [13, 19, 21, 23,  7],
       [ 8, 17, 13, 21, 14],
       [11, 16, 20, 20, 12],
       [22,  5,  8, 23,  8]]
)
lst_norm = (lst - lst.min()) / (lst.max() - lst.min())
lst_norm
```

#### Some additional functions

1. mean([axis = 0, 1, ...])
  - This will return the mean (row-wise/column-wise/everything)
2. std([axis = 0, 1, ...])
  - This will return the standard deviation (row-wise/column-wise/everything)
3. sum([axis = 0, 1, ...])
  - This will the sum (row-wise/column-wise/everything)
4. np.concat((np_array, np_array, ...))
  - This will combine numpy arrays into a single array

These are just a few of the functions which numpy offers. All of the available functions are listed in their documentation [https://docs.scipy.org/doc/numpy-1.15.1/reference/#]

### The basics of Seaborn / Matplotlib

Just like Numpy, Seaborn is an external library which needs to be installed.

This can be done using the conda command if you are using Anaconda:

```shell
conda install seaborn
```

Alternatively this can be done using the built in Python package manager Pip

```shell
pip install seaborn
```

Once numpy is installed it then needs to be imported to use its functionality.

Seaborn will also install Matplotlib so there is no need to install both of them.

```python
import seaborn as sns
import matplotlib.pyplot as plt

# If you are in jupyter then you need to tell matplotlib to display the plots in the notebook
%matplotlib inline
```

Matplotlib: 
+ library for generating plots in Python. 
+ It is extremely powerful 
  + every aspect of a plot can be customized 
+ Unfortunately many of the plots are not very aesthetically pleasing out of the box
  
Seaborn:
+ Uses Matplotlib in the background
+ Allows the same customizability that Matplotlib does
+ Has options for setting plot themes
  + These generally look better than the default Matplotlib plots

### Creating a plot using Matplotlib

#### Simple line plot

Matplotlib needs to be imported before we can use it.


As with many of the commonly used libraries it has a name which it is assigned to:
+ plt

```python
import matplotlib.pyplot as plt
%matplotlib inline
```

Matplotlib will work with both regular lists as well as numpy arrays

```python
# list of numbers to plot
lst = [21, 19, 11, 17, 20]

plt.plot(lst)
plt.show()
```

#### Adding more lines to the plot

```python
lst = [21, 19, 11, 17, 20]
lst_2 = np.random.randint(5, 25, 5)

plt.plot(lst)
plt.plot(lst_2)
plt.show()
```

#### Setting plot parameters

##### Figure size and labels

```python
lst = [21, 19, 11, 17, 20]
lst_2 = np.random.randint(5, 25, 5)

# Setting the size of the plot
fig = plt.figure(figsize=(7, 7))

# Setting the labels
plt.title('Plotting Lines')
plt.xlabel('X value')
plt.ylabel('Y value')

plt.plot(lst)
plt.plot(lst_2)

plt.show()
```

###### Setting the range of the X axis

+ An array is used to do this
+ It must have the same number of elements as the data does
+ the plot() function accepts data in different ways
  + plot(y < list >) - will plot the list data, the x values will be 0., 1., ...
  + plot(x < list >, y < list >) - will plot the data using the x & y coordinates
  + plot(mat < list < list > >) - will plot data using columns to differentiate data


```python 
lst = [21, 19, 11, 17, 20]
lst_2 = np.random.randint(5, 25, 5)

# Setting the size of the plot
fig = plt.figure(figsize=(7, 7))

# Setting the labels
plt.title('Plotting Lines')
plt.xlabel('X value')
plt.ylabel('Y value')

# Define the x axis values
x = np.linspace(0, 100, len(lst))

# define a 2d array of x, y pairs
x_mat = np.vstack((np.linspace(0, 100, 5), np.linspace(0, 100, 5))).T
mat = np.vstack((np.random.randint(5, 25, 5), np.random.randint(5, 25, 5))).T

plt.plot(lst)
plt.plot(x, lst_2)
plt.plot(x_mat, mat)

plt.show()
```

#### Customizing the lines

```python
lst = [21, 19, 11, 17, 20]
lst_2 = np.random.randint(5, 25, (5, 3))

# Define the x axis values
x = np.linspace(0, 100, len(lst))

# Setting the size of the plot
fig = plt.figure(figsize=(7, 7))

# Setting the labels
plt.title('Plotting Lines')
plt.xlabel('X value')
plt.ylabel('Y value')

# solid line
plt.plot(x, lst, '-')
# dashed line
plt.plot(x, lst_2[:,0], '--')
# dotted line
plt.plot(x, lst_2[:,1], ':')
# dash dot line
plt.plot(x, lst_2[:,2], '-.')

plt.show()
```

##### Changing the line color

```python
lst = [21, 19, 11, 17, 20]
lst_2 = np.random.randint(5, 25, (5, 3))

# Define the x axis values
x = np.linspace(0, 100, len(lst))

# Setting the size of the plot
fig = plt.figure(figsize=(7, 7))

# Setting the labels
plt.title('Plotting Lines')
plt.xlabel('X value')
plt.ylabel('Y value')

# solid line
plt.plot(x, lst, '-', c='r')
# dashed line
plt.plot(x, lst_2[:,0], '--', c='r')
# dotted line
plt.plot(x, lst_2[:,1], '-', c='b')
# dash dot line
plt.plot(x, lst_2[:,2], '--', c='b')

plt.show()
```

##### Limiting the axis

```python
lst = [21, 19, 11, 17, 20]
lst_2 = np.random.randint(5, 25, (5, 3))

# Define the x axis values
x = np.linspace(0, 100, len(lst))

# Setting the size of the plot
fig = plt.figure(figsize=(7, 7))

# Setting the labels
plt.title('Plotting Lines')
plt.xlabel('X value')
plt.ylabel('Y value')

# solid line
plt.plot(x, lst, '-', c='r')
# dashed line
plt.plot(x, lst_2[:,0], '--', c='r')
# dotted line
plt.plot(x, lst_2[:,1], '-', c='b')
# dash dot line
plt.plot(x, lst_2[:,2], '--', c='b')

# Set a limit on the x axis
plt.xlim((50, 100))

# You could do the same for the y axis

plt.show()
```

##### Figure Legends

```python
#### lst = [21, 19, 11, 17, 20]
lst_2 = np.random.randint(5, 25, (5, 3))

# Define the x axis values
x = np.linspace(0, 100, len(lst))

# Setting the size of the plot
fig = plt.figure(figsize=(7, 7))

# Setting the labels
plt.title('Plotting Lines')
plt.xlabel('X value')
plt.ylabel('Y value')

# solid line
plt.plot(x, lst, '-', c='r', label='red solid')
# dashed line
plt.plot(x, lst_2[:,0], '--', c='r', label='red dash')
# dotted line
plt.plot(x, lst_2[:,1], '-', c='b', label='blue solid')
# dash dot line
plt.plot(x, lst_2[:,2], '--', c='b', label='blue dash')

# Set a limit on the x axis
plt.xlim((50, 100))

# You could do the same for the y axis

#Add the legend
plt.legend()

plt.show()
```

#### Histograms

We can start by generating some random numbers

```python
mu, sigma = 0, 0.5 # mean, standard deviation

lst = np.random.normal(mu, sigma, 1000)
```

Using the random numbers it is easy to generate a histogram.

```python
fig = plt.figure(figsize=(8, 5))

plt.hist(lst)

plt.show()
```

##### Customizing the histogram

Some additional arguments for the histogram function [https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.hist.html#matplotlib.axes.Axes.hist]



In [None]:
help(plt.hist)

In [None]:
fig = plt.figure(figsize=(8, 5))
# We can make changes here
plt.hist(lst, )

plt.show()

#### Scatter plots

Start by generating random x, y pairs between 0 and 100
```python
mat = np.random.randint(0, 100, (100, 2))
```

Make a scatter plot

```python
fig = plt.figure(figsize=(5, 5))

plt.scatter(mat[:,0], mat[:,1])

plt.show()
```

In [None]:
help(plt.scatter)

In [None]:
fig = plt.figure(figsize=(5, 5))

# Add some additional arguments
plt.scatter(mat[:,0], mat[:,1])

plt.show()

#### Displaying 2D continuous data

Generate some data to display

```python
mu, sigma = 0, 500

mat = np.random.normal(mu, sigma, (50000, 2))

H, xe, ye =  np.histogram2d(mat[:,0], mat[:,1], bins=(100, 100))

```

plot the data


```python
fig = plt.figure(figsize=(8, 8))

plt.imshow(H)

plt.show()
```