# 1. Data types

### Install numpy

If you want to use numpy on your own laptop, then you first have to install it. Numpy is already installed on the university's jupyter server.
For more information about installing numpy: http://docs.scipy.org/doc/numpy-1.10.1/user/install.html

In case you want to use Python also for other courses or for you thesis, then you might consider to install the whole scipy stack. This includes often used packages such as numpy, scipy, matplotlib, ipython and pandas. For more information: http://www.scipy.org/install.html

### General information

- Scipy Lecture Notes: http://www.scipy-lectures.org/
- A short numpy tutorial: https://docs.scipy.org/doc/numpy-dev/user/quickstart.html
- Basics of numpy:
https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html

And if you want to know everything about a particular function, access the reference: 
http://docs.scipy.org/doc/numpy/reference/index.html#reference or the user guide: http://docs.scipy.org/doc/numpy/user/index.html#user

When you want to use functions from the numpy package, you first have to import numpy.

In [1]:
import numpy

### Data types in Python

Before discussing Numpy datatypes, first a small of the data types in Python:

- boolean (True, False)
- int (integer)
- float
- complex 
- str (string)
- byte
- list [ ]
- tuple ( )
- set
- dict { } (dictionary)

The types can be divided into immutable and mutable types. 
The content of immutable types cannot be changed after creation. 
- Immutable data types: int, float, complex, str, and tuple.
- Mutable data types: list, set and dict.

### Data types in Numpy

Numpy is based on **arrays**. You can think of an array as a list, or a table, where each cell of the table contains an item of the same **datatype**. 

Data types in Numpy are a bit different than in basic Python.
One array can only have one data type. 
The data type of the array x can be obtained with the x.dtype function. 

The 5 basic data types of a numerical variable are:
- float (float16, float32, or float64)
- integer (int8, int16, int32, or int64)
- unsigned integer: this number cannot be negative (uint8, uint16, uint32, or uint64)
- boolean (bool)
- complex (complex64 or complex128)

The numbers 8, 16, 32, 64, 128 in the name of datatypes are used to indicate the memory storage.

The data type `int8` only uses 1 byte (8 bits) and therefore this variable can only store a number in the range of -128 to 127. 

1 byte contains 8 bits and a bit can have either value 0 or value 1. The data type int16 uses 2 bytes and therefore this variable can store a number in the range of -32768 to 32767. 

For the unsigned integer, the number can never by negative and therefore the data type uint8 can have a value in the range of 0 to 255, where 255 = 2^8 - 1. 


Another important datatype is:

- string (for example <U3 or <U64, where number indicates the maximum length of the strings)


In [2]:
# array with integers:
x = numpy.array([1, 3, 5, 7, 9])
print(x, x.dtype)

# array with floats:
y = numpy.array([2.2, 4.4, 6.6, 8.8])
print(y, y.dtype)

# array with booleans:
z = numpy.array([True, False, True])
print(z, z.dtype)

# array with strings:
x = numpy.array(["a", "b", "cde"])
print(x, x.dtype)


[1 3 5 7 9] int64
[ 2.2  4.4  6.6  8.8] float64
[ True False  True] bool
['a' 'b' 'cde'] <U3


In case you mix different data types and you do not explicitly specify the data types, then the elements are converted to the same type. 

You can specify the data type of the array when you create the array with the `array()` function using the `dtype` keyword argument. Note that not every combination is possible. For example, when your array contains text then you cannot choose float as data types, because the text cannot be converted to floats, except when the text is exactly representing a floating point number.

In [3]:
# array with mixed data types:
x = numpy.array([1, 3.4, True, 2.3+4.5j, "a"])
print(x, x.dtype)

# explicitly specify the data type:
y = numpy.array([9, 8, 7, 6], dtype='float')
print(y, y.dtype)

['1' '3.4' 'True' '(2.3+4.5j)' 'a'] <U64
[ 9.  8.  7.  6.] float64


In [4]:
# Try to convert strings to integers

strings = ["12", "3", "24"]
z = numpy.array(strings, dtype='int')
print (z, z.dtype)

numerals = ["one", "two", "three"]
a = numpy.array(numerals, dtype='int')
print (a, a.dtype)

[12  3 24] int64


ValueError: invalid literal for int() with base 10: 'one'

### Convert to a different data type

Using the function `astype` is another way to convert between different datatypes.
For example, `x.astype(float)` to convert x to the data type float. 
Converting data types is called casting. 

In [5]:
# convert an array with integers to the data type float:
x = numpy.array([1, 3, 5, 7, 9])
print(x, x.dtype)
g = x.astype('float')
print(g, g.dtype)
print()

# convert an array with strings to float:
y = numpy.array(["1.4", "3.4", "5.4"])
print(y, y.dtype)
h = y.astype('float')
print(h, h.dtype)

[1 3 5 7 9] int64
[ 1.  3.  5.  7.  9.] float64

['1.4' '3.4' '5.4'] <U3
[ 1.4  3.4  5.4] float64


Sometimes, the data types are converted but the content is slightly changed. 
For example, when converting from float to integer, then the numbers are rounded down (floor). 
For example, when converting an array with only zeros and ones to the data type Boolean than the 0 is converted to False and the 1 is converted to True. 

In [6]:
# from float to integer
x = numpy.array([2.2, 3.2, 2.8])
print(x, x.dtype)
a = x.astype('int')
print(a, a.dtype)
print()

# from 0-1 to boolean
x = numpy.array([0, 1, 1, 0])
print(x, x.dtype)
a = x.astype('bool')
print(a, a.dtype)

[ 2.2  3.2  2.8] float64
[2 3 2] int64

[0 1 1 0] int64
[False  True  True False] bool


Not every data type can be converted to all other data types. Some examples:
- String to Boolean when for example x = ["a", "b", "c"]
- String to Integer when for example x = ["a", "b", "c"]

### Some important things

In Numpy, division by zero results in inf (infinity) and a RuntimeWarning. 
In basic Python, division by zero results in the ZeroDivisionError message.

In [7]:
# divison by zero in Numpy
x = numpy.array([4])
y = numpy.array([0])
print(x/y)

[ inf]




When you want to use infinity as a value for your variable or inside your array, then you can use the following expressions:
- numpy.inf for infinity
- numpy.PINF for positive infinity
- numpy.NINF for negative infinity

In order to check whether the variables has value infinity the following functions can be useful:
- numpy.isinf(x) : This function returns True when the value of x is either positive infinity or negative infinity. 
- numpy.isneginf(x) : This function returns True when the value of x is negative infinity.
- numpy.isposinf(x) : This function returns True when the value of x is positive infinity.
- numpy.isfinite(x) : This function is the opposite of the isinf() function.

Then there is something called Not A Number (NAN):
- numpy.nan : Create NAN value
- numpy.isnan(x) : This function returns True when the value of x is Not A Number (NAN).

### Exercises

#### Exercise 1.1

Some conversions between datatypes lose information, and are therefore not reversible. 
Decide which of the following conversions are lossy i.e. not reversible. Write examples to check your guess.

1. float -> int
2. bool  -> int
3. int   -> '<U16'
4. int   -> float
5. float64 -> float32
6. float32 -> float64

#### Exercise 1.2

The following function `linetofloat` takes a string which contains float numbers separated by 
spaces and returns an array of floats. Complete the definition of the function
For example:

```linetofloat("3.14 12.3 4.0") -> array([3.14, 12.3, 4.0])
```

In [1]:
def linetofloat(text):
    #
    #
    #
    return

#### Exercise 1.3

Open and inspect the file [populations.txt](populations.txt). It contains some numerical data. 
Search the internet to find out which numpy function you can use to load this data into a numpy array. Load the data, and convert it to `float32`.