# Handling data with Datetime and numpy

## What you'll learn in this class

Two other essential Data Science libraries are *Datetime* and *Numpy*. We are going to use it during the whole program, so we must take a little time to lay the foundations before starting the next step. The objective of this course is therefore :

* Manipulating time data with Datetime
* Understand the usefulness of Numpy in tensor manipulation
* Create and manipulate tensors via Numpy
* Know how to iterate on tensors
* Know how to create Numpy masks
* Understand what the *form* of a tensor is

## Manage time data with _Datetime_

You will quite often have to manage time data, including timestamps. And, it can be pretty frustrating at first, because this kind of data doesn't really work the way you're used to.

### Import librairies

In [None]:
# We're importing the datetime bookstore # We will then have access to all the classes of this library, 
# with their attributes and methods
import datetime

### Methods to know
The ```datetime``` library contains a ```datetime``` class. The method ```today()```of this class allows to initialize the attributes to the current date and time.

You can then use the ```strftime()```method to read the information contained in the attributes, putting them in the format you want:

In [None]:
today = datetime.datetime.today()
print("Today is {} and, at the time I wrote this part of the course, it was {}."
      .format(today.strftime("%A %d %B"), today.strftime("%Hh%M")))

In [None]:
# For information on how to use a method, use the "?"
today.strftime?

[0;31mDocstring:[0m format -> strftime() style string.
[0;31mType:[0m      builtin_function_or_method


Here, we imported the _datetime_ module and used the _datetime_ class in which, there is the `today()` method which returns the exact date, at the second ready, of today (the time you execute the code).

We then rearranged the function to output an intelligible sentence using the `strftime()` method.

Here is a simple example of what you can do with this library. Let's see it in a little bit of detail.


**datetime ≠ datetime**

First of all, we must insist on the fact that there is a difference between the _datetime_ module and the _datetime_ class. This is one thing that can be very confusing but it is NOT an error if we put: `datetime.datetime`.

It's just that we called the datetime class which has the same name as the datetime module.


### Useful attributes for _Datetime_ are

Let's look at some useful attributes, we can for example simply output the day, month, year or even the time of a date. Simply with the following attributes:

In [4]:
today = datetime.datetime.today()
print("Day : ", today.day)
print("Month : ", today.month)
print("Year : ", today.year)
print("Hour : ", today.hour)

Day :  3
Month :  9
Year :  2020
Hour :  15


### Most useful methods

There are a few useful methods to know about managing dates. We list the main ones


### _.date()_ or _.time()_

As you know, we manage timestamps with the _datetime_ class, that is, we have the date and time. But sometimes it is useful to have only the time or only the date. That's why we have these two functions:

In [5]:
# D'autres méthodes utiles de la classe datetime :
print("Date : ", today.date())
print("Hour : ", today.time())
print("Number of the week : ", today.weekday()) # Attention : le lundi correspond au numéro 0

Date :  2020-09-03
Hour :  15:44:41.248522
Number of the week :  3


#### .replace()

We use this function to change a value in the date we have

In [6]:
# Replace value in an attritbut
next_month = today.replace(month=5)
print(next_month.date())

2020-05-03


#### .weekday()

`.weekday()` return the encoded day of the week:

In [7]:
week_day = today.weekday()
print(week_day)

3


Here, the program returned 5 because it was Saturday. Indeed, Monday is encoded 0, Tuesday 1, Wednesday 2, etc..

#### .isoformat()

With `.isoformat()`, you will be able to return the date in : AAAA-MM-JJ HH:MM:SS

In [8]:
today.isoformat()

'2020-09-03T15:44:41.248522'

#### .strftime()

We used this function in the first program. It is indeed very useful because it allows to express the date in a more intelligible way thanks to a precise code in the string. Here is how it can be used in a simple way:

In [9]:
today.strftime("%d/%m/%y")

'03/09/20'

In the above example, we have returned the day, month and year in a format we know better: DD/MM/YY and this due to %d, %m, %Y

There are plenty of other codes for outputting a date the way you want. We've made an array of them for you to have them all with the datetime_class


<table>
  <tr>
   <td><strong>Write</strong>
   </td>
   <td><strong>Definition</strong>
   </td>
   <td><strong>Exemple</strong>
   </td>
  </tr>
  <tr>
   <td>%a
   </td>
   <td>Returns the abbreviated day of the week
   </td>
   <td>“Mon” / “Tue” / “Wed
   </td>
  </tr>
  <tr>
   <td>%A
   </td>
   <td>Returns the full weekday "Sunday" / "Monday"
   </td>
   <td>“Sunday” / “Monday”
   </td>
  </tr>
  <tr>
   <td>%d
   </td>
   <td>Returns the number of the day in the month in digits
   </td>
   <td>01, 02, 03, 04...
   </td>
  </tr>
  <tr>
   <td>%b
   </td>
   <td>Returns the month of the year in abbreviated form 
   </td>
   <td>“Jan” / “Feb” ...
   </td>
  </tr>
  <tr>
   <td>%B
   </td>
   <td>Returns the full month of the year
   </td>
   <td>“January” / February”
   </td>
  </tr>
  <tr>
   <td>%m
   </td>
   <td>Returns the month of the year in digits
   </td>
   <td>01, 02, 03, 04...
   </td>
  </tr>
  <tr>
   <td>%y
   </td>
   <td>Returns the year in two digits
   </td>
   <td>98, 99, 00, 01
   </td>
  </tr>
  <tr>
   <td>%Y
   </td>
   <td>Returns the year in four digits
   </td>
   <td>1998, 1999, 2000, 2001
   </td>
  </tr>
  <tr>
   <td>%c
   </td>
   <td>Returns the date and time
   </td>
   <td>Tue Aug 16 21:30:00 1988
   </td>
  </tr>
  <tr>
   <td>%x
   </td>
   <td>Just return the date
   </td>
   <td>08/16/88
   </td>
  </tr>
  <tr>
   <td>%X
   </td>
   <td>Just return the time (hour)
   </td>
   <td>21:30:00
   </td>
  </tr>
  <tr>
   <td>%H
   </td>
   <td>Returns the time in 24hrs format
   </td>
   <td>“01” ”03” “22”
   </td>
  </tr>
  <tr>
   <td>%I
   </td>
   <td>Returns the time in 12hrs format
   </td>
   <td>“01” “03” “12”
   </td>
  </tr>
  <tr>
   <td>%M
   </td>
   <td>Returns minutes
   </td>
   <td>“01” “58” “59” “00”
   </td>
  </tr>
  <tr>
   <td>%S
   </td>
   <td>Returns seconds
   </td>
   <td>“01” “58” “59” “00”
   </td>
  </tr>
</table>


NB: Each time, your console will try to adapt to local formats. If you write "%A" and your console knows that you are in France, it will return "Monday", "Tuesday", etc.

##### .strptime()

To finish with the useful methods, you will often find yourself in the situation where you are reading time data as a string from a file. The `.strptime()` method allows you to read these strings and convert them into datetime objects :

In [10]:
# Lire une chaîne de caractères et mettre les informations dans un objet datetime
stamp = "20/11/18 18:30"
date = datetime.datetime.strptime(stamp, "%d/%m/%y %H:%M")
print(date)
print(type(date))

2018-11-20 18:30:00
<class 'datetime.datetime'>


### Date and time classes

In [11]:
# Create a time object, create a date object, and combine them to give a datetime object
one_hour = datetime.time(19,32,0)
one_date = datetime.date(2018,7,1)
combined = datetime.datetime.combine(one_date,one_hour)
print(combined)
print(type(combined))

2018-07-01 19:32:00
<class 'datetime.datetime'>


### The timedelta class: doing operations on dates and times

We have one more thing to do with the dates. It's how to handle transactions between dates. Well, simply with the _timedelta_ class.

In [12]:
datetime.timedelta?

[0;31mInit signature:[0m [0mdatetime[0m[0;34m.[0m[0mtimedelta[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
Difference between two datetime values.

timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)

All arguments are optional and default to 0.
Arguments may be integers or floats, and may be positive or negative.
[0;31mFile:[0m           ~/anaconda3/lib/python3.8/datetime.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


In [13]:
today = datetime.datetime.today()

In [14]:
# Taking 10 days off
new_date = today - datetime.timedelta(10)
print(new_date)

2020-08-24 15:44:41.471897


Here is an example of an operation. Here, 10 days have been taken off as of today's date.

Generally speaking, _timedelta_ is structured as follows:

```python
datetime.timedelta(days, seconds, microseconds, milliseconds, minutes, hours, weeks)
```

For example, if you want to add 2 hours to the date :

In [15]:
# Adding 2 hours
new_date2 = today + datetime.timedelta(hours=2)
print(new_date2)

2020-09-03 17:44:41.471897


# Manipulate tensors with numpy

Another must-have library in Data Science is *Numpy*. We're going to use it during the whole program, so you have to take some time to lay the foundations before starting the next step. The objective of this course is therefore :

* Understand the usefulness of Numpy in the manipulation of tensors
* Create and manipulate tensors via Numpy
* Know how to iterate on tensors
* Know how to create Numpy masks
* Understand what the *form* of a tensor is

In [16]:
import numpy as np

## Create an array

### What is a tensor ?

A tensor is a generalization of the notion of vector. It is an object of dimension *n* containing numbers. For the small dimensions (0, 1 and 2) a specific name has been given to these objects.

**Dimension 0 ==> Scalar**

```
# Here's a scalar number

2
```

**Dimension 1 ==> Vector**

```
# Here's a vector

[1,
 2,
 3]
```

**Dimension 2 ==> Matrix**
```
# Here's a matrix

[[1, 2, 3,
  4, 5, 6,
  7, 8, 9]]

```

Beyond two dimensions, we will use the term of *tensor*.

### Create a numpy array

For create an array, you will use the ```np.array()``` method in wich you will insert an object

In [17]:
A = np.array([1,2,3,4,5], dtype=int)
A

array([1, 2, 3, 4, 5])

Here, we have specified the `dtype`inside the array but we were not have to. If we do not specify the `dtype`, Numpy will determine the most relevant type.

In [18]:
# If you don't specify the dtype, numpy will deduce it from the values present in the table
my_array = np.array([1,2,3,4])
my_array.dtype

dtype('int64')

Above, we created a vector, but we can use `np.array()`for create a matrix or a tensor in the same way

In [19]:
# You can use np.array to create a vector, a matrix or a tensor of a higher order:
my_matrix = np.array([[1,2,3], 
                      [4,5,6]])
display(my_matrix)

my_tensor = np.array([[[1,2,3], 
                       [4,5,6]],
                       [[7,8,9], 
                       [10,11,12]]])
display(my_tensor)

array([[1, 2, 3],
       [4, 5, 6]])

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

### Basics operations

#### Classics operations

All the classic operations you can imagine in Python work on Numpy :

In [20]:
# Create two matrices
A = np.array([[1,1],
              [0,1]])

B = np.array([[2,0],
              [3,4]])

# Addition
display(A+B)

# Soustraction 
display(A-B)

# Multiplication element by element
display(A*B)

array([[3, 1],
       [3, 5]])

array([[-1,  1],
       [-3, -3]])

array([[2, 0],
       [0, 4]])

#### Operations on matrices

You also have operations that you will only do on matrices. These include matrix multiplications, also known as "matrix product".

**CAREFUL:** Do not confuse classical multiplication with matrix multiplication. The former multiplies each of the elements with each other, whereas matrix multiplication works as follows:

![](https://www.mathsisfun.com/algebra/images/matrix-multiply-a.svg)

then

![](https://www.mathsisfun.com/algebra/images/matrix-multiply-b.svg)

etc..

In [21]:
print("Matrix A: ")
print(A)
print()

print("Matrix B: ")
print(B)
print()

# Matrices multiplication
print("Matrix multiplication of A by B:")
display(A@B)

# Another way to do matrix multiplication with numpy:
print("Matrix multiplication of A by B (dot method):")
display(A.dot(B))

# CAREFUL : A@B != B@A ATTENTION : A@B != B@A they say the operation doesn't commute:
print("Matrix multiplication of B by A:")
display(B@A)

Matrix A: 
[[1 1]
 [0 1]]

Matrix B: 
[[2 0]
 [3 4]]

Matrix multiplication of A by B:


array([[5, 4],
       [3, 4]])

Matrix multiplication of A by B (dot method):


array([[5, 4],
       [3, 4]])

Matrix multiplication of B by A:


array([[2, 2],
       [3, 7]])

### Functions to know in Numpy

#### Create matrices of 0s or 1s

It is useful to be able to create matrices composed only of 0 or 1. To do this, you can use: ```np.zeros()``` or ```np.ones()```.

In [22]:
# Initialization to 0 (matrix 3x4)
display(np.zeros((3,4)))

# Initialization to 1 (matrix 5x4)
display(np.ones((5,4)))

# Third order tensor (equal to 2 matrices 3x4)
np.ones((2,3,4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]])

#### Create a list of regularly spaced values
We have seen that the `range()` function in python can generate lists of regularly spaced integers. In the same way, the np.arange() method allows to generate lists of values (including decimal numbers):

In [23]:
np.arange?

[0;31mDocstring:[0m
arange([start,] stop[, step,], dtype=None)

Return evenly spaced values within a given interval.

Values are generated within the half-open interval ``[start, stop)``
(in other words, the interval including `start` but excluding `stop`).
For integer arguments the function is equivalent to the Python built-in
`range` function, but returns an ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not
be consistent.  It is better to use `numpy.linspace` for these cases.

Parameters
----------
start : number, optional
    Start of interval.  The interval includes this value.  The default
    start value is 0.
stop : number
    End of interval.  The interval does not include this value, except
    in some cases where `step` is not an integer and floating point
    round-off affects the length of `out`.
step : number, optional
    Spacing between values.  For any output `out`, this is the distance
    between two adjacent values,

In [24]:
# Generates a list of numbers from 3.2 to 4.8 excluded with a step of 0.2.
np.arange(3.2, 4.8, 0.2)

array([3.2, 3.4, 3.6, 3.8, 4. , 4.2, 4.4, 4.6])

#### Create a list of random numbers

The `np.random.randn()` method generates random numbers distributed according to the Normal Law (or Gaussian distribution), so that the values will be centered around 0 with a standard deviation equal to 1: 

In [25]:
np.random.randn?

[0;31mDocstring:[0m
randn(d0, d1, ..., dn)

Return a sample (or samples) from the "standard normal" distribution.

.. note::
    This is a convenience function for users porting code from Matlab,
    and wraps `standard_normal`. That function takes a
    tuple to specify the size of the output, which is consistent with
    other NumPy functions like `numpy.zeros` and `numpy.ones`.

.. note::
    New code should use the ``standard_normal`` method of a ``default_rng()``
    instance instead; see `random-quick-start`.

If positive int_like arguments are provided, `randn` generates an array
of shape ``(d0, d1, ..., dn)``, filled
with random floats sampled from a univariate "normal" (Gaussian)
distribution of mean 0 and variance 1. A single float randomly sampled
from the distribution is returned if no argument is provided.

Parameters
----------
d0, d1, ..., dn : int, optional
    The dimensions of the returned array, must be non-negative.
    If no argument is given a single Python floa

In [26]:
# Generates 10 normally distributed random numbers
np.random.randn(10)

array([ 1.05369169,  0.17110182, -0.58628475, -0.49372448,  0.66371934,
       -0.64813785, -0.43710923, -2.27709561,  0.47039191,  0.74229575])

The `np.random.randint()` method allow to generate random entiere values, evenly distributed within a given range:

In [27]:
np.random.randint?

[0;31mDocstring:[0m
randint(low, high=None, size=None, dtype=int)

Return random integers from `low` (inclusive) to `high` (exclusive).

Return random integers from the "discrete uniform" distribution of
the specified dtype in the "half-open" interval [`low`, `high`). If
`high` is None (the default), then results are from [0, `low`).

.. note::
    New code should use the ``integers`` method of a ``default_rng()``
    instance instead; see `random-quick-start`.

Parameters
----------
low : int or array-like of ints
    Lowest (signed) integers to be drawn from the distribution (unless
    ``high=None``, in which case this parameter is one above the
    *highest* such integer).
high : int or array-like of ints, optional
    If provided, one above the largest (signed) integer to be drawn
    from the distribution (see above for behavior if ``high=None``).
    If array-like, must contain integer values
size : int or tuple of ints, optional
    Output shape.  If the given shape is, e.g.,

In [28]:
# Generates 4 integers between 10 and 30 (considering a uniform distribution)
np.random.randint(10, 30, 4)

array([10, 28, 20, 16])

#### Usefull mathematical functions
These functions are also used quite often in the world of statistics. You can model them with numpy quite simply:

In [29]:
# Exponential
np.exp(30)

10686474581524.463

In [30]:
# Square
np.sqrt(9)

3.0

### Accessing items in a multidimensional array

In [31]:
# Creation of a matrix
c = np.array([[  0,  1,  2],            
               [ 10, 12, 13],
               [100,101,102],
               [110,112,113]])

In [32]:
c[3, 2] # Accessing to the value contained in the 4th row, 3rd column, is accessed

113

In [33]:
# Creation of a third order tensor (2 matrices superposed) 
c = np.array( [[[  0,  1,  2],               
                 [ 10, 12, 13]],
                [[100,101,102],
                 [110,112,113]]])

In [34]:
c[0, 1, 2] # We access the value contained in the 2nd row, 3rd column of the first matrix

13

### Itérer sur les array

In [35]:
# Création d'une matrice
c = np.array([[  0,  1,  2],            
               [ 10, 12, 13],
               [100,101,102],
               [110,112,113]])

#### Iterate on arrays

You can use the `np.flat` attribut to iterate across all the elements in an array:

In [36]:
# Iterate on the elements of the matrix

# Not recommanded (may be slow):
for row in c:
    for e in row:
        print(e)

print()
print('----------')
print()
        
# Better:
for e in c.flat:
    print(e)

0
1
2
10
12
13
100
101
102
110
112
113

----------

0
1
2
10
12
13
100
101
102
110
112
113


### Slices
In the same way as in pandas, you can use the slices syntax to select a sub-part of the arrays.

In [37]:
def f(x, y):
    return 10*x+y

# Creating an array from the function from above
print(c)

[[  0   1   2]
 [ 10  12  13]
 [100 101 102]
 [110 112 113]]


In [38]:
# Slices
print(c[0:5, 1])  # every lines of the second column
print(c[:, 1])  # every lines of the second column

print()
print('----------')
print()

print(c[1:3, : ]) # 2nd and 3rd lines of all columns

[  1  12 101 112]
[  1  12 101 112]

----------

[[ 10  12  13]
 [100 101 102]]


### Masks
Similarly, you can also use masks to select elements from your arrays based on a condition:

In [39]:
# Create a table
a = np.arange(12).reshape(3,4)
display(a)
# Create a mask
b = a > 4
b

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

array([[False, False, False, False],
       [False,  True,  True,  True],
       [ True,  True,  True,  True]])

In [40]:
# Use the mask to select elements of the matrix
a[b]

array([ 5,  6,  7,  8,  9, 10, 11])

Only the values for which the mask had the value `True` were shown.

## Manipulation of the shape of a numpy array

Finally, it is good to talk about the *shape* of an array. This is especially useful when manipulating images when you're attacking deep learning.

### What is the shape of a numpy array?

The *shape* of an array gives us information about its dimension and the number of elements present in each dimension.

In [41]:
# This is the shape of a matrix

c = np.array([[  0,  1,  2],            
               [ 10, 12, 13],
               [100,101,102],
               [110,112,113]])

c.shape

(4, 3)

In [42]:
# A matrix is an array of the second dimension (order):
len(c.shape)

2

In [43]:
# A vector is an array of the first dimension:
v = np.array([1,2,3,4])
len(v.shape)

1

### Shape & Reshape

Quite often you need to change the shape of a matrix. For example, you might want to invert the rows and columns of your matrix or *flatten* your matrix. You can do this with ``.reshape()```

In [44]:
# Change array's shape with reshape()
c = np.array([[  0,  1,  2],            
               [ 10, 12, 13],
               [100,101,102],
               [110,112,113]])

c.reshape((2,6)) # Be careful: n_lines x n_column must remain equal to the number of elements present in c

array([[  0,   1,   2,  10,  12,  13],
       [100, 101, 102, 110, 112, 113]])

You can also *guess* the value inside one of the dimensions of your array by setting the value `-1`.

In [45]:
# You can use -1 to let numpy guess the number of rows or the number of columns:
c.reshape((-1,4))

array([[  0,   1,   2,  10],
       [ 12,  13, 100, 101],
       [102, 110, 112, 113]])

In [46]:
c.reshape((2,-1))

array([[  0,   1,   2,  10,  12,  13],
       [100, 101, 102, 110, 112, 113]])

## List of functions grouped by theme

We have explained only the main features of Numpy in this course, but if you wish, you can explore the exhaustive list grouped by theme:

[Routines](https://docs.scipy.org/doc/numpy/reference/routines.html#routines)