<div style="color:#006666; padding:0px 10px; border-radius:5px; font-size:18px; text-align:center"><h1 style='margin:10px 5px'>Why Datatype Matters?</h1>
<hr>
<p style="color:#006666; text-align:right;font-size:10px">
Copyright by MachineLearningPlus. All Rights Reserved.
</p>

</div>

Numpy provides different datatypes to hold various forms of data. You can expicitly control which datatype to hold your data in.

In [1]:
import numpy as np
arr = np.array([1,2,3,4])
arr

array([1, 2, 3, 4])

In [2]:
arr.dtype

dtype('int32')

By default, numpy assigned a default datatype of `int32`. Each item of this array consumes 32bits = 32/8 = 4 bytes of memory. 

In [3]:
arr.nbytes

16

That means there is a certain maximum and minumum value it can hold.

In [4]:
np.iinfo('int32')

iinfo(min=-2147483648, max=2147483647, dtype=int32)

But you might not need `int32`. If this variable is supposed to represent the month of the year, the max value needed is just 12. In such case, `int8` would be sufficient to handle this data, freeing up memory for much needed computations.

So, when creating the variable, explicitly mention the datatype. This will matter more when the data size gets larger.

In [5]:
arr = np.array([1,2,3,4], dtype=np.int8)
arr

array([1, 2, 3, 4], dtype=int8)

<div class="alert alert-info" style="background-color:#006666; color:white; padding:0px 10px; border-radius:5px;"><h2 style='margin:7px 5px; font-size:16px'>Supported Data Types</h2>
</div>

The primary datatypes supported by numpy are as follows:

In [None]:
np.int    # integer
np.uint   # unsigned integer
np.float  # float
np.bool   # boolean
np.object # python object
np.str    # string

To find out the minimum and the maximum range a given integer type can store, use `np.iinfo` method.

In [6]:
# int
print("int8", np.iinfo(np.int8))
print("int16", np.iinfo(np.int16))
print("int32", np.iinfo(np.int32))
print("int64", np.iinfo(np.int64))

# unsigned int
print("uint8", np.iinfo(np.uint8))
print("uint16", np.iinfo(np.uint16))
print("uint32", np.iinfo(np.uint32))
print("uint64", np.iinfo(np.uint64))

int8 Machine parameters for int8
---------------------------------------------------------------
min = -128
max = 127
---------------------------------------------------------------

int16 Machine parameters for int16
---------------------------------------------------------------
min = -32768
max = 32767
---------------------------------------------------------------

int32 Machine parameters for int32
---------------------------------------------------------------
min = -2147483648
max = 2147483647
---------------------------------------------------------------

int64 Machine parameters for int64
---------------------------------------------------------------
min = -9223372036854775808
max = 9223372036854775807
---------------------------------------------------------------

uint8 Machine parameters for uint8
---------------------------------------------------------------
min = 0
max = 255
---------------------------------------------------------------

uint16 Machine parameters for 

<div class="alert alert-info" style="background-color:#006666; color:white; padding:0px 10px; border-radius:5px;"><h2 style='margin:7px 5px; font-size:16px'>Creating array that contains a mix of both numbers and characters and even any python object</h2>
</div>

In [7]:
arr = np.array(['a', 'b', 'c', 1], dtype='object')
arr.dtype

dtype('O')

Some python objects as well into the list.

In [8]:
arr = np.array(['a', 'b', 'c', 1, None, [21]], dtype='object')
arr.dtype

dtype('O')

In [9]:
arr

array(['a', 'b', 'c', 1, None, list([21])], dtype=object)

<div class="alert alert-info" style="background-color:#006666; color:white; padding:0px 10px; border-radius:5px;"><h2 style='margin:7px 5px; font-size:16px'>Code Challenge</h2>
</div>

1. Convert the following numpy array to optimal datatype (one that requires least space).

```python
import numpy as np
arr = np.array([1,20,300,4000,50000])
arr
```

2. Create a numpy array that contains the following tuples. What is the difference between the two?
```python
T1 = [(1, 10, 10), (2,20), (3,30)]
T2 = [(1, 10), (2,20), (3,30)]
```

__Code Url:__ https://git.io/Jcc8D

__Solution 1__

In [10]:
import numpy as np
arr = np.array([1,20,300,4000,50000])
arr

array([    1,    20,   300,  4000, 50000])

In [11]:
print(np.iinfo('int8'))
print(np.iinfo('int16'))
print(np.iinfo('int32'))

Machine parameters for int8
---------------------------------------------------------------
min = -128
max = 127
---------------------------------------------------------------

Machine parameters for int16
---------------------------------------------------------------
min = -32768
max = 32767
---------------------------------------------------------------

Machine parameters for int32
---------------------------------------------------------------
min = -2147483648
max = 2147483647
---------------------------------------------------------------



In [12]:
# Loss of data
arr.astype(np.int8)

array([  1,  20,  44, -96,  80], dtype=int8)

In [13]:
# Loss of data again
arr.astype(np.int16)

array([     1,     20,    300,   4000, -15536], dtype=int16)

In [14]:
# Ok
arr.astype(np.int32)

array([    1,    20,   300,  4000, 50000])

__Solution 2__

In [15]:
T1 = [(1, 10, 10), (2, 20), (3, 30)]
T2 = [(1, 10), (2, 20), (3, 30)]

In [16]:
a1 = np.array(T1, dtype='object')
a1

array([(1, 10, 10), (2, 20), (3, 30)], dtype=object)

In [17]:
# forms 1d array of tuples
a1.shape

(3,)

In [18]:
a2 = np.array(T2, dtype='object')
a2

array([[1, 10],
       [2, 20],
       [3, 30]], dtype=object)

In [19]:
# forms 2d array intuitively.
a2.shape

(3, 2)