# Dealing with Data Using Python's NumPy Package

If you followed the advice outlined in the previous chapter and installed the Python Anaconda stack, you already have NumPy installed and are ready to go. If you are more the do-it-yourself type, you can go http://www.numpy.org and follow the installation instructions found there.

We informed you earlier that it would be okay if you weren't a Python expert yet. Who knows,
perhaps you're just now switching from OpenCV's C++ API. This is all fine. I wanted to give
you a quick overview on how to get started with NumPy. If you are a more advanced
Python user, you may simply skip this subsection.

Once you are familiar with NumPy, you will find that most scientific computing tools in the
Python world are built around it. This includes OpenCV, so the time spent on learning
NumPy is guaranteed to benefit you in the end.

## Importing NumPy

Once you start a new IPython or Jupyter session, you can import the NumPy module and
verify its version as follows:

In [1]:
import numpy

> Recall that in the Jupyter Notebook you can hit Ctrl + Enter to execute a
cell once you have typed the command. Alternatively, Shift + Enter
executes the cell and automatically inserts or selects the cell below it.
Check out all the keyboard shortcuts by clicking on Help > Keyboard
Shortcut, or take a quick tour by clicking on Help > User Interface Tour.

In [2]:
numpy.__version__

'2.0.0'

You can install packages in jupyter itself. Let's say the `numpy` package is missing then all you have to do is:

In [None]:
!pip install numpy

And to upgrade `numpy`:

In [None]:
!pip install --upgrade numpy

For the pieces of the package discussed here, We would recommend using NumPy version 1.8
or later. By convention, you'll find that most people in the scientific Python world will
import NumPy using `np` as an alias:

In [5]:
import numpy as np

In [6]:
np.__version__

'2.0.0'

## Understanding NumPy arrays

Python is a **weakly-typed language**.  This means that you do
not have to specify a data type when you create a new variable. For example, the following
will automatically be represented as an integer:

In [7]:
a = 5

You can double-check the variable type as follows:

In [8]:
type(a)

int

Going a step further, we can create a list of integers as follows:

In [9]:
int_list = list(range(10))
int_list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Similary, we can create a list of strings:

In [10]:
str_list = [str(i) for i in int_list]
str_list

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

However, lists are not very flexible to do math on. Let's say, for example, we wanted to
multiply every element in <tt>int_list</tt> by a factor of 2. A naive approach might be to do the
following – but see what happens to the output:

In [9]:
int_list * 2

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

That's not really what we wanted!

Instead, operating on every element in the list gets really easy with NumPy:

In [10]:
int_arr = np.array(int_list)
int_arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [11]:
print(int_arr * 2)

[ 0  2  4  6  8 10 12 14 16 18]


In addition, every NumPy array comes with a list of attributes:

In [12]:
print("int_arr ndim: ", int_arr.ndim)
print("int_arr shape: ", int_arr.shape)
print("int_arr size: ", int_arr.size)
print("int_arr dtype: ", int_arr.dtype)

int_arr ndim:  1
int_arr shape:  (10,)
int_arr size:  10
int_arr dtype:  int64


You can display all attributes and methods of a NumPy array by typing out the name of the array, add a period, and then hit <tt>&lt;TAB&gt;</tt>:

<tt>In [X]: int_arr.&lt;TAB&gt;</tt>

This will display a dropdown menu with a whole lot of attributes, such as the ones mentioned above.

## Accessing single array elements by indexing

If you are familiar with Python's standard list indexing, indexing in NumPy will feel quite
familiar.
In a one-dimensional array, the i-th value (counting from zero) can be accessed by
specifying the desired index in square brackets, just as with Python lists:

In [13]:
int_arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [14]:
int_arr[0]

0

In [15]:
int_arr[3]

3

To index from the back of the array, you can use negative indices:

In [16]:
int_arr[-1]

9

In [17]:
int_arr[-2]

8

There are a few other cool tricks.

For example, return all elements from index 2 up to index 5, but don't include index 5:

In [18]:
int_arr[2:5]

array([2, 3, 4])

Return all elements from the beginning of the array up to index 5 - 1:

In [19]:
int_arr[:5]

array([0, 1, 2, 3, 4])

Return all elements from index 5 up to the end of the array:

In [20]:
int_arr[5:]

array([5, 6, 7, 8, 9])

Return every other element, starting at index 0:

In [21]:
int_arr[::2]

array([0, 2, 4, 6, 8])

Return all elements of the array, but in reverse order:

In [22]:
int_arr[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

## Creating multi-dimensional arrays
Arrays can be N-dimensional.
In machine learning, we will often deal with at least 2-D arrays, where the column index stands for the values of a particular feature, and the rows contain the actual feature values.

Let's say we want to create an array with 3 rows and 5 columns, with all elements initialized to zero. If we don't specify a data type, NumPy will default to using floats:

In [23]:
arr_2d = np.zeros((3, 5))
arr_2d

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

As you probably know from your OpenCV days, this could be interpreted as a 3x5 grayscale image with all pixels set to 0 (black). Analogously, if we wanted to create a tiny 2x4 pixel image with 3 color channels (R, G, B), but all pixels set to white, we would use NumPy to create a 3-D array with dimensions 2x4x3:

In [24]:
arr_float_3d = np.ones((3, 2, 4))
arr_float_3d

array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.]]])

Here, the first dimension defines the color channel (R, G, B). Thus, if this were real image data, we could easily grab the red color information by slicing the array:

In [25]:
arr_float_3d[0, :, :]

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In OpenCV, images either come as 32-bit float arrays with values between 0 and 1, or they come as 8-bit integer arrays with values between 0 and 255. Hence, we can also create a 2x4 pixel, all-white RGB image using 8-bit integers, by specifying the dtype attribute of the NumPy array, and by multiplying all the 1's in the array with 255:

In [26]:
arr_uint_3d = np.ones((3, 2, 4), dtype=np.uint8) * 255
arr_uint_3d

array([[[255, 255, 255, 255],
        [255, 255, 255, 255]],

       [[255, 255, 255, 255],
        [255, 255, 255, 255]],

       [[255, 255, 255, 255],
        [255, 255, 255, 255]]], dtype=uint8)

## Reminder about built-in documentation

As you read through this chapter, don't forget that IPython gives you the ability to quickly explore the contents of a package (by using the tab-completion feature), as well as the documentation of various functions
(using the <tt>?</tt> character)

For example, to display all the contents of the numpy namespace, you can type this:

    $ ipython
    In [1]: import numpy as np
    In [2]: np.<TAB>

Try it out yourself in the empty cell below:

In [12]:
import numpy as np


And to display NumPy's built-in documentation, you can use this:

    In [3]: np?

Then hit <tt>Shift+Enter</tt> to execute the cell.

Try it out yourself in the empty cell below:

In [13]:
np?

[0;31mType:[0m        module
[0;31mString form:[0m <module 'numpy' from '/Users/astar/lab/opencvs/myopencv/.venv/lib/python3.10/site-packages/numpy/__init__.py'>
[0;31mFile:[0m        ~/lab/opencvs/myopencv/.venv/lib/python3.10/site-packages/numpy/__init__.py
[0;31mDocstring:[0m  
NumPy
=====

Provides
  1. An array object of arbitrary homogeneous items
  2. Fast mathematical operations over arrays
  3. Linear Algebra, Fourier Transforms, Random Number Generation

How to use the documentation
----------------------------
Documentation is available in two forms: docstrings provided
with the code, and a loose standing reference guide, available from
`the NumPy homepage <https://numpy.org>`_.

We recommend exploring the docstrings using
`IPython <https://ipython.org>`_, an advanced Python shell with
TAB-completion and introspection capabilities.  See below for further
instructions.

The docstring examples assume that `numpy` has been imported as ``np``::

  >>> import numpy as np


Here's another neat trick. Let's say you're typing away at a command to create linearly spaced values in an interval using NumPy's arange function, but you forgot the exact syntax. What you do is you start typing the function's name, then use Shift+TAB to display the function signature:
In [4]: np.arange(<Shift+TAB>
Try it out yourself in the empty cell below:

More detailed documentation, along with tutorials and other resources, can be found at http://www.numpy.org.