### Introduction

In one of my recent projects, I had to make a faster version of a discrete choice dynamic programming model. First, I profiled the code with snakeviz and wanted to refactor only one implementation, but the whole programm had to be rewritten. After profiling again, I was glad that none of the former costly functions appeared in the upper ranks of the list. Indeed, I only saw functions calls to internals of other packages. But, the first item caught my eye which said ``~:0(<method 'copy' of 'numpy.ndarray' objects>)``. What is this? I was sure that I do not call ``np.copy()`` at all. But, what operations are secretly returning copies?

### How to identify views and copies?

To notice whether to objects do not refer to the same data buffer in the memory, we use the following function.

In [17]:
import numpy as np

In [18]:
def aid(x):
    """This function returns the memory block address of an object."""
    return x.__array_interface__["data"][0]

Let us start simple. We construct an array and take a look at the memory block address of the array and the same array starting at the first position.

In [20]:
x = np.array([1, 2, 3])
aid(x), aid(x[1:])

(2095759930656, 2095759930660)

Indeed, they have very similar addresses but the slice has an offset of 4. Every offset represents a byte so that the first number blocks 32bit. Furthermore, we know that the array contains integers. Thus, the dtype must be int32.

In [16]:
x.dtype

dtype('int32')

Addresses are only identical, if they share the same first element.

In [22]:
aid(x), aid(x[:2])

(2095759930656, 2095759930656)

As we are more interested in whether two objects come from the same memory block address instead whether they start at the same offset, we define two other functions.

In [24]:
def get_data_base(x):
    base = x
    while isinstance(base.base, np.ndarray):
        base = base.base
    return base

def arrays_share_data(x, y):
    return get_data_base(x) is get_data_base(y)

Just a quick test.

In [25]:
arrays_share_data(x, x.copy())

False

In [26]:
arrays_share_data(x, x[1:])

True

### View vs. Copy

Let us start examining when a copy or a view is returned.

#### In-place operations

In [27]:
x = np.arange(16).reshape(4, 4)

In [31]:
x_base = get_data_base(x)

In [32]:
x *= 2

In [33]:
get_data_base(x) is x_base

True

In [34]:
a = x * 2

In [35]:
arrays_share_data(x, a)

False

#### Indexing

In [42]:
arrays_share_data(x, x[0])

True

In [43]:
arrays_share_data(x, x[0, 0])

False

In [44]:
arrays_share_data(x, x[:1, :1])

True

In [45]:
arrays_share_data(x, x[0][0])

False

This is a little bit mind-boggling, right? There is no problem with the first case as you probably expected that the two array share the same base. The other three cases all index the same element of the matrix, but in two of the cases a copy is returned. Why is that?

The reason is that there are two kinds of indexing. The first group of indexing comprises simple indices, ``x[0]``, slice, ``x[:2]`` and boolean masks, ``x[x > 0]``. These methods all return a view and not a copy.

The second group is called ***fancy indexing*** and basically means that we use arrays of indices to access multiple values at once. The simplest way of fancy indexing is using lists of indices.

In [46]:
a = x[[2, 1]]

In [49]:
a

array([[32, 36, 40, 44],
       [16, 20, 24, 28]])

In [48]:
arrays_share_data(x, a)

False

We can also combine fancy indexing with other indexing schemes, but the return value is always a copy.

In [53]:
a = x[1, [1, 2]]

In [54]:
a

array([20, 24])

In [55]:
arrays_share_data(x, a)

False

So every form of fancy indexing returns a copy. What about the following case?

In [88]:
a = x[(1,)]

What do you expect?

In [93]:
arrays_share_data(x, a)

True

And, this one?

In [107]:
x = np.arange(16).reshape(4, 4, -1)

In [108]:
a = x[(1, 2)]

In [109]:
arrays_share_data(x, a)

True

I was a little bit puzzled by this one at first as I thought it is the same as the following.

In [123]:
a

array([6])

In [126]:
b = x[[1, 2]]

In [127]:
b

array([[[ 4],
        [ 5],
        [ 6],
        [ 7]],

       [[ 8],
        [ 9],
        [10],
        [11]]])

In [128]:
arrays_share_data(x, b)

False

The reason is that ellipses are simply ommitted and if the resulting index is not fancy the return value is a view.

### References

- https://ipython-books.github.io/45-understanding-the-internals-of-numpy-to-avoid-unnecessary-array-copying/
- http://www.jessicayung.com/numpy-views-vs-copies-avoiding-costly-mistakes/
- https://jakevdp.github.io/PythonDataScienceHandbook/02.07-fancy-indexing.html