Skip to content

PyBCAdvanced

Katy Huff edited this page Jan 29, 2012 · 4 revisions

Advanced !NumPy

Efficiency and C

!NumPy arrays can only store a single type of data (though void and object arrays are possible). Given numpy.array([0, 1, 2]), !NumPy will create a C array of integers and return a Python object holding that array. !NumPy can also create Fortran arrays by specifying a flag at creation by passing order='F' to most array-creation functions.

Using C (or Fortran) arrays 1) increases efficiency of accessing elements, and 2) trivializes interoperability between C and Python. To pass a Python list (of integers) as an array to a C function, you'd have to write the following:

#!Lineno
#!c

PyObject *list; /* set to some list in Python */ size_t size = PyList_Size(list); long *data = malloc(sizeof(long)*size); int i; for(i=0; i<size; i++) {

PyObject *curr = PyList_GetItem(list, i); if(!PyInt_Check(curr)) {

PyErr_SetString(PyExc_ValueError, "only ints allowed"); free(data); return 0;

} data[i] = PyInt_AsLong(curr);

}

my_awesome_c_function(data, size);

free(data);

Using !NumPy arrays, you'd write this:

#!Lineno
#!c

PyObject *array; /* set to some numpy array in Python */ size_t size = PyArray_SIZE(array); long *data = PyArray_DATA(array);

my_awesome_c_function(data, size);

Multidimensional Slicing

Suppose you want the elements of a 4D array where the 3rd dimension is 0. We've already seen how to use "slices" of an entire dimension as placeholders (just type ), but you can also specify multiple slices with : ..., which will fill in enough slices for every dimension of the array. Let's see it in action:

#!CodeExample
#!python
>>> A = numpy.arange(12)
>>> A.shape = (2, 1, 3, 2)
>>> A
array([[[[ 0,  1],
         [ 2,  3],
         [ 4,  5]]],
[[[ 6, 7],
[ 8, 9], [10, 11]]]])
>>> A[:, :, 0]
array([[[0, 1]],
6, 7])
>>> A[:, :, 0, :]
array([[[0, 1]],
6, 7])
>>> A[..., 0, :]
array([[[0, 1]],
6, 7])

Ufunc Methods

Ufuncs that take two input arrays and return an output array have several methods at their disposal to change their behavior. The most useful of these are .reduce() and .accumulate(). People familiar with functional programming should recognize these immediately. Reduce performs a function on each element and the result of the previous reduce call, returning the value of the last function call. Accumulate works much the same, but returns the value at each step.

In code, reduce is functionally equivalent to the following, where op is the ufunc being used, and A is the array passed to it:

#!CodeExample
#!python

r = op.identity for i in range(len(A)):

r = op(r,A[i])

return r

Likewise, accumulate is functionally equivalent to the following:

#!CodeExample
#!python

r = numpy.empty(len(A)) t = op.identity for i in range(len(A)):

t = op(t,A[i]) r[i] = t

return r

Now let's see these methods in action. For instance, suppose we want to quickly find the minimum value in an array. Using the Python built-in function min is too slow, but there is a ufunc named minimum that performs an element-wise min operation on two arrays. We can take advantage of this using reduce:

#!CodeExample
#!python
>>> import random
>>> A = numpy.array([random.randint(0,100) for i in range(10)])
>>> A
array([72, 24, 33, 38, 55, 51, 85, 80, 44, 74])
>>> numpy.minimum.reduce(A)
24

Now suppose we wanted the cumulative sum of an array. Accumulate makes this trivially easy:

#!CodeExample
#!python
>>> import random
>>> A = numpy.array([random.randint(0, 10) for i in range(10)])
>>> A
array([6, 4, 6, 1, 1, 4, 0, 1, 7, 7])
>>> numpy.add.accumulate(A)
array([ 6, 10, 16, 17, 18, 22, 22, 23, 30, 37])
Clone this wiki locally