# NumPy

Effectively loading, storing, and manipulating in-memory data in Python.

No matter what the data are, the first step in making it analyzable will be to transform them into **arrays of numbers**.

NumPy (short for **Numerical Python**) provides an efficient interface to store and operate on dense data buffers. In some ways, NumPy arrays are like Python's built-in list type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size. NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python.

In [1]:
import numpy as np

### A Python Integer Is More Than Just an Integer

The standard Python implementation is written in C. This means that every Python object is simply a cleverly-disguised **C structure**, which contains not only its value, but other information as well.

For example, when we define an integer in Python, such as x = 10000, x is not just a "raw" integer. It's actually **a pointer to a compound C structure**, which contains several values.

Looking through the Python 3.4 source code, we find that the integer (long) type definition effectively looks like this:

`struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};`

- ob_refcnt, a reference count that helps Python silently handle memory allocation and deallocation
- ob_type, which encodes the type of the variable
- ob_size, which specifies the size of the following data members
- ob_digit, which contains the actual integer value that we expect the Python variable to represent.

This means that there is some overhead in storing an integer in Python as compared to an integer in a compiled language like C. **PyObject_HEAD** is the part of the structure containing the **reference count, type code, and other pieces**.

Notice the difference here: a C integer is essentially a label for a position in memory whose bytes encode an integer value. A Python integer is a pointer to a position in memory containing all the Python object information, including the bytes that contain the integer value. This extra information in the Python integer structure is what allows Python to be coded so freely and dynamically.

<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/cint_vs_pyint.png" width="200px">

### A Python List Is More Than Just a List

`L3 = [True, "2", 3.0, 4]`
`[type(item) for item in L3]`
`[bool, str, float, int]`

To allow these flexible types, each item in the list must contain its own type info, reference count, and other informationâ€“that is, each item is a complete Python object. In the special case that all variables are of the same type, much of this information is redundant: it can be much more efficient to store data in a **fixed-type array**. At the implementation level, the array essentially contains a single pointer to one contiguous block of data. The Python list, on the other hand, contains a pointer to a block of pointers, each of which in turn points to a full Python object like the Python integer we saw earlier. Again, the advantage of the list is flexibility: because each list element is a full structure containing both data and type information, the list can be filled with data of any desired type. Fixed-type NumPy-style arrays lack this flexibility, but are much more efficient for storing and manipulating data.

<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/array_vs_list.png" width="300px">


### Fixed-Type Arrays in Python

In [2]:
import array
L = list(range(10))
A = array.array('i', L) # i is a type code indicating the contents are integers.
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [3]:
# Unlike Python lists, NumPy is constrained to arrays that all contain the same type.
# If types do not match, NumPy will upcast if possible

np.array([3.14, 4, 2, 3])

array([3.14, 4.  , 2.  , 3.  ])

In [4]:
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

In [5]:
np.array([range(i, i + 3) for i in [2, 4, 6]]) # multi-dimensional array

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

In [6]:
np.zeros(10, dtype=int)
np.ones((2, 4), dtype=float)
np.full((3, 6), 3.14)
np.arange(0, 20, 2) # Starting at 0, ending at 20, stepping by 2

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [7]:
np.linspace(0, 1, 5) # arr of five values evenly spaced between 0 and 1

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [8]:
np.random.random((3, 3)) # uniformly distributed random values between 0 and 1

array([[0.14981652, 0.50465865, 0.566204  ],
       [0.95979486, 0.27036794, 0.3413827 ],
       [0.67091205, 0.61075326, 0.88813083]])

In [9]:
np.random.normal(0, 1, (3, 3)) # normally distributed random values
# with mean 0 and standard deviation 1

array([[ 0.33353743, -0.32719704,  0.15195758],
       [-0.67470392,  0.69817957,  0.72085909],
       [-1.27008285,  1.08091716,  0.65519897]])

In [10]:
np.random.randint(0, 10, (3, 3)) # random integers in [0, 10]

array([[9, 3, 6],
       [6, 9, 8],
       [6, 2, 3]])

In [11]:
np.eye(3) # 3x3 identity matrix

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [12]:
np.empty(3) # Uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location

array([1., 1., 1.])

### NumPy Data Types

- **bool_**			Boolean (True or False) stored as a byte
- **int_**			Default integer type (same as C long; normally either int64 or int32)
- **intc**			Identical to C int (normally int32 or int64)
- **intp**			Integer used for indexing (same as C ssize_t; normally either int32 or int64)
- **int8**			Byte (-128 to 127)
- **int16**			Integer (-32768 to 32767)
- **int32**			Integer (-2147483648 to 2147483647)
- **int64**			Integer (-9223372036854775808 to 9223372036854775807)
- **uint8**			Unsigned integer (0 to 255)
- **uint16**		Unsigned integer (0 to 65535)
- **uint32**		Unsigned integer (0 to 4294967295)
- **uint64**		Unsigned integer (0 to 18446744073709551615)
- **float_**		Shorthand for float64.
- **float16**		Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
- **float32**		Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
- **float64**		Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
- **complex_**		Shorthand for complex128.
- **complex64**		Complex number, represented by two 32-bit floats
- **complex128**	Complex number, represented by two 64-bit floats
