# Understanding Data Types in Python

python是动态类型语言，c和java是静态类型语言，python在使用变量之前不需要显示声明变量类型，可以直接使用，例如
```python
# python code
result = 0
for i in range(100):
    result += 1
```

同样的代码用c语言表示应该是这样:
```c
int result = 0;
for (int i = 0; i < 100; ++i) {
    result += 1;
}
```

## A Python Integer Is More Than Just an Integer

The standard Python implementation is written in C. This means that every Python object is simply a cleverly-disguised C structure, which contains not only its value, but other information as well. For example, when we define an integer in Python, such as x = 10000, x is not just a "raw" integer. It's actually a pointer to a compound C structure, which contains several values. Looking through the Python 3.4 source code, we find that the integer (long) type definition effectively looks like this (once the C macros are expanded):
 ```c
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
}
```

A single integer in python3.4 actually contains four pieces.
* `ob_recnt`, a reference count that helps Python silently handle memory allocation and deallocation
* `ob_type`, which encodes the type of the variable
* `ob_size`, which specifies the size of the following data members
* `ob_digit`, which contains the actual integer value that we expect the Python variable to represent.

This means that there is some overhead in storing an integer in Python as compared to an integer in a compiled language like C, as illustrated in the following figure:
![storage](https://nbviewer.jupyter.org/github/donnemartin/data-science-ipython-notebooks/blob/master/numpy/figures/cint_vs_pyint.png)

Here PyObject_HEAD is the part of the structure containing the reference count, type code, and other pieces mentioned before.

Notice the difference here: a C integer is essentially a label for a position in memory whose bytes encode an integer value. A Python integer is a pointer to a position in memory containing all the Python object information, including the bytes that contain the integer value. This extra information in the Python integer structure is what allows Python to be coded so freely and dynamically. All this additional information in Python types comes at a cost, however, which becomes especially apparent in structures that combine many of these objects.

## A Python List Is More Than Just a List

In [1]:
L = list(range(10))
L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [2]:
type(L[0])

int

In [3]:
L2 = [str(c) for c in L]
L2

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [4]:
type(L2[0])

str

因为python是动态类型，所以我们可以创造元素类型不一样的list

In [5]:
L3 = [True, "2", 3.0, 4]
[type(item) for item in L3]

[bool, str, float, int]

But this flexibility comes at a cost: to allow these flexible types, each item in the list must contain its own type info, reference count, and other information–that is, each item is a complete Python object. In the special case that all variables are of the same type, much of this information is redundant: it can be much more efficient to store data in a fixed-type array. The difference between a dynamic-type list and a fixed-type (NumPy-style) array is illustrated in the following figure:

![illustrate](https://nbviewer.jupyter.org/github/donnemartin/data-science-ipython-notebooks/blob/master/numpy/figures/array_vs_list.png)

At the implementation level, the array essentially contains a single pointer to one contiguous block of data. The Python list, on the other hand, contains a pointer to a block of pointers, each of which in turn points to a full Python object like the Python integer we saw earlier. Again, the advantage of the list is flexibility: because each list element is a full structure containing both data and type information, the list can be filled with data of any desired type. Fixed-type NumPy-style arrays lack this flexibility, but are much more efficient for storing and manipulating data.

## Fixed-Type Arrays in Python

Python offers several different options for storing data in efficient, fixed-type data buffers. The built-in array module (available since Python 3.3) can be used to create dense arrays of a uniform type:  


In [6]:
import array
L = list(range(10))
A = array.array('i', L)
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [7]:
import numpy as np

## Create Arrays from Python Lists
First, we can use np.array to create arrays from Python lists:

In [8]:
# integer array
np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

Remember that unlike Python lists, NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible (here, integers are up-cast to floating point):  

numpy限制arrays包含相同类型的元素，如果类型不匹配，numpy会自动将类型向上转换

In [9]:
np.array([3.14, 4, 2, 3])

array([3.14, 4.  , 2.  , 3.  ])

If we want to explicitly set the data type of the resulting array, we can use the dtype keyword:  

In [10]:
np.array([1, 2, 3, 4], dtype="float32")

array([1., 2., 3., 4.], dtype=float32)

Finally, unlike Python lists, NumPy arrays can explicitly be multi-dimensional; here's one way of initializing a multidimensional array using a list of lists:  

In [11]:
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

The inner lists are treated as rows of the resulting two-dimensional array.
 

## Create Arrays from Scratch
Especially for laerger arrays, it is more efficient to create arrays from scratch using routines built into NumPy. Here are several examples.

In [12]:
# create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [13]:
# create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [14]:
# create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [15]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [16]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [17]:
# Create a 3x3 array of uniformly distributed 
# random values between 0 and 1
np.random.random((3, 3))

array([[0.47881126, 0.53115861, 0.00228086],
       [0.4005703 , 0.85785879, 0.21305901],
       [0.95714448, 0.71566001, 0.33389312]])

In [18]:
# Create a 3x3 array of normally distributed random values 
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

array([[ 1.0928415 , -1.17234833,  0.25192034],
       [ 0.49990997,  1.02662301,  0.09798689],
       [-0.74340732, -0.55286862, -0.54775236]])

In [19]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

array([[3, 0, 5],
       [6, 9, 8],
       [3, 4, 2]])

In [20]:
# Create a 3x3 identity matrix
np.eye(3, 3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [21]:
# Create an uninitialized array of three integers 
# The values will be whatever happens to already exist at that memory location
np.empty(3)

array([1., 1., 1.])

## NumPy Standard Data Types

NumPy arrays contain values of a single type, so it is important to have detailed knowledge of those types and their limitations. Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.

The standard NumPy data types are listed in the following table. Note that when constructing an array, they can be specified using a string:

`np.zeros(10, dtype='int16')`

or using the associated NumPy object:

`np.zeros(10, dtype=np.int16)`

In [22]:
np.typeDict

{'?': numpy.bool_,
 0: numpy.bool_,
 'byte': numpy.int8,
 'b': numpy.int8,
 1: numpy.int8,
 'ubyte': numpy.uint8,
 'B': numpy.uint8,
 2: numpy.uint8,
 'short': numpy.int16,
 'h': numpy.int16,
 3: numpy.int16,
 'ushort': numpy.uint16,
 'H': numpy.uint16,
 4: numpy.uint16,
 'i': numpy.intc,
 5: numpy.intc,
 'uint': numpy.uint32,
 'I': numpy.uintc,
 6: numpy.uintc,
 'intp': numpy.int64,
 'p': numpy.int64,
 9: numpy.int64,
 'uintp': numpy.uint64,
 'P': numpy.uint64,
 10: numpy.uint64,
 'long': numpy.int32,
 'l': numpy.int32,
 7: numpy.int32,
 'L': numpy.uint32,
 8: numpy.uint32,
 'longlong': numpy.int64,
 'q': numpy.int64,
 'ulonglong': numpy.uint64,
 'Q': numpy.uint64,
 'half': numpy.float16,
 'e': numpy.float16,
 23: numpy.float16,
 'f': numpy.float32,
 11: numpy.float32,
 'double': numpy.float64,
 'd': numpy.float64,
 12: numpy.float64,
 'longdouble': numpy.longdouble,
 'g': numpy.longdouble,
 13: numpy.longdouble,
 'cfloat': numpy.complex128,
 'F': numpy.complex64,
 14: numpy.complex64