Numpy is a python library for scientific computation.It provides an ndarray which is much more efficient for storing and manipulating data.

Data Types: NumPy supports a much greater variety of numerical types than Python does, Some types, such as int and intp, have differing bitsizes, dependent on the platforms,some of the available data types are:
1. bool
2. int
3. long
4. float
5. double
6. short
7. double
8. long double complex

In [1]:
import numpy as np
from io import StringIO

In [2]:
x= np.float32(1.0)
x

1.0

In [3]:
#similarly we can write
x1= np.int64(5)
x1

5

In [4]:
y = np.int_([1,2,4])
z = np.int64([1,2,3,4,5,6])
print(y)
print(z)             

[1 2 4]
[1 2 3 4 5 6]


In [5]:
#specify the data type
z = np.arange(3, dtype=np.uint8)
z1 = np.arange(5, dtype=np.float32)
print(z)
print(z1)

[0 1 2]
[0. 1. 2. 3. 4.]


Arrays
In python, arrays are created using array() function.
There are 5 general mechanisms for creating arrays:

1. Conversion from other Python structures like lists, tuples.
2. Intrinsic numpy array creation objects like arange, ones, zeros, etc.
3. Reading arrays from disk, either from standard or custom formats.
4. Creating arrays from raw bytes through the use of strings or buffers.
5. Use of library functions like random.


In [6]:
#creating numpy arrays
x = np.array([2,3,1,0])
x1 = np.array([[1,2.0,3],[5,0,0],[7,9,3.]])
x1              

array([[1., 2., 3.],
       [5., 0., 0.],
       [7., 9., 3.]])

In [7]:
#mixed array
x = np.array([[1,2.0],[0,0],(1+1j,3.)]) 
x

array([[1.+0.j, 2.+0.j],
       [0.+0.j, 0.+0.j],
       [1.+1.j, 3.+0.j]])

In [8]:
#zeros will create an array filled with 0 values with the specified shape and default dtype is float64
print(np.zeros(10))
#ones will create an array filled with 1 values. 
print(np.ones(10))

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


In [9]:
#arange() creates arrays with regularly incrementing values. 
np.arange(10, 100, dtype=float)

array([10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22.,
       23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33., 34., 35.,
       36., 37., 38., 39., 40., 41., 42., 43., 44., 45., 46., 47., 48.,
       49., 50., 51., 52., 53., 54., 55., 56., 57., 58., 59., 60., 61.,
       62., 63., 64., 65., 66., 67., 68., 69., 70., 71., 72., 73., 74.,
       75., 76., 77., 78., 79., 80., 81., 82., 83., 84., 85., 86., 87.,
       88., 89., 90., 91., 92., 93., 94., 95., 96., 97., 98., 99.])

In [10]:
#indices() creates a set of arrays (stacked as a one-higher dimensioned array), one per dimension with each representing variation in that dimension.
print(np.indices((5,5)))

[[[0 0 0 0 0]
  [1 1 1 1 1]
  [2 2 2 2 2]
  [3 3 3 3 3]
  [4 4 4 4 4]]

 [[0 1 2 3 4]
  [0 1 2 3 4]
  [0 1 2 3 4]
  [0 1 2 3 4]
  [0 1 2 3 4]]]


I/O with numpy

Importing data with genfromtxt:
NumPy provides several functions to create arrays from tabular data.one of them is genfromtxt().It runs two  loops. The first loop converts each line of the file in a sequence of strings. The second loop converts each string to the appropriate data type.Source of the data is the only mandatory argument of genfromtxt . It can be a string, a list of strings, or a generator.

Splitting the lines into columns

In [11]:
#The delimeter argument
#Once the file is defined and open for reading, genfromtxt splits each non-empty line into a sequence of strings.It skips the empty or commented lines. The delimiter keyword is used to define how the splitting should take place.
data = u"  1  2  3\n  4  5 67\n890123  4"
np.genfromtxt(StringIO(data), delimiter=3)

array([[  1.,   2.,   3.],
       [  4.,   5.,  67.],
       [890., 123.,   4.]])

In [12]:
np.genfromtxt(StringIO(data), delimiter=(4, 3, 2))

array([[1.000e+00, 2.000e+00, 3.000e+00],
       [4.000e+00, 5.000e+00, 6.700e+01],
       [8.901e+03, 2.300e+01, 4.000e+00]])

Some other arguments are:

In [13]:
#1. autostrip: when a line is decomposed into a series of strings, the individual entries are not stripped of leading nor trailing white spaces. This behavior can be overwritten by setting the optional argument autostrip
#to a value of True:
np.genfromtxt(StringIO(data), delimiter=",", dtype="|U5", autostrip=True)

array(['1  2 ', '4  5 ', '89012'], dtype='<U5')

In [14]:
#2. comments:It is used to define a character string that marks the beginning of a comment
data = u"""#
  # Skip me !
  # Skip me too !
  1, 2
  3, 4
  5, 6 #This is the third line of the data
  7, 8"""
np.genfromtxt(StringIO(data), comments="#", delimiter=",")

array([[1., 2.],
       [3., 4.],
       [5., 6.],
       [7., 8.]])

Skipping lines and choosing columns

In [15]:
#Arguments--------
#1. skip header
#2. skip_footer
#3. usecols
data = u"\n".join(str(i) for i in range(10))
np.genfromtxt(StringIO(data),
               skip_header=3, skip_footer=5)

array([3., 4.])

In [16]:
#eg of usecols
np.genfromtxt(StringIO(data),
                  usecols =(0,-1))

array([[0., 0.],
       [1., 1.],
       [2., 2.],
       [3., 3.],
       [4., 4.],
       [5., 5.],
       [6., 6.],
       [7., 7.],
       [8., 8.],
       [9., 9.]])

#Choosing the data type
#dtype take the values:
1. dtype=float. (The output will be 2D with the given dtype)
2. dtype=(int, float, float).
3. comma-separated string, such as dtype="i4,f8,|U3".
4. dictionary with two keys 'names' and 'formats'.
5. sequence of tuples (name, type), such as dtype=[('A', int), ('B', float)] etc.

In [17]:
#Setting the names using the names argument
data = StringIO("1 2 3\n 4 5 6")
np.genfromtxt(data, names="A, B, C")

array([(1., 2., 3.), (4., 5., 6.)],
      dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

In [18]:
#The defaultfmt argument
data = StringIO("1 2 3\n 4 5 6")
np.genfromtxt(data, dtype=(int, float, int))

array([(1, 2., 3), (4, 5., 6)],
      dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<i4')])

In [19]:
#The converters argument
convertfunc = lambda x: float(x.strip("%"))/100.
data = u"1, 2.3%, 45.\n6, 78.9%, 0"
names = ("i", "p", "n")

In [20]:
np.genfromtxt(StringIO(data), delimiter=",", names=names)

array([(1., nan, 45.), (6., nan,  0.)],
      dtype=[('i', '<f8'), ('p', '<f8'), ('n', '<f8')])

Using missing and filling values
1. missing_values: By default, any empty string is marked as missing. We can also consider more complex strings, such as "N/A" or "???" to represent missing or invalid data. The missing_values argument accepts three kind of values:
a) a string or a comma-separated string
b) a sequence of strings
c) a dictionary
2. filling_values: we need to provide a value for the missing entries, thus we use values like 'False',-1,np.nan etc


Some shortcut functions:
1. ndfromtxt
2. mafromtxt
3. recfromtxt

Indexing: Array indexing is done using square brackets with array indices inside. There are many options to indexing, which give numpy indexing great power.

Single element indexing: . It is 0-based, and accepts negative indices for indexing from the end of the array.

In [21]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [22]:
#Print 5th element
arr[5]

5

In [23]:
#numpy arrays support multidimensional indexing for multidimensional arrays.
arr.shape = (2,5)
arr

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [24]:
print(arr[1,4])
#we can also wirte
print(arr[0][2])
#both are same

9
2


In [25]:
x = np.arange(10)
print(x[2:5])
print(x[1:7:2])
print(x[:-7])

[2 3 4]
[1 3 5]
[0 1 2]


Creating index arrays: NumPy arrays may be indexed with other arrays (or any other sequence- like object that can be converted to an array, such as lists, with the exception of tuples

In [26]:
#example
arr = np.arange(5,2,-1)
arr

array([5, 4, 3])

In [27]:
arr[np.array([1, 2, 1, 2])]

array([4, 3, 4, 3])

In [28]:
#Negative values are permitted 
arr[np.array([2,2,-1,1])]

array([3, 3, 3, 4])

Indexing Multi-dimensional arrays

In [29]:
#creating multidimensional array
y = np.arange(35).reshape(5,7)
y[np.array([0,2,4]), np.array([0,1,2])]

array([ 0, 15, 30])

In [30]:
 y[np.array([0,2,4]), 1]

array([ 1, 15, 29])

Boolean index arrays:
Boolean arrays must be of the same shape as the initial dimensions of the array being indexed.
They has the same shape.

In [31]:
y = np.arange(35).reshape(5,7)
b= y>20
#selecting only values greater than 20
print(y[b])

[21 22 23 24 25 26 27 28 29 30 31 32 33 34]


In [32]:
#we get boolean results
b[:,5]

array([False, False, False,  True,  True])

In [33]:
y[b[:,5]] # we get the values

array([[21, 22, 23, 24, 25, 26, 27],
       [28, 29, 30, 31, 32, 33, 34]])

Combining index arrays with slices

In [34]:
#creating the array
y[np.array([0,2,4]),1:3]

array([[ 1,  2],
       [15, 16],
       [29, 30]])

In [35]:
#slicing
y[b[:,5],1:3]

array([[22, 23],
       [29, 30]])

In [36]:
#Assigning values to indexed arrays
x = np.arange(10)
#assigning value 1
x[2:7] = 1
#print the value
print(x[2:7])

[1 1 1 1 1]


In [37]:
#another example
x[1] = 1.2
print(x[1])

1


In [38]:
#when we have variable no. of indices
arr = np.arange(81).reshape(3,3,3,3)
indices = (1,1,1,1)
arr[indices]

40

In [39]:
indices = (1,1,1,slice(0,2))
arr[indices]

array([39, 40])

Broadcasting

Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations.
NumPy operations are usually done on pairs of arrays on an element-by-element basis. In the simplest case, the two arrays must have exactly the same shape.

In [40]:
a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0, 2.0, 2.0])
a * b

array([2., 4., 6.])

In [41]:
#creating different arrays
a = np.array([1.0, 2.0, 3.0])
b = 2.0
print(a*b)
print(a+b)

[2. 4. 6.]
[3. 4. 5.]


In [42]:
#a 1D array and 2D array do not broadcast
#arrays with different shapes do not broadcast
arr = np.arange(5)
arr1 = arr.reshape(5,1)
arr2 = np.ones(10)
arr3 = np.ones((4,4))

In [43]:
arr+arr2  #Error

ValueError: operands could not be broadcast together with shapes (5,) (10,) 

In [44]:
arr1+arr2 #correct

array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [5., 5., 5., 5., 5., 5., 5., 5., 5., 5.]])

Byteswapping: It often happens that the memory that you want to view with an array is not of the same byte ordering as the computer on which you are running Python.
Thus the bytes are, in memory order:

1. MSB integer 1
2. LSB integer 1
3. MSB integer 2
4. LSB integer 2

In [49]:
#creating a string
big_end_str = chr(0) + chr(1) + chr(3) + chr(2)

In [50]:
import numpy as np
#'>' = big-endian
big_end_arr = np.ndarray(shape=(2,),dtype='>i2', buffer=big_end_str)
print(big_end_arr[0])
print(big_end_arr[1])
# Error a bytes-like object is required, not 'str'

TypeError: a bytes-like object is required, not 'str'

Changing byte ordering:
 1. Change the byte-ordering info. using arr.newbyteorder()
 2. Change the byte-ordering of the underlying data using arr.byteswap() 

In [52]:
w = np.ndarray(shape=(2,),dtype='<i2', buffer=big_end_str)
print(w[0])
#we need to change the datatype to match the data
res = w.byteswap()
res[0]

TypeError: a bytes-like object is required, not 'str'

Structured arrays:  In structured arrays the datatype of a field may be any numpy datatype including other structured datatypes, and it may also be a sub-array which behaves like an ndarray of a specified shape. Their datatype is a composition of simpler datatypes organized as a sequence of named fields.

In [53]:
#creating a 1D array
x = np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)],
              dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
x

array([('Rex', 9, 81.), ('Fido', 3, 27.)],
      dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f4')])

In [54]:
#values are accessed through indexing
x['age']

array([9, 3])

Structured Datatypes: These are created using the function numpy.dtype. 

In [55]:
#using a ist of tuples
np.dtype([('x', 'f4'), ('y', np.float32), ('z', 'f4', (2,2))])

dtype([('x', '<f4'), ('y', '<f4'), ('z', '<f4', (2, 2))])

In [56]:
#creating via a string of comma-separated dtype specifications
np.dtype('i8,f4,S3')

dtype([('f0', '<i8'), ('f1', '<f4'), ('f2', 'S3')])

In [57]:
np.dtype('3int8, float32, (2,3)float64')

dtype([('f0', 'i1', (3,)), ('f1', '<f4'), ('f2', '<f8', (2, 3))])

In [58]:
#using a dictionary
print(np.dtype({'names': ['col1', 'col2'], 'formats': ['i4','f4']}))
print(np.dtype({'names': ['col1', 'col2'],
           'formats': ['i4','f4'],
           'offsets': [0, 4],
           'itemsize': 12}))

[('col1', '<i4'), ('col2', '<f4')]
{'names':['col1','col2'], 'formats':['<i4','<f4'], 'offsets':[0,4], 'itemsize':12}


In [59]:
#With a dictionary of field names
np.dtype=({'col1': ('i1',0), 'col2': ('f4',1)})

Indexing and Assignment to Structured arrays

In [71]:
dict1 = np.array([(1,2,3),(4,5,6)], dtype='i8,f4,f8')
dict1[1] = (7,8,9) #assignment from tuples
print(dict1)

[(1, 2., 3.) (7, 8., 9.)]


In [73]:
x = np.zeros(2, dtype='i8,f4,?,S1')
x[:] = 3 #assignment from scalars
print(x)

[(3, 3.,  True, b'3') (3, 3.,  True, b'3')]


In [79]:
n1 = np.zeros(3, dtype=[('a', 'i8'), ('b', 'f4'), ('c', 'S3')])
n2 = np.ones(3, dtype=[('x', 'f4'), ('y', 'S3'), ('z', 'O')])
n2[:] = n1  #assignment from other structured arrays
print(n2)

[(0., b'0.0', b'') (0., b'0.0', b'') (0., b'0.0', b'')]


In [80]:
#Indexing Structured Arrays
x1 = np.array([(1,2),(3,4)], dtype=[('foo', 'i8'), ('bar', 'f4')])
print(x1['foo'])
x1['foo'] = 10
print(x1)

[1 3]
[(10, 2.) (10, 4.)]


Subclassing ndarray

In Subclassing, new instances of ndarray classes can come about in three different ways.

1. Explicit constructor call: This is the usual route to Python instance creation.
2. View casting: casting an existing ndarray as a given subclass
3. New from template: creating a new instance from a template instance. Examples include returning slices from a subclassed array, creating return types from ufuncs, and copying arrays. 


Viewcasting: a mechanism by which we can  return a view of an array of any subclass as another subclass

In [60]:
import numpy as np
# create a completely useless ndarray subclass
class C(np.ndarray): pass
# create a standard ndarray
x1 = np.zeros((3,))
# take a view of it, as our useless subclass
res = x1.view(C)
print(type(res))

<class '__main__.C'>


In [63]:
#we can also create new instances of an ndarray subclass by 
#taking slices of subclassed arrays.
arr = res[1:] #taking slice
print(type(arr))
print(arr is res)  #false

<class '__main__.C'>
False


__new__ and __init__:
__new__ is a standard Python method, and, is called before __init__ when we create a class instance.

In [84]:
class C(object):
    def __new__(cls, *args):
        print('Cls in __new__:', cls)
        print('Args in __new__:', args)
        return object.__new__(cls, *args)

    def __init__(self, *args):
        print('type(self) in __init__:', type(self))
        print('Args in __init__:', args)

#calling the function
obj = C('welcome!')

Cls in __new__: <class '__main__.C'>
Args in __new__: ('welcome!',)


TypeError: object.__new__() takes no arguments

It is mandatory to use __new__ before __init__  because in some cases, we want to be able to return an object of some other class.

__array_finalize__  : It is the mechanism that numpy provides to allow subclasses to handle the various ways that new instances get created.
The subclass instances can come about in these three ways:

1. Explicit constructor call (obj = MySubClass(params)). This will call the usual sequence of MySubClass.__new__ then MySubClass.__init__.
2. View casting
3. Creating new from template


The arguments that __array_finalize__ receives differ for the three methods of instance creation are different from the others

In [85]:
import numpy as np

class C(np.ndarray):
    def __new__(cls, *args, **kwargs):
        print('In __new__ with class %s' % cls)
        return super(C, cls).__new__(cls, *args, **kwargs)

    def __init__(self, *args, **kwargs):
        # in practice you probably will not need or want an __init__
        # method for your subclass
        print('In __init__ with class %s' % self.__class__)

    def __array_finalize__(self, obj):
        print('In array_finalize:')
        print('   self type is %s' % type(self))
        print('   obj type is %s' % type(obj))
        
#check
c = C((10,))

In __new__ with class <class '__main__.C'>
In array_finalize:
   self type is <class '__main__.C'>
   obj type is <class 'NoneType'>
In __init__ with class <class '__main__.C'>


In [86]:
#we have other methods like __array_wrap__, __array_prepare__ , __del__ etc