## Intro to Python for Data Science #3


### Packages

Packages are a directory of Python scripts. Each script is called a module. Modules specify functions, methods, and new Python types. There are thousands of packages available on the Internet, some of them geared towards data science.

- Numpy: efficiently work with arrays
- Matplotlib: data visualization
- Scikit-learn: machine learning


To use packages, you have to install them on your system, and then specify in your script that you want to use the package.

You can also use one specific part of a package.


In [40]:
# array([]) does not work with numpy

import numpy

array([1, 2, 3])

array([1, 2, 3])

In [41]:
numpy.array([1, 2, 3])

array([1, 2, 3])

In [42]:
import numpy as np
np.array([1, 2, 3])

array([1, 2, 3])

The from import version to import specific parts of the package can be useful to limit the amount of coding, but you are also losing some of the context. If you are working in a long Python script, and import your array function from numpy at the top, if your array function isn’t in the script until later in the code, a reader might get confused. In this case, a more standard import numpy would be preferred.

In [43]:
from numpy import array
array([1, 2, 3])

array([1, 2, 3])

In [44]:
import math

radius = r
r = 3

C = 2 * math.pi * r

A = math.pi * r**2

print('Circumference: ' + str(C))
print('Area: ' + str(A))

Circumference: 18.84955592153876
Area: 28.274333882308138


There are several ways to import packages and modules into Python. Depending on the import call, you’ll have to use different Python code. If you want to use the function inv( ), which is in the linalg subpackage of the scipy package, you want to be able to use this function

In [45]:
from scipy.linalg import inv as my_inv

my_inv([[1, 2], [3, 4]])

array([[-2. ,  1. ],
       [ 1.5, -0.5]])

### NumPy

Numpy provides an alternative to Python lists - NumPy arrays. Numpy is great for doing vector arithmetic.

#### Arrays

Python doesn’t know how to do calculations on lists, so numpy is useful because it treats arrays like single values. The array is very similar to lists, however you can perform calculations over entire arrays.

In [46]:
fam = ['liz', 1.73, 68.2, 'emma', 1.68, 70.8, \
       'mom', 1.71, 79.8, 'dad', 1.89, 90.6, 'me', 1.79, 77.0]

height = fam[1::3]
print(height)

weight = fam[2::3]
print(weight)

[1.73, 1.68, 1.71, 1.89, 1.79]
[68.2, 70.8, 79.8, 90.6, 77.0]


In [47]:
import numpy as np

np_height = np.array(height)
np_weight = np.array(weight)

bmi = np_weight / (np_height ** 2)

print(np_height)
print(np_weight)

print(bmi)

[ 1.73  1.68  1.71  1.89  1.79]
[ 68.2  70.8  79.8  90.6  77. ]
[ 22.78726319  25.08503401  27.29044834  25.36323171  24.03170937]



Numpy can do all this easily because it assumes that your array can only contain values of a single type. Either an array of floats, booleans, etc. 

If you try to build a list with different types, some of the elements’ types are changed to end up with a homogenous list. This is known as *type coercion*. 

Since a numpy array is just a different Python type, it comes with its own methods. Different types, different behavior! The typical arithmetic operators such as +, -, *, and / have a different meaning for regular Python lists and numpy arrays.


In [48]:
np.array([1.0, "hi", True])

array(['1.0', 'hi', 'True'],
      dtype='<U32')

In [49]:
python_list = [1, 2, 3]

numpy_array = np.array([1, 2, 3])

print(python_list + python_list)
print(numpy_array + numpy_array)

[1, 2, 3, 1, 2, 3]
[2 4 6]


#### NumPy Subsetting

You can use square brackets to get elements from your array.

You can also use an array of booleans to do subsetting.
