### <font color="brown">NumPy - Continued</font>

---

In [1]:
import numpy as np

---

#### <font color="brown">Expressing Conditional Logic as Array Operations</font>

**Using a boolean array to conditionally select from one array or another**

In [61]:
xarr = np.array([1,2,3,4,5])
yarr = np.array([6,7,8,9,10])
cond = np.array([True, False, True, True, False])

In [62]:
# want value from xarr if cond is True, otherwise value from yarr
res = [x if c else y
           for x,y,c in zip(xarr, yarr, cond)]
res

[1, 7, 3, 4, 10]

**Using where function to conditionally manipulate items of array**

In [70]:
# above list comprehension is equivalent to this
np.where(cond, xarr, yarr)

array([ 1,  7,  3,  4, 10])

*Either or both of 2nd and 3rd arguments to np.where can be scalars*

In [71]:
arr = np.random.randint(-5,5,(4,4))
print(arr,'\n')

# replace all negative values with -1 and other values with 1
arrx = np.where(arr < 0, -1, 1)
print(arrx)

[[-2  2  3 -4]
 [-5  0  2  2]
 [ 4 -4 -2  4]
 [-1 -4 -4 -3]] 

[[-1  1  1 -1]
 [-1  1  1  1]
 [ 1 -1 -1  1]
 [-1 -1 -1 -1]]


In [72]:
# replace only negative values with -1
arrx = np.where(arr < 0, -1, arr)
print(arrx)

[[-1  2  3 -1]
 [-1  0  2  2]
 [ 4 -1 -1  4]
 [-1 -1 -1 -1]]


---

#### <font color="brown">Boolean arrays</font>
Boolean values are coerced to 1 (True) and 0 (False)

In [73]:
arr = np.array([1,-5,2,3,-4,6])
arr > 0

array([ True, False,  True,  True, False,  True])

In [74]:
(arr > 0).sum()    # number of True values 

4

In [75]:
print((arr > 0).any())
print((arr > 0).all())

True
False


In [76]:
arr = np.array([0,1,-5,2,9,0,3,-4,6])
# any and all also work with non-boolean arrays
# where non-zero values evaluate to True
print(arr.any())   
print(arr.all())

True
False


In [77]:
arr = np.array([1]*5)
print(arr)
print(arr.all())

[1 1 1 1 1]
True


---

#### <font color="brown">Linear Algebra</font>

In [78]:
narr = np.arange(0,32).reshape(4,8)
narr

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31]])

**Matrix Transpose**

In [79]:
narr.T  # transpose, rows become columns and vice versa

array([[ 0,  8, 16, 24],
       [ 1,  9, 17, 25],
       [ 2, 10, 18, 26],
       [ 3, 11, 19, 27],
       [ 4, 12, 20, 28],
       [ 5, 13, 21, 29],
       [ 6, 14, 22, 30],
       [ 7, 15, 23, 31]])

In [80]:
narr.transpose()  # alternatively

array([[ 0,  8, 16, 24],
       [ 1,  9, 17, 25],
       [ 2, 10, 18, 26],
       [ 3, 11, 19, 27],
       [ 4, 12, 20, 28],
       [ 5, 13, 21, 29],
       [ 6, 14, 22, 30],
       [ 7, 15, 23, 31]])

In [81]:
narr  # does not change original array

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31]])

**Matrix Multiplication**

In [84]:
mat1 = np.arange(1,7).reshape(2,3)
mat1

array([[1, 2, 3],
       [4, 5, 6]])

In [85]:
mat2 = np.arange(1,7).reshape(3,2)
mat2

array([[1, 2],
       [3, 4],
       [5, 6]])

In [86]:
np.dot(mat1,mat2)  # matrix multiply mat1 with mat2

array([[22, 28],
       [49, 64]])

In [87]:
mat2, mat1

(array([[1, 2],
        [3, 4],
        [5, 6]]),
 array([[1, 2, 3],
        [4, 5, 6]]))

In [88]:
np.dot(mat2,mat1)  # matrix multiply mat2 with mat1

array([[ 9, 12, 15],
       [19, 26, 33],
       [29, 40, 51]])

In [89]:
mat3 = np.array([-1,1,3,2,2,4]).reshape(2,3)
mat3

array([[-1,  1,  3],
       [ 2,  2,  4]])

In [90]:
mat1.T

array([[1, 4],
       [2, 5],
       [3, 6]])

In [91]:
np.dot(mat1.T,mat3)  # transpose mat1, then matrix multiply with mat3

array([[ 7,  9, 19],
       [ 8, 12, 26],
       [ 9, 15, 33]])

In [92]:
np.dot(np.array([1,2,3]), np.array([1,2,3]))

14

---

---

### <font color="brown">Pandas</font>

#### Pandas has two key data strucutures: Series and DataFrame

In [24]:
import pandas as pd
from pandas import Series

---

#### <font color="brown">Series is a 1D array like object containing an array of data (of any NumPy datatype),<br> and an associated array of data labels called *index*</font>

In [25]:
ser = Series([1, 5, -2, 16])
ser

0     1
1     5
2    -2
3    16
dtype: int64

In [26]:
print('values = ',ser.values)
print('index = ',ser.index)

values =  [ 1  5 -2 16]
index =  RangeIndex(start=0, stop=4, step=1)


**Both values and index have data types**

In [27]:
print(ser.values.dtype)
print(ser.index.dtype)

int64
int64


In [28]:
ser = Series([1, 5, -2, 16], index=range(0,4))
ser

0     1
1     5
2    -2
3    16
dtype: int64

In [29]:
print(ser.values.dtype)
print(ser.index.dtype)

int64
int64


**Another typical index is string labels**

In [30]:
ser = Series([1, 5, -2, 16], index=['a','b','x','d'])
ser

a     1
b     5
x    -2
d    16
dtype: int64

In [31]:
print(ser.index.dtype)   # Note the index type, it's not string

object


In [32]:
ser['x']  # can use index label subscripts to access and assign values, like a dictionary

-2

In [33]:
ser['a'] = 10
ser

a    10
b     5
x    -2
d    16
dtype: int64

In [34]:
ser[['x','a','b']]  # like NumPy syntax to access rows

x    -2
a    10
b     5
dtype: int64

**NumPy like array operations work as before, index tags along**

In [35]:
import numpy as np

res = ser[ser > 0]
print(res, '\n')
res = ser * 2
print(res, '\n')
res = np.power(ser,2)
print(res, '\n')
ser = ser ** 2
print(ser, '\n')

a    10
b     5
d    16
dtype: int64 

a    20
b    10
x    -4
d    32
dtype: int64 

a    100
b     25
x      4
d    256
dtype: int64 

a    100
b     25
x      4
d    256
dtype: int64 



---

#### Series is like an ordered dictionary

**Can do membership on index (like key membership in dictionary)**

In [36]:
'x' in ser

True

**Can create a Series out of a Python dictionary**

In [37]:
udict = {'Rutgers': 55000, 'Princeton': 15000, 'MIT': 20000, 'USC': 40000}
useries = Series(udict)
print(useries)
print(useries.index)

Rutgers      55000
Princeton    15000
MIT          20000
USC          40000
dtype: int64
Index(['Rutgers', 'Princeton', 'MIT', 'USC'], dtype='object')


In [38]:
# make a new series out of a subset of useries plus another univ name only
univs = ['Purdue','Rutgers','MIT','USC']
useries2 = Series(udict, index=univs)
useries2

Purdue         NaN
Rutgers    55000.0
MIT        20000.0
USC        40000.0
dtype: float64

**In the above, indexes common to the argument index (univs) and the udict index are kept with their udict values But for any index in univs that is not in udict, the value is NaN.<br>
Also note that that dtype has changed from int to float because of the NaN**

In [39]:
# what if dictionary has list values
adict = {"one": [1,2,3,4], "two": [4,5,6]}
aser = Series(adict)
aser

one    [1, 2, 3, 4]
two       [4, 5, 6]
dtype: object

---

##### **Checking for null/not null values**

In [40]:
useries2.isnull()  # NaN is equivalent to null

Purdue      True
Rutgers    False
MIT        False
USC        False
dtype: bool

In [41]:
user2nas = useries2.isnull()
print(user2nas.values)
print(user2nas.index)

[ True False False False]
Index(['Purdue', 'Rutgers', 'MIT', 'USC'], dtype='object')


In [42]:
useries2.notnull()

Purdue     False
Rutgers     True
MIT         True
USC         True
dtype: bool

---

##### **Naming the Series, and the index**

In [43]:
useries

Rutgers      55000
Princeton    15000
MIT          20000
USC          40000
dtype: int64

In [44]:
useries.name = "student population"
useries.index.name = "university"
useries

university
Rutgers      55000
Princeton    15000
MIT          20000
USC          40000
Name: student population, dtype: int64

---

**Auto alignment of differently indexed datax**

In [45]:
# if an index appears in one and not the other, result here is NaN
useries + useries2

MIT           40000.0
Princeton         NaN
Purdue            NaN
Rutgers      110000.0
USC           80000.0
dtype: float64

---

**Changing the index**

In [46]:
# can change index at any time
print('Original index: ',useries.index)
useries.index = ['RU','Princeton U','MIT','USC']
print('\nUpdated index: ',useries.index)

Original index:  Index(['Rutgers', 'Princeton', 'MIT', 'USC'], dtype='object', name='university')

Updated index:  Index(['RU', 'Princeton U', 'MIT', 'USC'], dtype='object')


---

**Dropping NaNs**

In [47]:
dat = Series([1, np.nan, 2.6, np.nan, 6])
print(dat)

0    1.0
1    NaN
2    2.6
3    NaN
4    6.0
dtype: float64


Alternatively, you can use an alias for np.nan (popular), as follows

In [48]:
from numpy import nan as NA

dat = Series([1, NA, 2.6, NA, 6])
print(dat)

0    1.0
1    NaN
2    2.6
3    NaN
4    6.0
dtype: float64


In [49]:
# drop NAs
dat.dropna()

0    1.0
2    2.6
4    6.0
dtype: float64

In [50]:
dat  # dropna returns a new Series, original is unchanged

0    1.0
1    NaN
2    2.6
3    NaN
4    6.0
dtype: float64

In [51]:
# alternatively, you can filter with notnull
dat[dat.notnull()]

0    1.0
2    2.6
4    6.0
dtype: float64

In [52]:
dat  # does not change the original, either

0    1.0
1    NaN
2    2.6
3    NaN
4    6.0
dtype: float64

In [53]:
dat1 = dat.copy()
dat1

0    1.0
1    NaN
2    2.6
3    NaN
4    6.0
dtype: float64

In [54]:
dat1.dropna(inplace=True)  # update in place
dat1

0    1.0
2    2.6
4    6.0
dtype: float64

In [55]:
# generally, you would simply assign back to original series
dat1 = dat.copy()
dat1

0    1.0
1    NaN
2    2.6
3    NaN
4    6.0
dtype: float64

In [56]:
dat1 = dat1[dat1.notnull()]
dat1

0    1.0
2    2.6
4    6.0
dtype: float64

---

**Resetting index**

In [57]:
useries

RU             55000
Princeton U    15000
MIT            20000
USC            40000
Name: student population, dtype: int64

In [58]:
# can reset index to numbers
useries = useries.reset_index()  # index becomes a column
useries

Unnamed: 0,index,student population
0,RU,55000
1,Princeton U,15000
2,MIT,20000
3,USC,40000


In [59]:
type(useries)    # changes into a DataFrame

pandas.core.frame.DataFrame

In [60]:
# change column name
useries.columns = ['Univ','Student Population']
useries

Unnamed: 0,Univ,Student Population
0,RU,55000
1,Princeton U,15000
2,MIT,20000
3,USC,40000
