# [Introduction to numpy](https://app.dataquest.io/m/506/introduction-to-numpy/1/introduction)

In [24]:
import numpy as np
import pandas as pd

We've learned that a computer can allocate memory in two ways. 
- allocate a single location or 
- a contiguous range of memory locations --A contiguous range of memory locations is called an array. 

##  Array and list

- ndarray as a list with a fixed length, that is, a list without the list.append() method.
- the way ndarrays are printed is slightly different than lists- the values are not separated by commas.
- we can use the len() function to get the length of the ndarray.

In [3]:
x=np.array([10,20,30])
print(x)

[10 20 30]


In [4]:
print(len(x))

3


###  Accessing values
- scalar
- [start:end]
- [start:step:end]
- negative indexing

In [5]:
x = np.array([10, 20, 30])
x0=x[0]
x2=x[2]
x[1]=42

In [6]:
x = np.array([9, 1, 5, 6, 2, 0, 4, 3, 8, 7])
first_half=x[:5]
last_8=x[2:]
middle=x[1:-1]

In [7]:
x = np.array([9, 1, 5, 6, 2, 0, 4, 3, 8, 7])
even=x[::2]
odd=x[1::2]
mul_3=x[0::3]
every_2=x[3:9:2]

![](https://dq-content.s3.amazonaws.com/506/7.2-m506.svg)

In [10]:
x = np.array([9, 1, 5, 6, 2, 0, 4, 3, 8, 7])

second_to_last=x[-2]
reversed_x=x[::-1]
first_5_reversed=x[4::-1]
last_5_reversed=x[:4:-1]


###  Copying an ndarray

When we **slice an ndarray**, NumPy **does not copy** the data from the ndarray into a new one. 

Instead, it **creates a view** of that ndarray allowing the user to manipulate the elements from that slice.

In [9]:
x = np.array([1, 0, 0, 0, 1])
y=x[1:-1]
z=y.copy()
z[0]=9
print(x,y,z)
y[1]=7       ## both x and y will change
print(x,y,z) 

[1 0 0 0 1] [0 0 0] [9 0 0]
[1 0 7 0 1] [0 7 0] [9 0 0]


###  List of lists - ndarray
- access one element (**scalar**) --array2d[row_index, col_index]
- array2d[row_start:row_end:row_step, col_start:col_end:col_step]
![](https://dq-content.s3.amazonaws.com/506/9.2-m506.svg)
- **indexing - 1_D or 2_D?**
  - **a single index** --1-dimensional array
  - **a range with a single index**  -- 2-dimensional array with a single row or column.
  
  <font color='red'>To get 2_D, both row and column index should be list

In [12]:
my_2d_array=np.array([[1,4,7],[2,5,8],[3,6,9]])
my_2d_array[1,1]=42
print(my_2d_array)

[[ 1  4  7]
 [ 2 42  8]
 [ 3  6  9]]


In [15]:
print(type(my_2d_array[1,1]))

<class 'numpy.int32'>


In [16]:
array2d = np.array([
    [1, 2, 3, 4, 5],
    [6, 7, 8, 9, 10],
    [11, 12, 13, 14, 15]
])
x=array2d[:,3:5]
y=array2d[::2,:]
z=array2d[::2,::2]

In [51]:
people_data = np.array([
    [27, 67, 1.65],
    [35, 81, 1.84],
    [29, 55, 1.60],
    [41, 73, 1.79]
])

anna_row=people_data[2]
bob_age_height=people_data[3,[0,2]]
ages_col=people_data[:,0]
weight_dexter_bob=people_data[[1,3],1]

In [53]:
print(anna_row)
print(bob_age_height)

[29.  55.   1.6]
[41.    1.79]


### 1_D or 2_D?

In [39]:
array2d[0]  ##= array2d[0,:]

array([1, 2, 3, 4, 5])

In [54]:
print(array2d[[0]])

[[1 3 3 3 0]]


In [40]:
array2d[[0,1]]

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [45]:
array2d[:,0]

array([ 1,  6, 11])

In [43]:
array2d[:,[0]]

array([[ 1],
       [ 6],
       [11]])

In [31]:
df=pd.DataFrame(array2d,columns=list('abcde'))

In [32]:
df

Unnamed: 0,a,b,c,d,e
0,1,2,3,4,5
1,6,7,8,9,10
2,11,12,13,14,15


In [34]:
df.iloc[0]

a    1
b    2
c    3
d    4
e    5
Name: 0, dtype: int32

In [35]:
df.iloc[[0]]

Unnamed: 0,a,b,c,d,e
0,1,2,3,4,5


In [36]:
type(df.iloc[[0]])

pandas.core.frame.DataFrame

In [41]:
df.iloc[[0,1]]

Unnamed: 0,a,b,c,d,e
0,1,2,3,4,5
1,6,7,8,9,10


In [46]:
df.iloc[:,0]

0     1
1     6
2    11
Name: a, dtype: int32

In [47]:
df.iloc[:,[0]]

Unnamed: 0,a
0,1
1,6
2,11


In [49]:
df['a']

0     1
1     6
2    11
Name: a, dtype: int32

In [50]:
array2d = np.array([
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
])

array2d[:,0:2]=1
array2d[2:4,:]=2
array2d[0:3,1:4]=3

## Arithmetic with numpy arrays -- element-wise

### Element-wise

In [55]:
def add_list_values(list1,list2):
    sumlist=[i1+i2 for i1,i2 in zip(list1,list2)]
    return sumlist

In [56]:
import time
import random
random.seed(0)

# Generate test lists
list1 = [random.randint(0, 1000) for _ in range(100000)]
list2 = [random.randint(0, 1000) for _ in range(100000)]

# Measure the execution time of adding lists
start = time.time()
add_list_values(list1, list2)
end = time.time()
time_list = end - start

x1=np.array(list1)
x2=np.array(list2)

start=time.time()
x3=x1+x2
end=time.time()
time_array=end-start

ratio=time_list/time_array
print(ratio)

0.8327539283656807


![](https://dq-content.s3.amazonaws.com/507/6.1-m507.svg)

In [58]:
scores = np.array([
    [46, 74, 52, 81],
    [75, 45, 67, 53],
    [67, 80, 73, 63],
    [59, 94, 43, 78]
])
scores_day1=scores[:,:2]
scores_day2=scores[:,2:]
shape1=scores_day1.shape
shape2=scores_day2.shape
print(shape1,shape2)

total_scores=scores_day1+scores_day2

(4, 2) (4, 2)


### Axis- Minimum and Maximun

- built-in function min() and max()work much slower than using the corresponding functions provided by NumPy.
- array.min() is perferred than np.min(array)
- Axis .min()/.max()/.sum()
![](https://dq-content.s3.amazonaws.com/507/8.1-m507.svg)
![](https://dq-content.s3.amazonaws.com/507/9.1-m507.svg)

In [60]:
total_scores = np.array([
 [ 98, 155],
 [142,  98],
 [140, 143],
 [102, 172]
])

scores_game1=total_scores[:,0]
scores_game2=total_scores[:,1]
min_game1=scores_game1.min()
max_game1=scores_game1.max()

min_game2=scores_game2.min()
max_game2=scores_game2.max()

In [62]:
total_scores = np.array([
 [ 98, 155],
 [142,  98],
 [140, 143],
 [102, 172]
])

max_game_scores=total_scores.max(axis=0)
min_game_scores=total_scores.min(axis=0)

max_people_scores=total_scores.max(axis=1)
min_people_scores=total_scores.min(axis=1)

total_people_score=total_scores.sum(axis=1)

##  Broadcasting NumPy Arrays

- ndarrays and values
- 1-dimensional arrays and 2-dimensional arrays
- change the shape of an ndarray

###  Same shape

In [63]:
print(np.ones((4,)))  # 1-D array with 4 ones
print(np.ones((2, 3)))# 2 by 3 ndarray of ones

[1. 1. 1. 1.]
[[1. 1. 1.]
 [1. 1. 1.]]


In [66]:
x = np.array([
 [7., 9., 2., 2.],
 [3., 2., 6., 4.],
 [5., 6., 5., 7.]
])
ones=np.ones((x.shape[0],x.shape[1]))

In [67]:
x=x-ones
print(x)

[[6. 8. 1. 1.]
 [2. 1. 5. 3.]
 [4. 5. 4. 6.]]


###  Broadcasting with a single value

- it tries to match the shapes of two ndarrays that we are trying to operate. 
- stretching them until their shapes match
- ![](https://dq-content.s3.amazonaws.com/508/3.1-m508.svg)

In [69]:
x = np.array([3, 2, 4, 5])

In [71]:
r=1/x
r

array([0.33333333, 0.5       , 0.25      , 0.2       ])

###  Broadcasting Horizontally
![](https://dq-content.s3.amazonaws.com/508/4.2-m508.svg)

In [72]:
x = np.array([
    [4, 2, 1, 5],
    [6, 7, 3, 8]
])
y = np.array([
    [1],
    [2]
])

In [73]:
z=x+y

In [75]:
print(z)

[[ 5  3  2  6]
 [ 8  9  5 10]]


###  Broadcasting Vertically
![](https://dq-content.s3.amazonaws.com/508/5.2-m508.svg)

In [76]:
x = np.array([
    [4, 2, 1, 5],
    [6, 7, 3, 8]
])
y=np.array([1,2,3,4])

In [77]:
z=x+y

In [78]:
z

array([[ 5,  4,  4,  9],
       [ 7,  9,  6, 12]])

### Broadcasting on both
![](https://dq-content.s3.amazonaws.com/508/6.2-m508.svg)
![](https://dq-content.s3.amazonaws.com/508/6.3-m508.svg)
- x is a 2-dimensional array with a single column
- y is a 1-dimensional array

In [80]:
x=np.array([1,2,3])
x=x[:,np.newaxis]
y=np.array([1,2,3])
z=x+y
print(z)

[[2 3 4]
 [3 4 5]
 [4 5 6]]


In [81]:
x=np.array([1,2,3])
x=x[:,np.newaxis]
y=np.array([[1,2,3]])
z=x+y
print(z)

[[2 3 4]
 [3 4 5]
 [4 5 6]]


### Reshaping 
- array.reshape() **tuple or scalar**

![](https://dq-content.s3.amazonaws.com/508/8.1-m508.svg)

**add the values of y to the two columns of x**
![](https://dq-content.s3.amazonaws.com/508/8.2-m508.svg)

In [3]:
x = np.array([
    [1, 2],
    [3, 4],
    [5, 6]
])
y = np.array([1, 2, 3])

y_as_col = y.reshape((3, 1))
print(x + y_as_col)

[[2 3]
 [5 6]
 [8 9]]


In [4]:
print(x.reshape(6))

[1 2 3 4 5 6]


In [6]:
print(x.reshape(1,6))

[[1 2 3 4 5 6]]


In [8]:
dice1=np.arange(1,7)
dice2=dice1.reshape((6,1))
dice_sums=dice1+dice2

What's under the hood  
**values are copied row by row**
![](https://dq-content.s3.amazonaws.com/508/9.1-m508-min.gif)
![](https://dq-content.s3.amazonaws.com/508/9.2-m508-min.gif)

**order parameter to 'F'** -Fortran programming language
![](https://dq-content.s3.amazonaws.com/508/9.3-m508-min.gif)

In [10]:
cell_numbers=np.arange(1,37)
numbering_by_row=cell_numbers.reshape((6,6))
numbering_by_row=cell_numbers.reshape((6,6),order='F')

SyntaxError: invalid syntax (<ipython-input-1-7cfac6cc459b>, line 1)

## Broadcasting DataFrame