<img src="banner.jpg">

# Intro to Data Analysis with Pandas & Numpy
---
by: John Muchovej \([@ionlights](github.com/ionlights/)\), on 09 Sep 2018

---

## What's NumPy?

> NumPy is the fundamental package for scientific computing with Python. It contains among other things:
> 
> - a powerful N-dimensional array object
> - sophisticated (broadcasting) functions
> - tools for integrating C/C++ and Fortran code
> - useful linear algebra, Fourier transform, and random number capabilities
>
> Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

That's according to [numpy.org](http://numpy.org), the official project site of NumPy. 

### So what's that actually mean, for us?
You'll see more, tonight, but effectively, `numpy` is library that allows us to work with linear algebra, be lazy, and perform array operations on a much larger (and more efficient) scale than Python's `list` allows for! :D

#### Let's import `numpy`, so we can bask in its glory

In [1]:
import numpy as np

### Why do arrays matter?

As we'll see throughout the semester, arrays (of various degrees) are crucial to almost everything we can accomplish in machine learning, whether in research or industry.

First, though, we'll go through some convenience factors on why `numpy` should be used instead of Python's `list`.

In [7]:
rows = 20000
cols = 40000

In [9]:
%%timeit -n 100 -r 10
np.zeros((rows, cols))

21.1 µs ± 5.9 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)


In [None]:
%%timeit -n 100 -r 10
[[0 for _ in range(cols)] for _ in range(rows)]

In [None]:
from IPython.display import Markdown
Markdown(f"As you can see, just to generate a matrix of ({rows}, {cols}) is significantly faster using `numpy`")

We can also use commands like `np.ones` and `np.full` to generate these sorts of matrices with `1` or `<custom-value>` &ndash; which makes creation of arrays not only convenient, but also low-cost operations.