### Introduction to Ndarrays

The core data structure in NumPy is the __ndarray__ or __n-dimensional array__. In programming, __array__ describes a collection of elements, similar to a list. The word __n-dimensional__ refers to the fact that ndarrays can have one or more dimensions. We'll start by working with one-dimensional (1D) ndarrays.

__Syntax:__

_import numpy as np_

We can directly convert a list to an ndarray using the _numpy.array()_ constructor. To create a 1D ndarray, we can pass in a single list

In [1]:
# Importing the library
import numpy as np

# Create a NumPy array
data_ndarray = np.array([10, 20, 30])

### Understanding Vectorization
Example: Read 8 rows of data (two cols) and calculate the sum

Using regular Python code with list of lists and for loops, our computer would take eight processor cycles to process the eight rows of our data.

The NumPy library takes advantage of a processor feature called __Single Instruction Multiple Data (SIMD)__ to process data faster. SIMD allows a processor to perform the same operation, on multiple data points, in a single processor cycle

As a result, the NumPy version of our code would only take two processor cycles — a four times speed-up! This concept of replacing for loops with operations applied to multiple data points at once is called __vectorization__ and ndarrays make vectorization possible.

### NYC Tax-Airport Data

source: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page

For this project, we'll only work with a subset of this data - approximately 90,000 yellow taxi trips to and from New York City airports between January and June 2016. Below is information about selected columns from the data set:

- _pickup_year_: The year of the trip.
- _pickup_month_: The month of the trip (January is 1, December is 12).
- _pickup_day_: The day of the month of the trip.
- _pickup_location_code_: The airport or borough where the trip started.
- _dropoff_location_code_: The airport or borough where the trip finished.
- _trip_distance_: The distance of the trip in miles.
- _trip_length_: The length of the trip in seconds.
- _fare_amount_: The base fare of the trip, in dollars.
- _total_amount_: The total amount charged to the passenger, including all fees, tolls and tips.

To convert the data set into a 2D ndarray, we'll first use Python's built-in __csv__ module to import our CSV as a "list of lists". Then, we'll convert the list of lists to an ndarray. We'll again use the _numpy.array()_ constructor, but to create a 2D ndarray, we'll pass in our list of lists instead of a single list

In [2]:
# Library imports
import csv
import numpy as np

# read dataset into list of lists
input_file = open('data/nyc_taxis.csv')
read_file = csv.reader(input_file)
taxi_list = list(read_file)

# remove the header row
taxi_list = taxi_list[1:]

# convert all values to float
converted_taxi_list = []
for row in taxi_list:
    converted_row = []
    for item in row:
        converted_row.append(float(item))
    converted_taxi_list.append(converted_row)
    
# Using NumPy araay constructor
taxi = np.array(converted_taxi_list)

### Array Shapes

It's often useful to know the number of rows and columns in an ndarray. When we can't easily print the entire ndarray, we can use the ndarray.shape attribute instead.

The data type returned is called a tuple. Tuples are very similar to Python lists, but can't be modified.

The output gives us a few important pieces of information:
- The first number tells us num of rows in data_ndarray.
- The second number tells us num of cols 3 columns in data_ndarray.

In [3]:
# Assign the array shape to a new variable
taxi_shape = taxi.shape
taxi_shape

(89560, 15)

### Selecting and Slicing Rows and Items from ndarrays

For any 2D array, the full syntax for selecting data is:

__ndarray[row_index,column_index]__
###### # or if you want to select all
###### # columns for a given set of rows
__ndarray[row_index]__

Where row_index defines the location along the row axis and column_index defines the location along the column axis.

#### Selection:
- With a list of lists, we use two separate pairs of square brackets back-to-back.
- With a NumPy ndarray, we use a single pair of brackets with comma-separated row and column locations.

In [4]:
#Select the row at index 0
row_0 = taxi[0]

# Select every column for the rows at indexes 391 to 500 inclusive.
rows_391_to_500 = taxi[391:501]

#Select the item at row index 21 and column index 5
row_21_column_5 = taxi[21, 5]

### Selecting Columns and Custom Slicing ndarrays

In [5]:
# Select every row for the columns at indexes 1, 4, and 7
cols = [1, 4, 7]
columns_1_4_7 = taxi[:,cols]

# Select the columns at indexes 5 to 8 inclusive for the row at index 99
row_99_columns_5_to_8 = taxi[99, 5:9]

# Select the rows at indexes 100 to 200 inclusive for the column at index 14
rows_100_to_200_column_14 = taxi[100:201, 14]

### Vector Math

NumPy ndarrays allow us to select data much more easily. Beyond this, the selection we make is a lot faster when working with __vectorized operations__ because the operations are applied to multiple data points at once.

The result of adding two 1D ndarrays is a 1D ndarray of the same shape (or dimensions) as the original.
- In this context, ndarrays can also be called __vectors__, a term taken from a branch of mathematics called linear algebra.
- What we just did, adding two vectors together, is called __vector addition__.

In [6]:
fare_amount = taxi[:, 9]
fees_amount = taxi[:, 10]

fare_and_fees = fare_amount + fees_amount

Let's use the columns _trip_distance_ & _trip_length_ to calculate the average travel speed of each trip in miles per hour. The formula for calculating miles per hour is:

miles per hour (m.p.h) = distance in miles / length in hours