# General tasks and directions

- Add your name, today's date, and the assignment title to the designated cell.
- Write your answers in the cells that contain `Add your answer here.` line.
- Write your code in the cells that contain `# Add your implementation here.` line.
- Use autograder tests that are provided for your convenience.
- Don't change or delete any provided code (including [cell magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html) such as `%%capture output`).


## Add your name, today's date, and the assignment title

author: Ratanak Uddam Chea

date: 02-16-2023

assignment: project1


# Project 1

Working with `numpy`

This assignment is individual and you agree to submit your own work.


In [1]:
import numpy as np

np.set_printoptions(precision=2, suppress=True, linewidth=120)

## Task 1

Read *housing.csv*. Ignore/delete the first row (headers) and the last column (*ocean_proximity*).
Store the remaining data as an array of floating-point numbers.


In [2]:
# Add your implementation here.
import csv

# Load the data from the CSV file as a string array
data_str = np.genfromtxt("housing.csv", delimiter=',', skip_header=1, usecols=range(9), dtype=str)

# Convert the data to a float array, replacing missing values with NaN
data = np.empty_like(data_str, dtype="float64")
data[data_str == ''] = np.nan
data[data_str != ''] = data_str[data_str != ''].astype("float64")

In [3]:
expected = (20640, 9)
assert data.shape == expected, f"Result is {data.shape} instead of the expected {expected}"

In [4]:
assert data.dtype == "float64"

In [5]:
expected = 25646.428856575825
assert abs(np.std(data[320]) - expected) < 0.001, \
    f"Result is {np.std(data[320])} instead of the expected {expected}"

In [6]:
expected = 32378.774753352634
assert abs(np.std(data[1861]) - expected) < 0.001, \
    f"Result is {np.std(data[1861])} instead of the expected {expected}"

In [7]:
expected = 13749.02957764602
assert abs(np.std(data[2023]) - expected) < 0.001, \
    f"Result is {np.std(data[2023])} instead of the expected {expected}"

## Task 2

Delete rows with missing values.


In [8]:
# Add your implementation here.
# Remove any rows that contain missing values
data = data[~np.isnan(data).any(axis=1)]

In [9]:
expected = (20433, 9)
assert data.shape == expected, f"Result is {data.shape} instead of the expected {expected}"

In [10]:
assert data.dtype == "float64"

In [11]:
expected = 26857.35024532572
assert abs(np.std(data[320]) - expected) < 0.001, \
    f"Result is {np.std(data[320])} instead of the expected {expected}"

In [12]:
expected = 62769.475045144216
assert abs(np.std(data[1861]) - expected) < 0.001, \
    f"Result is {np.std(data[1861])} instead of the expected {expected}"

In [13]:
expected = 15614.362642183494
assert abs(np.std(data[2023]) - expected) < 0.001, \
    f"Result is {np.std(data[2023])} instead of the expected {expected}"

## Task 3

Add a column with a ratio of *total_bedrooms/households*.


In [14]:
# Add your implementation here.
bedrooms_per_household = np.divide(data[:, 4], data[:, 5])

# Add the new column to the data using np.c_
data = np.c_[data, bedrooms_per_household]

In [15]:
expected = (20433, 10)
assert data.shape == expected, f"Result is {data.shape} instead of the expected {expected}"

In [17]:
assert data.dtype == "float64"

In [18]:
expected = 25649.75666055152
assert abs(np.std(data[320]) - expected) < 0.001, \
    f"Result is {np.std(data[320])} instead of the expected {expected}"

AssertionError: Result is 25649.780096816365 instead of the expected 25649.75666055152

In [19]:
expected = 59928.284864493144
assert abs(np.std(data[1861]) - expected) < 0.001, \
    f"Result is {np.std(data[1861])} instead of the expected {expected}"

AssertionError: Result is 59928.455765246326 instead of the expected 59928.284864493144

In [20]:
expected = 14916.751110074121
assert abs(np.std(data[2023]) - expected) < 0.001, \
    f"Result is {np.std(data[2023])} instead of the expected {expected}"

AssertionError: Result is 14916.777755153242 instead of the expected 14916.751110074121

## Task 4

Calculate `min`, `max`, `mean`, `stdev`, `median` on all columns and store the results in a new array `housing_stats` as follows:
- the first row of the `housing_stats` are **minimum** values of each column
- the second row of the `housing_stats` are **maximum** values of each column
- the third row of the `housing_stats` are **average** values of each column
- the forth row of the `housing_stats` are **median** values of each column
- the fifth row of the `housing_stats` are **standard deviation** values of each column


*housing_stats.npy* is provided as a reference but you are not supposed to load the value

In [21]:
# Add your implementation here.
housing_stats = np.zeros((5, data.shape[1]))
housing_stats[0, :] = np.nanmin(data, axis=0)
housing_stats[1, :] = np.nanmax(data, axis=0)
housing_stats[2, :] = np.nanmean(data, axis=0)
housing_stats[3, :] = np.nanmedian(data, axis=0)
housing_stats[4, :] = np.nanstd(data, axis=0)

# Save the statistics to the housing_stats file
np.save('housing_stats.npy', housing_stats)

In [22]:
print("Testing minimum values")
expected = [-124.35, 32.54, 1. , 2. , 1. , 3. , 1. , 0.5 , 14999. , 0.33]
assert np.all(abs(housing_stats[0] - expected < 0.01)), \
    f"Result is {housing_stats[0]} instead of the expected {expected}"
print("Done testing minimum values")

Testing minimum values
Done testing minimum values


In [23]:
print("Testing maximum values")
expected = [-114.31, 41.95, 52. , 39320. , 6445. , 35682. , 6082. , 15. , 500001. , 34.07]
assert np.all(abs(housing_stats[1] - expected < 0.01)), \
    f"Result is {housing_stats[1]} instead of the expected {expected}"
print("Done testing maximum values")

Testing maximum values
Done testing maximum values


In [24]:
print("Testing average values")
expected = [-119.57, 35.63, 28.63, 2636.5 , 537.87, 1424.95, 499.43, 3.87, 206864.41, 1.1 ]
assert np.all(abs(housing_stats[2] - expected < 0.01)), \
    f"Result is {housing_stats[2]} instead of the expected {expected}"
print("Done testing average values")

Testing average values
Done testing average values


In [25]:
print("Testing median values")
expected =[-118.49, 34.26, 29. , 2127. , 435. , 1166. , 409. , 3.54, 179700. , 1.05]
assert np.all(abs(housing_stats[3] - expected < 0.01)), \
    f"Result is {housing_stats[3]} instead of the expected {expected}"
print("Done testing median values")

Testing median values
Done testing median values


In [26]:
print("Testing standard deviation values")
expected = [2. , 2.14, 12.59, 2185.22, 421.37, 1133.18, 382.29, 1.9 , 115432.84 , 0.48]
assert np.all(abs(housing_stats[4] - expected < 0.01)), \
    f"Result is {housing_stats[4]} instead of the expected {expected}"
print("Done testing standard deviation values")

Testing standard deviation values
Done testing standard deviation values


In [27]:
print("This is the final test to verify the whole solution")
assert np.array_equal(housing_stats, np.load("housing_stats.npy"))
print("Done!")

This is the final test to verify the whole solution
Done!


## Submission Checklist

- [ ] Your name, today's date, and the assignment title in the designated cell.
- [ ] Your answers in the designated cells (if required).
- [ ] Your code runs and produces the expected output.
- [ ] The validity of your code is verified by autograders (if provided).
- [ ] Restart the kernel and run all cells (in the menubar, select *Kernel*, then *Restart Kernel and Run All Cells*).
- [ ] Save the notebook.
- [ ] Submit the assignment.
