# Learning Python with NYC housing data

Let's see how this goes.

## What is this thing?

Right now you're using some software based on [Jupyter Notebook](https://jupyter.org/), an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and the narrative text you're reading right now.

It's pretty awesome, but it's not the only way to learn and use Python. If you want to hack on [nycdb](https://github.com/aepyornis/nyc-db), for instance, you'll want to learn how to use your computer's [command-line interface](https://tutorial.djangogirls.org/en/intro_to_command_line/), install [Python](https://tutorial.djangogirls.org/en/python_installation/) and a [code editor](https://tutorial.djangogirls.org/en/code_editor/), and learn [git](https://try.github.io/).  That's a whole lot to learn for a two-hour workshop, though, so we're going to short-circuit a bunch of it by using Jupyter. Just keep in mind that this isn't the _only_ way to program in Python!

## What is Python?

Python is a programming language that's used for almost everything, from building websites like Instagram, to scientific computing, machine learning, and [automating boring stuff](https://automatetheboringstuff.com/).

It's also known for its readability. For instance, even though you might not know how to write it yet, you might be able to guess what the following code does:

In [0]:
x = 5

x = x * 2

print(x)


What do you think the above code does?  You can try running it by clicking on the code and pressing `Ctrl` + `Enter` (or `Cmd`  + `Enter` if you're on a Mac).

In Jupyter Notebook, the above code is called a **code cell**.

### Things to try

Try doing some of the following with the code cell above:

1. Try deleting the first line. What do you think will happen if you re-run the cell?  Try it and find out!
2. Try clicking right after the opening parenthesis in `print(x)` and press `Tab`.
3. Try changing the word `print` to `lolprint`.  What do you think will happen if you re-run the cell?  Try it out!

### Things to remember

I'm going to talk about this during the workshop, but by the end of my blabbering you should hopefully understand that:

* In the above code cell, `x` is a **variable** and `print` is a **function**.
* The **state** of your program--the values of all its variables and some other things--is stored in Jupyter's **runtime**.  If your program ever gets into a weird state, you can always restart the runtime by going to the "Runtime" menu and choosing "Restart runtime...".
* It is perfectly normal to have your program crash. It's how you learn!
* Python is cool.

If you're using this notebook in Google's Jupter Notebook-based environment, Google Colaboratory, you can learn more about its features at [Google's Colaboratory overview](https://colab.research.google.com/notebooks/basic_features_overview.ipynb).

## When things take too long

When anyone is coding--it doesn't matter if they're a beginner or an expert--they're eventually going to write something that takes *way* too long to run. It's important to understand how to tell the computer to stop running it.

Try running this, and see if you can figure out how to stop it:

In [0]:
x = 0

while True:
  x = x + 1

## Writing a function

You've seen Python's built-in `print` function. It's also possible to write your own functions. Let's try it out.

A **Borough, Block, and Lot (BBL)** number is a set of three numbers that a lot of NYC agencies use to track building information.  For example, the BBL for 150 Court Street in Brooklyn is `3-292-26`.

The first number identifies the borough a building is in.  Given the information from the [Wikipedia page for BBL](https://en.wikipedia.org/wiki/Borough,_Block_and_Lot), can you finish the following function that returns the name of a borough, given its borough number?

In [0]:
def get_borough_name(num):
  "Given a borough number, return its name."

  if num == 3:
    return 'Brooklyn'
  # Write your code here!
  raise ValueError(f"{num} does not correspond to a NYC borough")


# Let's try out the function here. 
print(get_borough_name(3))

### Things to try

* Try removing two spaces of indentation from the first `return` statement. What happens?
* There's more than one way to write the `get_borough_name` function. Another way to implement it might involve [Python dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries), known as "dicts" for short. You could try implementing the same function using dicts.
* What happens when you pass a number to `get_borough_name` that doesn't correspond to a borough number?
* Try clicking right after the opening parenthesis in `get_borough_name(3)` and press `Tab`. Where does that information come from?

### Things to remember

* Indentation in Python has *semantic meaning*, which is unlike a lot of other programming languages, which frequently use syntax like curly braces to communicate the same kind of information to the computer. 
* An [exception](https://docs.python.org/3/tutorial/errors.html) is how errors are propagated in Python code. It's considered a best practice to `raise` them with an informative message when your code encounters an error. It's also possible to "catch" exceptions in a [`try...except` clause](https://docs.python.org/3/tutorial/errors.html#handling-exceptions), thereby adding [fault tolerance](https://en.wikipedia.org/wiki/Fault_tolerance) to your program.
* The documentation embedded in the first part of a function's source code is called its docstring.

## Challenge: create a padded BBL

BBLs can be represented in a variety of ways. Sometimes they're presented in a 10-digit padded format, where the block number consists of five digits left-padded by leading zeroes and the lot number consists of four similar digits. For example, the padded BBL representation of 150 Court Street is `3002920026`.

While BBLs are frequently mentioned in NYC data sources, they're not always represented in the same way, so you might end up needing to translate between them. Can you write a function that, given the borough, block, and lot number as distinct arguments, uses Python's [`zfill`](https://docs.python.org/3/library/stdtypes.html#str.zfill) to return the BBL's padded representation? You may need to use [`str`](https://docs.python.org/3/library/stdtypes.html#str) to convert the arguments from integers into strings.

In [0]:
def to_padded_bbl(borough, block, lot):
  "Convert the given NYC BBL to a padded BBL."

  # Write your code here, and change the return statement below.
  return '???'


# Let's try out the function here.
print(to_padded_bbl(3, 292, 26))

## Challenge: parse a padded BBL

Let's try going the other direction: write a function that, given a padded BBL, returns a [tuple](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences) consisting of its borough, block, and lot numbers.

You will probably want to use Python's _slicing_ functionality (search for the word "slicing" in [An informal introduction to Python](https://docs.python.org/3/tutorial/introduction.html)) to divide the string into parts and [`int`](https://docs.python.org/3/library/functions.html#int) to convert the parts into integers.

In [0]:
def from_padded_bbl(bbl):
  """
  Parse the given padded NYC BBL and return a 3-tuple
  containing its borough, block, and lot numbers.
  """
  
  borough = 0
  block = 0
  lot = 0
  # Write your code here!
  return (borough, block, lot)


# Let's try out the function here.
print(from_padded_bbl('3002920026'))

## Making requests

Python has lots of **packages**, or reusable chunks of functionality that you can build upon. One of those packages is called [requests](http://docs.python-requests.org/en/master/) and it makes it really easy to make network requests to fetch open data.

Let's use the [NYC Planning Labs GeoSearch API](https://geosearch.planninglabs.nyc/docs/) to find information about an address!

In [0]:
import requests

address = '2010 seventh ave'

response = requests.get("https://geosearch.planninglabs.nyc/v1/search", params={'text': address})

print(f"API status code is {response.status_code}.")

data = response.json()

Now let's see what we got, using Python's built-in [`pprint`](https://docs.python.org/3/library/pprint.html) module to format it in a way that's easier to read:

In [0]:
import pprint

pprint.pprint(data)

That's a lot of data! Let's just get the human-readable addresses for all the geo features our API found:

In [0]:
for feature in data['features']:
  props = feature['properties']
  name = props['name']
  borough = props['borough']
  bbl = props['pad_bbl']
  print(f"{name} in {borough} has BBL {bbl}.")


Note that we've assigned the variable `bbl` to the padded BBL of the last geo feature we found. we can now use that to look up lots of information about it in NYCDB. 

## Talking to NYCDB

For our workshop, we have a slightly outdated sandbox instance of NYCDB hosted at the following URL:

In [0]:
NYCDB_URL = "paste in the URL here!"

The Jupyter Notebook environment doesn't have the Python package we need to talk to NYCDB, `psycopg2`, but we can install it:

In [0]:
!pip install psycopg2-binary

Now we can connect to NYCDB:

In [0]:
import psycopg2

nycdb = psycopg2.connect(NYCDB_URL)


And we can use our connection to construct an SQL query that retrieves the number of HPD violations for the building:

In [0]:
with nycdb.cursor() as cur:
  cur.execute(f"SELECT COUNT(*) FROM hpd_violations WHERE bbl = '{bbl}'")
  print(cur.fetchone())


If you want to play around more with NYCDB, you can learn more about its schema by looking at its [`datasets.yml`](https://github.com/aepyornis/nyc-db/blob/master/src/nycdb/datasets.yml) file.

**Note 1:** The above code snippet featured a quick and readable way to create a SQL query with data supplied from another part of our program, but it's important not to use that technique with untrusted user data, as it won't fare well with [little bobby tables](https://xkcd.com/327/).

**Note 2:** It's important to remember that there isn't actually a one-to-one mapping between BBLs and buildings.  For example, smaller buildings may share a lot with each other.

## Challenge: write a function that combines all the things

Write a function that, given an address, returns the number of HPD violations for the BBL it's in.

In [0]:
def get_hpd_violations(address):
  "Return the number of HPD violations for the BBL the given address is in."

  count = 0
  # Write code here!
  return count


# Let's try out the function here. 
print(get_hpd_violations("247 west 116th street"))

## Challenge: read some Python code

Python is famous for its readability, and you now know enough of it to understand some real code!

Consider taking a look at the [421a exemption scraper](https://github.com/toolness/nyc-421a-xls). Or, since you have some experience dealing with BBLs, take a look at [nycdb's `bbl.py`](https://github.com/aepyornis/nyc-db/blob/master/src/nycdb/bbl.py) or the [JustFix tenant app's `nyc.py`](https://github.com/JustFixNYC/tenants2/blob/master/project/util/nyc.py).