Urban Data Science & Smart Cities <br>
URSP688Y Spring 2026<br>
Instructor: Chester Harvey <br>
Urban Studies & Planning <br>
National Center for Smart Growth <br>
University of Maryland

# Demo 4 - Functions, Packages and Tables

- Functions
- Installing and importing packages
- Woking with tables
    - Load data from a CSV
    - Practice exploring the data
    - Convert dates stored as strings to `datetime` data types
    - Drop duplicate rows
    - Count rows withing groups
    - Concatenate tables
    - Join columns from another table
- Debugging

## Functions

Functions are pre-defined programming components that do things. Often, they take inputs and produce outputs.

<img src="https://miro.medium.com/v2/resize:fit:880/0*xMEO8AbXwdsgnHSH.png" alt="Diagram of a function with input and output" width="400"/>

- Some basic functions are built-in to Python (e.g., `print`)

- We can write our own custom functions.

- We can use custom functions other people have written.

In [None]:
# Write a function to test if an input sport is a winter sport

test_sport = 'tennis'
winter_sports =['curling', 'skiing', 'luge', 'snowboarding', 'ice skating']

test = False

# Loop through winter sports
for winter_sport in winter_sports: 

    # Test if the sport is equivalent the winter sport
    if test_sport == winter_sport:

        test = True

test

#### Namespaces

Functions are a good way to understand a somewhat complicated (but, in the end, VERY useful) aspect of Python: namespaces.

Namespaces are the sections of code in which certain variables, _names_, exist and are accessible to other code. Having different namespaces makes it possible for the same variable name to store different values in different places. 

Namespaces minimize name clutter (because you don't need many versions of a variable name), maximize flexibility, and allow code to be written in ways that are generalizable to lots of applications.

The function we just wrote has one argument, `sport`, which is a named variable inside the function; in Python terminology, this variable is _local_ to the function.

## Importing packages

Now that we have basic data structures under our belts—integers, floats, booleans, strings, lists, and dictionaries—we can put them together into a more complex and capable data structure: a table.

We could write our own custom code to combine lists and dictionaries into a table, *or* we could use someone else's code (actually, many, many other peoples' code) to do this in a way that has become an industry standard.

The easiest way to use other peoples' code in a way that is well-tested and documented is through a **package**.

To use a package that's not already in our environment, we first have to install it.

***Note that you only need to install a package once in an environment, not every time you use it.***

In Anaconda prompt (Windows) or Terminal (Mac):
- `conda activate 688y`
- `conda install pandas`

Now we can import the package into our namespace.

Packages are often imported with aliases for brevity. I'll use the standard aliases, but they are technically arbitrary, just like variable names.

`pandas` is typicall imported with the alias `pd`.

In [None]:
# Import packages
import pandas as pd

## Pandas

[_Pandas_](https://pandas.pydata.org/) (Python Data Analysis Library) is currently the most popular way to analyze tables in Python.

The tabular data structure at the heart of Pandas is the DataFrame.

## Loading Data from a File

Let's get our hands on some real-world data by loading a table from a file.

Let's load data from the [Maryland Eviction Case Database](https://opendata.maryland.gov/Housing/District-Court-of-Maryland-Eviction-Case-Data/mvqb-b4hf/data).

A CSV file that is stored in the same directory as our notebook can be opened by entering just the file name as an argument to `pd.read_csv`.

In [None]:
# pd.read_csv() function

Let's practice navigating and doing some analysis with our DataFrame.

Preview the dataframe

In [None]:
# .head() method

How many rows does it have?

In [None]:
# len() function

What columns does it have?

In [None]:
# .columns attribute

Which counties are represented?

In [None]:
# .value_counts() method

What is the earlist date?

Is this true? What's wrong?

In [None]:
# .sort_values() method

Convert the event date column to a `datetime` data type

In [None]:
# pd.to_datetime() function

How many unique cases are there?

In [None]:
# .drop_duplicates() method

How many unique cases per zip code?

In [None]:
# .groupby() method

Which zip codes have the most unique cases *per person*?

Let's join data from [CensusReporter](https://censusreporter.org/).

### Combining/Merging/Joining Tables

Combining information from multiple tables into a single table is one of the most useful data wrangling operations.

There are lots of different ways to join tables, but two basic types are:

1. Joining column with a shared key, which outputs a table that is wider than either input.
2. Concatenating rows with shared column names, which outputs a table that is longer than either input.

#### Joining columns based on a key

![joining columns with a shared key](https://rforhr.com/horizontal_join.png)


#### Concatenating rows with the same column names
![joining rows with shared column names](https://rforhr.com/vertical_join.png)

First, let's concatenate census tables for Montgomery and Prince George's county to make a single table with populations for each zip code.

Then, we'll merge counts of eviction cases onto each zip code.

Finally, we'll calcuate the number of eviction cases per capita.

In [None]:
# Load census reporter data, ignoring the row with data for the whole county (first row under the header)

# Can we write a function that loads a census reporter csv and skips the second row?

In [None]:
# Combine into a single dataframe

# pd.concat() function

In [None]:
# Rename columns with readable names
column_map = {
    'name':'census_zip', 
    'B01003001':'population', 
    'B01003001, Error':'population_error'
}

In [None]:
# Make sure zip codes are stored as strings in both the eviction and census dataframes

# .astype() method

In [None]:
# Join/Merge the counts of evictions per zip code to the census zip codes

# .merge() method

In [None]:
# Clean up the dataframe

# .fillna() method

# .drop() method

In [None]:
# Calculate evictions per population

## Errors and debugging

Errors are frustrating and inevitable. Even professional programmers spend much of their time debugging.

Luckily, there are good tools and techniques for making debugging a little easier.

Despite these, you will probably nearly tear your hair out with some frequency, especially as a beginner. It will get better with time.

There are two types of errors in programming: logic and syntax. They both result in your program not achieving its goal, but the first may not be as easily detectable because the code may still run.

### Logic errors
These are issues with how you have approached or executed your problem. If your code runs but produces nonsensical results, there is probably a logic error. However, your erroneous code might also produce logical but *wrong* results; you might never notice until the problem has rippled downstream. It's best to address this proactively by planning your code well so it's less likely to be illogical, and writing readable code that can be easily reviewed.

Here's a logic error. Can you find it? (Hint: the issue is syntactical, but it's still a logic error because the code works without throwing an error.)

In [None]:
def check_adult(age, cutoff=18):
    if age > cutoff:
        adult = False
    else:
        adult = True
    return adult

check_adult(20)

### Syntax errors
These are more obvious because your code will simply fail. There are lots of tools for figuring out where and why.

Error messages are usually the starting place for debugging a syntax error.

In [None]:
def check_adult(age, cutoff=18):
    if age < cutoff:
        adult = False
    else:
        adult = True
    return adult

check_adult('20')

The error message tells us where the problem is located.

Sometimes, it can be helpful to turn on line numbers.
- In Colab: `Tools -> Settings -> Editor -> Show line numbers`
- In JupyterLab: `View -> Show Line Numbers`

The `ValueError` tells us that the issue is related to the value of a variable on this line, but it's still pretty vague.

Time to start [Googling](https://www.google.com/).


### Debugging
We can also use an "interactive debugger" to help diagnose our problem by stepping through the code one line at a time.

The debugger allows you to set "breakpoints" where the code will stop running temporarily, a table that shows the values of variables at that time, and buttons to step through the code.

In [None]:
def check_adult(age, cutoff=18):
    if age < cutoff:
        adult = False
    else:
        adult = True
    return adult

check_adult(10)