Urban Data Science & Smart Cities <br>
URSP688Y Spring 2026<br>
Instructor: Chester Harvey <br>
Urban Studies & Planning <br>
National Center for Smart Growth <br>
University of Maryland

# Demo 3 - Python Data Types, Programming Logic, Functions, and Intro to Pandas

- Data types
- Programming logic and flow
- Functions
- Installing packages
- Importing packages
- Pandas
  - DataFrames
  - Calculations with columns
  - Selection and filtering
  - Grouping

## Data Types

### Basic Data Types

In [24]:
# String (text)
name = 'Chester'
print(name)
print(type(name))

Chester
<class 'str'>


In [25]:
# Integer
year = 2026
print(year)
print(type(year))

2026
<class 'int'>


In [26]:
# Float (decimal)
miles = 6.37
print(miles)
print(type(miles))

6.37
<class 'float'>


In [27]:
# Boolean (true or false)
a_age = 10
b_age = 20
a_older = a_age > b_age
print(a_older)
print(type(a_older))

False
<class 'bool'>


### Composite Data Types

### List

An ordered array of objects.

In [28]:
# Make a list of Winter Olympic sports
winter_sports = [
  'curling',
  'skiing',
  'luge',
  'snowboarding',
  'ice skating'
]

In [29]:
winter_sports[2] = 'skeleton'

In [33]:
winter_sports.remove('ice skating')

In [34]:
winter_sports

['curling', 'skiing', 'skeleton', 'snowboarding']

#### Dictionary

Labeled data stored as key-value pairs.

*Note*: Dictionaries used to be unordered, but as of Python 3.6 they technically maintain their order. Lists are still usually preferred when order matters. There's also something called an [ordered dictionary](https://realpython.com/python-ordereddict/), which makes it more explicit that you care about order and can make it easier to manage/change order.

In [35]:
# Make a dictionary recording the number of people in class who have yet watched each sport at the Olympics

winter_sports_views = {
  'curling': 2,
  'skiing': 6,
  'luge': 2,
  'snowboarding': 3,
  'ice skating': 6,
}

In [36]:
winter_sports_views

{'curling': 2, 'skiing': 6, 'luge': 2, 'snowboarding': 3, 'ice skating': 6}

In [37]:
winter_sports_views['luge']

2

In [38]:
winter_sports_views['ski jumping'] = 2

### Programming logic

Now that we've got basic building blocks, we can *do* things with them.

This requires programming logic: using logical statements to control the flow of our code in productive ways.

#### [Conditions](https://realpython.com/python-conditional-statements/)

In [39]:
# Conditional statement outputs a True or False
# Is 'rugby' equivalent to 'curling'?
'rugby' == 'curling'

False

In [40]:
# Write a condition testing if an input is a Winter Olympic sport
test_sport = 'rugby'
winter_sport = 'curling' 

if test_sport == winter_sport:
    print(f'{test_sport} is the winter sport')
else:
    print(f'{test_sport} is not a winter sport')    

rugby is not a winter sport


#### Loops

Loops can iterate through composite data, like lists and dictionaries.

'For' loops are the most common type used in data science.

In [41]:
winter_sports

['curling', 'skiing', 'skeleton', 'snowboarding']

In [42]:
# Is the current sport luge?
for sport in winter_sports:
    is_luge = 'luge' == sport
    if is_luge:
        print(sport)

In [43]:
winter_sports

['curling', 'skiing', 'skeleton', 'snowboarding']

In [44]:
# Test if each sport in a list is a Winter Olympic sport
test_sports = ['tennis', 'running', 'foosball', 'luge', 'Ski jumping']

tests = {}

# Loop through test sports
for test_sport in test_sports:
    
    test = False

    # Loop through winter sports
    for winter_sport in winter_sports: 

        # Test if current winter sport is equivalent to current test sport
        if winter_sport == test_sport:

            test = True

    tests[test_sport] = test

        
tests

# what we're aiming for as a list
# [False, False, False, True]

# What we're aiming for as a dictionary
# {'tennis': False, 'running': False, 'foosball': False, 'luge': True}

{'tennis': False,
 'running': False,
 'foosball': False,
 'luge': False,
 'Ski jumping': False}

In [45]:
# Test if each sport in a dictionary has been watched by more than 2 people

for sport in winter_sports_views:
    views = winter_sports_views[sport]
    if views > 2:
        print(f'{sport} has been viewed by {views} people!')
    else:
        print(f'not enough people are watching {sport}!')

not enough people are watching curling!
skiing has been viewed by 6 people!
not enough people are watching luge!
snowboarding has been viewed by 3 people!
ice skating has been viewed by 6 people!
not enough people are watching ski jumping!


----------------------------------------------------
# We got this far in class on Week 3
----------------------------------------------------

## Functions

Functions are pre-defined programming components that do things. Often, they take inputs and produce outputs.

<img src="https://miro.medium.com/v2/resize:fit:880/0*xMEO8AbXwdsgnHSH.png" alt="Diagram of a function with input and output" width="400"/>

- Some basic functions are built-in to Python (e.g., `print`)

- We can write our own custom functions.

- We can use custom functions other people have written.

In [None]:
# Write a function wrapper for one of the loops above

#### Namespaces

Functions are a good way to understand a somewhat complicated (but, in the end, VERY useful) aspect of Python: namespaces.

Namespaces are the sections of code in which certain variables, _names_, exist and are accessible to other code. Having different namespaces makes it possible for the same variable name to store different values in different places. 

Namespaces minimize name clutter (because you don't need many versions of a variable name), maximize flexibility, and allow code to be written in ways that are generalizable to lots of applications.

The function we just wrote has two arguments, `name` and `age`, which are variables inside the function. It also defines another variable, `label`, which is usable inside the function. We call these variables that are _local_ to the function. We can see the variables local to a namespace by printing the output of the `locals` function (notice that it doesn't need any arguments).

## Importing packages

Now that we have basic data structures under our belts—integers, floats, booleans, strings, lists, and dictionaries—we can put them together into a more complex and capable data structure: a table.

We could write our own custom code to combine lists and dictionaries into a table, *or* we could use someone else's code (actually, many, many other peoples' code) to do this in a way that has become an industry standard.

The easiest way to use other peoples' code in a way that is well-tested and documented is through a **package**.

To use a package that's not already in our environment, we first have to install it.

In [None]:
# ! conda install pandas

Next, we import it into the current namespace.

Packages are often imported with aliases for brevity. I'll use the standard aliases, but they are technically arbitrary, just like variable names.

## Pandas

[_Pandas_](https://pandas.pydata.org/) (Python Data Analysis Library) is currently the most popular way to analyze tables in Python.

The tabular data structure at the heart of Pandas is the DataFrame.

Let's import `pandas` with the alias `pd` for short.

In [None]:
import pandas as pd

## DataFrames

Now we can use Pandas to make a DataFrame.

Notice that we're just entering dictionaries, strings, and ints? Under the hood, Pandas is also storing these data with these basic types. But it will give us a lot of tools to do sophisticated things with them.

In [None]:
columnwise_data = {
    'english': {'Daniela': 83, 'Zoe': 97, 'Rowen': 77, 'Jude': 95, 'Austin': 87, 'Jasper': 92, 'Liora': 88, 'Kieran': 72},
    'math': {'Daniela': 95, 'Zoe': 83, 'Rowen': 73, 'Jude': 80, 'Austin': 100, 'Jasper': 94, 'Liora': 89, 'Kieran': 96},
    'science': {'Daniela': 90, 'Zoe': 87, 'Rowen': 95, 'Jude': 73, 'Austin': 80, 'Jasper': 99, 'Liora': 87, 'Kieran': 90},
    'school':{'Daniela': 'Fairview', 'Zoe': 'New Vista', 'Rowen': 'Fairview', 'Jude': 'New Vista', 'Austin': 'New Vista', 'Jasper': 'Fairview', 'Liora': 'New Vista', 'Kieran': 'Fairview'},
}

df = pd.DataFrame(columnwise_data)
df

## Previewing DataFrames

Dataframes can get big fast. It can be helpful just to see the first few rows, or just to see the column names.

the `head` method is used to show the first five rows by default, or you can set the argument with the number you want to see.

In [None]:
df.head()

The `columns` attribute is very handy for listing all the columns. I tend to add the `to_list` method so Jupyter prints them out nicely without extra clutter.

In [None]:
df.columns.tolist()

The `value_counts` method is very handy for previewing unique values in a column.

In [None]:
df['school'].value_counts()

#### Slicing

Just like lists, we can select parts of tables based on indexes. This is called 'slicing.'

Columns and rows are identified by the bold headers to the left and top. You can index data based on these headers.

In [None]:
df['english'] # One column 

In [None]:
df[['english','math']] # Multiple columns; note that the input is a list

In [None]:
df.loc['Daniela'] # One row based on index value

In [None]:
df.iloc[0] # One row based on index order (starting with 0)

## Filtering

You can also retrieve a subset of a DataFrame based on a condition. This requires making a 'boolean mask', then selecting by that mask. Pandas will only return the rows or columns that are `True` in the mask.

In [None]:
df['school'] == 'Fairview'

In [None]:
df[df['school'] == 'Fairview']

## Grouping

A very powerful thing to do with tables is to group rows, then make calculations within groups. This is like PivotTable in Excel.

Let's calculate the average grade in English by school.

In [None]:
df.groupby('school')['english'].mean()

## Wide vs Long Tables

Data scientists often talk about tables being organized in two ways: wide and long

- Wide: Multiple attributes for the same object stored in each row
- Long: Only one attribute per row (potential for multiple rows per object)

Let's restructure our table so it's long and see what the differences are.

In [None]:
# The first step is to convert the row index into a column with the header 'name'
df = df.reset_index().rename(columns={'index':'name'})

In [None]:
# Then we can use the melt function to convert from wide to long
df_long = pd.melt(
    df, 
    id_vars=['name','school'], 
    value_vars=['english','math','science'], 
    var_name='subject', 
    value_name='grade',
)
df_long

With the data in this format, we can more easily calculate grade averages across all subjects.

In [None]:
df_long['grade'].mean()

We can still easily break down by subject using groups.

In [None]:
df_long.groupby('subject')['grade'].mean()

## Loading Data from a File

Enough with these toy data! Let's get our hands on some real-world data by loading a table from a file.

Let's load data from the [Maryland Eviction Case Database](https://opendata.maryland.gov/Housing/District-Court-of-Maryland-Eviction-Case-Data/mvqb-b4hf/data).

In [None]:
df = pd.read_csv('District_Court_of_Maryland_Eviction_Case_Data_2024Q4.csv')

In [None]:
df

## Errors and debugging

Errors are frustrating and inevitable. Even professional programmers probably spend most of their time debugging.

Luckily, there are good tools and techniques for making debugging a little easier.

Despite these, you will probably nearly tear your hair out with some frequency, especially as a beginner. It will get better with time.

There are two types of errors in programming: logic and syntax. They both result in your program not achieving its goal, but the first may not be as easily detectable because the code may still run.

### Logic errors
These are issues with how you have approached or executed your problem. If your code runs but produces nonsensical results, there is probably a logic error. However, your erroneous code might also produce logical but *wrong* results; you might never notice until the problem has rippled downstream. It's best to address this proactively by planning your code well so it's less likely to be illogical, and writing readable code that can be easily reviewed.

Here's a logic error. Can you find it? (Hint: the issue is syntactical, but it's still a logic error because the code works without throwing an error.)

In [None]:
def check_adult(age):
    if age > 18:
        adult = False
    else:
        adult = True
    return adult

check_adult(20)

### Syntax errors
These are more obvious because your code will simply fail. There are lots of tools for figuring out where and why.

Error messages are usually the starting place for debugging a syntax error.

In [None]:
def check_adult(age):
    if age < 18:
        adult = False
    else:
        adult = True
    return adult

check_adult('20')

The error message tells us where the problem is located.

Sometimes, it can be helpful to turn on line numbers.
- In Colab: `Tools -> Settings -> Editor -> Show line numbers`
- In JupyterLab: `View -> Show Line Numbers`

The `ValueError` tells us that the issue is related to the value of a variable on this line, but it's still pretty vague.

Time to start [Googling](https://www.google.com/).


## Style guidelines for Python
- At the very least, do things consistently
- One statement per line
- Try to limit line length to 72 characters
- Use four spaces to indent
- Put spaces around operators (e.g., `1 + 1` or `day = 'Monday'`) (except in keyword function arguments)
- Use blank lines intentionally and consistently
- Use meaningful names
- Name variables and functions with `lowercase_underscores`
- Constants are often named in `ALL_CAPS_WITH_UNDERSCORES` (e.g., `C = 2.99792458e+8`)
- Name custom classes with `CapWords`
- In general, avoid spaces in folder and filenames used for programming

See [Code Readability](https://github.com/ncsg/ursp688y_sp2024/blob/main/README.md#code-readability) on the syllabus. [CS61A](https://cs61a.org/articles/composition/) has an excellent composition guide. [PEP 8](https://peps.python.org/pep-0008/) is a standard Python style guide. [Google](https://google.github.io/styleguide/pyguide.html) publishes their internal Python style guide.