# **Immersive Python Workshop (January 9-10, 2025)**


# **Intro to Python (1:10pm - 2:00pm)**; Instructor: Bryan Gee

### **Using Google Colab and Jupyter Notebook development environments**

Let's first start off by exploring the Google Colab environment which is based on the Jupyter IDE. Here we will practice creating new text blocks, creating new code blocks, and using Python print statements to view

In [None]:
#this is a comment
# print("hello")

### **Import Python Packages**

In [None]:
#import os, csv, requests, and json packages
import os       #interacts with OS
import csv      #reads and writes in CSV format
import requests #send HTTP requests
import json     #reads and writes in JSON format

In [None]:
#import package and assign alias
import pandas as pd
pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQYDVK6VKqLLRAHcHkRhVDQTYL5QVXhojUAL80yrKaZJj3KhqsTtqQYRni_bKrQ2IF20A4fR--h-kNJ/pub?output=csv")

#import a specific module of a package
from datetime import date
date.today()

#import a specific module of a package with an alias
from matplotlib import pyplot as plt

### **Use PIP To Install A New Package That Is Not In Colab By Default**

In [None]:
import rasterio #accessing geospatial raster file

In [None]:
!pip install rasterio

In [None]:
#new packages that are installed with pip still need to be imported afterward in order to be used further down in the notebook

import rasterio

In [None]:
!pip list

### **Explore default Python Object Types**

In [None]:
#objects with single elements (see https://docs.python.org/3/library/stdtypes.html for more)

boolean_example = False

integer_example = 1

float_example = 1.1

string_example = "University of Texas at Austin"

#escaping formatting
new_example_string = "this is the "best" workshop ever"
print(new_example_string)

#creative punctuation formatting
new_example_string2 = 'this is the "best" workshop ever'
print(new_example_string2)

#see https://stackoverflow.com/questions/56011/single-quotes-vs-double-quotes-in-python for discussion of single vs. double quotation marks

In [None]:
#objects with multiple elements (see https://docs.python.org/3/tutorial/datastructures.html for more)

tuple_example = ("Travis", "Williamson", "Hays")

list_example = ['Texas', 'Oklahoma', 'Louisiana']

List_Example = ['British Columbia', 'Alberta', 'Ontario'] #notice that variable names are case sensitive!

list_example_two = [1,2,3,4,5]

set_example = {1,2,3}

dictionaryexample = {
                      "name":"Texas",
                      "population":29000000,
                      "capitol":"Austin",
                      "majorcities":["Houston","Dallas","San Antonio","Austin"]
                    }

DictionaryExample = {"name":"Swift, Taylor", "DOB":"1989-12-12", "age":"35", "birthplace":"West Reading, PA"}

In [None]:
type(tuple_example)

In [None]:
#indexing structures
print(tuple_example[0])
print(list_example[2])
print(dictionaryexample['majorcities'][0])
# print(set_example[2])

### **Test Out Python Operators**

In [None]:
#operators
a = 1 + 1
b = 2 - 1
c = 2 * 2
d = 4 / 2
e = 13%5
print(a)

z = 2
y = "4"
print(z+y)

In [None]:
#convert numeric value to string
z_string = str(z)
print(z_string + y)

#convert number string to numeric value
y_int = int(y)
print(z + y_int)

a = 1 + 1
print("The variable 'a' is equal to: " + str(a))

In [None]:
total_list = list_example + List_Example
print(total_list)

In [None]:
total_list = list_example + 'Greenland'
# total_list.append('Greenland')
print(total_list)

### **Common Python Objects Methods**

In [None]:
data = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQYDVK6VKqLLRAHcHkRhVDQTYL5QVXhojUAL80yrKaZJj3KhqsTtqQYRni_bKrQ2IF20A4fR--h-kNJ/pub?output=csv")
print(data.head())
data.head()

songs = data['song_title']
print(songs)

#modifying string case
print(songs.str.upper())

#replacing part of string
print(songs.str.replace("e","3"))

#length of object
data['song_title_letter_count'] = data['song_title'].str.len()
print(data[['song_title', 'song_title_letter_count']])

#using datetime module
#import a specific module of a package
from datetime import date
#get today's date in MM-DD-YYYY format
today = date.today()
data['days_since'] = data['song_release_date'] - today

data['song_release_date'] = pd.to_datetime(data['song_release_date']).dt.date
data['days_since'] = data['song_release_date'] - today

data[['song_title','song_release_date','days_since']]
data.to_csv("sample_data/ts_discography_released_edited.csv")

#split string into columns
data_subset = data.head(3)
data_subset
data_subset[['song_writer1', 'song_writer2']] = data_subset['song_writers'].str.split(',', expand=True)
data_subset

#split string into list
data_subset['song_writers_list'] = data_subset['song_writers'].str.split(',')
#data_subset.loc[:, 'song_writers_list'] = data_subset['song_writers'].str.split(',')
data_subset
print(data_subset['song_writers_list'].apply(type))

### **Try Using Conditional Statements**

Examples of conditional statements are provided below. Notice how a condition ends with a `:` and the lines of code to executed if a condition is met are indented. Indentation in Python is a key part of the syntax and must be carefully managed for code to run properly. It also helps with code readability and as compared to the `{}` used in other languages like JavaScript.

In [None]:
numberofdatapointsfound = 20

if numberofdatapointsfound > 30:
  print("let's run the analysis")

elif numberofdatapointsfound == 30:
  print("just enough data")

else:
  print("we need to find more data")

print("this is the end of the code block.")

In [None]:
numberofdatapointsdesired = 20

if numberofdatapointsdesired > 30:
  print("let's run the analysis. current data below:")
  data_select = data.head(100)
  print(data_select)

elif numberofdatapointsdesired == 30:
  print("just enough data. current data below:")
  data_select = data.head(30)
  print(data_select)

else:
  print("we will test with a small dataset. current data below:")
  data_select = data.head(5)
  print(data_select)

print("this is the end of the code block.")

### **Practice Using If and Else with a Boolean Object**

In [None]:
# this is an example of a boolean if-else
ThursdayRain = True
FridayRain = False

if ThursdayRain and FridayRain:
  print("Moving workshop to virtual both days")
elif ThursdayRain and not FridayRain:
  print("Moving workshop to virtual on Thursday, consider in-person on Friday")
else:
  print("In-person workshop as planned")

# **Breaktime (2:00pm - 2:10pm)**
```






    


```



# **Python Essentials (2:10pm - 3:00pm)** Instructor: Michael Shensky

In this section we will cover loops, error handling, functions, and retrieving data using the `requests` package

### **Looping Through Items in a List**

In [None]:
#first loop example
listexample = [1,2,3,4,5,6,7,8]

#the variable name after "for" in the expression below can be changed to anythign you like but needs to be consistent throughout the for loop
for currentnum in listexample:
  currentnum +=  1000
  print(currentnum)

print("the for loop is complete")

1001
1002
1003
1004
1005
1006
1007
1008
the for loop is complete


In [None]:
#second loop example
workshopattendees = ['person 1', 'person 2', 'person 3']

for example in workshopattendees:
  #produce custom email message
  text = "Hi " + example + ", thank you for attending our workshop today."
  print(text)

Hi person 1, thank you for attending our workshop today.
Hi person 2, thank you for attending our workshop today.
Hi person 3, thank you for attending our workshop today.


```


    


```





### **Using a While Loop**

In [None]:
i = 1

while i < 20:
  print(str(i))
  #increment value of i by 1
  i += 1

🔶🔶 **CHALLENGE**: What might be risky about using while loops? What happens in the example above if you forget to include line 6 to increment the value of i each time through the loop?

```


    


```





### **Iterate Through Key Value Pairs in a Dictionary**

In [None]:
ut = {"location":"Austin, TX", "color":"burnt orange", "founded":1883, 'mascot':'bevo'}

#in the expression below the variable names "k" and "v" stand for key and value, but can be adjusted to alternative variable names
for k, v in ut.items():
  if k == "location":
    print(k + ": " + str(v))

location: Austin, TX


```


    


```





### **Use Try and Except Statements to Gracefully Handle Errors**

In [None]:
try:
  a = 1
  b = "two"
  c = a + b
  print(c)

except Exception as e:
  print(str(e))
  print("error")


print("this is an essential line of code that needs to be run every time")

this is an essential line of code that needs to be run every time


```


    


```





### **Use the Requests Module to Retrieve Source of a Webpage**

In [None]:
texasgeodataportalsource = requests.get('https://geodata.lib.utexas.edu/')
print(texasgeodataportalsource.text)


<!DOCTYPE html>
<html class="no-js" lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <meta name="geoblacklight-version" content="4.4.0">

    <!-- Google tag (gtag.js) -->
    <script async src="https://www.googletagmanager.com/gtag/js?id=G-S1CVCQRTMM"></script>
    <script>
      window.dataLayer = window.dataLayer || [];
      function gtag(){dataLayer.push(arguments);}
      gtag('js', new Date());

      gtag('config', 'G-S1CVCQRTMM');
    </script>

    <!-- Internet Explorer use the highest version available -->
    <meta http-equiv="X-UA-Compatible" content="IE=edge">

    <title>University of Texas Libraries GeoData</title>
    <script>
      document.querySelector('html').classList.remove('no-js');
    </script>
    <link href="https://geodata.lib.utexas.edu/catalog/opensearch.xml" title="University of Texas Libra

```


    


```





### **Retrieve Data in JSON Format from the City of Austin Data Portal**

In [None]:
austinfoundpetresponse = requests.get("https://data.austintexas.gov/resource/kz4x-q9k5.json")
austinfoundpetdata = austinfoundpetresponse.text
austinfoundpetjson = json.loads(austinfoundpetdata)

In [None]:
#see information about first pet
print(austinfoundpetjson[1])

{'animal_id': 'A913718', 'location': {'latitude': '30.355005018', 'longitude': '-97.686831969', 'human_address': '{"address": "9200 NORTH PLAZA", "city": "AUSTIN", "state": "", "zip": "78753"}'}, 'at_aac': 'Yes (come to the shelter)', 'intake_date': '2024-09-20T00:00:00.000', 'type': 'Cat', 'looks_like': 'Domestic Shorthair Mix', 'color': 'Blue Tabby/White', 'sex': 'Intact Female', 'age': '2 months', 'image': {'url': 'http://www.petharbor.com/pet.asp?uaid=ASTN.A913718'}, ':@computed_region_8spj_utxs': '4', ':@computed_region_q9nd_rr82': '9', ':@computed_region_e9j2_6w3z': '17', ':@computed_region_m2th_e4b7': '63', ':@computed_region_rxpj_nzrk': '78', ':@computed_region_qwte_z96m': '2133', ':@computed_region_a3it_2a2z': '3642', ':@computed_region_hnvk_vq9u': '4'}


In [None]:
#see information about all pets
for foundpet in austinfoundpetjson:
  for k, v in foundpet.items():
    if k == "looks_like":
      print(k + ": " + str(v))
  print()

🔶🔶 **CHALLENGE**: Try modifying the code above to print the looks_like value for the first item in austinfoundpetjson

In [None]:
print(austinfoundpetjson[2]['looks_like'])


Domestic Shorthair Mix


```


    


```




### **Retrieve Data in from the Texas Data Repository**

In [None]:
#Dive characteristics of Weddell seals 2014-2016 from https://dataverse.tdl.org/file.xhtml?fileId=62725&datasetVersionId=1894
sealdivedatarequest = requests.get("https://dataverse.tdl.org/api/access/datafile/62725?gbrecs=true")
sealdivedata = sealdivedatarequest.text
print(sealdivedata)

In [None]:
#while the requests module works well for retrieving some types of data, pandas can be a better option for loading tabular data
import pandas as pd
sealdivedf = pd.read_csv("https://dataverse.tdl.org/api/access/datafile/62725?gbrecs=true")
sealdivedf.head()

Unnamed: 0,Unique,Divenum,Hole,Seal,DataFile,Year,Month,Day,Hour,Minute,...,Avg.WindDir,Avg.WindSp.kts,Avg.WindSp.mph,Avg.AirTemp.C,Max.Air.Temp.C,Min.Air.Temp.C,Avg.WindChill.C,Avg.Light,Avg.TideHeight,Unnamed: 64
0,1,1,1,50,101,2014,11,11,21,57,...,310.0,5.0,7.5,-8.0,-3,-19,-13.51,10761.1,0.107,
1,2,2,1,50,101,2014,11,11,22,1,...,310.0,5.0,7.5,-8.0,-3,-19,-13.51,3131.4,0.03,
2,3,3,1,50,101,2014,11,11,22,4,...,310.0,5.0,7.5,-8.0,-3,-19,-13.51,1598.8,0.03,
3,4,4,1,50,101,2014,11,11,22,12,...,310.0,5.0,7.5,-8.0,-3,-19,-13.51,10252.6,0.03,
4,5,5,1,50,101,2014,11,11,22,16,...,310.0,5.0,7.5,-8.0,-3,-19,-13.51,1618.9,0.03,


```


    


```





### **Define Simple Function**

In [None]:
def multiplyvalues(val1, val2):
  newval = val1 * val2
  return newval

valuesonetoten = range(1,21,3)

print(list(valuesonetoten))

for value in valuesonetoten:
  print(multiplyvalues(value,10))


[1, 4, 7, 10, 13, 16, 19]
10
40
70
100
130
160
190


```


    


```





### **Define A More Advanced Function That Retrieves Data**

In [None]:
# This function utilizes the requests package to make an API call to the GitHub API to retrieve information about GitHub users in different cities
querylist = []
querylist.append("location:\"austin\"&followers:>=10&repos:>=10")
querylist.append("location:\"dallas\"&followers:>=10&repos:>=10")
querylist.append("location:\"houston\"&followers:>=10&repos:>=10")
# querylist.append("location:\"topeka\"&followers:>=5&repos:>=5")
# querylist.append("location:\"miami\"&followers:>=5&repos:>=5")


def api_query(query, resultsperpage):
  requesturl = "https://api.github.com/search/users?q="+ query +"&per_page=" + str(resultsperpage) +"&limit=10"
  response = requests.get(requesturl, headers={'Content-Type': 'application/json'})
  responsejson = json.loads(response.text)

  print(query + "    " + str(responsejson))


for query in querylist:
  api_query(query, "10")



location:"austin"&followers:>=10&repos:>=10    {'total_count': 36470, 'incomplete_results': False, 'items': [{'login': 'getify', 'id': 150330, 'node_id': 'MDQ6VXNlcjE1MDMzMA==', 'avatar_url': 'https://avatars.githubusercontent.com/u/150330?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/getify', 'html_url': 'https://github.com/getify', 'followers_url': 'https://api.github.com/users/getify/followers', 'following_url': 'https://api.github.com/users/getify/following{/other_user}', 'gists_url': 'https://api.github.com/users/getify/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/getify/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/getify/subscriptions', 'organizations_url': 'https://api.github.com/users/getify/orgs', 'repos_url': 'https://api.github.com/users/getify/repos', 'events_url': 'https://api.github.com/users/getify/events{/privacy}', 'received_events_url': 'https://api.github.com/users/getify/received_events', 'type': 'User

🔶🔶 **CHALLENGE**: Modify the code block above to help you determine which city in Texas out of Houston, Dallas, and Austin has the most GitHub users with at least 10 followers and at least 10 repositories