# DataWISE × DataEng: Python Intro Workshop

Ready to dive into the world of Python?  

---

## 🚅 What We'll Cover
- Some basic building blocks used in Python
- Calling an API using a Python script

---

Let's get pythoning! 🐍🐍🐍







## 1. What is Python?? 
#### ... and how does it work?

- Like we talked about, Python is a way we can communicate with our computer and ask it to do things for us.
- Let's start with the simplest of Python commands - print()

In [43]:
# In this cell enter your code below! ↓
print("Hello Lovely TILers!!")


Hello Lovely TILers!!


Wow - it's like magic - but on a computer!

---
## 2. 📦 Storing Useful Info
### a) Variables 
A variable stores a value, so that you can use it in your code later.
For this example - and because we will be using it later - I want to set a variable to be "Mansion House"
* Name is the variable
* "Mansion House" is the value
* = means "store this variable"


In [44]:
# Enter your first variable ↓
name = "mansion house"
print(name)


mansion house


### b) Lists
A list is a store of a group of values in one variable. 
We use square brackets to wrap around this list of values.

In [45]:
# Add more names of some super cool TILers here ↓
really_cool_TILers = ["Trea","Zoe","Pave"]

We can then call the list based on the list order using print from earlier. 
We can even add in extra text to print too if we add it in like this "TEXT"...

🧠Note - Python counts from 0 🧠

In [46]:
really_cool_TILers = ["Trea","Zoe","Pave"]
print(really_cool_TILers[0], "is the coolest TILer")

Trea is the coolest TILer


### c) Dictionaries
A dictionary is a collection of key value pairs.
Dictionaries can contain multiple data types as well as lists.

In [47]:
trea = {
  "cohort": 38,
  "specialist_subject": "Flight codes",
  "starsign": "Leo"
}

pave = {
  "cohort": 43,
  "specialist_subject": "Love Island",
  "starsign": "Capricorn"
}

print("Trea's cohort is", trea["cohort"])

print("Pave watches too much", pave["specialist_subject"])


Trea's cohort is 38
Pave watches too much Love Island


* 'trea' and 'pave' are the names we are assigning to the dictionaries
* Each dictionary contains 3 key value pairs
* The syntax used is what defines these as dictionaries

---
## 3.  📚 Libraries 
#### Bringing in functions to do things for us
We need to import a library/tool that will let us connect to the internet. 
<br>
It's called the requests library - and is one of the most useful libraries out there. 
<br>
Basically in short - it will let us talk to the TFL api on the internet.


In [49]:
# Import the requests library using import + the name of the library ↓
import requests

---
## 4. 📞 Calling the API 
We need to give the requests library a URL to request from, we can do this by setting a variable like we learned earlier. 
We can grab the TFL URL from their API documentation site - https://api.tfl.gov.uk/
</br>
Then we need to set up the rest of our code: 
- requests.get(url) sends the request
- response holds the reply from TfL

In [51]:
## Importing the library 
import requests 

## Setting the url variable - here is a lovely one I prepared earlier 
url = "https://api.tfl.gov.uk/StopPoint/Search/Mansion%20House"

## Defining a new variable called response that uses the requests library to "get" from the TFL API url ↓
response = requests.get(url)

## Printing the response from the URL and seeing what it looks like ↓
print(response)

<Response [200]>


## 5. ℹ️ Pulling Through Data From the API 
APIs respond with JSON, which is like a big text based dictionary - so we're going to have to pull through this JSON and figure out how to read it/what to do with it. 
</br> 
We can use one of the IDs pulled through earlier.
</br> 
"940GZZLUMSH"  - Mansion House Station ID 🏠

In [52]:
## Import requests like earlier  ↓


## Set Manstion House as the stop ID variable  ↓
stop_id = "940GZZLUMSH" 
url = f"https://api.tfl.gov.uk/StopPoint/{stop_id}/Arrivals"

## Set our response  ↓
response = requests.get(url)

## Set our data  ↓
data = response.json()

print(data)

[{'$type': 'Tfl.Api.Presentation.Entities.Prediction, Tfl.Api.Presentation.Entities', 'id': '406319045', 'operationType': 1, 'vehicleId': '201', 'naptanId': '940GZZLUMSH', 'stationName': 'Mansion House Underground Station', 'lineId': 'circle', 'lineName': 'Circle', 'platformName': 'Eastbound - Platform 3', 'direction': 'inbound', 'bearing': '', 'destinationNaptanId': '940GZZLUHSC', 'destinationName': 'Hammersmith (H&C Line) Underground Station', 'timestamp': '2025-06-26T10:23:15.0958606Z', 'timeToStation': 48, 'currentLocation': 'Between Blackfriars and Mansion House', 'towards': 'Hammersmith', 'expectedArrival': '2025-06-26T10:24:03Z', 'timeToLive': '2025-06-26T10:24:03Z', 'modeName': 'tube', 'timing': {'$type': 'Tfl.Api.Presentation.Entities.PredictionTiming, Tfl.Api.Presentation.Entities', 'countdownServerAdjustment': '00:00:00', 'source': '0001-01-01T00:00:00', 'insert': '0001-01-01T00:00:00', 'read': '2025-06-26T10:23:06.431Z', 'sent': '2025-06-26T10:23:15Z', 'received': '0001-01-

---
## 5. 🚆 Looping Through the Arrivals 
We are going to be using something called a for loop!
</br>
A for loop lets us repeat a block of code for every item in a list - just like looping through rows in a dataset - almost like an iterative macro!
In this example - our data is a list of dictionaries - like a table where each row has columns like lineName, destinationName etc. 
</br>
Each item represents one arriving train - and I'm going to call each item a train
</br>
</br>
for train in arrivals_data: says - Loop through every row in the list 
</br>
And then inside the loop it: 
* Extract rows from the current "row"
* Print() formats our data and shows the train arrival info in human friendly form




In [55]:
## Here is one I prepared earlier 
for train in data:
    line = train["lineName"]
    destination = train["destinationName"]
    time = round(train["timeToStation"]/60)

    print(f"{line} line train to {destination} arriving in {time} minutes.")

Circle line train to Hammersmith (H&C Line) Underground Station arriving in 1 minutes.
Circle line train to Hammersmith (H&C Line) Underground Station arriving in 8 minutes.
Circle line train to Hammersmith (H&C Line) Underground Station arriving in 18 minutes.
Circle line train to Edgware Road (Circle Line) Underground Station arriving in 4 minutes.
Circle line train to Edgware Road (Circle Line) Underground Station arriving in 14 minutes.
Circle line train to Edgware Road (Circle Line) Underground Station arriving in 23 minutes.
District line train to Barking Underground Station arriving in 10 minutes.
District line train to Richmond Underground Station arriving in 3 minutes.
District line train to Richmond Underground Station arriving in 11 minutes.
District line train to Richmond Underground Station arriving in 21 minutes.
District line train to Upminster Underground Station arriving in 2 minutes.
District line train to Upminster Underground Station arriving in 12 minutes.
District

KeyError: 'destinationName'

---
## ⭐ Extra Bit!! - 6. Getting Preppy With It!
Now we are going to attempt to get our data into something that resembles the datasets we are used to working with - with nice columns and rows!!
</br>
</br>
To do that - we need the help of another library called pandas - which lets us put things into dataframes - which is a table like structure. 

In [57]:
!pip install pandas 
import pandas as pd

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [58]:
df = pd.DataFrame(data)

## .head() lets us peak at the first five rows so we dont have mad loads of data coming back 
print(df.head())


                                               $type           id  \
0  Tfl.Api.Presentation.Entities.Prediction, Tfl....    406319045   
1  Tfl.Api.Presentation.Entities.Prediction, Tfl....  -1118870440   
2  Tfl.Api.Presentation.Entities.Prediction, Tfl....   -555446017   
3  Tfl.Api.Presentation.Entities.Prediction, Tfl....   -157322450   
4  Tfl.Api.Presentation.Entities.Prediction, Tfl....    406122437   

   operationType vehicleId     naptanId                        stationName  \
0              1       201  940GZZLUMSH  Mansion House Underground Station   
1              1       202  940GZZLUMSH  Mansion House Underground Station   
2              1       203  940GZZLUMSH  Mansion House Underground Station   
3              1       210  940GZZLUMSH  Mansion House Underground Station   
4              1       211  940GZZLUMSH  Mansion House Underground Station   

   lineId lineName            platformName direction  ... destinationNaptanId  \
0  circle   Circle  Eastbound - Pla

### a) Cleaning Our DataFrame
* Now we're going to just take some key columns as we don't need everything


In [59]:

df = df[["lineName", "destinationName", "platformName", "timeToStation"]]
## Now print df below 
print(df)

              lineName                                 destinationName  \
0               Circle      Hammersmith (H&C Line) Underground Station   
1               Circle      Hammersmith (H&C Line) Underground Station   
2               Circle      Hammersmith (H&C Line) Underground Station   
3               Circle  Edgware Road (Circle Line) Underground Station   
4               Circle  Edgware Road (Circle Line) Underground Station   
5               Circle  Edgware Road (Circle Line) Underground Station   
6             District                     Barking Underground Station   
7             District                    Richmond Underground Station   
8             District                    Richmond Underground Station   
9             District                    Richmond Underground Station   
10            District                   Upminster Underground Station   
11            District                   Upminster Underground Station   
12            District             Eal

* I also think we need to do something about our time column and create a minutes to arrival column

In [60]:
df["minutesToArrival"] = round(df["timeToStation"] / 60 ) 
## Now print df below again  ↓
print(df)

              lineName                                 destinationName  \
0               Circle      Hammersmith (H&C Line) Underground Station   
1               Circle      Hammersmith (H&C Line) Underground Station   
2               Circle      Hammersmith (H&C Line) Underground Station   
3               Circle  Edgware Road (Circle Line) Underground Station   
4               Circle  Edgware Road (Circle Line) Underground Station   
5               Circle  Edgware Road (Circle Line) Underground Station   
6             District                     Barking Underground Station   
7             District                    Richmond Underground Station   
8             District                    Richmond Underground Station   
9             District                    Richmond Underground Station   
10            District                   Upminster Underground Station   
11            District                   Upminster Underground Station   
12            District             Eal

#### ⭐⭐ Woohoo!! Now things are looking familiar!
</br>

### b)🤏Filtering Our DataFrame
* I take the District home - so that's what I care about and want to filter for 

In [42]:
district_trains = df[df["lineName"] == "District"]
print(district_trains)

    lineName                      destinationName            platformName  \
4   District         Richmond Underground Station  Westbound - Platform 1   
5   District         Richmond Underground Station  Westbound - Platform 1   
6   District         Richmond Underground Station  Westbound - Platform 1   
7   District         Richmond Underground Station  Westbound - Platform 1   
8   District        Upminster Underground Station  Eastbound - Platform 3   
9   District        Upminster Underground Station  Eastbound - Platform 3   
10  District  Ealing Broadway Underground Station  Westbound - Platform 1   
11  District  Ealing Broadway Underground Station  Westbound - Platform 1   
12  District  Ealing Broadway Underground Station  Westbound - Platform 1   
13  District  Ealing Broadway Underground Station  Westbound - Platform 1   
14  District  Ealing Broadway Underground Station  Westbound - Platform 1   
15  District        Upminster Underground Station  Eastbound - Platform 3   

### b)📂 Saving Our DataFrame
Now I want to save my really useful train data somewhere. 
I can use .to_csv to do that:
* I will neeed to give it a name as well


In [61]:
district_trains.to_csv("circle_trains.csv")

### Congratulations!! 🥳
You've now...
* Called an API 
* Pulled some data 
* Cleaned it
* Filtered it 
* And saved it!!
