# Joins

In [1]:
import polars as pl

## Introducing the Datasets
- Data in the real world can be spread out across multiple tables/datasets.
- A join combines two tables/`DataFrames` together based on a logical criteria.
- To **join** means to link or connect.
- In this section, we'll be exploring data from a fictional streaming service (ala Netflix).
- All CSV datasets are found within the `streaming_service` directory.

- The `movies.csv` dataset holds information on the films available on the service.

In [2]:
pl.read_csv("streaming_service/movies.csv").head(2)

id,title,budget,release_date,revenue,runtime,status
i64,str,i64,str,i64,i64,str
1,"""#Horror""",1500000,"""2015-11-20""",,90,"""Released"""
2,"""(500) Days of Summer""",7500000,"""2009-07-17""",60722734.0,95,"""Released"""


- The `plans.csv` file lists the streaming service's subscription plans.

In [3]:
pl.read_csv("streaming_service/plans.csv")

id,name,price,resolution
i64,str,f64,str
1,"""Free""",8.99,"""720p"""
2,"""Standard""",13.99,"""1080p"""
3,"""Premium""",19.99,"""4K"""
4,"""Deluxe""",999.99,"""4K with Caviar"""


- The `users.csv` file lists the subscribers (name, email, and the ID of their subscription plan).

In [4]:
pl.read_csv("streaming_service/users.csv").head(2)

id,name,email,subscription_plan_id
i64,str,str,i64
1,"""Alicia Palmer""","""alicia103@hotmail.com""",2
2,"""Marcus Carlson""","""marcus861@protonmail.com""",3


- The `watch_history.csv` file is a join table that connects a user ID and a movie ID.

In [5]:
pl.read_csv("streaming_service/watch_history.csv").head(2)

id,user_id,movie_id,watched_on
i64,i64,i64,str
1,913,410,"""2025-07-17"""
2,2007,3658,"""2024-10-12"""


- The `support.csv` file is a dataset of complaints that customer support received.
- Some tickets are connected to a specific user/subscriber; other tickets are not.

In [6]:
pl.read_csv("streaming_service/support.csv").head()

ticket_id,user_id,complaint
i64,i64,str
1,2557.0,"""billing"""
2,4534.0,"""video quality"""
3,2994.0,"""video quality"""
4,,"""movie selection"""
5,,"""movie selection"""


In [7]:
movies = pl.read_csv("streaming_service/movies.csv")
plans = pl.read_csv("streaming_service/plans.csv")
users = pl.read_csv("streaming_service/users.csv")
watch_history = pl.read_csv("streaming_service/watch_history.csv")
support = pl.read_csv("streaming_service/support.csv")

### Further Reading
- https://docs.pola.rs/user-guide/transformations/joins/#quick-reference-table
- https://docs.pola.rs/user-guide/transformations/joins/#equi-joins

## Inner Join

In [8]:
movies = pl.read_csv("streaming_service/movies.csv")
plans = pl.read_csv("streaming_service/plans.csv")
users = pl.read_csv("streaming_service/users.csv")
watch_history = pl.read_csv("streaming_service/watch_history.csv")
support = pl.read_csv("streaming_service/support.csv")

- An `inner` join merges rows that have a matching value in both `DataFrames`.
- Inner joins are ideal when you want to find the intersection or overlap between two datasets (the values in common).
- Polars will join rows based on the same value existing in a specified column in both tables.

<img src="images/Inner_Join.webp" alt="Inner Join Diagram" width="400" />

- Let's say I want to find every movie that every user watched.
- An inner join between `users` and `watch_history` identifies users whose ID appears in both `users.id` and `watch_history.user_id`.
- Polars excludes `null` (missing) values in matches by default.  A null is not considered equal to another null.

In [9]:
users.head(1)

id,name,email,subscription_plan_id
i64,str,str,i64
1,"""Alicia Palmer""","""alicia103@hotmail.com""",2


In [10]:
watch_history.head(1)

id,user_id,movie_id,watched_on
i64,i64,i64,str
1,913,410,"""2025-07-17"""


- Some users have watched multiple movies. Some users have watched no movies.
- An inner join will exclude users with an `id` in `users` but no `user_id` in `watch_history`. In other words, an inner join will exclude users with no watch history.
- An inner join can create multiple matches if the same user watched multiple movies. The same `user_id` may appear multiple times in the `watch_history` table and Polars will match it repeatedly with the row with the same user ID in `users`.

### The join Method
- The `join` method merges two `DataFrames` together.
- The `other` parameter sets the second `DataFrame`.
- The `how` parameter declares the join strategy.
- Polars appends a `_right` suffix to any duplicate column name from the right `DataFrame`.
- Polars exclude `watch_history.user_id` from the result. The value in that column is the same as `id`.

In [11]:
users.join(watch_history, how="inner", left_on="id", right_on="user_id")

id,name,email,subscription_plan_id,id_right,movie_id,watched_on
i64,str,str,i64,i64,i64,str
1,"""Alicia Palmer""","""alicia103@hotmail.com""",2,1778,8575,"""2024-10-06"""
2,"""Marcus Carlson""","""marcus861@protonmail.com""",3,1881,568,"""2025-08-05"""
3,"""Robert Williams""","""robert107@outlook.com""",2,1671,2580,"""2024-06-26"""
5,"""Richard Parsons""","""richard615@yahoo.com""",5,51,5311,"""2026-09-27"""
6,"""Alexis Thompson""","""alexis467@protonmail.com""",2,1043,7010,"""2025-02-18"""
…,…,…,…,…,…,…
4978,"""Jennifer Leach""","""jennifer883@gmail.com""",3,614,7180,"""2024-03-12"""
4985,"""Pamela Marshall""","""pamela197@aol.com""",3,1877,3944,"""2027-02-27"""
4988,"""Michelle Scott""","""michelle866@pandas.edu""",1,1095,902,"""2025-09-25"""
4989,"""Kevin Moss""","""kevin392@polars.edu""",1,1390,3273,"""2025-06-22"""


- Let's identify the users who watched multiple movies.
- The `is_duplicated` method returns True if a row stores a duplicate value.

In [12]:
users.join(
    watch_history,
    how="inner",
    left_on="id",
    right_on="user_id",
    suffix="_from_watch_history",
).filter(pl.col("id").is_duplicated())

id,name,email,subscription_plan_id,id_from_watch_history,movie_id,watched_on
i64,str,str,i64,i64,i64,str
22,"""Jeffrey Munoz""","""jeffrey682@outlook.com""",3,280,6694,"""2026-07-11"""
22,"""Jeffrey Munoz""","""jeffrey682@outlook.com""",3,329,4635,"""2025-09-03"""
22,"""Jeffrey Munoz""","""jeffrey682@outlook.com""",3,1110,722,"""2024-01-26"""
73,"""Jeffrey Roberts""","""jeffrey63@aol.com""",3,1314,7768,"""2024-09-27"""
73,"""Jeffrey Roberts""","""jeffrey63@aol.com""",3,1512,1880,"""2027-10-01"""
…,…,…,…,…,…,…
4958,"""Haley Turner""","""haley535@polars.edu""",3,900,6500,"""2025-10-30"""
4970,"""Maria Allen""","""maria14@yahoo.com""",1,668,2354,"""2025-10-02"""
4970,"""Maria Allen""","""maria14@yahoo.com""",1,1571,1344,"""2025-07-28"""
4974,"""Daniel Mcbride""","""daniel169@yahoo.com""",3,373,5773,"""2025-03-24"""


- The `suffix` parameter attaches a custom suffix to the duplicate column names from the right `DataFrame`.
- The `id` column from `watch_history` becomes `id_from_watch_history`.
- The `id` column from `users` is unaffected and remains `id`.

In [13]:
users.join(
    watch_history,
    how="inner",
    left_on="id",
    right_on="user_id",
    suffix="_from_watch_history",
)

id,name,email,subscription_plan_id,id_from_watch_history,movie_id,watched_on
i64,str,str,i64,i64,i64,str
1,"""Alicia Palmer""","""alicia103@hotmail.com""",2,1778,8575,"""2024-10-06"""
2,"""Marcus Carlson""","""marcus861@protonmail.com""",3,1881,568,"""2025-08-05"""
3,"""Robert Williams""","""robert107@outlook.com""",2,1671,2580,"""2024-06-26"""
5,"""Richard Parsons""","""richard615@yahoo.com""",5,51,5311,"""2026-09-27"""
6,"""Alexis Thompson""","""alexis467@protonmail.com""",2,1043,7010,"""2025-02-18"""
…,…,…,…,…,…,…
4978,"""Jennifer Leach""","""jennifer883@gmail.com""",3,614,7180,"""2024-03-12"""
4985,"""Pamela Marshall""","""pamela197@aol.com""",3,1877,3944,"""2027-02-27"""
4988,"""Michelle Scott""","""michelle866@pandas.edu""",1,1095,902,"""2025-09-25"""
4989,"""Kevin Moss""","""kevin392@polars.edu""",1,1390,3273,"""2025-06-22"""


- For inner joins, Polars excludes the matching join column from the right table (`user_id` from the `watch_history`).
- The `watch_history.user_id` column's values would match the `users.id` value.
- Pass the `coalesce` parameter an argument of `False` to include the matching columns.

In [14]:
users.join(
    watch_history,
    how="inner",
    left_on="id",
    right_on="user_id",
    suffix="_from_watch_history",
    coalesce=False,
)

id,name,email,subscription_plan_id,id_from_watch_history,user_id,movie_id,watched_on
i64,str,str,i64,i64,i64,i64,str
1,"""Alicia Palmer""","""alicia103@hotmail.com""",2,1778,1,8575,"""2024-10-06"""
2,"""Marcus Carlson""","""marcus861@protonmail.com""",3,1881,2,568,"""2025-08-05"""
3,"""Robert Williams""","""robert107@outlook.com""",2,1671,3,2580,"""2024-06-26"""
5,"""Richard Parsons""","""richard615@yahoo.com""",5,51,5,5311,"""2026-09-27"""
6,"""Alexis Thompson""","""alexis467@protonmail.com""",2,1043,6,7010,"""2025-02-18"""
…,…,…,…,…,…,…,…
4978,"""Jennifer Leach""","""jennifer883@gmail.com""",3,614,4978,7180,"""2024-03-12"""
4985,"""Pamela Marshall""","""pamela197@aol.com""",3,1877,4985,3944,"""2027-02-27"""
4988,"""Michelle Scott""","""michelle866@pandas.edu""",1,1095,4988,902,"""2025-09-25"""
4989,"""Kevin Moss""","""kevin392@polars.edu""",1,1390,4989,3273,"""2025-06-22"""


### Further Reading
- https://docs.pola.rs/user-guide/transformations/joins/#inner-join
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join.html

## The on Parameter
- The `on` parameter can designates the join column _if_ the column name is identical across both tables.

In [15]:
movies = pl.read_csv("streaming_service/movies.csv")
plans = pl.read_csv("streaming_service/plans.csv")
users = pl.read_csv("streaming_service/users.csv").rename({"id": "user_id"})
watch_history = pl.read_csv("streaming_service/watch_history.csv")
support = pl.read_csv("streaming_service/support.csv")

In [16]:
users.head(1)

user_id,name,email,subscription_plan_id
i64,str,str,i64
1,"""Alicia Palmer""","""alicia103@hotmail.com""",2


In [17]:
watch_history.head(1)

id,user_id,movie_id,watched_on
i64,i64,i64,str
1,913,410,"""2025-07-17"""


- Let's execute the inner join from the previous lesson.
- The `left_on` and `right_on` parameters are still valid...

In [18]:
users.join(watch_history, how="inner", left_on="user_id", right_on="user_id")

user_id,name,email,subscription_plan_id,id,movie_id,watched_on
i64,str,str,i64,i64,i64,str
1,"""Alicia Palmer""","""alicia103@hotmail.com""",2,1778,8575,"""2024-10-06"""
2,"""Marcus Carlson""","""marcus861@protonmail.com""",3,1881,568,"""2025-08-05"""
3,"""Robert Williams""","""robert107@outlook.com""",2,1671,2580,"""2024-06-26"""
5,"""Richard Parsons""","""richard615@yahoo.com""",5,51,5311,"""2026-09-27"""
6,"""Alexis Thompson""","""alexis467@protonmail.com""",2,1043,7010,"""2025-02-18"""
…,…,…,…,…,…,…
4978,"""Jennifer Leach""","""jennifer883@gmail.com""",3,614,7180,"""2024-03-12"""
4985,"""Pamela Marshall""","""pamela197@aol.com""",3,1877,3944,"""2027-02-27"""
4988,"""Michelle Scott""","""michelle866@pandas.edu""",1,1095,902,"""2025-09-25"""
4989,"""Kevin Moss""","""kevin392@polars.edu""",1,1390,3273,"""2025-06-22"""


- ...but the `on` parameter is cleaner.

In [19]:
users.join(watch_history, how="inner", on="user_id")

user_id,name,email,subscription_plan_id,id,movie_id,watched_on
i64,str,str,i64,i64,i64,str
1,"""Alicia Palmer""","""alicia103@hotmail.com""",2,1778,8575,"""2024-10-06"""
2,"""Marcus Carlson""","""marcus861@protonmail.com""",3,1881,568,"""2025-08-05"""
3,"""Robert Williams""","""robert107@outlook.com""",2,1671,2580,"""2024-06-26"""
5,"""Richard Parsons""","""richard615@yahoo.com""",5,51,5311,"""2026-09-27"""
6,"""Alexis Thompson""","""alexis467@protonmail.com""",2,1043,7010,"""2025-02-18"""
…,…,…,…,…,…,…
4978,"""Jennifer Leach""","""jennifer883@gmail.com""",3,614,7180,"""2024-03-12"""
4985,"""Pamela Marshall""","""pamela197@aol.com""",3,1877,3944,"""2027-02-27"""
4988,"""Michelle Scott""","""michelle866@pandas.edu""",1,1095,902,"""2025-09-25"""
4989,"""Kevin Moss""","""kevin392@polars.edu""",1,1390,3273,"""2025-06-22"""


### Further Reading
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join.html

## Full Joins
- A full join keeps merges rows from both `DataFrames`.
- Like an inner join, a full join will match rows based on a shared value across specified columns.
- Unlike an inner join, a full join will keep a row if it does not have a match in the other `DataFrame`.
- Polars will populate the remaining row values with `null` if there isn't a match with the other `DataFrame`.

In [20]:
movies = pl.read_csv("streaming_service/movies.csv")
plans = pl.read_csv("streaming_service/plans.csv")
users = pl.read_csv("streaming_service/users.csv")
watch_history = pl.read_csv("streaming_service/watch_history.csv")
support = pl.read_csv("streaming_service/support.csv")

<img src="images/Full_Join.webp" alt="Full Join Diagram" width="400" />

In [21]:
users.head(2)

id,name,email,subscription_plan_id
i64,str,str,i64
1,"""Alicia Palmer""","""alicia103@hotmail.com""",2
2,"""Marcus Carlson""","""marcus861@protonmail.com""",3


In [22]:
plans

id,name,price,resolution
i64,str,f64,str
1,"""Free""",8.99,"""720p"""
2,"""Standard""",13.99,"""1080p"""
3,"""Premium""",19.99,"""4K"""
4,"""Deluxe""",999.99,"""4K with Caviar"""


- An orphan record is an entry that doesn't have a matching record in the other table.
- A full join is ideal for identifying orphan records (users whose plan ID is invalid, plans who have no users).
- Richard Parsons has a `subscription_plan_id` (5) that doesn't exist in the `plans` `DataFrame`.
- The row entries for Richard Parsons are `null`. There are no values to pull in from a matching row in `plans` `DataFrame`.
- The Deluxe plan and its ID (4) has no user who subscribes to it.
- The Deluxe plan row has `null` values for the user `id`, `name`, `email`, and `subscription_plan_id`

In [23]:
users.join(
    plans,
    how="full",
    left_on="subscription_plan_id",
    right_on="id",
    suffix="_from_plans",
)

id,name,email,subscription_plan_id,id_from_plans,name_from_plans,price,resolution
i64,str,str,i64,i64,str,f64,str
1,"""Alicia Palmer""","""alicia103@hotmail.com""",2,2,"""Standard""",13.99,"""1080p"""
2,"""Marcus Carlson""","""marcus861@protonmail.com""",3,3,"""Premium""",19.99,"""4K"""
3,"""Robert Williams""","""robert107@outlook.com""",2,2,"""Standard""",13.99,"""1080p"""
4,"""Molly Torres""","""molly701@polars.edu""",1,1,"""Free""",8.99,"""720p"""
5,"""Richard Parsons""","""richard615@yahoo.com""",5,,,,
…,…,…,…,…,…,…,…
4997,"""Joshua Harrington""","""joshua522@polars.edu""",3,3,"""Premium""",19.99,"""4K"""
4998,"""Kenneth Sanchez""","""kenneth212@outlook.com""",2,2,"""Standard""",13.99,"""1080p"""
4999,"""Jasmine Hall""","""jasmine8@hotmail.com""",3,3,"""Premium""",19.99,"""4K"""
5000,"""Jeffrey Brooks""","""jeffrey477@yahoo.com""",2,2,"""Standard""",13.99,"""1080p"""


- Let's identify the users who are subscribed to a non-existent subscription plan.

In [24]:
users.join(
    plans,
    how="full",
    left_on="subscription_plan_id",
    right_on="id",
    suffix="_from_plans",
).filter(pl.col("id_from_plans").is_null())

id,name,email,subscription_plan_id,id_from_plans,name_from_plans,price,resolution
i64,str,str,i64,i64,str,f64,str
5,"""Richard Parsons""","""richard615@yahoo.com""",5,,,,


- Let's identify the plans who have no subscribing users.

In [25]:
users.join(
    plans,
    how="full",
    left_on="subscription_plan_id",
    right_on="id",
    suffix="_from_plans",
).filter(pl.col("id").is_null())

id,name,email,subscription_plan_id,id_from_plans,name_from_plans,price,resolution
i64,str,str,i64,i64,str,f64,str
,,,,4,"""Deluxe""",999.99,"""4K with Caviar"""


In [26]:
plans

id,name,price,resolution
i64,str,f64,str
1,"""Free""",8.99,"""720p"""
2,"""Standard""",13.99,"""1080p"""
3,"""Premium""",19.99,"""4K"""
4,"""Deluxe""",999.99,"""4K with Caviar"""


### Further Reading
- https://docs.pola.rs/user-guide/transformations/joins/#full-join
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join.html

## Left and Right Joins
- A left join keeps all records from the left table and merges matching rows (where possible) from the right table.
- All left table rows will be present. Where there is no match, Polars will populate `null` in the right table columns.

In [27]:
movies = pl.read_csv("streaming_service/movies.csv")
plans = pl.read_csv("streaming_service/plans.csv")
users = pl.read_csv("streaming_service/users.csv")
watch_history = pl.read_csv("streaming_service/watch_history.csv")
support = pl.read_csv("streaming_service/support.csv")

<img src="images/Left_Join.png" alt="Left Join Diagram" width="400" />

In [28]:
support.head(4)

ticket_id,user_id,complaint
i64,i64,str
1,2557.0,"""billing"""
2,4534.0,"""video quality"""
3,2994.0,"""video quality"""
4,,"""movie selection"""


In [29]:
users.head(2)

id,name,email,subscription_plan_id
i64,str,str,i64
1,"""Alicia Palmer""","""alicia103@hotmail.com""",2
2,"""Marcus Carlson""","""marcus861@protonmail.com""",3


- The first three support tickets have an associated user who we can match by ID in the `users` table.
- The fourth ticket has no `user_id`; the row's values for `name`, `email`, and `subscription_plan_id` will be null.
- A null `user_id` in the `users` table will also create `null` (missing) values.
- Polars will coalesce (combine together into one) the duplicate columns in a left join.

In [30]:
support.join(users, how="left", left_on="user_id", right_on="id")

support.join(users, how="left", left_on="user_id", right_on="id", coalesce=False)

ticket_id,user_id,complaint,id,name,email,subscription_plan_id
i64,i64,str,i64,str,str,i64
1,2557,"""billing""",2557,"""Louis Rodriguez""","""louis10@hotmail.com""",2
2,4534,"""video quality""",4534,"""James Wilson""","""james788@hotmail.com""",2
3,2994,"""video quality""",2994,"""John Cole""","""john450@pandas.edu""",1
4,,"""movie selection""",,,,
5,,"""movie selection""",,,,
…,…,…,…,…,…,…
496,,"""audio quality""",,,,
497,,"""billing""",,,,
498,2800,,2800,"""Angelica Weiss""","""angelica155@gmail.com""",1
499,,,,,,


- A right join on the `users` `DataFrame` will accomplish the same result.
- The right table (`support`) is still the anchor table/source of truth. All of its rows will be kept.
- Polars will bring in the rows where the `users.id` matches `support.user_id`.
- The order of the columns will be different.

In [31]:
users.join(support, how="right", left_on="id", right_on="user_id")

name,email,subscription_plan_id,ticket_id,user_id,complaint
str,str,i64,i64,i64,str
"""Louis Rodriguez""","""louis10@hotmail.com""",2,1,2557,"""billing"""
"""James Wilson""","""james788@hotmail.com""",2,2,4534,"""video quality"""
"""John Cole""","""john450@pandas.edu""",1,3,2994,"""video quality"""
,,,4,,"""movie selection"""
,,,5,,"""movie selection"""
…,…,…,…,…,…
,,,496,,"""audio quality"""
,,,497,,"""billing"""
"""Angelica Weiss""","""angelica155@gmail.com""",1,498,2800,
,,,499,,


### Further Reading
- https://docs.pola.rs/user-guide/transformations/joins/#left-join
- https://docs.pola.rs/user-guide/transformations/joins/#right-join
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join.html

## Semi Join
- A semi join keeps only the left `DataFrame` rows that have a match in the right `DataFrame`.
- However, Polars does not concatenate the right `DataFrame`'s columns to the resulting `DataFrame`.
- A semi join is closer to a filter operation than a join.

In [32]:
movies = pl.read_csv("streaming_service/movies.csv")
plans = pl.read_csv("streaming_service/plans.csv")
users = pl.read_csv("streaming_service/users.csv")
watch_history = pl.read_csv("streaming_service/watch_history.csv")
support = pl.read_csv("streaming_service/support.csv")

In [33]:
users.head(1)

id,name,email,subscription_plan_id
i64,str,str,i64
1,"""Alicia Palmer""","""alicia103@hotmail.com""",2


In [34]:
support.head(4)

ticket_id,user_id,complaint
i64,i64,str
1,2557.0,"""billing"""
2,4534.0,"""video quality"""
3,2994.0,"""video quality"""
4,,"""movie selection"""


- Business Case: Let's filter for the users who created at least one support ticket.
- Invoke `join` on `users` so the resulting `DataFrame` has the columns from the `users` table.

In [35]:
users.join(support, how="semi", left_on="id", right_on="user_id")

id,name,email,subscription_plan_id
i64,str,str,i64
32,"""Gregory Davis""","""gregory21@gmail.com""",3
48,"""Joseph Lawson""","""joseph390@pandas.edu""",3
61,"""Alexandra Luna""","""alexandra749@yahoo.com""",1
72,"""Jesse Elliott""","""jesse648@outlook.com""",3
198,"""James Becker MD""","""james920@hotmail.com""",3
…,…,…,…
4681,"""Andrew Anderson""","""andrew97@gmail.com""",3
4826,"""Craig Avila""","""craig667@hotmail.com""",3
4881,"""Kelly Mejia""","""kelly693@aol.com""",2
4933,"""Kristopher Rogers""","""kristopher699@pandas.edu""",2


### Further Reading
- https://docs.pola.rs/user-guide/getting-started/#joining-dataframes
- https://docs.pola.rs/user-guide/transformations/joins/#semi-join
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join.html

## Anti Join
- The anti join is the opposite of the semi join.
- The anti join keeps the left `DataFrame` rows that do not have a match in the right `DataFrame`.

In [36]:
movies = pl.read_csv("streaming_service/movies.csv")
plans = pl.read_csv("streaming_service/plans.csv")
users = pl.read_csv("streaming_service/users.csv")
watch_history = pl.read_csv("streaming_service/watch_history.csv")
support = pl.read_csv("streaming_service/support.csv")

In [37]:
users.head(3)

id,name,email,subscription_plan_id
i64,str,str,i64
1,"""Alicia Palmer""","""alicia103@hotmail.com""",2
2,"""Marcus Carlson""","""marcus861@protonmail.com""",3
3,"""Robert Williams""","""robert107@outlook.com""",2


In [38]:
support.head(3)

ticket_id,user_id,complaint
i64,i64,str
1,2557,"""billing"""
2,4534,"""video quality"""
3,2994,"""video quality"""


- Business case: Identify the users who did not file a ticket/complaint.
- We want to find the rows with an `id` in `users` (left) that do not have a match in `support.user_id`.

In [39]:
users.join(support, how="anti", left_on="id", right_on="user_id")

id,name,email,subscription_plan_id
i64,str,str,i64
1,"""Alicia Palmer""","""alicia103@hotmail.com""",2
2,"""Marcus Carlson""","""marcus861@protonmail.com""",3
3,"""Robert Williams""","""robert107@outlook.com""",2
4,"""Molly Torres""","""molly701@polars.edu""",1
5,"""Richard Parsons""","""richard615@yahoo.com""",5
…,…,…,…
4995,"""Shawn Ayers""","""shawn651@hotmail.com""",3
4996,"""Sarah Bowers""","""sarah309@hotmail.com""",3
4998,"""Kenneth Sanchez""","""kenneth212@outlook.com""",2
4999,"""Jasmine Hall""","""jasmine8@hotmail.com""",3


### Further Reading
- https://docs.pola.rs/user-guide/transformations/joins/#anti-join
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join.html

## Cross Joins
- A cross join matches every row from the left `DataFrame` with every row from the right `DataFrame`.
- The strategy is called a Cartesian product.
- The resulting `DataFrame`'s length will be equal to the product of the two `DataFrame's` lengths.
- The `on` parameter is not required because every left row will be matched with every right row.

In [40]:
foods = pl.DataFrame(
    {"food": ["Hamburger", "Hotdog", "Chicken Fingers"], "calories": [354, 151, 185]}
)

foods

food,calories
str,i64
"""Hamburger""",354
"""Hotdog""",151
"""Chicken Fingers""",185


In [41]:
condiments = pl.DataFrame({"condiment": ["Ketchup", "Mustard"], "calories": [19, 3]})

condiments

condiment,calories
str,i64
"""Ketchup""",19
"""Mustard""",3


- A cross join will include all columns from both `DataFrames`.
- Polars will append `_right` to the right `DataFrame`'s duplicate columns.

In [42]:
foods.join(condiments, how="cross", suffix="_from_condiments")

food,calories,condiment,calories_from_condiments
str,i64,str,i64
"""Hamburger""",354,"""Ketchup""",19
"""Hamburger""",354,"""Mustard""",3
"""Hotdog""",151,"""Ketchup""",19
"""Hotdog""",151,"""Mustard""",3
"""Chicken Fingers""",185,"""Ketchup""",19
"""Chicken Fingers""",185,"""Mustard""",3


### Further Reading
- https://docs.pola.rs/user-guide/transformations/joins/#cartesian-product
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join.html

## Joining on Multiple Columns
- Polars can join `DataFrames` based on matching values across multiple columns.
- Let's imagine we are running a chain of stores.
- We manage the store's inventory in one system and the products in another.
- The inventory system stores the remaining units of each product per store.

In [43]:
inventory = pl.read_csv("store/inventory.csv")
inventory

store_id,product_id,quantity
i64,i64,i64
1,101,5
1,102,2
1,103,6
2,103,7
2,104,3
3,201,1


- The price system stores each the price of each product per store.

In [44]:
prices = pl.read_csv("store/prices.csv")
prices

store_id,product_id,price
i64,i64,i64
1,102,10
1,101,10
1,103,5
2,105,12
2,201,15
3,201,8


- Review: An **inner join** merges rows where values exist and match in _both_ `DataFrames`.
- An inner join across multiple columns will merge rows based on matching values in both `store_id` and `product_id`.
- If the `store_id` matches but the `product_id` does not match, Polars does not join the rows.
- If the `product_id` matches but the `store_id` does not match, Polars does not join the rows.
- Pass the `on`/`left_on`/`right_on` parameters a list of column names.

In [45]:
inventory.join(prices, how="inner", on=["store_id", "product_id"])

store_id,product_id,quantity,price
i64,i64,i64,i64
1,102,2,10
1,101,5,10
1,103,6,5
3,201,1,8


- Review: A **full join** pulls in all rows from both `DataFrames`.
- Polars with join rows where `store_id` and `product_right` match.
- Poalrs will keep rows with no complementary matching values..
- ...but those rows will have `null` values in the other columns.

In [46]:
inventory.join(prices, how="full", on=["store_id", "product_id"], suffix="_from_prices")

store_id,product_id,quantity,store_id_from_prices,product_id_from_prices,price
i64,i64,i64,i64,i64,i64
1.0,102.0,2.0,1.0,102.0,10.0
1.0,101.0,5.0,1.0,101.0,10.0
1.0,103.0,6.0,1.0,103.0,5.0
,,,2.0,105.0,12.0
,,,2.0,201.0,15.0
3.0,201.0,1.0,3.0,201.0,8.0
2.0,104.0,3.0,,,
2.0,103.0,7.0,,,


- Review: A **left join** keeps all rows from `inventory` and pulls in matching rows from `prices`.
- Polars will populate `null` for the `price` column if the `store_id` and `product_id` does not exist in `prices`.
- Polars will exclude rows from `prices` with no matching combination of `store_id` + `product_id`.

In [47]:
inventory.join(prices, how="left", on=["store_id", "product_id"])

store_id,product_id,quantity,price
i64,i64,i64,i64
1,101,5,10.0
1,102,2,10.0
1,103,6,5.0
2,103,7,
2,104,3,
3,201,1,8.0


- Review: A **semi join** selects/filters the `inventory` rows where the `store_id` and `product_id` exist in `prices`.
- A semi join does not pull in the corresponding data from `prices`.
- Semi joins isolate the `store_id` and `product_id`s that exist in both tables.
- For example, a row with `store_id=1` and `product_id=101` exists in both tables.

In [48]:
inventory.join(prices, how="semi", on=["store_id", "product_id"])

store_id,product_id,quantity
i64,i64,i64
1,101,5
1,102,2
1,103,6
3,201,1


- Review: An **anti join** selects/filters the `inventory` rows where `store_id` and `product_id` does not in `prices`.
- An anti join identifies which combinations of `inventory`'s `store_id` and `product_id` do not exist in `prices`.
- For example, a row with `store_id=2` and `product_id=103` exists in `inventory` but does not exist in the `prices` table.

In [49]:
inventory.join(prices, how="anti", on=["store_id", "product_id"])

store_id,product_id,quantity
i64,i64,i64
2,103,7
2,104,3


### Further Reading
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join.html

## The validate Parameter
- The `validate` parameter to the `join` method asserts on the uniqueness of the join keys.
- Think of `validate` as a safety check to confirm the `join` method is doing what we expect.
- The default validation is `m:m` (join keys can occur multiple times in both left and right `DataFrame`).
- `m:m` turns off off the validation check entirely. It doesn't matter if join key occurs 0 times, 1 time, or multiple times.

In [50]:
students = pl.read_csv("college/students.csv")
students.head(2)

student_id,name
i64,str
101,"""Emma"""
102,"""Noah"""


In [51]:
enrollments = pl.read_csv("college/enrollments.csv")
enrollments.head(2)

student_id,class_name
i64,str
101,"""Yoga"""
101,"""Boxing"""


In [52]:
# left : right
# 1 - join key will occur only once
# m - join key can occur multiple times ("many")
# m:m (many-to-many)
students.join(enrollments, how="inner", on="student_id", validate="m:m")

student_id,name,class_name
i64,str,str
101,"""Emma""","""Yoga"""
101,"""Emma""","""Boxing"""
102,"""Noah""","""Yoga"""
103,"""Olivia""","""Spin"""
103,"""Olivia""","""Boxing"""
103,"""Olivia""","""CrossFit"""


- The `1:m` (1 to many) validation checks that the join key(s) are unique in the left dataset.
- Each `student_id` must only exist once in the `students` `DataFrame` (that's the `1` part).
- There _can_ be multiple occurrences of the same `student_id` in `enrollments` (that's the `m` part).

In [53]:
# students (1) -> join key of student_id cannot repeat
# enrollments (m) -> join key of student_id can repeat
students.join(enrollments, how="inner", on="student_id", validate="1:m")

student_id,name,class_name
i64,str,str
101,"""Emma""","""Yoga"""
101,"""Emma""","""Boxing"""
102,"""Noah""","""Yoga"""
103,"""Olivia""","""Spin"""
103,"""Olivia""","""Boxing"""
103,"""Olivia""","""CrossFit"""


- A `validate` parameter set to `1:1` asserts that join keys are unique in both left and right `DataFrames`.
- Each value in the left `DataFrame` can match at most once with a value in the right `DataFrame`.
- A `1:1` validation will fail here because the same `student_id` repeats in the `enrollments` `DataFrame`.

In [54]:
# students (1) - join key of student_id must be unique
# enrollments (1) - join key of student_id must also be unique
# students.join(enrollments, how="inner", on="student_id", validate="1:1")

- The inverted `m:1` (many to 1) validation checks that the join key(s) are unique in the right dataset.
- There _can_ be multiple occurrences of the same `student_id` in `enrollments` (that's the `m` part).
- Each `student_id` must only exist once in the `students` `DataFrame` (that's the `1` part).
- The example is the same as the previous one but the left and right `DataFrames` are flipped.

In [55]:
# enrollments (m) - join key of student_id can repeat
# students (1) - join key of student_id must be unique
enrollments.join(students, how="inner", on="student_id", validate="m:1")

student_id,class_name,name
i64,str,str
101,"""Yoga""","""Emma"""
101,"""Boxing""","""Emma"""
102,"""Yoga""","""Noah"""
103,"""Spin""","""Olivia"""
103,"""Boxing""","""Olivia"""
103,"""CrossFit""","""Olivia"""


### Further Reading
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join.html

## The join_asof Method I
- The `join_asof` method matches values on the _nearest_ match rather than an exact match.
- This technique is particularly helpful when working with timeseries data.
- A row's datetime value may not perfectly match with a datetime entry in another table.
- "as of" means "up to or before a given time." i.e., "The system was running as of 9am this morning"
- The `outages.csv` shows the timestamps of issues on a website.

In [56]:
outages = pl.read_csv("outages/outages.csv", try_parse_dates=True).sort("timestamp")
outages

timestamp,reported_issue
datetime[μs],str
2026-01-01 00:05:00,"""Slow load times"""
2026-01-01 00:25:00,"""Website down"""
2026-01-01 00:47:00,"""SSL handshake failed"""
2026-01-01 01:13:00,"""Unexpected redirect"""
2026-01-01 01:50:00,"""500 errors"""


- The `uptime_checks.csv` shows the timestamps of uptime checks.
- The website performs an uptime check every 10 minutes.
- A value of "OK" indicates no issue at that time.

In [57]:
uptime_checks = pl.read_csv("outages/uptime_checks.csv", try_parse_dates=True).sort(
    "check_timestamp"
)
uptime_checks.head()

check_timestamp,status
datetime[μs],str
2026-01-01 00:00:00,"""OK"""
2026-01-01 00:10:00,"""Timeout"""
2026-01-01 00:20:00,"""Timeout"""
2026-01-01 00:30:00,"""OK"""
2026-01-01 00:40:00,"""OK"""


- To use `join_asof`, sort both `DataFrames` using the `on` column (the columns whose values will be used for the join)
- A backward search selects the last row in the right `DataFrame` who key is less than or equal to the left's key.
- The default strategy is `backward`.

In [58]:
outages.join_asof(uptime_checks, left_on="timestamp", right_on="check_timestamp")
outages.join_asof(
    uptime_checks, left_on="timestamp", right_on="check_timestamp", strategy="backward"
)

timestamp,reported_issue,check_timestamp,status
datetime[μs],str,datetime[μs],str
2026-01-01 00:05:00,"""Slow load times""",2026-01-01 00:00:00,"""OK"""
2026-01-01 00:25:00,"""Website down""",2026-01-01 00:20:00,"""Timeout"""
2026-01-01 00:47:00,"""SSL handshake failed""",2026-01-01 00:40:00,"""OK"""
2026-01-01 01:13:00,"""Unexpected redirect""",2026-01-01 01:10:00,"""OK"""
2026-01-01 01:50:00,"""500 errors""",2026-01-01 01:50:00,"""OK"""


- A forward search selects the first row in the right `DataFrame` who key is greater than or equal to the left's key.

In [59]:
outages.join_asof(
    uptime_checks, left_on="timestamp", right_on="check_timestamp", strategy="forward"
)

timestamp,reported_issue,check_timestamp,status
datetime[μs],str,datetime[μs],str
2026-01-01 00:05:00,"""Slow load times""",2026-01-01 00:10:00,"""Timeout"""
2026-01-01 00:25:00,"""Website down""",2026-01-01 00:30:00,"""OK"""
2026-01-01 00:47:00,"""SSL handshake failed""",2026-01-01 00:50:00,"""Timeout"""
2026-01-01 01:13:00,"""Unexpected redirect""",2026-01-01 01:20:00,"""OK"""
2026-01-01 01:50:00,"""500 errors""",2026-01-01 01:50:00,"""OK"""


- A `strategy` of `nearest` finds the closest match (least distance to travel).
- If a time falls right in the middle, it will be rounded up (much like 0.5 is rounded to 1).
- Notice that Polars matches `2026-01-01 00:47:00` with `2026-01-01 00:50:00` timestamp (forward).
- Vice versa, Polars matches `2026-01-01 01:13:00` with `2026-01-01 01:10:00` timestamp (backward).

In [60]:
outages.join_asof(
    uptime_checks, left_on="timestamp", right_on="check_timestamp", strategy="nearest"
)

timestamp,reported_issue,check_timestamp,status
datetime[μs],str,datetime[μs],str
2026-01-01 00:05:00,"""Slow load times""",2026-01-01 00:10:00,"""Timeout"""
2026-01-01 00:25:00,"""Website down""",2026-01-01 00:30:00,"""OK"""
2026-01-01 00:47:00,"""SSL handshake failed""",2026-01-01 00:50:00,"""Timeout"""
2026-01-01 01:13:00,"""Unexpected redirect""",2026-01-01 01:10:00,"""OK"""
2026-01-01 01:50:00,"""500 errors""",2026-01-01 01:50:00,"""OK"""


### Further Reading
- https://docs.pola.rs/user-guide/transformations/joins/#asof-join
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join_asof.html

## The join_asof Method II: Tolerance
- The `tolerance` parameter sets the constraint/boundary/duration by which the match can occur in the given search direction.
- Each unit of time has a corresponding symbol:
    - `ms` for millisecond
    - `m` for minute
    - `h` for hour
    - `d` for day
    - `w` for week
    - `mo` for month
    - `q` for quarter
    - `y` for year

In [61]:
outages = pl.read_csv("outages/outages.csv", try_parse_dates=True).sort("timestamp")
uptime_checks = pl.read_csv("outages/uptime_checks.csv", try_parse_dates=True).sort(
    "check_timestamp"
)

- For example, `5m` (5 minutes) and a `backward` strategy instructs Polars to look for a matching timestamp in the previous 5 minutes.
- The outage at `2026-01-01 00:05:00` no longer matches `2026-01-01 00:00:00` because it it outside the 4-minute window.
- Polars supplies `null` where it cannot join a complementary row.

In [62]:
outages.join_asof(
    uptime_checks,
    left_on="timestamp",
    right_on="check_timestamp",
    strategy="backward",
    tolerance="4m",
)

timestamp,reported_issue,check_timestamp,status
datetime[μs],str,datetime[μs],str
2026-01-01 00:05:00,"""Slow load times""",,
2026-01-01 00:25:00,"""Website down""",,
2026-01-01 00:47:00,"""SSL handshake failed""",,
2026-01-01 01:13:00,"""Unexpected redirect""",2026-01-01 01:10:00,"""OK"""
2026-01-01 01:50:00,"""500 errors""",2026-01-01 01:50:00,"""OK"""


- A unit of time can combine symbols: `6d7h5m` declares a tolerance range of "6 days, 7 hours, 5 minutes".
- There is no tolerance by default, so Polars will proceed backwards until it finds a match (if one exists).

In [63]:
outages.join_asof(
    uptime_checks,
    left_on="timestamp",
    right_on="check_timestamp",
    strategy="forward",
    tolerance="4m",
)

timestamp,reported_issue,check_timestamp,status
datetime[μs],str,datetime[μs],str
2026-01-01 00:05:00,"""Slow load times""",,
2026-01-01 00:25:00,"""Website down""",,
2026-01-01 00:47:00,"""SSL handshake failed""",2026-01-01 00:50:00,"""Timeout"""
2026-01-01 01:13:00,"""Unexpected redirect""",,
2026-01-01 01:50:00,"""500 errors""",2026-01-01 01:50:00,"""OK"""


### Further Reading
- https://docs.pola.rs/user-guide/transformations/joins/#asof-join

## The join_asof Method III: The by Parameter
- Some datasets will require a join by exact keys before performing an approximate match.
- The `by` parameter sets the column(s) to match on _exactly_ between the two `DataFrames`.
- The `on` parameter sets the column(s) to match on nearness between the two `DataFrames`.
- Sort the `by` columns first, then sort the `on` key for each value within them.

In [64]:
transactions = pl.read_csv(
    "exchange_rates/transactions.csv", try_parse_dates=True
).sort("timestamp")
transactions.head(3)

timestamp,amount,currency
datetime[μs],f64,str
2026-01-02 00:10:43,401.28,"""JPY"""
2026-01-02 00:13:25,270.73,"""EUR"""
2026-01-02 00:48:26,13.56,"""GBP"""


In [65]:
exchange_rates = pl.read_csv("exchange_rates/rates.csv", try_parse_dates=True).sort(
    "timestamp"
)
exchange_rates.head(3)

timestamp,currency,usd_rate
datetime[μs],str,f64
2026-01-02 00:00:00,"""EUR""",1.0921
2026-01-02 00:00:00,"""GBP""",1.2744
2026-01-02 00:00:00,"""JPY""",0.0081


- Let's say we join based the on nearest timestamps.
- Here's an example where the `join_asof` will not work as expected.
- Polars correctly locates the closest timestamp...
- ...but it completely ignores the `currency` columns whose values need to match.
- We need to join on a matching currency, `_then_` find the closest timestamp.

In [66]:
transactions.join_asof(
    exchange_rates,
    on="timestamp",
    strategy="backward",
    coalesce=False,
    suffix="_from_exchange_rates",
)

timestamp,amount,currency,timestamp_from_exchange_rates,currency_from_exchange_rates,usd_rate
datetime[μs],f64,str,datetime[μs],str,f64
2026-01-02 00:10:43,401.28,"""JPY""",2026-01-02 00:00:00,"""JPY""",0.0081
2026-01-02 00:13:25,270.73,"""EUR""",2026-01-02 00:00:00,"""JPY""",0.0081
2026-01-02 00:48:26,13.56,"""GBP""",2026-01-02 00:00:00,"""JPY""",0.0081
2026-01-02 01:17:51,57.84,"""EUR""",2026-01-02 01:00:00,"""JPY""",0.0077
2026-01-02 01:27:13,461.33,"""EUR""",2026-01-02 01:00:00,"""JPY""",0.0077
…,…,…,…,…,…
2026-01-02 19:40:25,321.19,"""EUR""",2026-01-02 19:00:00,"""JPY""",0.0073
2026-01-02 19:42:12,238.8,"""GBP""",2026-01-02 19:00:00,"""JPY""",0.0073
2026-01-02 19:48:55,131.45,"""GBP""",2026-01-02 19:00:00,"""JPY""",0.0073
2026-01-02 22:45:32,16.17,"""GBP""",2026-01-02 22:00:00,"""JPY""",0.0076


- Let's re-import the data. We want to sort on the join column (`currency`) first, then the near-join column (`timestamp`) after.

In [67]:
transactions = pl.read_csv(
    "exchange_rates/transactions.csv", try_parse_dates=True
).sort("currency", "timestamp")
transactions.head(3)

timestamp,amount,currency
datetime[μs],f64,str
2026-01-02 00:13:25,270.73,"""EUR"""
2026-01-02 01:17:51,57.84,"""EUR"""
2026-01-02 01:27:13,461.33,"""EUR"""


In [68]:
exchange_rates = pl.read_csv("exchange_rates/rates.csv", try_parse_dates=True).sort(
    "currency", "timestamp"
)
exchange_rates.head(3)

timestamp,currency,usd_rate
datetime[μs],str,f64
2026-01-02 00:00:00,"""EUR""",1.0921
2026-01-02 01:00:00,"""EUR""",1.1466
2026-01-02 02:00:00,"""EUR""",1.0775


- Polars will warn that it cannot verify the sorted nature of columns when we use `by`.
- The warning does not mean the work is incorrect.
- The first timestamp (`2025-01-01 00:13:25` for EUR) matches `2025-01-02 00:00:00` for EUR and brings over its 1.0921 value.

In [69]:
transactions.join_asof(
    exchange_rates,
    on="timestamp",
    by="currency",
    strategy="backward",
    coalesce=False,
    suffix="_from_exchange_rates",
)

  transactions.join_asof(


timestamp,amount,currency,timestamp_from_exchange_rates,usd_rate
datetime[μs],f64,str,datetime[μs],f64
2026-01-02 00:13:25,270.73,"""EUR""",2026-01-02 00:00:00,1.0921
2026-01-02 01:17:51,57.84,"""EUR""",2026-01-02 01:00:00,1.1466
2026-01-02 01:27:13,461.33,"""EUR""",2026-01-02 01:00:00,1.1466
2026-01-02 03:27:57,51.97,"""EUR""",2026-01-02 03:00:00,1.1277
2026-01-02 06:08:07,220.8,"""EUR""",2026-01-02 06:00:00,1.1129
…,…,…,…,…
2026-01-02 13:04:49,267.09,"""JPY""",2026-01-02 13:00:00,0.008
2026-01-02 15:07:13,329.1,"""JPY""",2026-01-02 15:00:00,0.0082
2026-01-02 17:03:03,345.53,"""JPY""",2026-01-02 17:00:00,0.0076
2026-01-02 17:11:41,465.82,"""JPY""",2026-01-02 17:00:00,0.0076


In [70]:
exchange_rates.filter(pl.col("currency") == "EUR")

timestamp,currency,usd_rate
datetime[μs],str,f64
2026-01-02 00:00:00,"""EUR""",1.0921
2026-01-02 01:00:00,"""EUR""",1.1466
2026-01-02 02:00:00,"""EUR""",1.0775
2026-01-02 03:00:00,"""EUR""",1.1277
2026-01-02 04:00:00,"""EUR""",1.1141
…,…,…
2026-01-02 19:00:00,"""EUR""",1.1448
2026-01-02 20:00:00,"""EUR""",1.1495
2026-01-02 21:00:00,"""EUR""",1.0716
2026-01-02 22:00:00,"""EUR""",1.0733


- Finally, let's calculate the amount paid in USD by multiplying the amount by the conversion rate.

In [71]:
transactions.join_asof(
    exchange_rates,
    on="timestamp",
    by="currency",
    strategy="backward",
    coalesce=False,
    suffix="_from_exchange_rates",
).with_columns((pl.col("amount") * pl.col("usd_rate")).alias("total_paid_in_usd"))

  transactions.join_asof(


timestamp,amount,currency,timestamp_from_exchange_rates,usd_rate,total_paid_in_usd
datetime[μs],f64,str,datetime[μs],f64,f64
2026-01-02 00:13:25,270.73,"""EUR""",2026-01-02 00:00:00,1.0921,295.664233
2026-01-02 01:17:51,57.84,"""EUR""",2026-01-02 01:00:00,1.1466,66.319344
2026-01-02 01:27:13,461.33,"""EUR""",2026-01-02 01:00:00,1.1466,528.960978
2026-01-02 03:27:57,51.97,"""EUR""",2026-01-02 03:00:00,1.1277,58.606569
2026-01-02 06:08:07,220.8,"""EUR""",2026-01-02 06:00:00,1.1129,245.72832
…,…,…,…,…,…
2026-01-02 13:04:49,267.09,"""JPY""",2026-01-02 13:00:00,0.008,2.13672
2026-01-02 15:07:13,329.1,"""JPY""",2026-01-02 15:00:00,0.0082,2.69862
2026-01-02 17:03:03,345.53,"""JPY""",2026-01-02 17:00:00,0.0076,2.626028
2026-01-02 17:11:41,465.82,"""JPY""",2026-01-02 17:00:00,0.0076,3.540232


### Further Reading
- https://docs.pola.rs/user-guide/transformations/joins/#asof-join