Skip to content

Scrapes and post-processes Peter Larsson's website Alltime Athletics.

License

Notifications You must be signed in to change notification settings

thomascamminady/alltime_athletics_python

Repository files navigation

alltime_athletics_python

Scrapes Peter Larsson's website Alltime Athletics. Check out my blog to see how this data can be visualized.

Just give me the data

The latest data frame can be found here as a csv, or here in parquet format. Or run

import pandas as pd

df = pd.read_csv(
     "https://media.githubusercontent.com/media/thomascamminady/alltime_athletics_python/main/dataframes/latest_version_alltime_athletics.csv"
)

As an example, here are the women's world record performances (as of 2023-06-12), sorted by the date of the world record.

  event name result date of event
0 800 metres Jarmila Kratochvílová 1:53.28 1983-07-26
1 400 metres Marita Koch 47.60 1985-10-06
2 100 metres Florence Griffith-Joyner 10.49 1988-07-16
3 200 metres Florence Griffith-Joyner 21.34 1988-09-29
4 60 metres Irina Privalova 6.92 1993-02-11
5 3000 metres Wang Junxia 8:06.11 1993-09-13
6 60 metres Irina Privalova 6.92 1995-02-09
7 10 km race walk Yelena Nikolayeva 41:04 1996-04-20
8 1000 metres Svetlana Masterkova 2:28.98 1996-08-23
9 5000 metres track walk Gillian O'Sullivan 20:02.60 2002-07-14
10 2 Miles Meseret Defar 8:58.58 2007-09-14
11 20 km race walk Yelena Lashmanova 1:23:39 2018-06-09
12 3000m steeplechase Beatrice Chepkoech 8:44.32 2018-07-20
13 300 metres Shaunae Miller-Uibo 34.41 2019-06-20
14 1 Mile Sifan Hassan 4:12.33 2019-07-12
15 2000m steeplechase Gesa Felicitas Krause 5:52.80 2019-09-01
16 marathon Brigid Kosgei 2:14:04 2019-10-13
17 15km road Letesenbet Gidey 44:20 2019-11-17
18 50 km race walk Yelena Lashmanova 3:50:42 2020-09-05
19 10000 metres Letesenbet Gidey 29:01.03 2021-06-08
20 2000 metres Francine Niyonsaba 5:21.56 2021-09-14
21 half-marathon Letesenbet Gidey 62:52 2021-10-24
22 20km road Letesenbet Gidey 59:46+ 2021-10-24
23 10km road Yalemzerf Yehualaw 29:14 2022-02-27
24 400m hurdles Sydney McLaughlin-Levrone 50.68 2022-07-22
25 100m/110m hurdles Oluwatobiloba Amusan 12.12 2022-07-24
26 30km road Ruth Chepngetich 1:34:01+ 2022-10-09
27 1500 metres Faith Kipyegon 3:49.11 2023-06-02
28 5000 metres Faith Kipyegon 14:05.20 2023-06-09

You would get this table by using polars and running

import polars as pl

df = pl.read_csv(
     "https://media.githubusercontent.com/media/thomascamminady/alltime_athletics_python/main/dataframes/latest_version_alltime_athletics.csv"
)

(
 df.filter(pl.col("rank") == 1)
 .filter(pl.col("sex") == "female")
 .select("event", "name", "result", "date of event")
 .sort("date of event")
)

Download

If you have cloned the source code, you can run

poetry run python alltime_athletics_python/app.py

If you installed this package from PyPI, run

from alltime_athletics_python.io import download_data
download_data()

Note that download_data() reads data from Alltime Athletics AS IS. You will definitely need to do some postprocessing.

Postprocessing

To read the processed data, run

from alltime_athletics_python.io import import_running_only_events
df = import_running_only_events("./data")

Development

To set up the project, simply run

make init

Credits

This tool does not take credit for the amazing effort by Peter Larsson, who compiles Alltime Athletics. Alltime Athletics is an amazing collection of track and field results with a lot of work that must have gone into it. Thank you, Peter Larsson.

The only functionality that this tool provides is to have an easier way to read data from Alltime Athletics.

This package was created with Cookiecutter and thomascamminady/cookiecutter-pypackage, a fork of the audreyr/cookiecutter-pypackage project template.

About

Scrapes and post-processes Peter Larsson's website Alltime Athletics.

Topics

Resources

License

Stars

Watchers

Forks