This repository has been archived by the owner on Nov 20, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
wikimedia/research-datasets
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
# archived - see https://gitlab.wikimedia.org/repos/research # research repository This repository serves multiple purposes - library for sharing and reusing code for research at wmf. E.g. pipeline transformations for - graphs revision graph, diffs - redirect resolution, link graph between articles - topic modelling - ... - datasets. Pipelines that create datasets as output - tooling for running orchestrated machine learning pipelines in the data engineering infrastructure - running airflow dags - gpu training on yarn cluster - custom workloads using skein Instead of splitting different functionality into different repos prematurely, this will be a mono repo for now. ## Library To use this package, add it as a requirement to your setup.py or pip install it into a given python evironment. `pip install "wmf_research @ git+https://github.com/wikimedia/research-datasets.git@master"` ### Use interactively todo. describe interactive ipython notebook ## Datasets ### covid19 datasets about the covid19 pandemic that the wmf released publicly ### commons images download images uploaded to wikipedia commons from swift and make the available on hdfs for machine learning use cases
About
No description, website, or topics provided.