Developing a tool for fair and reproducible use of paid crowdsourcing in the digital humanities

This repository contains the code for the paper Developing a tool for fair and reproducible use of paid crowdsourcing in the digital humanities, to be presented at the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2022).

The code demonstrates how the tool described in the paper may be used to create crowdsourcing pipelines on the Toloka platform.

Prerequisites

To run the example code contained in this repository, you first need to install the tool described in the paper.

The tool is available on PyPI, and may be installed by running the command pip install abulafia.

We recommend installing the tool and its dependencies into a virtual environment.

Codebase

The directory config contains YAML configuration files for each crowdsourcing task and action in the crowdsourcing pipeline.

The directory data contains the input data as a TSV file.

The directory instructions contains instructions for the crowdsourcing tasks as an HTML file.

Running the pipeline

To run the crowdsourcing pipeline, execute the file run_pipeline.py from the command line using the following command:

python3 run_pipeline.py -c path_to_credentials.json

As instructed in the tool repository, you need to store your Toloka credentials into a JSON file and provide this file as input to the script in run_pipeline.py.

We recommend executing the file in the Toloka sandbox by setting the value of the variable mode to SANDBOX.

You can then register on the Toloka sandbox as a performer to view and complete the tasks in the pipeline.

Contact

If you have questions about the pipeline, feel free to open an issue in this repository, or contact us via e-mail. Our e-mail addresses can be found in the conference publication.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
data		data
instructions		instructions
LICENSE		LICENSE
README.md		README.md
run_pipeline.py		run_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

data

data

instructions

instructions

LICENSE

LICENSE

README.md

README.md

run_pipeline.py

run_pipeline.py

Repository files navigation

Developing a tool for fair and reproducible use of paid crowdsourcing in the digital humanities

Prerequisites

Codebase

Running the pipeline

Contact

About

Releases

Packages

Languages

License

thiippal/latech-clfl-2022

Folders and files

Latest commit

History

Repository files navigation

Developing a tool for fair and reproducible use of paid crowdsourcing in the digital humanities

Prerequisites

Codebase

Running the pipeline

Contact

About

Resources

License

Stars

Watchers

Forks

Languages