Skip to content

thiippal/latech-clfl-2022

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Developing a tool for fair and reproducible use of paid crowdsourcing in the digital humanities

This repository contains the code for the paper Developing a tool for fair and reproducible use of paid crowdsourcing in the digital humanities, to be presented at the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2022).

The code demonstrates how the tool described in the paper may be used to create crowdsourcing pipelines on the Toloka platform.

Prerequisites

To run the example code contained in this repository, you first need to install the tool described in the paper.

The tool is available on PyPI, and may be installed by running the command pip install abulafia.

We recommend installing the tool and its dependencies into a virtual environment.

Codebase

The directory config contains YAML configuration files for each crowdsourcing task and action in the crowdsourcing pipeline.

The directory data contains the input data as a TSV file.

The directory instructions contains instructions for the crowdsourcing tasks as an HTML file.

Running the pipeline

To run the crowdsourcing pipeline, execute the file run_pipeline.py from the command line using the following command:

python3 run_pipeline.py -c path_to_credentials.json

As instructed in the tool repository, you need to store your Toloka credentials into a JSON file and provide this file as input to the script in run_pipeline.py.

We recommend executing the file in the Toloka sandbox by setting the value of the variable mode to SANDBOX.

You can then register on the Toloka sandbox as a performer to view and complete the tasks in the pipeline.

Contact

If you have questions about the pipeline, feel free to open an issue in this repository, or contact us via e-mail. Our e-mail addresses can be found in the conference publication.

About

A repository for the conference paper at the LaTeCH-CLfL 2022 workshop.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published