WNSub

This repository contains code for the WNSub lexical substitution dataset. The dataset was generated automatically from SemCor 3.0 and WordNet by retrieving synonyms corresponding to the gold senses of the target words.

Data

The dataset is available at data/wnsub/wnsub.jsonl. Each line in the file contains a JSON object with the following fields:

id: a unique identifier for the instance
context: the context sentence containing the target word
target_idx: the index of the word to be substituted
lemma: lemma of the target word
pos: PoS tag of the target words
substitutes: a list of substitutes retrieved from WordNet

Building

To build the dataset from scratch, follow these steps:

Install the necessary dependencies in a virtual environment by running the following command:

bash scripts/init.sh

Build the dataset by running the following command:

bash scripts/build_wnsub.sh

Acknowledgement

SemCor 3.0 was dataset accessed through WSD evaluation frameword (Raganato et al., 2017)

Alessandro Raganato, José Camacho-Collados and Roberto Navigli. Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison In Proceedings of European Chapter of the Association for Computational Linguistics (EACL), Valencia, Spain, April 3-7, 2017.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
scripts		scripts
wnsub		wnsub
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WNSub

Data

Building

Acknowledgement

About

Releases

Packages

Languages

talgatomarov/wnsub

Folders and files

Latest commit

History

Repository files navigation

WNSub

Data

Building

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages