Evaluation of automatic collocation extraction methods for language learning

This repository contains the code for the paper on Evaluation of automatic collocation extraction methods for language learning accepted at the 14^th Workshop on Innovative Use of NLP for Building Educational Applications (BEA) at The 57^th Annual Meeting of Association for Computational Linguistics (ACL) 2019. Additionally, the poster presented at the workshop.

Installation

Python 3.6+ (Install using Anaconda - Recommended and tested!)
Spacy

Data

Download the Data and Evaluation files for all the three test sets - Sketch Engine, FLAX and Elia.

Note: Elia is not available online yet, however, one can access the code and data used to generate Elia's database of collocations

Collocation Extraction

Extract collocates for each of the 3 test sets of Sketch Engine, FLAX and Elia.

For Sketch Engine (SE) and FLAX (FL), the collocations are web scraped as below,
- Open the Ipython notebook (SketchEngine_WebScraping.ipynb for Sketch Engine (SE) and FLAX_WebScraping.ipynb for FLAX (FL))
- Change the path to point to the Reference Set
- Change the location to store the Collocations data .csv files for Evaluation.
- Run all cells in the notebook.
For Elia (EL),
- Open the Ipython notebook CollocationsExtraction_Elia.ipynb
- Change the path to the .csv files for the Reference Set and Collocates Database from Elia.
- Change the location to store the filtered Collocates .csv files scraped from Elia for Evaluation.
- Run all cells in the notebook.

Evaluation:

For Evaluation of all 3 test sets (Sketch Engine, FLAX and Elia),
- Open the Ipython notebook Evaluation_Collocations.ipynb
- Change the path to the Reference Set, Collocation data .csv files and where the Evaluation files will be generated for each test set in its corresponding cell.
- Run all cells in the notebook.
Open the generated evaluation files to compute the Precision and Recall metrics (per collocation type and for each of the test set).

Results Plot:

For Plotting of Results many graph variants were used and the final one was selected. To run all of these,
- Open the Ipython notebook Results_Plot.ipynb
- Change path to the folder where the Elia_CollocationMeasures_Comparison.csv file (provided with the code) is located. By default this should be in the current directory and hence the variable 'plotDataFolder' need not be changed.
- Run all cells in the notebook.
- View the plots/graphs inline in the notebook or click on it to open it in a new tab to save it locally.

Citation

If you use this paper in your research, we'd be appreciate if you cite the paper:

Evaluation of automatic collocation extraction methods for language learning, Bhalla, Vishal and Klimcikova, Klara, Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications (BEA), Association for Computational Linguistics (ACL). August 2019

or, in bibtex format:

@inproceedings{bhalla-klimcikova-2019-autocoleval,
    title = "Evaluation of automatic collocation extraction methods for language learning",
    author = "Bhalla, Vishal and Klimcikova, Klara",
    booktitle = "Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications (BEA) at The 57th Annual Meeting of The Association for Computational Linguistics (ACL)",
    month = August,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/W19-4428",
    pages = "264--274",
    abstract = "A number of methods have been proposed to automatically extract collocations, i.e., conventionalized lexical combinations, from text corpora. However, the attempts to evaluate and compare them with a specific application in mind lag behind. This paper compares three end-to-end resources for collocation learning, all of which used the same corpus but different methods. Adopting a gold-standard evaluation method, the results show that the method of dependency parsing outperforms regex-over-pos in collocation identification. The lexical association measures (AMs) used for collocation ranking perform about the same overall but differently for individual collocation types. Further analysis has also revealed that there are considerable differences between other commonly used AMs.",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reference

reference

CollocationsExtraction_Elia.ipynb

CollocationsExtraction_Elia.ipynb

Elia_CollocationMeasures_Comparison.csv

Elia_CollocationMeasures_Comparison.csv

Evaluation_Collocations.ipynb

Evaluation_Collocations.ipynb

FLAX_WebScraping.ipynb

FLAX_WebScraping.ipynb

README.md

README.md

Results_Plot.ipynb

Results_Plot.ipynb

SketchEngine_WebScraping.ipynb

SketchEngine_WebScraping.ipynb

Repository files navigation

Evaluation of automatic collocation extraction methods for language learning

Installation

Data

Collocation Extraction

Evaluation:

Results Plot:

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
reference		reference
CollocationsExtraction_Elia.ipynb		CollocationsExtraction_Elia.ipynb
Elia_CollocationMeasures_Comparison.csv		Elia_CollocationMeasures_Comparison.csv
Evaluation_Collocations.ipynb		Evaluation_Collocations.ipynb
FLAX_WebScraping.ipynb		FLAX_WebScraping.ipynb
README.md		README.md
Results_Plot.ipynb		Results_Plot.ipynb
SketchEngine_WebScraping.ipynb		SketchEngine_WebScraping.ipynb

vishalbhalla/autocoleval

Folders and files

Latest commit

History

Repository files navigation

Evaluation of automatic collocation extraction methods for language learning

Installation

Data

Collocation Extraction

Evaluation:

Results Plot:

Citation

About

Resources

Stars

Watchers

Forks

Languages