Few Shot Learning and Syntactic Invariance for Neural LMs

All the neural LMs are trained on the Wall Street Journal portion of the Penn Treebank.

invar_item_generation

A python notebook for generating the test items used to test the neural LMs. There are two ways to generate the argument structure tests generate_argstruct_tests generates tests for both past-tense and infinitive verbs, whereas generate_argstruct_tests_vbd generates tests for only past-tense verbs.

Files are saved in the /test_items_downsample directory.

/test_items_downsample/

Directory where the test items are stored after generation (these are csv files that end in tests).

Also stored in this directory are the outputs of the varios models on the tests. Models were run seperatly. The results for the number tests have an extra set of results pp, which include a prepositional phrase distractor and were generated after the base tests.

data_wrangling.rmd

This file reads in all the model outputs stored in the test_items_downsample directory and combines them into two files: argstruct_data.csv and number_data.csv. These are the main results files.

This file harmonizes each verb with a number of their usage statistics: The number of times it occurs as each POS tag, as well as the number of times it takes an object, as well as a passive subject.

invar_analysis.Rmd

This file is the main analysis script of the project. It reads in teh argstruct_data.csv and number_data.csv, analyzes them and generates images in the images folder.

/data/

• wsj_deps.conllu contains the PTB portion of the penn treebank turned into dependency parses, using the tools found here: https://nlp.stanford.edu/software/stanford-dependencies.shtml

•v_counts.csv: every token with a VBD, VBN or VB tag, its lemma POS and the number of times it occurs in the corpus. This can be useufl for choosing open-class items in creating the sentences.

•wsj_proportions.csv: for each verb, this gives the counts of the number of times it occurs with a passive subject and object. This information is used in data_wrangling.rmd to track what proportion of the time it takes an object.

• extract_verb_proportions.py generates wsj_proportions.csv

•filter.csv a list of verbs that are the same in passive participle and past tense, which are the only verbs we want to use for the argstruct learning experiment.

/vbd_only/

This is a seperate folder where we do analysis for the transformation tests for the argstruct learning. This folder is much like the test_items_downsample folder, except it also contains an analysis script vbd_only_analysis.Rmd that does a seperate analysis for these tests.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
images		images
test_items_downsample		test_items_downsample
vbd_only		vbd_only
.DS_Store		.DS_Store
.Rhistory		.Rhistory
README.md		README.md
arg_transf.csv		arg_transf.csv
argstruct_data.csv		argstruct_data.csv
data_wrangling.Rmd		data_wrangling.Rmd
invar_analysis.Rmd		invar_analysis.Rmd
invar_analysis.pdf		invar_analysis.pdf
invar_item_generation.ipynb		invar_item_generation.ipynb
number_data.csv		number_data.csv

wilcoxeg/fsl_invar

Folders and files

Latest commit

History

Repository files navigation

Few Shot Learning and Syntactic Invariance for Neural LMs

invar_item_generation

/test_items_downsample/

data_wrangling.rmd

invar_analysis.Rmd

/data/

/vbd_only/

About

Resources

Stars

Watchers

Forks

Languages