Skip to content

Code for Bilingual Sparse Embeddings from the NAACL 2016 paper

Notifications You must be signed in to change notification settings

yogarshi/bisparse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sparse Bilingual Word Representations

Code for Sparse Bilingual Embeddings as described in Sparse Bilingual Word Representations for Cross-lingual Lexical Entailment.

Prerequisites

  • MATLAB

Getting Embeddings

Run 'sh fasta_biling.sh' with the following parameters (in order):

  • En Vocab File : One word per line (|e| lines)
  • Fr Vocab File : One word per line (|f| lines)
  • Dense En embeddings : One vector per line, each vector a space seperated list of floats (|e| lines)
  • Dense Fr embeddings : One vector per line, each vector a space seperated list of floats (|f| lines)
  • Alignment matrix : .mat file containing the crosslingual statistics matrix S (of size |e| x |f|)

Example files are available here.

The output of the above script will be two vector files, one for each language. These new vectors will be sparse and interpretable, with the dimensions aligned across languages!

NB :There are other hyperparameters in the script which you should consider adjusting.

Data

The data folder contains

  • final_dataset.tsv - The French-English crosslingual lexical entailment dataset
  • bisparse_{en,fr}.txt - The French-English bilingual sparse vectors used to obtained results in the paper

Utils

This folder contains some other useful code :

  • top_dims.py - Interpret the dimensions given a (sparse) vector file

If you use this code or the associated dataset, please cite the paper!

@InProceedings{VyasCarpuat2015,
    	Title = {Sparse Bilingual Word Representations for Cross-lingual Lexical Entailment},
    	Booktitle = {Proceedings of NAACL},
    	Author = {Vyas, Yogarshi and Carpuat, Marine},
    	Year = {2016},
    	Location = {San Diego, United States of America}
}

About

Code for Bilingual Sparse Embeddings from the NAACL 2016 paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published