Sparse Bilingual Word Representations

Code for Sparse Bilingual Embeddings as described in Sparse Bilingual Word Representations for Cross-lingual Lexical Entailment.

Prerequisites

MATLAB

Getting Embeddings

Run 'sh fasta_biling.sh' with the following parameters (in order):

En Vocab File : One word per line (|e| lines)
Fr Vocab File : One word per line (|f| lines)
Dense En embeddings : One vector per line, each vector a space seperated list of floats (|e| lines)
Dense Fr embeddings : One vector per line, each vector a space seperated list of floats (|f| lines)
Alignment matrix : .mat file containing the crosslingual statistics matrix S (of size |e| x |f|)

Example files are available here.

The output of the above script will be two vector files, one for each language. These new vectors will be sparse and interpretable, with the dimensions aligned across languages!

NB :There are other hyperparameters in the script which you should consider adjusting.

Data

The data folder contains

final_dataset.tsv - The French-English crosslingual lexical entailment dataset
bisparse_{en,fr}.txt - The French-English bilingual sparse vectors used to obtained results in the paper

Utils

This folder contains some other useful code :

top_dims.py - Interpret the dimensions given a (sparse) vector file

If you use this code or the associated dataset, please cite the paper!

@InProceedings{VyasCarpuat2015,
    	Title = {Sparse Bilingual Word Representations for Cross-lingual Lexical Entailment},
    	Booktitle = {Proceedings of NAACL},
    	Author = {Vyas, Yogarshi and Carpuat, Marine},
    	Year = {2016},
    	Location = {San Diego, United States of America}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
utils		utils
README.md		README.md
fasta.m		fasta.m
fasta_biling.m		fasta_biling.m
fasta_biling.sh		fasta_biling.sh
fasta_biling_solver.m		fasta_biling_solver.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparse Bilingual Word Representations

Prerequisites

Getting Embeddings

Data

Utils

About

Releases

Packages

Languages

yogarshi/bisparse

Folders and files

Latest commit

History

Repository files navigation

Sparse Bilingual Word Representations

Prerequisites

Getting Embeddings

Data

Utils

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages