Skip to content
This repository was archived by the owner on Jul 27, 2023. It is now read-only.

Files

Latest commit

cf2175e · Apr 27, 2019

History

History

language-detection

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Dec 30, 2018
Apr 27, 2019

How-to

  1. You need to download and process dataset first,
wget http://downloads.tatoeba.org/exports/sentences.tar.bz2
bunzip2 sentences.tar.bz2
tar xvf sentences.tar
  1. Change to csv,
awk -F"\t" '{print"__label__"$2" "$3}' < sentences.csv | shuf > all.txt
  1. Run any notebook using Jupyter Notebook.