Skip to content

Commit

Permalink
vocab unittest
Browse files Browse the repository at this point in the history
  • Loading branch information
undertherain committed Apr 25, 2018
1 parent b710f75 commit d551909
Show file tree
Hide file tree
Showing 11 changed files with 2,883 additions and 0 deletions.
903 changes: 903 additions & 0 deletions tests/data/corpora/annotated/sense_small.txt.annotated

Large diffs are not rendered by default.

Binary file added tests/data/corpora/gzipped/sense_small.txt.gz
Binary file not shown.
523 changes: 523 additions & 0 deletions tests/data/corpora/multiple_files/emma_small.txt

Large diffs are not rendered by default.

351 changes: 351 additions & 0 deletions tests/data/corpora/multiple_files/sense_small.txt

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions tests/data/corpora/multiple_small/one.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
one two three four five six. seven eight nine ten
1 change: 1 addition & 0 deletions tests/data/corpora/multiple_small/two.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
eleven tweleve thirteen fourteen
938 changes: 938 additions & 0 deletions tests/data/corpora/plain/sense_small.txt

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions tests/data/vocabs/numbers/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"path_source": "./test/data/corpora/numbers",
"vsmlib_version": "0.1.6"
}
13 changes: 13 additions & 0 deletions tests/data/vocabs/numbers/vocab.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#word frequency
one 1
two 406
three 345
four 330
five 324
six 271
seven 184
eight 177
nine 176
ten 10
eleven 146
twelve 170
6 changes: 6 additions & 0 deletions tests/data/vocabs/plain/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"cnt_words": 142,
"min_frequency": 10,
"path_source": "./test/data/corpora/plain",
"vsmlib_version": "0.1.6"
}
143 changes: 143 additions & 0 deletions tests/data/vocabs/plain/vocab.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
#word frequency
, 671
. 406
of 345
the 330
to 324
and 271
her 184
was 177
his 176
a 170
it 146
in 145
for 119
be 118
she 107
he 101
as 97
i 95
that 92
not 82
their 70
him 69
by 65
had 63
which 63
but 62
at 60
them 60
no 60
have 58
with 57
so 56
on 54
you 54
is 50
from 47
would 47
they 45
could 45
will 44
dashwood 42
! 42
my 39
were 38
more 38
than 37
very 36
mrs 35
all 34
any 34
mother 33
house 32
such 31
every 29
elinor 27
this 26
do 26
norland 25
own 25
what 25
if 25
who 24
an 24
been 23
one 23
much 23
or 23
john 21
your 21
might 20
pounds 19
when 19
think 19
said 19
himself 18
too 18
should 18
great 17
only 17
how 17
must 17
may 17
are 16
there 16
can 16
far 15
make 15
though 15
marianne 15
soon 14
father 14
thousand 14
well 14
did 14
some 14
we 14
man 13
sister 13
mr 13
present 13
first 13
other 13
time 13
give 13
now 13
herself 13
sure 13
shall 13
edward 13
many 12
opinion 12
into 12
fortune 12
half 12
really 12
sisters 12
thing 12
enough 12
day 12
me 12
say 12
taste 12
good 11
years 11
three 11
comfortable 11
handsome 11
little 11
love 11
? 11
am 11
barton 11
before 10
heart 10
gave 10
child 10
most 10
then 10
feel 10
ever 10
beyond 10
see 10

0 comments on commit d551909

Please sign in to comment.