-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken link on :h thesaurus #629
Comments
After manually following the link suggested in #3583, it doesn't look like a comma separated thesaurus file is there. The following link has the Moby Thesaurus (public-domain) and might be more reliable going forward. Additional info: |
After manually following the link suggested in #3583, it doesn't look
like a comma separated thesaurus file is there.
Looks like it's one word per line, thus that won't work as a thesaurus.
The following link has the Moby Thesaurus (public-domain) and might be
more reliable going forward.
http://www.gutenberg.org/files/3202/files/mthesaur.txt
Hmm, does that actually work? I found this random entry:
table,Domesday Book,account,account book,address book,adjourn,
A table is a "Domesday Book"?
Also, it uses comma separated words, and includes spaces. Vim doesn't
appear to handle that.
… Additional info:
https://en.wikipedia.org/wiki/Moby_Project#Thesaurus
http://www.gutenberg.org/catalog/world/results?title=moby+list
--
A computer programmer is a device for turning requirements into
undocumented features. It runs on cola, pizza and Dilbert cartoons.
Bram Moolenaar
/// Bram Moolenaar -- Bram@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ an exciting new programming language -- http://www.Zimbu.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///
|
Yeah, 'table,Domesday Book,account,account book,address book,adjourn,' is a very broad association. It looks like Domesday Book is an accounting survey of land and a table can be a table of accounts. That does seem like too far a stretch in meaning to be useful and this isn't a good thesaurus source. |
After searching around for quite a bit more, there doesn't seem to be a current open licensed thesaurus in space separated 'key-word alt1-word alt2-word...' form. (At least in english). So it probably makes sense to just remove the text suggesting a thesaurus file to download.. |
Just some notes that might be of interest to anyone looking at this ticket but not directly related to the solution.. Wordnet seems to the main source for an english thesaurus: Openoffice maintains a structured text version of the wordnet data the date here: |
Just some notes that might be of interest to anyone looking at this
ticket but not directly related to the solution..
Wordnet seems to the main source for an english thesaurus:
https://wordnet.princeton.edu
Openoffice maintains a structured text version of the wordnet data the
date here:
https://www.openoffice.org/lingucomponent/thesaurus.html
The main download file is here:
https://www.openoffice.org/lingucomponent/MyThes-1.zip
If the data exists but is in the wrong format, perhaps someone can write
a script to turn it into the right format. We could then include the
script with Vim and/or make the output available on the ftp site.
…--
`When any government, or any church for that matter, undertakes to say to
its subjects, "This you may not read, this you must not see, this you are
forbidden to know," the end result is tyranny and oppression no matter how
holy the motives' -- Robert A Heinlein, "If this goes on --"
/// Bram Moolenaar -- Bram@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ an exciting new programming language -- http://www.Zimbu.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///
|
Thesaurus Zip Attached: The attached thesaurus_pkg.zip contains a thesaurus.txt in the vim space-separated Can the thesaurus.txt be made available on the the ftp site? The thesaurus.pkg.zip contains these files:
Notes:
Patch File: diff -ruN MyThes-1.0/Makefile MyThes-1.0-vim/Makefile
--- MyThes-1.0/Makefile 2003-12-08 14:42:33.000000000 -0700
+++ MyThes-1.0-vim/Makefile 2018-11-30 07:48:21.000000000 -0700
@@ -21,7 +21,7 @@
-@ ($(RANLIB) $@ || true) >/dev/null 2>&1
example: example.o $(LIBS)
- $(CXX) $(CXXFLAGS) -o $@ example.o $(LDFLAGS)
+ $(CXX) -o $@ example.o $(LDFLAGS)
%.o: %.cxx
$(CXX) $(CXXFLAGS) -c $<
diff -ruN MyThes-1.0/README-VIM-THESAURUS MyThes-1.0-vim/README-VIM-THESAURUS
--- MyThes-1.0/README-VIM-THESAURUS 1969-12-31 17:00:00.000000000 -0700
+++ MyThes-1.0-vim/README-VIM-THESAURUS 2018-11-30 10:04:10.000000000 -0700
@@ -0,0 +1,21 @@
+To create a thesaurus file formatted for vim's thesaurus run:
+bash ./mk-vim-thesaurus.sh
+
+The file 'thesaurus.txt' will be created.
+
+Here are the steps that mk-vim-thesaurus.sh takes:
+
+1. Extract term list from MyThes-1.0 th_en_US_new.dat file:
+# Note: This will remove complex words with spaces in them because space is the
+# default delimiter for vim's thesaurus format.
+grep -v "^(" th_en_US_new.dat | awk -F"|" '{print $1}' | grep -v ' ' | grep -v 'ISO8859-1' > words-without-spaces.lst
+
+2. make example
+make
+
+3. Run mk_vim_thesaurus_format:
+# Note: While extracting synonyms, multiple word synonyms with spaces are excluded.
+./example th_en_US_new.idx th_en_US_new.dat words-without-spaces.lst > raw-list
+
+4. Remove entries that don't have synonyms:
+grep -v "^\w\+$" raw-list > thesaurus.txt
diff -ruN MyThes-1.0/example.cxx MyThes-1.0-vim/example.cxx
--- MyThes-1.0/example.cxx 2003-12-08 14:37:13.000000000 -0700
+++ MyThes-1.0-vim/example.cxx 2018-11-30 09:00:44.000000000 -0700
@@ -70,16 +70,20 @@
// or count since needed for CleanUpAfterLookup routine
mentry* pm = pmean;
if (count) {
- fprintf(stdout,"%s has %d meanings\n",buf,count);
- for (int i=0; i < count; i++) {
- fprintf(stdout," meaning %d: %s\n",i,pm->defn);
+ // initial word
+ fprintf(stdout,"%s",buf);
+ for (int i=0; i < count; i++) {
for (int j=0; j < pm->count; j++) {
- fprintf(stdout," %s\n",pm->psyns[j]);
+ // only output the word if it doesn't have spaces
+ // because space is the standard delimiter in the
+ // vim thesaurus file format.
+ if (strchr(pm->psyns[j], ' ') == NULL) {
+ fprintf(stdout," %s",pm->psyns[j]);
+ }
}
- fprintf(stdout,"\n");
pm++;
- }
- fprintf(stdout,"\n\n");
+ }
+ fprintf(stdout,"\n");
// now clean up all allocated memory
pMT->CleanUpAfterLookup(&pmean,count);
} else {
diff -ruN MyThes-1.0/mk-vim-thesaurus.sh MyThes-1.0-vim/mk-vim-thesaurus.sh
--- MyThes-1.0/mk-vim-thesaurus.sh 1969-12-31 17:00:00.000000000 -0700
+++ MyThes-1.0-vim/mk-vim-thesaurus.sh 2018-11-30 08:51:15.000000000 -0700
@@ -0,0 +1,13 @@
+
+# Extract term list from MyThes-1.0 th_en_US_new.dat file:
+grep -v "^(" th_en_US_new.dat | awk -F"|" '{print $1}' | grep -v ' ' | grep -v 'ISO8859-1' > words-without-spaces.lst
+
+# make example
+make
+
+# Run mk_vim_thesaurus_format:
+./example th_en_US_new.idx th_en_US_new.dat words-without-spaces.lst > raw-list
+
+# Remove entries that don't have synonyms:
+grep -v '^\w\+$' raw-list > thesaurus.txt
+
diff -ruN MyThes-1.0/mythes.cxx MyThes-1.0-vim/mythes.cxx
--- MyThes-1.0/mythes.cxx 2003-12-08 14:40:27.000000000 -0700
+++ MyThes-1.0-vim/mythes.cxx 2018-11-28 13:12:29.000000000 -0700
@@ -25,7 +25,7 @@
// return index of char in string
int mystr_indexOfChar(const char * d, int c)
{
- char * p = strchr(d,c);
+ const char * p = strchr(d,c);
if (p) return (int)(p-d);
return -1;
}
|
I can at least mention this comment in the help, unpacking the .zip file isn't too difficult. |
Thanks, Is there anything else that needs to be done to complete this issue? |
Thanks, Is there anything else that needs to be done to complete this issue?
Well, this only provides one English thesaurus. I don't know if this is
even a good one. And there are many other languages...
…--
Every time I lose weight, it finds me again!
/// Bram Moolenaar -- Bram@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ an exciting new programming language -- http://www.Zimbu.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///
|
Since german and english are mostly distinct and have only few mutual words, we're taking the easy approach and add both thesauri files at the same time. Vim will query both and display the ciumulated matches. Taken from: * vim/vim#629 (comment) * https://github.com/Yamagi/vim-german-thesaurus
It seems that there is a broken link on the thesaurus documentation:
"To obtain a file to be used here, check out this ftp site: ftp://ftp.ox.ac.uk/pub/wordlists/ ..."
The text was updated successfully, but these errors were encountered: