-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newest Ichiran with newest data seems to be failing 31 tests #45
Comments
Hi, unfortunately because JMdict data always changes it's impossible to segmentation tests to always pass unless they're modified and the code has been manually calibrated. For that reason only the latest release is guaranteed to actually pass all the tests. For example
This test failure is caused by the word ぴんと立つ being added to JMdict database on 2022-07-19. Since the latest release of Ichiran was in January 2022 the test doesn't use this word for segmentation. As for this,
check that you have downloaded file kanjidic2.xml and specified a path to it in settings. Try manually running the following functions:
|
I understand; so the example answer is actually better than expected one, given the current state of JMDict, and what's expected needs to be adjusted. As for kanjidic2.xml, I have it and the path is correct.
I did that right now, but it should have executed earlier as well as part of full-init, so I have to assume these were already loaded and calculated when I ran tests previously. I can't run tests again at the moment to confirm that it's still there though, as in the meantime I added in some logging to better understand ho it works, and the side-effect seems to be that the tests lock up mid-way. I think it's possible some other change to JMDict or KanjiDic might be causing the earlier error though. |
I repeated the procedure on a fresh database, and the 'kanji' error didn't show up. So indeed, most likely the kanjidic2 database hadn't been loaded despite full-init having finished execution, and the kanjidic2 path being already provided to it before it started. A mystery, but apparently no longer reproducible. It's still failing the same 31 tests, but it's expected. Closing. |
I think the first time it failed on add-errata because the word in question was deleted from JMdict (due to my comment in fact...), I'll try to make it work with the latest data in the coming weeks.
|
I reinitialized the entire database, and indeed, it turned out that there had been lingering side-effects of that crash (notably n-kanji and n-kana in many conjugations were left at 0, which wasn't causing crashing, but was causing trouble with scoring). After the reinitialisation, it only fails on 13 tests:
|
dec23 branch contains code which should pass all tests on recent JMdict dumps (make sure to run |
I just got around to doing it, and full-init seems to be failing very early on:
Previous master worked correctly with the same JMDict file from around the middle of December, so I think some code change must have caused this... To be sure, I downloaded the newest JMdict_e today's one, and tried with it, but that didn't fix anything, same crash. Very strange, it's supposed to be dropping the tables at the beginning of full-init, and it seems impossible for the xml file to have a duplicated entry... Maybe I should have tried just add-errata first, but I wanted to be sure it's all reset. Now I also can't try add-errata anymore, since full-init deleted the tables. |
Try initializing on a fresh database, I don't think I ran this on an
existing one in a long time so it's not guaranteed to work. You might also
have existing connections to the database which could prevent dropping the
tables.
сб, 6 янв. 2024 г., 16:22 vpltd-kgalaj ***@***.***>:
… I just got around to doing it, and full-init seems to be failing very
early on:
* (ichiran/maintenance:full-init)
Initializing ichiran/dict...
debugger invoked on a CL-POSTGRES-ERROR:UNIQUE-VIOLATION in thread
#<THREAD "main thread" RUNNING {10010C0093}>:
Database error 23505: duplicate key value violates unique constraint "entry_pkey"
DETAIL: Key (seq)=(1000280) already exists.
QUERY: INSERT INTO entry (primary_nokanji, n_kana, n_kanji, root_p, content, seq) VALUES (false, 0, 0, true, E'<?xml version="1.0" encoding="UTF-8"?>
<entry>
<ent_seq>1000280</ent_seq>
<k_ele>
<keb>論う</keb>
</k_ele>
<r_ele>
<reb>あげつらう</reb>
</r_ele>
<sense>
<pos>v5u</pos>
<pos>vt</pos>
<misc>uk</misc>
<gloss xml:lang="eng">to discuss</gloss>
</sense>
<sense>
<pos>v5u</pos>
<pos>vt</pos>
<gloss xml:lang="eng">to find fault with</gloss>
<gloss xml:lang="eng">to criticize</gloss>
<gloss xml:lang="eng">to criticise</gloss>
</sense>
</entry>', 1000280)
Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [ABORT] Exit debugger, returning to top level.
(CL-POSTGRES::GET-ERROR #<SB-SYS:FD-STREAM for "socket 127.0.0.1:55237, peer: 127.0.0.1:5432" {100EF410F3}>)
source: (ERROR (CL-POSTGRES-ERROR::GET-ERROR-TYPE CODE) :CODE CODE :MESSAGE
(GET-FIELD #\M) :DETAIL (GET-FIELD #\D) :HINT (GET-FIELD #\H)
:CONTEXT (GET-FIELD #\W) ...)
0] 0
*
Previous master worked correctly with the same JMDict file from around the
middle of December, so I think some code change must have caused this...
To be sure, I downloaded the newest JMdict_e today's one, and tried with
it, but that didn't fix anything, same crash.
Very strange, it's supposed to be dropping the tables at the beginning of
full-init, and it seems impossible for the xml file to have a duplicated
entry...
Maybe I should have tried just add-errata first, but I wanted to be sure
it's all reset. Now I also can't try add-errata anymore, since full-init
deleted the tables.
—
Reply to this email directly, view it on GitHub
<#45 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA7DRHIKUFG2AQH23O7SSBDYNFT5DAVCNFSM6AAAAABA7XD2WSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZZG4ZDONBYGA>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Actually nevermind that. This is related to a change I made to EDIT: just pushed a fix to the branch |
Your last fix seems to have fixed that one. full-init now gets as far as the "Loading custom data..." before crashing:
EDIT: I am going to assume the problem is that "eng" in two last seqs in extra.xml is escaped, unlike "eng" in old content in there, and edit that and restart full-init. |
yeah the xml file was corrupted, I fixed and added a test for it |
Edict, Kanjidic2, jmdict-data, quicklisp and ichiran pulled from the Net yesterday.
Did full-init.
Had to comment out 2209300 additions in the errata, because the entire entry was deleted in jmdict. Then applied errata again.
macOS 13.6.1 Intel, Postgres and SBCL installed through Brew.
Results:
Unit Test Summary
| 707 assertions total
| 676 passed
| 31 failed
| 2 execution errors
| 0 missing tests
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("猫" "は" "しっぽ" "を" "ぴんと" "立てて" "歩いた")
| but saw ("猫" "は" "しっぽ" "を" "ぴんと立てて" "歩いた")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("わかりきった") but saw ("わ" "かりきった")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("おとめ" "に" "ふさわしい" "振る舞い") but saw ("お" "とめ" "に" "ふさわしい" "振る舞い")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("折りたたみ" "式" "ついたて") but saw ("折りたたみ" "式" "ついた" "て")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("使い物" "に" "ならん" "だろ") but saw ("使い" "物にならん" "だろ")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("雪" "が" "ない" "ため") but saw ("雪" "が" "な" "いため")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("バラしちゃってる") but saw ("バラ" "しちゃってる")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("何も" "口" "に" "せぬ") but saw ("何も" "口" "にせぬ")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("工夫" "が" "される") but saw ("工夫" "がされる")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("だめ" "だったら") but saw ("だ" "めだったら")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("彼女" "は" "苦しげ" "に" "うめいて" "横たわった")
| but saw ("彼女" "は" "苦しげ" "に" "うめ" "いて" "横たわった")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("共感" "性") but saw ("共感性")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("それ" "ただ" "の" "怪しい" "人" "です" "し")
| but saw ("それた" "だの" "怪しい" "人" "です" "し")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("出したい" "とき" "は") but saw ("出した" "いと" "き" "は")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("旅行" "に" "いきたい") but saw ("旅行" "にい" "きたい")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("しない" "かい") but saw ("し" "ないかい")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("てか" "最近" "ファン" "層" "は" "円盤" "すら" "買わない" "から" "そいつら" "から" "金" "とる"
"ってのは" "無謀")
| but saw ("てか" "最近" "ファン層" "は" "円盤" "すら" "買わない" "から" "そいつら" "から" "金" "とる" "ってのは"
"無謀")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("なんというか" "すみません") but saw ("なんという" "かすみません")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("そう" "したい" "から" "した" "だけ" "だ") but saw ("そうした" "いからした" "だけ" "だ")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("手にとって" "いただき" "やすくなる") but saw ("手にとっていた" "だ" "きやすくなる")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("大事" "に" "なります") but saw ("大" "事になります")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("奴" "が" "まとも" "に" "見られない") but saw ("奴" "が" "まともに" "見られない")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("といった" "ところ" "でしょうか") but saw ("と" "いった" "ところ" "でしょうか")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("言い方" "も" "します") but saw ("言い方" "もします")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("届け" "したら") but saw ("届" "けしたら")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("全く" "と" "いって" "いい") but saw ("全く" "と" "いっていい")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("仲良し" "に" "なったら") but saw ("仲良し" "になったら")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("体" "に" "悪い" "と" "知り" "ながら" "タバコをやめる" "こと" "は" "できない")
| but saw ("体に悪い" "と" "知り" "ながら" "タバコをやめる" "こと" "は" "できない")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("雨" "が" "降りそう" "な" "気がします") but saw ("雨が降りそう" "な" "気がします")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("そういう" "お" "隣" "どうし") but saw ("そういう" "お" "隣どうし")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("みんな" "土足で" "おいで") but saw ("みんな" "土足で" "おい" "で")
|
SEGMENTATION-TEST: 451 assertions passed, 31 failed, and an execution error.
| Execution error:
| Database error 42P01: relation "kanji" does not exist
QUERY: (SELECT r.text, r.type FROM kanji AS k INNER JOIN reading AS r ON (r.kanji_id = k.id) WHERE ((k.text = E'取') and (not (r.type IN (E'ja_na')))))
|
MATCH-READINGS-TEST: 0 assertions passed, 0 failed, and an execution error.
| Execution error:
| Database error 42P01: relation "kanji" does not exist
QUERY: (SELECT r.text, r.type FROM kanji AS k INNER JOIN reading AS r ON (r.kanji_id = k.id) WHERE ((k.text = E'気') and (not (r.type IN (E'ja_na')))))
|
SEGMENTATION-TEST: 451 assertions passed, 31 failed, and an execution error.
#<TEST-RESULTS-DB Total(707) Passed(676) Failed(31) Errors(2)>
The text was updated successfully, but these errors were encountered: