Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*Chinese-pyim*-122442 可能不是一个有效的词库 buffer,忽略。 #54

Closed
et2010 opened this issue Jan 23, 2016 · 9 comments
Closed

Comments

@et2010
Copy link
Contributor

et2010 commented Jan 23, 2016

这次是我在另外一台Windows 7电脑上试验的出现的。与那个不同的是,有候选字,但是只出现少量候选字。我感觉可能还是和我自己转的词库有关。词库在这里:

pyim-sgcore.zip

@tumashu
Copy link
Owner

tumashu commented Jan 24, 2016

你这个词库做的不对,你用emacs打开你的词库文件,执行 pyim-update-file, 对词库排序

@et2010
Copy link
Contributor Author

et2010 commented Jan 24, 2016

pyim-update-dict-file吗?我试了,还是用不了。见鬼了我

@et2010
Copy link
Contributor Author

et2010 commented Jan 24, 2016

我这次整理词库,干了以下几件事:

  • 删除中英混合词
  • 删除原文件中的非汉字字符(也不是ascii,不知道是什么鬼)
  • 删除了Ext-ABCDE扩展汉字

最后用pyim自带功能转换词库(字和词分别转的,然后又cat到一起,就是我上传的文件)

这么整应该不会搞坏词库吧,还是我不小心碰了雷区?

@tumashu
Copy link
Owner

tumashu commented Jan 24, 2016

字和词不能分开。。。

@tumashu
Copy link
Owner

tumashu commented Jan 24, 2016

那个命令用心后,你词库按照拼音排序了吗?

@et2010
Copy link
Contributor Author

et2010 commented Jan 24, 2016

是的,用过命令后词库是按照拼音排序的

@tumashu
Copy link
Owner

tumashu commented Jan 24, 2016

你加我qq吧,329985753

@et2010
Copy link
Contributor Author

et2010 commented Jan 24, 2016

我又重新来了一遍,这次貌似好了

总结经验:

  • 第二次没有加7000常用汉字
  • 这次把按word生成dict的函数改对了,没有再把单字删除
  • 没事别瞎折腾

我感觉关键问题就是第一次搞的时候,用cat合并时没有检查7000字文件和word词库文件是否都是utf-8编码,结果导致合并后的词库文件坏掉,pyim也没法处理坏掉的词库。

@et2010
Copy link
Contributor Author

et2010 commented Jan 24, 2016

这个问题解决后, #53 也顺带解决了。

@et2010 et2010 closed this as completed Jan 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants