-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
分词总是分错,是怎么回事? #12
Comments
网页界面输出 问题:苏格兰属于哪个洲 |
日志输出 Question:苏格兰属于哪个洲 搜索到Evidence 8 条 处理问题类型模式文件: /questionTypePatterns/QuestionTypePatternsLevel1_true.txt 处理问题类型模式文件: /questionTypePatterns/QuestionTypePatternsLevel2_true.txt 处理问题类型模式文件: /questionTypePatterns/QuestionTypePatternsLevel3_true.txt Question 苏格兰属于哪个洲 |
我知道了,我把项目放linux的 桌面 了,取资源的时候,这个中文路径是问题 |
对问题进行分词:苏格兰属于哪个洲
分词结果为:苏 格 兰 属 于 哪 个 洲
我check下来源码在eclipse里运行多次,我记得就有一次运行是自动回答正确的
我初步断定是分词的错误,因为在CommonCandidateAnswerSelect类里
会忽略长度小于2的候选答案,所以上面的 苏格兰 会被忽略
The text was updated successfully, but these errors were encountered: