[Fix] work with non-ascii filter word which can not be seperated by space #2536

Merged
merged 1 commit into from Apr 5, 2013

Conversation

Projects
None yet
4 participants
Contributor

ulion commented Apr 3, 2013

The file item filter, filter items by StringUtils::FindWords method.
but the StringUtils::FindWords method assume file labels is words separated by space, that's not true for non-ascii labels.
This pr will do a strstr like search for non-ascii filter word, and keep old behavior for ascii filter word.

Member

garbear commented Apr 3, 2013

TestStringUtils.cpp tests this code directly, can it be updated to include this fix?

Member

jmarshallnz commented Apr 3, 2013

It can be true for non-ascii labels, but doesn't necessarily always have to be true. As you have it ATM, only if the filter starts with a non-ascii letter are the spaces ignored. Could their be cases where the filter word starts with an ascii letter but no space is present? Also, there is certainly cases where the filter word can start with a non-ascii letter yet a space should also be present.

If there's no simple way to determine the beginning of a word, perhaps we should consider dropping the support for word searches completely?

Contributor

ulion commented Apr 3, 2013

current pr implement:
for ascii starting filter, match the filter to the space separated word beginning, until filter is fully consumed, matched.
for non-ascii starting filter, doing a strstr search to the target label.

yes, there's no simple way to determine the beginning of a word, but keep the word match for ascii is still necessary, imo.

Member

jmarshallnz commented Apr 3, 2013

So what if the filter word is "Äpfel"? You do a strstr() in that case. However for "Apple" you don't.

IMO if there's no way to determine those words that should be checked for regardless of space, we should remove the checking for all.

Contributor

ulion commented Apr 3, 2013

test case added.

but I didn't figure out how to enable gtest on osx, so not tested.

Contributor

ulion commented Apr 3, 2013

that's another question to whether remove word match for ascii word, and I just to fix it for non-ascii filter match.

if you really care, I can limit the strstr match only for utf8 char with at least 3 bytes length (latin is 2 bytes utf8 chars) , then your case has unified behavior like before.

Member

jmarshallnz commented Apr 3, 2013

Your fix requires you to ignore spaces as some languages don't have spaces to separate words. I don't think there's an easy way to determine whether a given word should be preceded by a space or not.

Another idea that is slightly different to yours but I think might be slightly more generic is to start the matching after either a space or a non-latin character.

Contributor

ulion commented Apr 3, 2013

maybe we can limit the first char match point to:

  1. first char of an ascii or latin alpha word
  2. first char of a number
  3. any other characters

that means, the match point is not:

  1. an ascii or latin alpha char after an ascii or latin alpha char
  2. [0-9] after any of [0-9]
Member

jmarshallnz commented Apr 3, 2013

That sounds similar to my proposal, yes. Basically you'd just skip to the next character following whitespace or non-ascii/latin character, whichever comes sooner.

Owner

Memphiz commented Apr 4, 2013

@ulion fyi - for enabling the testframework:

edit https://github.com/xbmc/xbmc/blob/master/tools/depends/target/xbmc/Makefile

add --enable-gtest to the configrue line

then make -C tools/depends/target/xbmc

then make test will build and run them... (have to add this to the readme)

Contributor

ulion commented Apr 5, 2013

I tried that it compile xbmc and full my disk. I'm using xcode for build xbmc, so I can not do the gtest work currently.

Contributor

ulion commented Apr 5, 2013

updated to skip current latin word or number.

Owner

Memphiz commented Apr 5, 2013

Well - nvm but it doesn't matter if you build via xcode or not ... but if i got you right your disk is full so nothing we can do about that in our buildsystem ;)

Member

jmarshallnz commented Apr 5, 2013

Is there any reason we don't have --enable-gtest on by default? Would save a reconfigure.

Contributor

ulion commented Apr 5, 2013

test cases added for word/number mixed label.

@jmarshallnz jmarshallnz added a commit that referenced this pull request Apr 5, 2013

@jmarshallnz jmarshallnz Merge pull request #2536 from ulion/non_ascii_filter_word_fix
[Fix] work with non-ascii filter word which can not be seperated by space
a072736

@jmarshallnz jmarshallnz merged commit a072736 into xbmc:master Apr 5, 2013

ulion deleted the ulion:non_ascii_filter_word_fix branch Apr 6, 2013

@ulion ulion added a commit that referenced this pull request Apr 15, 2013

@ulion ulion Merge pull request #2617 from ulion/fix_findword_deadloop
StringUtils::FindWords fix dead loop (introduced by PR #2536)
3c7f72c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment