# BM 算法(Boyer Moore)

> Boyer-Moore算法的执行时间同样线性依赖于被搜索字符串的大小，但是通常仅为其它算法的一小部分：它不需要对被搜索的字符串中的字符进行逐一比较，而会跳过其中某些部分。通常搜索关键字越长，算法速度越快。它的效率来自于这样的事实：对于每一次失败的匹配尝试，算法都能够使用这些信息来排除尽可能多的无法匹配的位置。

> [wiki](https://zh.wikipedia.org/wiki/Boyer-Moore%E5%AD%97%E7%AC%A6%E4%B8%B2%E6%90%9C%E7%B4%A2%E7%AE%97%E6%B3%95)

In [197]:
def boyer_moore(base, expect):
    print('-'*20, 'new case', '-'*20)
    index = 0
    
    # 目标字符串每个字符第一次出现的位置
    # 用于优化坏字符匹配的速度
    expect_index_dict = {}
    for k, v in enumerate(expect):
        expect_index_dict.setdefault(v, k)

    base_len = len(base)
    expect_len = len(expect)
    expect_range = range(0, expect_len)
    while index+expect_len <= base_len:
        base_range = range(index, index+expect_len)

        print('base[%s:%s]: %s' % (base_range.start, base_range.stop, base[base_range.start:base_range.stop]))
        print('expect: %s' % expect)

        # 匹配
        for x, y in zip(base_range[::-1], expect_range[::-1]):
            # 匹配失败
            if base[x] != expect[y]:
                break
        else:
            # 匹配成功
            print('\n', 'match: base[%s:%s]\n' % (base_range.start, base_range.stop))
            return base_range
        
        # 坏字符
        bad_characher = base[x]
        bad_characher_move = y - expect_index_dict.get(bad_characher, -1)
        print('bad characher: base[%s]: %s' % (x, bad_characher))

        # 好后缀
        good_suffix = base[x+1:base_range.stop]
        print('good suffix: base[%s:%s]: %s' % (x+1, base_range.stop, good_suffix))
        good_suffix_move = expect.find(good_suffix)+1 if good_suffix else -2
        if good_suffix_move > x:
            good_suffix_move = expect_len
            print('好后缀匹配目标字符串头部:')
            for k, v in enumerate(good_suffix):
                print('- %s' % good_suffix[k:])
                if expect.startswith(good_suffix[k:]):
                    good_suffix_move -= k
                    break

        print('bad_characher_move: %s' % bad_characher_move)
        print('good_suffix_move: %s' % good_suffix_move)
        index += max(good_suffix_move, bad_characher_move)

        print('-'*20)
    print('not match \n')


boyer_moore('abcacabdc', 'abd')
boyer_moore('acabcbcbacabc', 'abbcabc')
boyer_moore('acabcbcbacabc', 'cbacabc')

-------------------- new case --------------------
base[0:3]: abc
expect: abd
bad characher: base[2]: c
good suffix: base[3:3]: 
bad_characher_move: 3
good_suffix_move: -2
--------------------
base[3:6]: aca
expect: abd
bad characher: base[5]: a
good suffix: base[6:6]: 
bad_characher_move: 2
good_suffix_move: -2
--------------------
base[5:8]: abd
expect: abd

 match: base[5:8]

-------------------- new case --------------------
base[0:7]: acabcbc
expect: abbcabc
bad characher: base[4]: c
good suffix: base[5:7]: bc
bad_characher_move: 1
good_suffix_move: 3
--------------------
base[3:10]: bcbcbac
expect: abbcabc
bad characher: base[8]: a
good suffix: base[9:10]: c
bad_characher_move: 5
good_suffix_move: 4
--------------------
not match 

-------------------- new case --------------------
base[0:7]: acabcbc
expect: cbacabc
bad characher: base[4]: c
good suffix: base[5:7]: bc
好后缀匹配目标字符串头部:
- bc
- c
bad_characher_move: 4
good_suffix_move: 6
--------------------
base[6:13]: cbacabc
expect:

range(6, 13)