Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

莱文斯坦距离的 JavaScript 版本实现 #36

Open
zwhu opened this issue Apr 9, 2017 · 1 comment
Open

莱文斯坦距离的 JavaScript 版本实现 #36

zwhu opened this issue Apr 9, 2017 · 1 comment

Comments

@zwhu
Copy link
Owner

zwhu commented Apr 9, 2017

莱文斯坦距离,又称Levenshtein距离,是编辑距离的一种。指两个字串之间,由一个转成另一个所需的最少编辑操作次数。许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符,删除一个字符。

莱文斯坦距离与 react 有千丝万缕的关系,所以花了几天的时间研究了 levenshtein 的算法,以及使用 js 实现了一遍,从 这里 (此链接给出了大多数语言版本的实现)copy 了测试用例。后面如果有时间我再另写一篇莱文斯坦距离与 React 的 key 的关系的文章。

'use strict'

const min = Math.min

// 构造二维数组
const initialArray = (lgt1, lgt2) => {
  let array = []
  for (let i = 0; i < lgt1; i++) {
    array[i] = Array(lgt2)
  }
  return array
}

/**
 * Levenshtein Distance
 * 从一个字符串转成另一个字符串所需要的最小操作步数
 */
const levenshtein = (sa, sb) => {
  let d = []
  , algt = sa.length
  , blgt = sb.length
  , i = 0
  , j = 0

  if(algt === 0) return blgt 

  if(blgt === 0) return algt
  
  // 初始化二维数组
  d = initialArray(algt + 1, blgt + 1)

  for (i = 0; i < algt + 1; i++) {
    d[i][0]  = i
  }

  for (j = 0; j < blgt + 1; j++) {
    d[0][j] = j
  }

  for(i = 1; i < algt + 1; i++) {
    for(j = 1; j < blgt + 1; j++) {
      if(sa[i - 1] === sb[j - 1]) {
        d[i][j] = d[i - 1][j - 1]
      } else {
        d[i][j] = min(
          d[i - 1][j] + 1,
          d[i][j - 1] + 1,
          d[i - 1][j - 1] + 1
        )  
      }
    }
  }

  return d[i - 1][j - 1]
}


// test case
// copy from https://rosettacode.org/wiki/Levenshtein_distance#JavaScript
;[ 
  ['', '', 0],
  ['yo', '', 2],
  ['', 'yo', 2],
  ['yo', 'yo', 0],
  ['tier', 'tor', 2],
  ['saturday', 'sunday', 3],
  ['mist', 'dist', 1],
  ['tier', 'tor', 2],
  ['kitten', 'sitting', 3],
  ['stop', 'tops', 2],
  ['rosettacode', 'raisethysword', 8],
  ['mississippi', 'swiss miss', 8]
].forEach(function(v) {
  var a = v[0], b = v[1], t = v[2], d = levenshtein(a, b);
  if (d !== t) {
    console.log('levenstein("' + a + '","' + b + '") was ' + d + ' should be ' + t)
  }
})
@zwhu
Copy link
Owner Author

zwhu commented Apr 9, 2017

上面初始化数组还有效率优化的空间,不过我为了让代码有良好的可读性,就没有再继续压缩循环的次数了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant