Add tolerance so Templater can be less "literal" #1

Closed
turicas opened this Issue Apr 20, 2012 · 1 comment

1 participant

@turicas
Owner

The library which inspired this (templatemaker) has a feature of learning with such a "tolerance" of characters. As an example is worth than a lot of words, just see:

>>> t = Templater()
>>> t.learn('my favorite color is blue')
>>> t.learn('my favorite color is violet')
>>> t.learn('my favorite color is purple')
>>> print t._template
[None, 'my favorite color is ', None, 'l', None, 'e', None]

Note the 'l' and 'e' in the template definition. In some cases we don't want to be so literal, so we should be able to add a tolerance, like this:

>>> t = Templater(tolerance=1)
>>> t.learn('my favorite color is blue')
>>> t.learn('my favorite color is violet')
>>> t.learn('my favorite color is purple')
>>> print t._template
[None, 'my favorite color is ', None]

So Templater will tolerate long common substrings with size equal to 1 (one character) and will only identify that something creates a new "variable" in the template if it is different in all template learn cases and the string that is different have its length more than 1.

For this to work we need to modify the function _create_template to do not catch all longest common substring, but only the ones with size greater than tolerance.

@turicas turicas was assigned Apr 20, 2012
@turicas
Owner

It also need to change save and load methods since we need to store self.tolerance (and maybe other info) - now we are storing only self._template.

@turicas turicas added a commit that closed this issue Apr 20, 2012
@turicas FIX: pickling whole Templater object [GH: #fix #1]
- Storing whole Templater object so _tolerance is also stored
- Little changes to Makefile to running tests
8f1ca9d
@turicas turicas closed this in 8f1ca9d Apr 20, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment