Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsing(?) bugs in captures #125

Closed
rntz opened this issue Sep 22, 2020 · 4 comments
Closed

parsing(?) bugs in captures #125

rntz opened this issue Sep 22, 2020 · 4 comments

Comments

@rntz
Copy link

rntz commented Sep 22, 2020

Two (possibly related) bugs. Let's call them the "duck duck goose" and "merry christmas" bugs. First one, "duck duck goose", involves some interacting captures:

duckduckgoose.py

from talon import Module, Context
mod=Module()
ctx=Context()

@mod.capture
def ducks(m) -> str: "duck+"
@ctx.capture(rule="duck+")
def ducks(m): return "ducks"

@mod.capture
def duckgoose(m) -> str: "duck+ goose"
@ctx.capture(rule="<self.ducks> goose")
def duckgoose(m): return "duckgoose"

duckduckgoose.talon

duck test <user.ducks>: "{ducks}"
goose test <user.duckgoose>: "{duckgoose}"
test <user.ducks> <user.duckgoose>: "{ducks} {duckgoose}"

Now, try saying "test duck duck goose".

Expected result: "ducks duckgoose"
Actual result:

2020-09-22 22:51:01    IO 
2020-09-22 22:51:05    IO EMIT ['test', 'duckdak', 'goose']
2020-09-22 22:51:05    IO COMPILING
2020-09-22 22:51:05    IO dfa rules built in 0.101s
2020-09-22 22:51:05    IO dfa rules linked 0.113s
2020-09-22 22:51:06    IO minimize + cfg in 0.174s
2020-09-22 22:51:06    IO DECODING
detecting in viterbi toks: #################_test###_duckdak###_gooseoseose####
791.166 #################_test###_duck#_duck##_gooseoseose####
  result: test duck duck goose

2020-09-22 22:51:06    IO DECODED ['test', 'duck', 'duck', 'goose']
2020-09-22 22:51:06 ERROR cb error topic="phrase" cb=<bound method SpeechSystem.engine_event of <talon.scripting.speech_system.SpeechSystem object at 0x7fb91d52ced0>>
   24:       lib/python3.7/threading.py:890| 
   23:       lib/python3.7/threading.py:926| 
   22:       lib/python3.7/threading.py:870| 
   21:                    talon/cron.py:112| 
   20: ------------------------------------# cron thread
   19:                    talon/cron.py:77 | 
   18:          talon/scripting/rctx.py:200| 
   17: ------------------------------------# 'cron' main:<lambda>()
   16:                     talon/vad.py:16 | 
   15:             talon/engines/w2l.py:745| 
   14:      talon/scripting/dispatch.py:98 | 
   13:      talon/scripting/dispatch.py:133| 
   12:      talon/scripting/dispatch.py:124| 
   11:          talon/scripting/rctx.py:200| 
   10: ------------------------------------# 'phrase' user.engines:_redispatch()
    9: talon/scripting/speech_system.py:42 | 
    8:      talon/scripting/dispatch.py:98 | 
    7:      talon/scripting/dispatch.py:133| 
    6:      talon/scripting/dispatch.py:124| 
    5:          talon/scripting/rctx.py:202| 
    4: ------------------------------------# 'phrase' user.engines:engine_event()
    3: ------------------------------------# stack splice
    2:          talon/scripting/rctx.py:200| 
    1: talon/scripting/speech_system.py:300| 
talon.engines.EngineError: failed to parse phrase: ['test', 'duck', 'duck', 'goose']
2020-09-22 22:51:06    IO [audio]=2430.000ms  [emit]=510.499ms (0.21x)  [decode]=2.124ms (0.00x)  [total]=512.623ms (0.21x)

Ok, now for the merry christmas bug.

merry.py

from talon import Module, Context
mod=Module()
ctx=Context()

mod.list("merry", desc="merry")
ctx.lists["self.merry"] = { "merry": "merry" }

@mod.capture
def merries(m) -> str: "merry+"
@ctx.capture(rule="{self.merry}+")
def merries(m): return "-".join(m.merry_list)

merry.talon

# This fails with an AttributeError
<user.merries> merry* christmas: "MERRY CHRISTMAS"

# This succeeds.
#merry* <user.merries> christmas: "MERRY CHRISTMAS"

Now try saying "merry christmas".

Expected result: "MERRY CHRISTMAS"
Actual result:

2020-09-22 22:58:25    IO EMIT ['merry', 'christmas']
2020-09-22 22:58:25    IO DECODING
detecting in viterbi toks: ###########_merryry_christmas###########
611.77 ###########_merryry_christmas###########
  result: merry christmas

2020-09-22 22:58:25    IO DECODED ['merry', 'christmas']
2020-09-22 22:58:25 ERROR     2: talon/grammar/vm.py:87| 
    1: talon/grammar/vm.py:82| 
KeyError: 'merry_list'

[The below error was raised while handling the above exception(s)]
2020-09-22 22:58:25 ERROR cb error topic="phrase" cb=<bound method SpeechSystem.engine_event of <talon.scripting.speech_system.SpeechSystem object at 0x7fb91d52ced0>>
   33:       lib/python3.7/threading.py:890| 
   32:       lib/python3.7/threading.py:926| 
   31:       lib/python3.7/threading.py:870| 
   30:                    talon/cron.py:112| 
   29: ------------------------------------# cron thread
   28:                    talon/cron.py:77 | 
   27:          talon/scripting/rctx.py:200| 
   26: ------------------------------------# 'cron' main:<lambda>()
   25:                     talon/vad.py:16 | 
   24:             talon/engines/w2l.py:745| 
   23:      talon/scripting/dispatch.py:98 | 
   22:      talon/scripting/dispatch.py:133| 
   21:      talon/scripting/dispatch.py:124| 
   20:          talon/scripting/rctx.py:200| 
   19: ------------------------------------# 'phrase' user.engines:_redispatch()
   18: talon/scripting/speech_system.py:42 | 
   17:      talon/scripting/dispatch.py:98 | 
   16:      talon/scripting/dispatch.py:133| 
   15:      talon/scripting/dispatch.py:124| 
   14:          talon/scripting/rctx.py:202| 
   13: ------------------------------------# 'phrase' user.engines:engine_event()
   12: ------------------------------------# stack splice
   11:          talon/scripting/rctx.py:200| 
   10: talon/scripting/speech_system.py:301| 
    9:              talon/grammar/vm.py:174| 
    8:              talon/grammar/vm.py:137| 
    7: talon/scripting/speech_system.py:318| 
    6:              talon/grammar/vm.py:174| 
    5:              talon/grammar/vm.py:137| 
    4: talon/scripting/speech_system.py:322| 
    3:         talon/scripting/types.py:327| 
    2:    user/mine/regression/merry.py:11 | def merries(m): return "-".join(m.merr..
    1:              talon/grammar/vm.py:89 | 
AttributeError: merry_list
2020-09-22 22:58:25    IO [audio]=1710.000ms  [emit]=157.272ms (0.09x)  [decode]=3.478ms (0.00x)  [total]=160.750ms (0.09x)

While these bugs are a bit arcane, they are not contrived. I ran into the merry christmas bug while writing actual talon code to do with modifier keys. The code I was writing was incorrect, but I didn't realize this because it triggered the merry christmas bug. I discovered the duck duck goose bug while trying to minimize the merry christmas bug.

@lunixbochs
Copy link

lunixbochs commented Sep 23, 2020

Re: duck duck goose.

I think the optimizer is rejecting the duck duck goose case on purpose, for good reason, to prevent exponential parsing time

<duck>: duck+ compiles to a loop around the word duck

  user.ducks.0:
  0 WORD 'duck'
  1 FORK (0, -2)
  2 RETURN

  user.duckgoose.0:
  0 CALL <user.ducks>
  1 WORD 'goose'
  2 RETURN

To prevent exponential parsing cases, when a loop jumps backwards and forwards at the same time, the forward path is not allowed to visit the backwards jump target without advancing a word

(<duck> <duck>) is two basic loops around the word duck, which means the second duck will never contain any words, because it is prevented from jumping to the word duck, as it is a descendent of the first duck's dual forward/backward jump. I believe this is correct - the only solution I can imagine is to allow backtracking one word in the second loop once the first loop terminates unsuccessfully, but that's kind of complicated.

There's no "correct" distribution of words between the two ducks anyway. I think the easy answer is you need to design your rules to not put two of the same basic repetition captures in a row without any bridge words.

@lunixbochs
Copy link

Fixed the merry christmas bug in v0.1.2 - when recently optimizing list parsing for wav2letter, I introduced a regression where in some cases a list could consume 0 words but not fail that parse path. That's fixed now.

@rntz
Copy link
Author

rntz commented Sep 24, 2020

Thanks! I can confirm this fixes the merry christmas bug for me. I am less concerned with the duck duck goose case, since I didn't run into it while writing real code, and as you point out it involves putting two repetition captures in a row, which is not a very sensible thing to do.

My only (mild) concern is that if one did accidentally write some code that looked like duck duck goose without realizing it, it might be hard to debug. (This is what happened with merry christmas; there was some indirection through captures that made it harder to notice.) If the error message said something about adjacent repetitions of the same capture/list, that would make it much easier to figure out the problem with my code. Is it easy to tell if this case is being triggered and change the error message?

No worries if not, and thanks for fixing this so quickly!

@lunixbochs
Copy link

lunixbochs commented Sep 24, 2020 via email

@rntz rntz closed this as completed Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants