# SCAN 

## Task 

Each example in the SCAN dataset is aimed at converting a natural language command to a sequence of actions. 

$$ InputCommand \longrightarrow OutputSequence$$

## Phrase Structure Grammar 

The input commands can be generated with a basic PS grammar starting from C and ending with U: 

1. C $\longrightarrow$ S and S
2. C $\longrightarrow$ S after S
3. C $\longrightarrow$ S
4. S $\longrightarrow$ V twice
5. S $\longrightarrow$ V thrice
6. S $\longrightarrow$ V
7. V $\longrightarrow$ D[1] opposite D[2]
8. V $\longrightarrow$ D[1] around D[2]
9. V $\longrightarrow$ D
10. V $\longrightarrow$ U
11. D $\longrightarrow$ U left
12. D $\longrightarrow$ U right
13. D $\longrightarrow$ turn left
14. D $\longrightarrow$ turn right
15. U $\longrightarrow$ walk
16. U $\longrightarrow$ run
17. U $\longrightarrow$ jump
18. U $\longrightarrow$ look

Where C=Full Command, S= Sentence Phrase, V= Verb Phrase, D= Direction Phrase, U= Verb

## Compositional Abstraction Modelling

Compositionality refers to the ability of compositional generalization i.e the ability to recognize the abstract underlying data structure, recovering the rules of abstraction and productively applying those rules in new contexts.

In the context of SCAN, a CAM (compositional abstraction model) should recover the phrase structure abstraction of the given input and apply it to parse new input and convert it into action sequences. Given such a form of abstraction where the model recovers phrase structure grammar, there can be two possible abstraction models: 

1. Top Down: The highest nodes like C are resolved and interpreted first
2. Bottom Up: The lowest nodes like U are resolved and interpreted first.

If the model has really been able to understand the compositional abstraction of the data in the form of PSG here, it should follow the sequence of PSG to resolve nodes and that can only be accomplished in one of two ways listed above. 

## Top down CAM 

1. Resolve C: Split sentence based on and/after
2. Resolve S: Identify and interpret twice/thrice
3. Resolve V: Identify and interpret opposite/around
4. Resolve D: Identify and interpret left/right/turn left/turn right
5. Resolve U: Identify and interpret all verbs

Example: Jump thrice and turn left

l0=['jump','thrice','and','turn','left']

C/l1=[['jump','thrice'],['turn','left']]

S/l2=[['jump','x x x'],['turn','left']]

V/l3=[['jump','x x x'],['turn','left']]

D/l4=[['jump','x x x'],['LTURN']]

U/l5=[['JUMP JUMP JUMP'],['LTURN']]

In [11]:
def causal_model_td(command): 

    # resolve c  # note: do we split or not? the abstraction model shouldn't require this?
    l0=command.split()
    if 'and' in l0:
        l11=l0[:l0.index("and")] 
        l12= l0[l0.index("and")+1:]
        l1=[l11,l12]
    elif 'after' in l0:
        # reverse order of command
        l11= l0[l0.index("after")+1:]
        l12=l0[:l0.index("after")]
        l1=[l11,l12]
    else:
        l1=[l0,[]]

    # resolve s
    l2=[]
    for item in l1:
        if "twice" in item:
            ind=item.index("twice")
            item[ind]= "xx"
        if "thrice" in item:
            ind=item.index("thrice")
            item[ind]= "xxx"
        l2.append(item)
    
    # resolve v
    for item in l2:
        if "around" in item:
            ind=item.index("around")
            item[ind]= "yyyy"
        if "opposite" in item:
            ind=item.index("opposite")
            item[ind]= "yy"
        l3.append(item)
        

    # resolve d
    l4=[]
    for item in l3:
        if "turn" in item:
            ind=item.index("turn")
            if item[ind+1] =="right": 
                item=["RTURN"]
            if item[ind+1] =="left": 
                item=["LTURN"]       
        elif 'yyyy' in item:
            ind=item.index("yyyy")
            if item[ind+1] =="right": 
                item=["RTURN RTURN RTURN RTURN"]
            if item[ind+1] =="left": 
                item=["LTURN LTURN LTURN LTURN"]
        elif 'yy' in item:
            ind=item.index("yy")
            if item[ind+1] =="right": 
                item=["RTURN RTURN"]
            if item[ind+1] =="left": 
                item=["LTURN LTURN"]
        l4.append(item)

    #resolve u
    actions=['walk', 'run','jump','look']
    l5=[]
    for item in l4:
        for act in actions:
            if act in item: 
                ind=item.index(act)
                if item[ind+1]=='xxx':
                    item=[act*3]
                elif item[ind+1]=='xx':
                    item=[act*2]
        l5.append(item)
    
    action=  l5   
    return  action

## Bottom up CAM

1. Resolve U: Split sentence into lexical items, identify and interpret all verbs.
2. Resolve D: Identify and interpret left/right/turn left/turn right
3. Resolve V: Identify and interpret opposite/around
4. Resolve S: Identify and interpret twice/thrice
5. Resolve C: Identify and interpret and/after.

Example: Jump thrice and turn left

l0=['jump','thrice','and','turn','left']

U/l1=['JUMP','thrice','and],'turn','left']

D/l2=['JUMP','thrice','and','LTURN']

V/l3=['JUMP','thrice','and','LTURN']

S/l4=['JUMP','JUMP','JUMP','and','LTURN']

C/l5=['JUMP','JUMP','JUMP','LTURN']

In [None]:
def causal_model_bu(command): 

    l0=command.split()

    #resolve u
    actions = {
    "walk": "I_WALK",
    "run": "I_RUN",
    "jump": "I_JUMP",
    "look": "I_LOOK"
    }
    l1=l0.copy()
    for item in l0:
    if item in list(actions.keys()): 
        ind=l0.index(item)
        l1[ind]=actions[item]

    #resolve d  #further edit required to remove remaining "turn"
    l2=l1.copy()
    if 'turn' in l1:
        ind=l1.index('turn')
        if l1[ind+1] =="right": 
            l2[ind]='RTURN'
            del l2[ind+1]
        if l1[ind+1] =="left": 
            l2[ind]='LTURN'
            del l2[ind+1]
    if 'right' in l1:
        ind=l1.index('right')
        l2[ind]='RTURN'
    if 'left' in l1:
        ind=l1.index('left')
        l2[ind]='LTURN'
   

    #resolve v
    l3=l2.copy()
    if "around" in l2:
        ind=l2.index("around")
        if l2[ind+1]== "RTURN":
            del l3[ind+1]
            l3[ind]='RTURN'*4
        if l2[ind+1]== "LTURN":
            del l3[ind+1]
            l3[ind]='LTURN'*4
    if "opposite" in l2:
        ind=l2.index("opposite")
        if l2[ind+1]== 'RTURN':
            del l3[ind+1] 
            l3[ind]='RTURN'*4
        if l2[ind+1]== "LTURN":
            del l3[ind+1]
            l3[ind]='LTURN'*4

     

    #resolve s
    l4=l3.copy()
    if "twice" in l3: 
        ind=l3.index("twice")
        del l4[ind]
        l4[ind-1]=l4[ind-1]*2
    if "thrice" in l3: 
        ind=l3.index("thrice")
        del l4[ind]
        l4[ind-1]=l4[ind-1]*3

    #resolve c
    l5=l4.copy()
    if "and" in l4:
        ind=l4.index('and')
        del l5[ind]
    if "after" in l4:
        l5=[]
        ind=l4.index('after')
        l5.append(l4[ind+1:])
        l5.append(l4[:ind])

    action= l5    
    return  action