# Version 3

The basic idea here is to extract the CPC first. This is the `core propositional content` or the the central thing that the speaker is talking about -- Could be an action or concept. In either case, it is the thing that the listener is not exepected to have presupposed. E.g., "The purple box is open". There the listener is expected to know about the purple box, but not that it is open. 

**Algorithm:**

Input: utterance U, ActionDb A, ConceptDB C, AvailableTypes T

1. speech_act <-- extract_speech_act(U) 
2. refs <-- extract_referents(U) 
3. ref_dict <-- extract_referent_types(U, refs, T)
4. cpc_name <-- extract_cpc_name(U) 
5. ling_cpc_signature <-- extract_cpc_sign(U, cpc_name, ref_dict)
6. * ling_parse <-- tether(U, ling_cpc_signature, A, C, T)
    
    
For "INSTRUCT" speech acts, `tethering` involves:
- comparing the linguistically derived parse (i.e., ling_parse) with available action signatures, and generating an association chain between the cpc_name and one or more corresponding actions 
    - ranked list of name matches. Filter down this list with argument matching. 
    - Failure here means agent cannot perform action

For "STATEMENT" speech acts, `tethering` involves:
- comparing the linguistically derived parse (i.e., ling_parse) with available concepts.
    - Failure here means agent can learn a new fact, but not understand its meaning or be able to recognize the concept in a different setting, without further attempts at tethering. 






## Domain

Objects:
- Circuit breaker
- m3 screw
- NFSV
- m3 hole
- work area
- conveyor 
- conveyor belt 
- deep m3 holes

Actions
- pickup
- putdown
- find see/ can you see/ verify that you can see
- go to location
- go to pose 
- search 
- mount 
- align 
- assemble* 



In [52]:
# Data
import json

# actions
with open('../data/actions_short.json') as f:
    actions = json.load(f)
    
with open('../data/properties.json') as f:
    properties = json.load(f)
    
types = ["physobj", "agent", "location", "pose", "action", "number", "direction"]  

In [14]:
# Dev Dataset
dev = []
with open('../data/dev.txt') as f:
    for line in f:
        dev.append(line.replace("\n",""))


In [73]:
dev

['that screw belongs to Evan',
 'pose screw feeder',
 'first verify that you can see the hole',
 'then mount the screw',
 'then align with the hole',
 'Then run the screwdriver job of the screw',
 'first go to pose conveyor',
 'then verify that you can see the NFSV',
 'Then get the NFSV on the work area',
 'then search for 2 m3 holes',
 'screw a M3 screw into the left M3 hole',
 'then go to pose work area',
 'screw a M3 screw into the right M3 hole',
 'then get the NFSV on the conveyor',
 'then advance the conveyor belt',
 'assemble a NFSV',
 'replace search for 2 m3 holes with search for 2 deep m3 holes',
 'replace screw an M3 screw into the left M3 hole with screw an M3 screw into the bottom deep M3 hole',
 'replace screw an M3 screw into the right M3 hole with screw an M3 screw into the top deep M3 hole']

In [3]:
!echo $OPENAI_API_KEY

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


In [4]:
# Imports

from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI,ChatAnthropic
from langchain.chains import LLMChain
import anthropic



In [24]:
# Initialize LLM 

llm = ChatOpenAI(model_name="gpt-4", temperature=0.0)
#llm = OpenAI(temperature=0.0)
#llm = Anthropic(model="claude-instant-1.1-100k")

In [25]:
## (1) Speech Act Classification 

template_speech_act= """
Decide whether the utterance below from a speaker to a listener is one of "want", "wantBel", "itk"
A "want" is an imperative statement or a request by the speaker to have the listener do an action or stop doing an action.
An "itk" is a 'wh' or 'yes/no' query (what, why, when, where, who) or request from a speaker for more information from the listener about the listeners knowledge, beliefs or perceptions
A "wantBel" is a statement of fact or opinion that the speaker conveys to a listener and  expects to listener to come to believe. 



utterance: \n{utterance}\n
act:
"""

prompt_speech_act = PromptTemplate(
    input_variables=["utterance"],
    template=template_speech_act
)

chain_speech_act = LLMChain(llm=llm, prompt=prompt_speech_act)

In [26]:
## (2) Central Referents 

template_centralref = """
What is the central item (which could be a single thing or a collection of things) that is being referred to in the below sentence?

Remember, the central referent is a thing or object, not an action or descriptor.It is meant to capture the central real world item being referenced in the utterance. 


sentence: \n{utterance}\n 
referent:
"""

prompt_centralref = PromptTemplate(
    input_variables=["utterance"],
    template=template_centralref
)

chain_centralref = LLMChain(llm=llm, prompt=prompt_centralref)

In [27]:
## (3) Supporting Referents

template_suppref = """
What are some objects (which could be a single thing or a collection of things) that is being referred to in the below sentence not including the central referent? Return as a python list.
If none, then return empty list []. Even if only one item, return as a list.  
Remember, the supporting referents are things or objects, not actions or descriptors. They are meant to capture the real world items being referenced in the utterance. 


sentence: \n{utterance}\n 
central referent: \n{centralref}\n
supporting referents (noun(s) from utterance):
"""

prompt_suppref = PromptTemplate(
    input_variables=["utterance", "centralref"],
    template=template_suppref
)

chain_suppref = LLMChain(llm=llm, prompt=prompt_suppref)

In [28]:
## (4) Getting the type of thing that the referents are 

template_typeof = """
Determine whether or not the referent item mentioned below in the context of the provided utterance is one of the types also provided below. To check if the referent is of a type, follow the below procedure
1. Iterate through each item mentioned in the list of types. 
2. For each item X in the list of types expand on the meaning of each item, and then ask if the central referent is of type X given that meaning. 
3. If the central referent is of type X in the list, return X.

\n\n EXAMPLE \n
utterance: The lemon is on the table
referent: lemon
types: ['area', 'physobj', 'location', 'pose']
typeOf: Looking through the items in the list of types above. physobj is a physical object. lemon is a type of physical object. So, it is of type physobj

Remember, return specifically ONE of the items in the list, or if none apply then return NONE. 

utterance: \n{utterance}\n
referent: \n{ref}\n
types: \n{types}\n
typeOf:
"""

prompt_typeof = PromptTemplate(
    input_variables=["ref", "types", "utterance"],
    template=template_typeof
)

chain_typeof = LLMChain(llm=llm, prompt=prompt_typeof)


In [32]:
## (5) Extract CPC

template_cpc = """
Determine the core propositional content (cpc) of the utterance below in the context of its central referent and speech act type
To do so, use the following procedure

1. Determine the type of cpc ("action", "concept") associated with the utterance.
If the speech act is a "want" that means the utterance is an imperative and the cpc is an "action".
If the speech act is a "wantBel" (note the capital B) that means the utterance is a statement assertion, and the cpc will be a "concept"
If the speech act is an "itk" that means the utterance contains a question about some concept, so the cpc is a "concept"

2. If the type of cpc is an "action", then the core propositional content (or cpc) is the action that is being performed on the central referent.
If the type of cpc is a "concept", then the core propositional content (or cpc) is a concept that is being associated with the central referent.

3. Convert the cpc into a single representative word that captures its meaning, without any reference to the referents.

4. return the converted cpc and its type in the following format "<CPC>:<TYPE>" 

utterance: \n{utterance}\n
speech act: \n{speechact}\n
central referent: \n{centralref}
core propositional content and:
"""

prompt_cpc = PromptTemplate(
    input_variables=["centralref", "utterance","speechact"],
    template=template_cpc
)

chain_cpc = LLMChain(llm=llm, prompt=prompt_cpc)

In [35]:
## (6) Candidate Real Actions 
## "Real" == actions implemented in the robot system. 
"""
Approach: look to see if there exists an action that captures this.

Criteria
(1) Semantic similarity of Name 
(2) The arguments in the robot action exist in the linguistic parse. If not then we are either in the wrong action or we are missing an action
"""

template_candidate_realactions ="""
Select a list of candidate actions from the list of available actions that is most relevant to the core action performed on the central referent as understood in the context of the utterance. 

To decide the list of applicable candidate actions, use the following procedure to systematically filter the list of available actions:
1. Compare the name and description (if any) of each action in the available actions to the core action. Narrow the list of actions to include only those with a semantically similar name or description to the central action. 
2. Return the narrowed list of actions as a python list of string action names. 

\n\n LIST OF AVAILABLE ACTIONS \n:
{actions}

utterance: \n{utterance}\n
central referent: \n{centralref}\n
core action: \n{cpc}\n
candidate actions:
"""

prompt_candidate_realactions = PromptTemplate(
    input_variables=["centralref", "utterance","cpc", "actions"],
    template=template_candidate_realactions
)

chain_candidate_realactions = LLMChain(llm=llm, prompt=prompt_candidate_realactions)

In [57]:
## (7) Tether Real Action
# provided a list of realarguments, bind referents to them. 

template_bound_action = """
Try to bind each candidate action's arguments to the central and supplementary referents. Use the following procedure:
For each candidate action: 
1. Look at its arguments in order written as "VAR<NUM>:<TYPE>". If the first argument is of TYPE "agent", then bind that to "self".
2. For the second argument (if it exists), if the central referent is an object of  type TYPE in the argument, then bind the central referent to the TYPE. If not, bind to NONE. 
3. For  any subsequent arguments, attempt to bind the supplementary referents in the same way. 
4. Return output as a list of dicts, each with the name of the action, and bindings. 

utterance: \n{utterance}\n
central referent: \n{centralref}\n
supplementary referents: \n{supprefs}\n
candidate actions: \n{candidaterealactions}\n
bound actions:
"""

prompt_bound_action = PromptTemplate(
    input_variables=["centralref", "supprefs", "utterance","candidaterealactions"],
    template=template_bound_action
)

chain_bound_action = LLMChain(llm=llm, prompt=prompt_bound_action)

# Pipeline

In [65]:
import ast
import string

def linguistic_parse(utterance):
    speech_act = chain_speech_act.run(utterance=utterance).lower()
    
    # 2. Central Referent Extraction
    centralref = chain_centralref.run(utterance=utterance).lower()
    
    centralreftype = chain_typeof.run(ref=centralref, types=types, utterance=utterance ).split(" ")[-1]
    centralreftype = centralreftype.translate(str.maketrans('', '', string.punctuation))
    
    # 3. Supporting Referents Extraction
    supprefs = chain_suppref.run(utterance=utterance, centralref=centralref).lower()
    supprefs = ast.literal_eval(supprefs)
    
    supprefs_full = [] #with type info
    if supprefs:
        for suppref in supprefs:
            ref_type = chain_typeof.run(ref=suppref, types=types, utterance=utterance ).split(" ")[-1]
            supprefs_full.append(f"{suppref}:{ref_type}")
            

    cpc = chain_cpc.run(utterance=utterance, speechact=speech_act, centralref=centralref)
    
    candidates = []
    bound_candidates = []
    if ":action" in cpc:
        candidates = chain_candidate_realactions.run(utterance=utterance, 
                                                       centralref=centralref,
                                                       cpc=cpc,
                                                       actions=actions) 
        candidates = ast.literal_eval(candidates)
        
        ## need to pass in for each candidate
        
        bound_candidates = chain_bound_action.run(utterance=utterance,
                                              centralref=centralref,
                                              supprefs=supprefs_full,
                                              candidaterealactions=find_dict_in_list(actions, 
                                                                                     "name", 
                                                                                     candidates[0]))
        
    else: 
        # this is concept to be asserted into belief
        pass
    
    output = {
        "utterance": utterance,
        "speech_act": speech_act,
        "centralref": f"{centralref}:{centralreftype}",
        "supprefs": supprefs_full,
        "cpc": cpc,
        "candidates": candidates,
        "bound_candidates": bound_candidates
    }
    
    return output


In [68]:
linguistic_parse("then assemble the caddy")

{'utterance': 'then assemble the caddy',
 'speech_act': 'want',
 'centralref': 'caddy:physobj',
 'supprefs': [],
 'cpc': 'assemble:action',
 'candidates': ['assemble', 'assemblenfsv', 'assemblenvfau', 'modifyAssemble'],
 'bound_candidates': "[{'name': 'assemble', 'bindings': {'VAR0': 'self', 'VAR1': 'caddy'}}]"}

In [None]:
for text in dev:
    print(text)
    out = linguistic_parse(text)
    print(json.dumps(out, indent=2))
    print()

In [49]:
# Helper Method
# Given a list of dictionaries, and a key, return the entry in the list that matches
def find_dict_in_list(lst, key, target):
    for item in lst:
        if not key in item:
            #print("Key not in Dict")
            return None
        if item[key] == target:
            return item
    #print("Nothing found")
    return None


In [53]:
find_dict_in_list(actions, "name", "mountScrew")

{'name': 'mountScrew', 'roles': [{'VAR0': 'agent'}, {'VAR1': 'physobj'}]}