# Version 3

For "INSTRUCT" speech acts, `tethering` involves:
- comparing the linguistically derived parse (i.e., ling_parse) with available action signatures, and generating an association chain between the cpc_name and one or more corresponding actions 
    - ranked list of name matches. Filter down this list with argument matching. 
    - Failure here means agent cannot perform action

For "STATEMENT" speech acts, `tethering` involves:
- comparing the linguistically derived parse (i.e., ling_parse) with available concepts.
    - Failure here means agent can learn a new fact, but not understand its meaning or be able to recognize the concept in a different setting, without further attempts at tethering. 






In [1]:
# Helper Method
# Given a list of dictionaries, and a key, return the entry in the list that matches
def find_dict_in_list(lst, key, target):
    for item in lst:
        if not key in item:
            #print("Key not in Dict")
            return None
        if item[key] == target:
            return item
    #print("Nothing found")
    return None


In [2]:
!echo $OPENAI_API_KEY

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


In [3]:
# Imports

from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI,ChatAnthropic
from langchain.chains import LLMChain
#import anthropic

#from rich import print

In [4]:
# Initialize LLM 

llm = ChatOpenAI(model_name="gpt-4", temperature=0.0)
#llm = OpenAI(temperature=0.0)
#llm = Anthropic(model="claude-instant-1.1-100k")

In [5]:
## (1) Speech Act Classification 

template_speech_act= """
Decide whether the utterance below from a speaker to a listener is one of "want", "wantBel", "itk"
A "want" is an imperative statement or a request by the speaker to have the listener do an action or stop doing an action.
An "itk" is a 'wh' or 'yes/no' query (what, why, when, where, who) or request from a speaker for more information from the listener about the listeners knowledge, beliefs or perceptions
A "wantBel" is a statement of fact or opinion that the speaker conveys to a listener and  expects to listener to come to believe. 



utterance: \n{utterance}\n
act:
"""

prompt_speech_act = PromptTemplate(
    input_variables=["utterance"],
    template=template_speech_act
)

chain_speech_act = LLMChain(llm=llm, prompt=prompt_speech_act)

In [6]:
## (2) Central Referents 

template_centralref = """
What is the central item (which could be a single thing or a collection of things) that is being referred to in the below sentence?

Remember, the central referent is a thing or object, not an action or descriptor.It is meant to capture the central real world item being referenced in the utterance. 


sentence: \n{utterance}\n 
referent:
"""

prompt_centralref = PromptTemplate(
    input_variables=["utterance"],
    template=template_centralref
)

chain_centralref = LLMChain(llm=llm, prompt=prompt_centralref)

In [7]:
## (3) Supporting Referents

template_suppref = """
What are some objects (which could be a single thing or a collection of things) that is being referred to in the below sentence not including the central referent? Return as a python list.
If none, then return empty list []. Even if only one item, return as a list.  
Remember, the supporting referents are things or objects, not actions or descriptors. They are meant to capture the real world items being referenced in the utterance. 


sentence: \n{utterance}\n 
central referent: \n{centralref}\n
supporting referents (noun(s) from utterance):
"""

prompt_suppref = PromptTemplate(
    input_variables=["utterance", "centralref"],
    template=template_suppref
)

chain_suppref = LLMChain(llm=llm, prompt=prompt_suppref)

In [8]:
## (4) Getting the type of thing that the referents are 

template_typeof = """
Determine whether or not the referent item mentioned below in the context of the provided utterance is one of the types also provided below. To check if the referent is of a type, follow the below procedure
1. Iterate through each item mentioned in the list of types. 
2. For each item X in the list of types expand on the meaning of each item, and then ask if the central referent is of type X given that meaning. 
3. If the central referent is of type X in the list, return X.

\n\n EXAMPLE \n
utterance: The lemon is on the table
referent: lemon
types: ['area', 'physobj', 'location', 'pose']
typeOf: Looking through the items in the list of types above. physobj is a physical object. lemon is a type of physical object. So, it is of type physobj

Remember, return specifically ONE of the items in the list, or if none apply then return NONE. 

utterance: \n{utterance}\n
referent: \n{ref}\n
types: \n{types}\n
typeOf:
"""

prompt_typeof = PromptTemplate(
    input_variables=["ref", "types", "utterance"],
    template=template_typeof
)

chain_typeof = LLMChain(llm=llm, prompt=prompt_typeof)


In [9]:
## (5) Extract CPC

template_cpc = """
Determine the core propositional content (cpc) of the utterance below in the context of its central referent and speech act type
To do so, use the following procedure

1. Determine the type of cpc ("action", "concept") associated with the utterance.
If the speech act is a "want" that means the utterance is an imperative and the cpc is an "action".
If the speech act is a "wantBel" (note the capital B) that means the utterance is a statement assertion, and the cpc will be a "concept"
If the speech act is an "itk" that means the utterance contains a question about some concept, so the cpc is a "concept"

2. If the type of cpc is an "action", then the core propositional content (or cpc) is the action that is being performed on the central referent.
If the type of cpc is a "concept", then the core propositional content (or cpc) is a concept that is being associated with the central referent.

3. Convert the cpc into a single representative word that captures its meaning, without any reference to the referents.

4. return the converted cpc and its type in the following format "<CPC>:<TYPE>" 

utterance: \n{utterance}\n
speech act: \n{speechact}\n
central referent: \n{centralref}
core propositional content and:
"""

prompt_cpc = PromptTemplate(
    input_variables=["centralref", "utterance","speechact"],
    template=template_cpc
)

chain_cpc = LLMChain(llm=llm, prompt=prompt_cpc)

In [10]:
## (6) Candidate Real Actions 
## "Real" == actions implemented in the robot system. 
"""
Approach: look to see if there exists an action that captures this.

Criteria
(1) Semantic similarity of Name 
(2) The arguments in the robot action exist in the linguistic parse. If not then we are either in the wrong action or we are missing an action
"""

template_candidate_realactions ="""
Select a list of 5 candidate actions from the list of available actions that is most relevant to the core action performed on the central referent as understood in the context of the utterance. 

To decide the list of applicable candidate actions, use the following procedure to systematically filter the list of available actions:
1. Compare the name and description (if any) of each action in the available actions to the core action. Narrow the list of actions to include only those with a semantically similar name or description to the central action. 
2. Return the narrowed list of actions as a python list of string action names followed by a colon and then a numeric score between 0 and 1 signifying the semantic similarity between the name or description and the central action.
For example "move:0.5" where "move" is the action name and 0.5 is the similarity score. 

\n\n LIST OF AVAILABLE ACTIONS \n:
{actions}

utterance: \n{utterance}\n
central referent: \n{centralref}\n
core action: \n{cpc}\n
candidate actions:
"""

prompt_candidate_realactions = PromptTemplate(
    input_variables=["centralref", "utterance","cpc", "actions"],
    template=template_candidate_realactions
)

chain_candidate_realactions = LLMChain(llm=llm, prompt=prompt_candidate_realactions)

In [11]:
## (7) Tether Real Action
# provided a list of realarguments, bind referents to them. 

template_bound_action = """
Try to bind the candidate action's arguments to the central and supplementary referents. Use the following procedure:
1. Look at the candidate action's arguments in order written as "VAR<NUM>:<TYPE>". If the first argument is of TYPE "agent", then bind that to "self".
2. For the second argument (if it exists), if the central referent is an object of  type TYPE in the argument, then bind the central referent to the TYPE. If not, bind to NONE. 
3. For  any subsequent arguments, attempt to bind the supplementary referents in the same way. 
4. Return output as a python dictionary, with following format (Do NOT include any special characters like newlines):
"name": "<NAME OF THE ACTION>","bindings": [{{"<VARIABLE NAME (E.g.VAR0)>": "<REFERENT>"}}, ...]


utterance: \n{utterance}\n
central referent: \n{centralref}\n
supplementary referents: \n{supprefs}\n
candidate action: \n{candidate_full_info}\n
bound action:
"""

prompt_bound_action = PromptTemplate(
    input_variables=["centralref", "supprefs", "utterance","candidate_full_info"],
    template=template_bound_action
)

chain_bound_action = LLMChain(llm=llm, prompt=prompt_bound_action)

In [12]:
## (6b) Candidate Real Concepts
## "Real" == concepts understandable to a robotic system (some consultant exists for it)
"""
Approach: look to see if there exists an action that captures this.

Criteria
(1) Semantic similarity of Name 
(2) The arguments in the concept exist in the linguistic parse. If not then we are either in the wrong action or we are missing an action
"""

template_candidate_realconcepts="""
Select a list of 5 candidate concept from the list of available concepts that is most relevant to the core concept associated with the central referent as understood in the context of the utterance. 

To decide the list of applicable candidate concepts, use the following procedure to systematically filter the list of available concepts:
1. Compare the name and description (if any) of each concept in the available concepts to the core concepts. Narrow the list of concepts to include only those with a semantically similar name or description to ONLY the core concept. 
2. Return the narrowed list of concepts as a python list of string concept names followed by a colon and then a numeric score between 0 and 1 signifying the semantic similarity between the name or description of the available concepts and the core concept.

\n\n LIST OF AVAILABLE CONCEPTS \n:
{concepts}

utterance: \n{utterance}\n
central referent: \n{centralref}\n
core concept: \n{cpc}\n
candidate concepts:
"""

prompt_candidate_realconcepts = PromptTemplate(
    input_variables=["centralref", "utterance","cpc", "concepts"],
    template=template_candidate_realconcepts
)

chain_candidate_realconcepts = LLMChain(llm=llm, prompt=prompt_candidate_realconcepts)

In [13]:
## (7b) Tether Real Concept, if available
# provided a list of realarguments, bind referents to them. 

template_bound_concept = """
Try to bind each candidate concept's arguments to the central and supplementary referents. Use the following procedure:
For each candidate concept: 
1. Look at its arguments in order written as "VAR<NUM>:<TYPE>". If the first argument is of TYPE "agent", then bind that to "self".
2. For the second argument (if it exists), if the central referent is an object of  type TYPE in the argument, then bind the central referent to the TYPE. If not, bind to NONE. 
3. For  any subsequent arguments, attempt to bind the supplementary referents in the same way. 
4. Return output as a python dictionary, with following format (Do NOT include any special characters like newlines):
"name": "<NAME OF THE CONCEPT>","bindings": [{{"<VARIABLE NAME (E.g.VAR0)>": "<REFERENT>"}}, ...]

utterance: \n{utterance}\n
central referent: \n{centralref}\n
supplementary referents: \n{supprefs}\n
candidate concepts: \n{candidate_full_info}\n
bound concept:
"""

prompt_bound_concept = PromptTemplate(
    input_variables=["centralref", "supprefs", "utterance","candidate_full_info"],
    template=template_bound_concept
)

chain_bound_concept = LLMChain(llm=llm, prompt=prompt_bound_concept)

In [14]:
# (8) Novel concept induction

template_novel_concept = """
Generate a concept template for the core concept within the context of the utterance. Use the following procedure:

1. Extract a concept name. The name can be from the core concept itself. 
2. Generate a list of arguments, where each argument states the type of argument that can be bound to the concept.
Here, we want to make sure that each argument type makes sense for the concept, and also can be bound to the central referent and zero or more of the supplemental references.

Return output as a python dictionary, with following format (Do NOT include any special characters like newlines):
"name": "<NAME OF THE CORE CONCEPT>","roles": [{{"<VARIABLE NAME (E.g.VAR0)>": "<TYPE>"}}, ...]


utterance: \n{utterance}\n
core concept: \n{cpc}\n
types: \n{types}\n
central referent: \n{centralref}\n
supplementary referents: \n{supprefs}\n
novel concept: 
"""

prompt_novel_concept = PromptTemplate(
    input_variables=["centralref", "supprefs", "utterance", "cpc", "types"],
    template=template_novel_concept
)

chain_novel_concept = LLMChain(llm=llm, prompt=prompt_novel_concept)


In [15]:
# (9) SPC Property Candidate identification 
## getting the properties of interest
## For each of the referents, we want to find any individual descriptors, we also want to find and apply any given relations between referents

template_properties = """
Determine the properties of the referents mentioned in the bindings. Use the following procedure for each of the referents:
1. The names of each of the referents itself should be added as a property to the list.
2. From the utterance, extract all the adjectival descriptors used to describe the properties of the referents, and add to list.
3. Add to this list, any prepositional relations (mentioned in the utterance) between two or more of the referents, not already covered by the semantics of the core propositional content. 
4. Return this list as a list of python dictionaries with the following format:
"name": <NAME OF PROPERTY/DESCRIPTOR/RELATION>, "arguments": <LIST OF VARIABLE NAMES> 

where the variable names correspond to the variable names associated with each of the referents. Remember, the variable names have to be correct.


utterance: \n{utterance}\n
referents: \n{referent_info}\n
core propositional content: \n{bound_candidate}\n
properties, descriptors and relations:
"""


prompt_properties = PromptTemplate(
    input_variables=["referent_info", "utterance", "bound_candidate"],
    template=template_properties
)

chain_properties = LLMChain(llm=llm, prompt=prompt_properties)


In [16]:
# (10) Candidate real properties: Find the properties (SPCs) in the consultant properties. THese are things the robot perception/cognition can understand
## "Real" == concepts understandable to a robotic system (some consultant exists for it)

template_candidate_realprops="""
Select a list of 5 candidate concept from the list of available concepts that is most semantically similar to the property associated with the referent, as understood in the context of the utterance. 

To decide the list of applicable candidate concepts, use the following procedure to systematically filter the list of available concepts:
1. Compare the name and description (if any) of each concept in the available concepts to the properties. Narrow the list of concepts to include only those with a semantically similar name or description to ONLY the property. 
2. Return the narrowed list of concepts as a python list of string concept names followed by a colon and then a numeric score between 0 and 1 signifying the semantic similarity between the name or description of the available concepts and the property.

\n\n LIST OF AVAILABLE CONCEPTS \n:
{concepts}

utterance: \n{utterance}\n
property: \n{prop}\n
candidate concepts:
"""

prompt_candidate_realprops = PromptTemplate(
    input_variables=["utterance","prop", "concepts"],
    template=template_candidate_realprops
)

chain_candidate_realprops = LLMChain(llm=llm, prompt=prompt_candidate_realprops)


In [17]:
# (11) Cognitive Status

template_cognitive_status= """
Determine the cognitive status of each of the referents mentioned in the bindings. Use the following procedure for each of the referents:

1. Decide which ONE (and only one) of the following five cognitive statuses the referents could fall into:
statuses: [INFOCUS, ACTIVATED", FAMILIAR, DEFINITE, INDEFINITE]

As shown in the table below, the Givenness Hierarchy is comprised of six hierarchically nested tiers of cognitive status, 
where information with one cognitive status can be inferred to also have all
lower statuses. Each level of the GH is “cued” by a set
of linguistic forms, as seen in the table. For example, the second
row of the table shows that the definite use of “this” can be
used to infer that the speaker assumes the referent to be at
least activated to their interlocutor.
\n\n
Cognitive Status | Mnemonic Status | Form |
-----------------|-----------------|------|
INFOCUS | in the focus of attention | it |
ACTIVATED | in short term memory | this,that,this N |
FAMILIAR | in long term memory| that N |
DEFINITE | in long term memory  or new | the N |
INDEFINITE | new or hypothetical | a N |
\n\n

When deciding the one cognitive status for each referent, use the table above and compare the form (pronoun, determiner, article) of the utterance to its status.

Return this list as a list of python dictionaries with the following format:
"name": <COGNITIVE STATUS>, "arguments": <LIST OF VARIABLE NAMES>

where the variable names correspond to the variable names associated with each of the referents. Remember, the variable names have to be correct.


utterance: \n{utterance}\n
referents: \n{referent_info}\n
cognitive statuses:
"""

prompt_cognitive_status = PromptTemplate(
    input_variables=["referent_info", "utterance"],
    template=template_cognitive_status
)

chain_cognitive_status = LLMChain(llm=llm, prompt=prompt_cognitive_status)

# Pipeline

## Describing the linguistic parse algorithm in words

The input to the algorithm is an `utterance`, `list of available referent types`, `available action repertoire` and `available sensory or conceptual repertoire`. Then we perform the following steps:

1. Classify speech-act: This is the communicative intent of the speaker.
2. Extract Central referent: What is the central real world item that is being discussed in the utterance
3. Classify the "type" of the central referent (we find the type of object (e.g., physobj, location etc.) 
4. Extract Supplemental referents: What are some other real world objects or entities being discussed 
5. Classify the "types" of each of these supplemental references. 
6. Extract the core propositional content (`CPC`) of the utterance: The key `action` or `concept` being discussed in reference to the central referent and possibly other supplemental references. This is the key reason the speaker is even communicating to the listener -- to have them do some action X or inform them about some fact or concept Y. The X and Y here are the core propositional content 
7. Classify CPC as one of the grounded actions or concepts available in the robot. 
    - For actions: 
        1. Generate: Select a set of candidate actions from the available actions that match the CPC. 
        2. Tether: Attempt to bind each candidate action in the list to the referents ensuring that types and names match.
    - For concepts:
        1. Generate: Select a set of candidate concepts from the available concepts that match the CPC. If none, then generate a new symbol. 
        2. Tether: For each candidate concept, bind them. If novel concept, then no need to bind. 

## Speeding up

How can we speed up the parsing? We might be able to restrict how well and how much it searches

1. (best | all): best option only selects one candidate that has the best name match. all looks through all actions

In [18]:
# HELPER FUNCTIONS

def clean_candidates(candidates):
    """
    Cleans the list of candidates to extract a list of names and a list of scores of the candidates
    """
    names = []
    scores = []
    for candidate in candidates:
        name = candidate.split(":")[0]
        score = float(candidate.split(":")[1])
        names.append(name)
        scores.append(score)
        
    return names, scores


def prune_candidates(names, scores, threshold=0.75):
    return [(x,y) for x,y in zip(names,scores) if y > threshold ]

def best_candidate(names, scores, threshold=0.75):
    """
    Selects best name and score above a threshold. 
    """
    pruned_names = []
    pruned_scores = []
    for n,s in zip(names,scores):
        if s>threshold:
            pruned_names.append(n)
            pruned_scores.append(s)
    
    if pruned_names:
        return pruned_names[pruned_scores.index(max(pruned_scores))]
    return "NONE"


In [23]:
import ast
import string

SIMILARITY_THRESHOLD = 0.75

def find_and_bind(utterance, actions, concepts, types):
    print(f"\nProcessing utterance: {utterance}")
    print("[ ] Classifying speech act", end="\r")
    speech_act = chain_speech_act.run(utterance=utterance).lower()
    print("[X] Classifying speech act")
    
    # 2. Central Referent Extraction
    print("[ ] Extracting referents", end="\r")
    centralref = chain_centralref.run(utterance=utterance).lower()
    
    centralreftype = chain_typeof.run(ref=centralref, types=types, utterance=utterance ).split(" ")[-1]
    centralreftype = centralreftype.translate(str.maketrans('', '', string.punctuation))
    
    # 3. Supporting Referents Extraction
    supprefs = chain_suppref.run(utterance=utterance, centralref=centralref).lower()
    supprefs = ast.literal_eval(supprefs)
    
    supprefs_full = [] #with type info
    if supprefs:
        for suppref in supprefs:
            ref_type = chain_typeof.run(ref=suppref, types=types, utterance=utterance ).split(" ")[-1]
            supprefs_full.append(f"{suppref}:{ref_type}")
    print("[X] Extracting referents")
    print(f"\tCentral ref: {centralref}:{centralreftype}")
    print(f"\tSuppl ref: {supprefs_full}")

    print("[ ] Extracting CPC", end="\r")
    cpc = chain_cpc.run(utterance=utterance, speechact=speech_act, centralref=centralref)
    print("[X] Extracting CPC")
    print(f"\tcpc: {cpc}")
    
    # 4. Find and Bind Candidate real actions and concepts to the CPC.
    candidates = []
    #bound_candidates = []
    new_concept = {}
    if ":action" in cpc:
        
        # Find Candidates in the robot's action repertoire
        print("[ ] Extracting Candidate actions", end="\r")
        candidates = chain_candidate_realactions.run(utterance=utterance, 
                                                       centralref=centralref,
                                                       cpc=cpc,
                                                       actions=actions) 
        candidates = ast.literal_eval(candidates)
        print("[X] Extracting Candidate actions")
        print(f"\tCandidates: {candidates}")

        
        # Bind Candidates
        print("[ ] Selecting best candidate", end="\r")
        
        # extract scores
        names, scores = clean_candidates(candidates)
        
        # Select best candidate
        best = best_candidate(names=names, scores=scores, threshold=SIMILARITY_THRESHOLD)
        
        print("[X] Selecting best candidate")
        print(f"\tBest: {best}")
        
        print("[ ] Binding best candidate", end="\r")
        if not "NONE" in best:
            # Bind best candidate to a real action
            best_full = find_dict_in_list(actions, "name", best) #full info on the action 
            bound_candidate = chain_bound_action.run(utterance=utterance,
                                                  centralref=centralref,
                                                  supprefs=supprefs,
                                                  candidate_full_info=find_dict_in_list(actions, 
                                                                                         "name", 
                                                                                         best))
            # check if bound_candidate contains a "NONE"
            if "NONE" in bound_candidate:
                print(f"We have a problem. There is a unbound variable here:\n{bound_candidate}")

            # eval string
            bound_candidate = ast.literal_eval(bound_candidate)
        
        print("[X] Binding best candidate")
        print(f"\tBest full: {best_full}")
        print(f"\tBound candidate: {bound_candidate}")

        
    else:
        # Find candidates in the robot's perceptual and conceptual repertoire
        
        print("[ ] Extracting Candidate concepts", end="\r")
        candidates = chain_candidate_realconcepts.run(utterance=utterance, 
                                                       centralref=centralref,
                                                       cpc=cpc,
                                                       concepts=concepts) 
        candidates = ast.literal_eval(candidates)
        
        print("[X] Extracting Candidate concepts")
        print(f"\tCandidates: {candidates}")
        
        # Binding the best candidate above a certain threshold
        ## Approach: 
        ## 1. Prune candidates to only include those with thresholds higher than SIMILARITY_THRESHOLD
        ## 2. Bind the one with the highest score. If NONE in params then bind next one.
        ## 3. Stop 
        
        print("[ ] Selecting best candidate", end="\r")
        # extract scores
        names, scores = clean_candidates(candidates)
        
        # Select best candidate
        best = best_candidate(names=names, scores=scores, threshold=SIMILARITY_THRESHOLD)
        
        print("[X] Selecting best candidate")
        print(f"\tBest: {best}")
        
        if not "NONE" in best:
            # Bind best candidate to a real action
            print("[ ] Binding best candidate", end="\r")
            best_full = find_dict_in_list(actions, "name", best)
            bound_candidate = chain_bound_concept.run(utterance=utterance,
                                                  centralref=centralref,
                                                  supprefs=supprefs,
                                                  candidate_full_info=find_dict_in_list(concepts, 
                                                                                         "name", 
                                                                                         best))
            # check if bound_candidate contains a "NONE"
            if "NONE" in bound_candidate:
                print(f"We have a problem. There is a unbound variable here:\n{bound_candidate}")

            # eval string
            bound_candidate = ast.literal_eval(bound_candidate)
            
            
            print("[X] Binding best candidate")
            print(f"\tBest full: {best_full}")
            print(f"\tBound candidate: {bound_candidate}")
            
        else:
            # novel concept being described 
            # need to hypothesize a name and arguments. 
            print("[ ] Instantiating novel concept", end="\r")
            new_concept = chain_novel_concept.run(utterance=utterance,
                                                 types=types,
                                                 centralref=centralref,
                                                 supprefs=supprefs,
                                                 cpc=cpc)
            
            new_concept = ast.literal_eval(new_concept)
            print("[X] Instantiating novel concept")
            
            best_full = new_concept
            
            print("[ ] Binding novel concept", end="\r")
            bound_candidate = chain_bound_concept.run(utterance=utterance,
                                                  centralref=centralref,
                                                  supprefs=supprefs,
                                                  candidate_full_info=find_dict_in_list([new_concept], 
                                                                                         "name", 
                                                                                         new_concept['name']))
            
            print("[X] Binding novel concept")
            
            # check if bound_candidate contains a "NONE"
            if "NONE" in bound_candidate:
                print(f"We have a problem. There is a unbound variable here:\n{bound_candidate}")
            
            bound_candidate = ast.literal_eval(bound_candidate)
            
            
        print("[X] Selecting and Binding best concept candidate")
        print(f"\tBest full: {best_full}")
        print(f"\tBound candidate: {bound_candidate}")

    
    # ----- PROPERTY BINDING ---------------
    # Variable asignment (i.e., for each central and supp ref, their names need to have a consultant)
    # also the variables need to match. 
    
    print("[ ] Extracting Properties", end="\r")
    # Remove "self" from the binding list 
    bindings = []
    for item in bound_candidate['bindings']:
        if not list(item.values())[0] == 'self':
            bindings.append(item)
    
    
    
    spc = chain_properties.run(utterance=utterance,
                                     referent_info=bindings,
                              bound_candidate=bound_candidate)
    
    spc = ast.literal_eval(spc)
    
    print("[X] Extracting Properties")
    print(f"\tspc: {spc}")
    print(f"\tBindings: {bindings}")
    
    
    # Finding consultant properties
    print("[ ] Finding Consultant properties similar to SPC", end="\r")
    props_all = []
    for prop in spc:
    
        candidate_consultant_properties = chain_candidate_realprops.run(utterance=utterance,
                                                                   concepts=concepts,
                                                                   prop=prop)
    
        candidate_consultant_properties = ast.literal_eval(candidate_consultant_properties)
        
        print(f"\tCand. Cons. Props. for {prop['name']}: {candidate_consultant_properties}")
        
        # extract scores
        names, scores = clean_candidates(candidate_consultant_properties)
        
        # Select best candidate
        best = best_candidate(names=names, scores=scores, threshold=SIMILARITY_THRESHOLD)
        
        print(f"\tBest: {best}")
        
        
        props_all.append({'spc':prop, 'best':best})
    
    
        
    print("[X] Finding Consultants properties similar to SPC")

    print(f"\tProps all: {props_all}")
    
    # Binding or Tethering variables in the best candidate property for each SPC
    print("[ ] Binding Consultant properties similar to SPC", end="\r")
    
    bound_properties = []
    for prop in props_all: 
        if not "none" in prop['best'].lower():
            bound_property = chain_bound_concept.run(utterance=utterance,
                                                  centralref=centralref,
                                                  supprefs=supprefs,
                                                  candidate_full_info=find_dict_in_list(concepts, 
                                                                                         "name", 
                                                                                         prop['best']))
            
            # check if bound_candidate contains a "NONE"
            if "NONE" in bound_property:
                print(f"We have a problem. There is a unbound variable here:\n{bound_property}")

            # eval string
            bound_property = ast.literal_eval(bound_property)
            bound_properties.append(bound_property)
    print("[X] Binding Consultant properties similar to SPC")
    print(f"\tBound Properties: {bound_properties}")
            
    
    # --------------------------------------
        
    
    # ----------- COGNITIVE STATUS ------------------
    # for each of central and supp referents.
    
    print("[ ] Classifying cognitive status", end="\r")
    cognitive_statuses = chain_cognitive_status.run(utterance=utterance,
                                                   referent_info=bindings)
    
    cognitive_statuses = ast.literal_eval(cognitive_statuses)
    print("[X] Classifying cognitive status")
    
    
    # ----------------------------------------------
    
    
    
    output = {
        "utterance": utterance,
        "speech_act": speech_act,
        "centralref": f"{centralref}:{centralreftype}",
        "supprefs": supprefs_full,
        "cpc": cpc,
        "candidates": candidates,
        "bound_candidate": bound_candidate,
        "best": best_full,
        "spc": spc,
        "cognitive_statuses": cognitive_statuses,
        'candidate_consultant_properties': props_all,
        "bound_properties": bound_properties
    }
    #print(json.dumps(output, indent=2))
    
    return output

def construct_parse_string(bindings, grammar=None):
    """
    Given a parse dict of bindings, construct a string (ideally, we provide a grammar too)
    Grammar check not implemented
    """
    speaker = "brad"
    
    # CPC
    cpc_template = "{cpc_name}({cpc_variables})"
    cpc = cpc_template.format(cpc_name=bindings['best']['name'],
                             cpc_variables=",".join([ list(x.keys())[0] for x in  bindings['best']['roles'] ] ))
    
    print(cpc)
    
    # SPC
    if bindings['spc']:
        spcs = []
        for s in bindings['spc']:
            spc_template = "{spc_name}({spc_variables})"
            spc = spc_template.format(spc_name=s['name'],
                                     spc_variables=",".join(s['arguments']))
            spcs.append(spc)
    
    spc_all = ",".join(spcs)
    
    print(spc_all)
    
    final_template = "{speech_act}({speaker},{cpc},{{{spc},{cognitive_status}}})"

    
    
    
    


In [None]:
construct_parse_string(out)

In [None]:
out

# Evals

In [None]:
# Data processing classes and functions
import json

def process_data_item(json_item):
    actions = json_item['promptInfo']['actions']
    concepts = json_item['promptInfo']['properties']
    return actions, concepts

class DIARCDataset:
    def __init__(self, annotations_file, types=None):
        with open(annotations_file, "r") as f:
            self.data = json.load(f)
        self.data = self.data['utterances']
        self.types = types
        if not self.types:
            # default types
            self.types = ["physobj", "agent", "location", "pose", "action", "number", "direction", "name", "string"]
    
    def print_stats(self):
        num_items = len(self.data)
        print(f"Number of utterances: {num_items}")
    
    def get_item(self, idx):
        actions = self.data[idx]['promptInfo']['actions']
        concepts = self.data[idx]['promptInfo']['properties']
        utterance = self.data[idx]['utteranceText']
        desired_semantics = self.data[idx]['desiredSemantics']
        
        item = {
            "utterance": utterance,
            "desired_semantics": desired_semantics,
            "actions": actions,
            "concepts": concepts,
            "types": self.types
        
        }
        return item
    
    def xy(self):
        """
        returns all the utterances and desired semantics
        """
        utterances = []
        desired_semantics = []
        for item in data:
            utt = item['utterance']
            des = item['desired_semantics']
            utterances.append(utt)
            desired_semantics.append(des)
        return utterances, desired_semantics
            
    
    
        

In [21]:
# Construct Sample dev dataset 
import json 

with open("../data/actions_short.json", "r") as f:
    actions_dev = json.load(f)
with open("../data/properties.json", "r") as f:
    concepts_dev = json.load(f)
types = ["physobj", "agent", "location", "pose", "action", "number", "direction", "name", "string"]

utterances = ["then assemble the caddy",
             "that m3 screw belongs to Evan",
             "screw the m3 into that hole on the conveyor"]

dataset = []
for utt in utterances:
    item = {"utterance": utt,
           "actions": actions_dev,
           "concepts": concepts_dev,
           "types": types}
    dataset.append(item)

print(f"Available utterances: {utterances}")

Available utterances: ['then assemble the caddy', 'that m3 screw belongs to Evan', 'screw the m3 into that hole on the conveyor']


In [24]:
idx = 1
out = find_and_bind(utterance=dataset[idx]['utterance'],
                       actions=dataset[idx]['actions'],
                       concepts=dataset[idx]['concepts'],
                       types=types)
print(json.dumps(out, indent=2))


Processing utterance: that m3 screw belongs to Evan
[X] Classifying speech act
[X] Extracting referents
	Central ref: m3 screw:physobj
	Suppl ref: ['evan:agent.']
[X] Extracting CPC
	cpc: belonging:concept
[X] Extracting Candidate concepts
	Candidates: ['hole:0.1', 'prop:0.2', 'bottle:0.1', 'conveyor:0.1', 'work area:0.2']
[X] Selecting best candidate
	Best: NONE
[X] Instantiating novel concept
[X] Binding novel concept
[X] Selecting and Binding best concept candidate
	Best full: {'name': 'belonging', 'roles': [{'VAR0': 'physobj'}, {'VAR1': 'agent'}]}
	Bound candidate: {'name': 'belonging', 'bindings': [{'VAR0': 'm3 screw'}, {'VAR1': 'evan'}]}
[X] Extracting Properties
	spc: [{'name': 'm3 screw', 'arguments': ['VAR0']}, {'name': 'Evan', 'arguments': ['VAR1']}]
	Bindings: [{'VAR0': 'm3 screw'}, {'VAR1': 'evan'}]
	Cand. Cons. Props. for m3 screw: ['hole: 0.2', 'm3: 0.8', 'deepM3: 0.6', 'prop: 0.3', 'bottle: 0.1']
	Best: m3
	Cand. Cons. Props. for Evan: ['hole: 0.1', 'm3: 0.8', 'deepM3: 

In [None]:
for text in dev:
    print(text)
    out = linguistic_parse(text)
    print(json.dumps(out, indent=2))
    print()

In [None]:
find_dict_in_list(actions, "name", "mountScrew")

# Next

- [ ] The SPCs are a bit of a mess. Need to fix up that datastructure
- [ ] Variable names from the concepts.txt has "X", "Y" etc. We need to make sure they are instead VAR0, VAR1 etc. 
    - Align variable names in spc predicates with the cpc. 
    - Maybe consider a separate datastructure to store "referent" information 
        - referents, properties, cognitive statuses
        
        ```
        referents = [
            {"text": "m3 screw", 
            "type": "physobj",
            "variable_name": "VAR0",
            "cognitive status": "ACTIVATED"},
            
            {"text": "evan", 
            "type": "agent",
            "variable_name": "VAR1",
            "cognitive status": "FAMILIAR"}
            ]
            
        descriptors = [
            {"text": "m3 screw", 
            "name": "m3",
            "arguments": ["VAR0"] },
            
            {"text": "evan", 
            "name": "NONE",
            "arguments": [] }
            ]
        
        intention = {"speech_act": "wantBel",
                    "proposition":
                                {"text": belonging",
                                "type": "concept",
                                "arguments": ["VAR0", "VAR1"]}
            }
            }   
        ```
     - Consider other datastructures as well for the cpc, etc. 