# v0.1

The goal of this is to get a basic version up and running

Approach 1
- Infer dialog act
- Look only at action names at first (maybe descriptions), if ambiguous, then look at action signatures, if ambiguous look at pre/eff, if still ambiguous ask for clarification. 
    - Need to be more general here. The imperative could be specifying a sequence of actions (not sure DIARC can handle this anyway), or an action that cannot occur unless another action is performed first --> constant interaction with planner is needed. Sometimes action imperatives are even just goal directives in disguise (go to the door --> no goToDoor action, but at(door) is a possible state. How to convert "go to" to "be at". "put two blue blocks on top of the red one". 


idea
- Check if action names exist, if not then reword the utterance as a goal, and see if it can be handled that way
- If only one name exists, then check action signature and types
    - if that works, then check if precons are met, and if so, perform action
    - If not, then set precons as goal, plan and generate action sequence, and then also perform target action
- If multiple matches, then check to see the "best" action signature
    - Select best action signature by 

Other things:
- Get a pddl-based planner involved --> external calls 

Variations
- human involvement could include providing some set of conditions that specify several states, or they provide a a group of actions (a miniplan) and refer to that by a name
- human could provide action performance constraints - do X, but don't touch Y.

- Challenges with reference resolution .


Some examples

INSTRUCT(commX,shafer,pass(shafer,commX,VAR0),{plate(VAR0),DEFINITE(VAR0)})




In [1]:
# Spot testing
data = [{"utterance": "Pick up the ball", "act": "INSTRUCT"},
        {"utterance": "Get the ball", "act": "INSTRUCT"},
        {"utterance": "Pick up a ball", "act": "INSTRUCT"},
        {"utterance": "Pick up this blue ball", "act": "INSTRUCT"},
        {"utterance": "Put the ball down on the table", "act":"INSTRUCT"},
       {"utterance": "Get the circuit breaker on the work area", "act": "INSTRUCT"},
       {"utterance": "First raise your arms", "act": "INSTRUCT"},
       {"utterance": "Dempster, who do you trust?", "act": "QUESTION"},
       {"utterance": "Dempster, do you see an object?", "act": "QUESTION"},
       {"utterance": "The area behind you is safe", "act": "STATEMENT"},
       {"utterance": "The object in front of you is a ball", "act": "STATEMENT"},
       {"utterance": "It uses a medical caddy", "act": "STATEMENT"}]

intents = ["INSTRUCT", "QUESTION", "STATEMENT"]


In [2]:
actions = [
    {'action': 'pickup',
    'description': 'Picks up items',
    'parameters': ['object'], 
    },
    {'action': 'putdown',
    'description': 'Puts down items',
    'parameters': ['object'], 
    }
]

consultants = [
    {'consultant': 'vision',
    'properties': [
        {'name': 'clear',
         'parameters': ['object']
        },
        {'name': 'holding',
         'parameters': ['object']
        },
        {'name': 'on',
        'parameters': ['object', 'object']}
    ]
    }
]


In [3]:
# Given a list of dictionaries, and a key, return the entry in the list that matches
def find_dict_in_list(lst, key, target):
    for item in lst:
        if not key in item:
            #print("Key not in Dict")
            return None
        if item[key] == target:
            return item
    #print("Nothing found")
    return None

In [4]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain

In [5]:
llm = ChatOpenAI(model_name="gpt-4", temperature=0.0)
#llm = OpenAI(temperature=0.0)

In [9]:
# CHAINS

## (1) Dialog Act Classification 

template_dialog_act_classifier= """
Decide whether the utterance below from a speaker to a listener is an INSTRUCT, QUESTION or STATEMENT. 
An INSTRUCT is an imperative statement or a request by the speaker to have the listener do an action.
A QUESTION is a query or request from a speaker for more information from the listener about the listeners knowledge, beliefs or perceptions
A STATEMENT is a statement of fact or opinion that the speaker conveys to a listener. 

utterance: \n{utterance}\n
act:
"""

prompt_dialog_act_classifier = PromptTemplate(
    input_variables=["utterance"],
    template=template_dialog_act_classifier
)

chain_dialog_act_classifier = LLMChain(llm=llm, prompt=prompt_dialog_act_classifier)


## (2) Find relevant action from Action DB

template_action_selector= """
Select an action from the list of available actions that is most relevant to the given utterance. 
To decide the applicable action, use the following procedure to systematically filter the most relevant action:
1. Check the dialog_type. If it is an INSTRUCT, then narrow the list of actions to only include those actions that are ontic actions or world-modifying actions. If the utterance is a QUESTION, then narrow the list of actions to only include those actions that are querying actions. If the utterance is a STATEMENT, then narrow the list of actions to only include those actions that are epistemic actions, or actions that change the belief state of the listener. 

2. Then, compare the name and description of the action to the utterance. Narrow the list of actions to include only those with a semantically similar name or description to the utterance. 

3. If the narrowed list contains one action, then return its name. If it contains no actions, then return NONE. If it contains more than one action, then return AMBIGUOUS.


\n\n LIST OF AVAILABLE ACTIONS \n:
{actions}

utterance: \n{utterance}\n
dialog act: \n{dialog_act}\n
action:
"""

prompt_action_selector = PromptTemplate(
    input_variables=["utterance", "actions", "dialog_act"],
    template=template_action_selector
)

chain_action_selector = LLMChain(llm=llm, prompt=prompt_action_selector)


## (3) Extract parameters
###>>  Doing this programmatically from the actions list


## (4) Identify a referrent object and its properties. 
### Given an utterance, an action, and its parameter types, return a set of properties 

template_referent_properties= """
Identify descriptors in the utterance that refer to an entity which is an argument or parameter in the action.
Use the following procedure:
1. For each of the parameter types, identify what descriptive terms refer to the parameter type from the utterance. 
That is think of an entity is being referred to by the parameter types, and see what descriptors refer to this entity.
2. Build a predicates for each descriptive terms. The predicates have a functor name that is the descriptor and a variable name.
Hypothesize variable names (e.g., VAR0, VAR1, etc.). The predicates should be of the form `descriptive term(variable names)`
3. Compose the predicates into a list 
4. Return the list as properties. 

Remember, these are descriptive properties (adjectives in a sense) and do not include cues for articles, determiners, pronouns etc.
Also remember that functors cannot contain spaces, so replace spaces with underscores.
\n\nEXAMPLE\n
utterance: Pick up the blue ball
action: pickup
parameter types: ["object"]
properties: ["blue(VAR0)", "ball(VAR0)"]
\n\n
utterance: \n{utterance}\n
action: \n{action}\n
parameter types: \n{parameters}\n
properties:
"""
prompt_referent_properties = PromptTemplate(
    input_variables=["utterance", "action", "parameters"],
    template=template_referent_properties
)

chain_referent_properties = LLMChain(llm=llm, prompt=prompt_referent_properties)



## (5) Ref Res: For each variable determine it's cognitive status 
### For each variable we are figuring out the one or more givenness hierarchy statuses

template_cognitive_status= """
For each variable in the properties, decide which ONE of the following five cognitive statuses the variables in the below properties could fall into:
statuses: [INFOCUS, ACTIVATED", FAMILIAR, UINIQUELY_IDENTIFIABLE, REFERENTIAL, TYPE_IDENTIFIABLE]

As shown in the table below, the Givenness Hierarchy is comprised of six hierarchically nested tiers of cognitive status, 
where information with one cognitive status can be inferred to also have all
lower statuses. Each level of the GH is “cued” by a set
of linguistic forms, as seen in the table. For example, the second
row of the table shows that the definite use of “this” can be
used to infer that the speaker assumes the referent to be at
least activated to their interlocutor.
\n\n
Cognitive Status | Mnemonic Status | Form |
-----------------|-----------------|------|
INFOCUS | in the focus of attention | it |
ACTIVATED | in short term memory | this,that,this N |
FAMILIAR | in long term memory| that N |
UNIQUELY IDENTIFIABLE | in long term memory  or new | the N |
REFERENTIAL | new or hypothetical|  indefinite this N |
TYPE IDENTIFIABLE | new or hypothetical | a N |
\n\n


When deciding the one cognitive status for each variable, use the table above and compare the form (pronoun, determiner, article) of the utterance to its status.
\n\nExample:\n
utterance: Pick up the blue ball
properties: ["blue(VAR0)", "ball(VAR0)"]
cognitive status: ["UNIQUELY IDENTIFIABLE(VAR0)"]
\n\n
utterance: \n{utterance}\n
properties: \n{properties}\n
cognitive status:
"""



prompt_cognitive_status = PromptTemplate(
    input_variables=["utterance", "properties"],
    template=template_cognitive_status
)

chain_cognitive_status = LLMChain(llm=llm, prompt=prompt_cognitive_status)





## (6) Check if consultants can handle the properties 




## (6) Construct the final parse 



In [10]:
import ast
import re

def parse_utterance(utterance, actions):
    #print(f"Utterance: {utterance}")
    #print(f"Actions: {actions}")
    dialog_act = chain_dialog_act_classifier.run(utterance=utterance)
    action = chain_action_selector.run(utterance=utterance, actions=actions, dialog_act=dialog_act)
    parameters = []
    properties = []
    cognitive_status = []
    variables = []
    if not action=="NONE" and not action=="AMBIGUOUS":
        # Get params
        relevant_action = find_dict_in_list(actions, 'action', action)
        parameters = relevant_action['parameters']
        
        # Get properties
        properties = chain_referent_properties.run(utterance=utterance, action=action, parameters=parameters)
        properties = ast.literal_eval(properties)
        
        # Get list of variable names
        def get_args(predicate):
            return re.search('\(([^)]+)', predicate).group(1)
        
        if properties:
            variables = list(set([get_args(x) for x in properties]))
        
        
        cognitive_status = chain_cognitive_status.run(utterance=utterance, properties=properties)
        
        #eval cog status
        cognitive_status = ast.literal_eval(cognitive_status)
        
    output = {'utterance': utterance,
              'dialog_act':dialog_act,
              'action': action,
             'parameters': parameters,
             'properties': properties,
              'variables': variables,
             'cognitive_status': cognitive_status}
    return output



def parse(speaker, listener, utterance, actions):
    output = parse_utterance(utterance, actions)
    variables = ",".join(output['variables'])
    properties = ",".join(output['properties'])
    cognitive_status = ",".join(output['cognitive_status'])
    action = output['action']
    dialog_act = output['dialog_act']
    
    template = "{dialog_act}({speaker},{listener},{action}({variables}),{{{properties},{cognitive_status}}})"
    parsed = template.format(dialog_act=dialog_act,
                            speaker=speaker,
                            listener=listener,
                            action=action,
                            variables=variables,
                            properties=properties,
                            cognitive_status=cognitive_status)
    
    return parsed, output


In [13]:
import json

for item in data:
    parsed, output = parse("brad", "self", item['utterance'], actions)
    print(json.dumps(output, indent=2))   
    print("PARSED: ", parsed,"\n")

{
  "utterance": "Pick up the ball",
  "dialog_act": "INSTRUCT",
  "action": "pickup",
  "parameters": [
    "object"
  ],
  "properties": [
    "ball(VAR0)"
  ],
  "variables": [
    "VAR0"
  ],
  "cognitive_status": [
    "UNIQUELY IDENTIFIABLE(VAR0)"
  ]
}
PARSED:  INSTRUCT(brad,self,pickup(VAR0),{ball(VAR0),UNIQUELY IDENTIFIABLE(VAR0)}) 

{
  "utterance": "Get the ball",
  "dialog_act": "INSTRUCT",
  "action": "pickup",
  "parameters": [
    "object"
  ],
  "properties": [
    "ball(VAR0)"
  ],
  "variables": [
    "VAR0"
  ],
  "cognitive_status": [
    "UNIQUELY IDENTIFIABLE(VAR0)"
  ]
}
PARSED:  INSTRUCT(brad,self,pickup(VAR0),{ball(VAR0),UNIQUELY IDENTIFIABLE(VAR0)}) 

{
  "utterance": "Pick up a ball",
  "dialog_act": "INSTRUCT",
  "action": "pickup",
  "parameters": [
    "object"
  ],
  "properties": [
    "ball(VAR0)"
  ],
  "variables": [
    "VAR0"
  ],
  "cognitive_status": [
    "TYPE IDENTIFIABLE(VAR0)"
  ]
}
PARSED:  INSTRUCT(brad,self,pickup(VAR0),{ball(VAR0),TYPE ID

In [12]:
import gradio as gr

def nlu(utterance):
    parsed, output = parse("brad", "self", utterance, actions)
    return parsed, output

demo = gr.Interface(fn=nlu, inputs="text", outputs=["text", "text"])

demo.launch() 

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


