voice interaction drafts/paRequirements/paRequirements.htm

<html dir="ltr" about="" property="dcterms:language" content="en" xmlns="http://www.w3.org/1999/xhtml" prefix="bibo: http://purl.org/ontology/bibo/" typeof="bibo:Document"><head>
        <title>Architecture Requirements for Intelligent Personal Assistants</title>
        <meta http-equiv="Content-Type" content="text/html;charset=utf-8">

        <link href="../cg-draft.css" rel="stylesheet" type="text/css" charset="utf-8">
    </head>
    <body contenteditable="false"><div class="head">
            <p><a href="http://www.w3.org/">
                    <img width="72" height="48" src="http://www.w3.org/Icons/w3c_home" alt="W3C"></a></p>
            <h1 property="dcterms:title" class="title" id="title">Architecture Requirements for Intelligent Personal Assistants</h1>
           
            <dl>
                <dt>Latest version</dt>
                <dd>Last modified: June 22, 2023</dd>
                <dd><a href="https://github.com/w3c/voiceinteraction/blob/master/voice%20interaction%20drafts/paRequirements/paRequirements.htm"> https://github.com/w3c/voiceinteraction/blob/master/voice%20interaction%20drafts/paRequirements.htm</a> </dd>
                <dt>Editor</dt>
                <dd>Deborah Dahl, Conversational Technologies</dd>
                <dd>Dirk Schnelle-Walka, switch</dd>
            </dl>
            <p class="copyright">Copyright &copy; 2023 the Contributors to the Voice Interaction Community Group, 
                published by the  <a href="http://www.w3.org/community/voiceinteraction/">Voice Interaction Community Group</a> 
                under the <a href="https://www.w3.org/community/about/agreements/cla/">W3C Community Contributor License Agreement (CLA)</a>. A human-readable <a href="http://www.w3.org/community/about/agreements/cla-deed/">summary</a> is available.</p>
            <hr></div>

        <h2 id="abstract">Abstract</h2>

        <p>This document was prepared by reviewing version 1.3 of the <a href="https://w3c.github.io/voiceinteraction/voice%20interaction%20drafts/paArchitecture-1-3.htm" >Intelligent Personal Assistant Architecture Report</a> and extracting requirements for the architecture that were implied by the report. This was done in order to have a standalone list of architecture requirements. The headings in this document correspond to the headings in the architecture report. Sections of the Architecture document which are not referenced here were not considered to contain any requirements. The terms "MUST", "MAY" and "SHOULD" are used in this document as defined in <a href="https://www.ietf.org/rfc/rfc2119.txt">IETF RFC 2119.</a></p>

        <h2>Status of This Document</h2>

        <p><em>This specification was published by the 
                <a href="http://www.w3.org/community/voiceinteraction/">Voice Interaction Community Group</a>. 
                It is not a W3C Standard nor is it on the W3C Standards Track. 
                Please note that under the 
                <a href="http://www.w3.org/community/about/agreements/cla/">W3C Community Contributor License Agreement (CLA)</a> there is a limited opt-out and other conditions apply. Learn more about <a href="http://www.w3.org/community/">W3C Community and Business Groups</a>.</em></p>

     
        <!-- OddPage -->
        <h2><span class="secno">1. </span>Introduction</h2>
        
<ol class="req">
<li>Intelligent Personal Assistants (IPA's) MUST be able to provide general purpose information</li>
<li>Specialized virtual assistants MUST be able to provide enterprise-specific information</li>
<li>Specialized virtual assistants MAY be able to provide non-enterprise-specific information</li>
<li>IPA's SHOULD be able to perform transactions</li>
<li>Specialized assistants MUST be able to interoperate with general IPA's</li>
<li>IPA's SHOULD be able to execute operations in a user's environment</li>
<li>IPA's MUST be able to interact with users through voice or text or both.</li>
</ol>
<h2><span class="secno">2. </span> Problem Statement</h2>
<ol class="req" start="8">
<li>
IPA's MAY be able to transfer a partially completed task to another IPA
</li>
</ol>
<h2><span class="secno">3. </span> Architecture</h2>
<ol class="req" start="9">
<li>IPA's MAY include a Client layer</li>
<li>IPA's MUST include a Dialog layer</li>
<li>IPA's MAY include an API/Data layer</li>
<li>Components MAY be shifted to other layers as needed</li>
<li>The architecture SHOULD support question answering and information retrieval applications</li>
<li>The architecture SHOULD support executing local services to accomplish tasks</li>
<li>The architecture SHOULD support executing remote services to accomplish tasks</li>
<li>The architecture MUST support dynamically adding local and remote services or knowledge sources.</li>
<li>It MAY be possible to forward requests from one IPA to another with the same architecture</li>
<li>It MAY be possible to forward requests or partial requests from one IPA to another with the same architecture, omitting the client layer</li>
<li>IPA extensions MAY be selected from a standardized marketplace </li>
</ol>
<h3><span class="secno">3.1 </span> Client Layer</h3>

  <h3><span class="secno">3.1.2</span> User Input and System Output</h3>
  <ol class="req" start="20">
 <li>	The Client layer MAY include a microphone</li>
<li>	The Client layer MAY include a means for text input</li>
<li>	The Client layer MAY include a speaker</li>
<li>	The Client layer MAY include a display</li>
<li>	Additional (non-speech) output modalities MAY be employed to render output or to capture input
</li> 
</ol>

  <h3><span class="secno">3.1.3</span> IPA Client</h3>
    <ol class="req" start="25">
<li>The IPA Client MUST allow activation and deactivation by means of a Client Activation Strategy.</li>
<li>IPA Clients MAY also capture input via text and output text</li>
<li>IPA Clients MAY also capture input from various modality recognizers </li>
<li>IPA Clients MAY also capture contextual information, e.g., location, time, environmental sounds or other inputs that it obtains from Local Data Providers</li>
<li>An IPA Client MAY also receive commands to be executed locally in the Local Services.</li>
<li>An IPA Client MAY also receive multimodal output to be rendered by a respective modality synthesizer</li>
<li>IPA Clients MAY reference a session identifier.</li>

</ol>
  <h4><span class="secno">3.2.2.1</span> Client Activation Strategy</h4>
    <ol class="req" start="32">
<li>The IPA Client MUST be activated with a Client Activation Strategy</li>
<li>	The Client Activation Strategy MAY be push-to-talk</li>
<li>	The Client Activation Strategy MAY be hotword</li>
<li>	The Client Activation Strategy MAY be triggered by an interpreted text string (either from audio or text)</li>
<li>	The Client Activation Strategy MAY be a change in environment</li>
<li>	The Client Activation Strategy MAY be triggered by a script or environmental condition</li>
<li>	The Client Activation Strategy MAY be a different strategy not enumerated here</li>
  </ol>
<h4><span class="secno">3.2.2.2</span> Local Service Registry</h4>
  <ol class="req" start="39">
<li>The IPA Client MUST include a Local Service Registry</li>
<li>	The Local Service Registry MUST maintain a list of Local Services</li>
<li>	The Local Service Registry MUST maintain a list of Local Data Providers</li>
  </ol>
<h2><span class="secno"></span> Dialog Layer</h2>
<h3><span class="secno">3.2.1</span> IPA Service</h3>
  <ol class="req" start="42">
<li>The IPA Client MUST forward audio data and metadata (if any) to the ASR</li>
<li>	The IPA Service MUST forward text data and metadata (if any) to the NLU</li>
<li>	The IPA Service MUST forward audio output output from the TTS to the IPA Client if supported</li>
<li>	The IPA Service MUST forward multimodal output from the Dialog Manager to the Client if supported</li>
<li>The IPA Service MUST forward text output from the NLG to the IPA Client if supported</li>
</ol>

<h3><span class="secno">3.2.2</span> ASR</h3>
  <ol class="req" start="47">
<li>The ASR MUST generate one or more recognition hypotheses from voice input that it receives from the IPA Service</li>
<li>	The ASR MAY associate recognition hypotheses with confidence scores</li>
<li>	The ASR MUST forward the recognition hypotheses to the NLU</li>
<li>	The ASR MAY update the History with the recognition hypotheses</li>
</ol>

<h3><span class="secno">3.2.3</span> NLU</h3>
  <ol class="req" start="51">
<li>The NLU MUST extract textual interpretations from text strings (either from audio or text)</li>
<li>	The NLU MAY extract multiple interpretations from input text strings (either from audio or text)</li>
<li>	The NLU MUST be able to interpret input Core Intent Sets</li>
<li>	The NLU SHOULD be able to interpret utterances that are a combination of activation strategies and commands.</li> 
<li>	The NLU make make use of the Data Provider to access local or external data</li> 
<li>	The NLU MAY make use of the Context to check for complementary information such as information in the history or knowledge </li>
<li>	The NLU MUST forward the semantic interpretation of the input to the Dialog Manager </>
<li>	The NLU MAY associate statistical confidences with interpretations</li>
<li>	The NLU MAY extract emotion or sentiment from text strings either from audio or text)</li>
  </ol>

<h3><span class="secno">3.2.4</span> Dialog Manager</h3>
  <ol  class="req" start="60">
<li>The Dialog Manager MUST recognize when the user goals are changed</li>
<li>	The Dialog Manager SHOULD confirm when the user goals are changed</li>
<li>	The Dialog Manager MAY consider ongoing workflows that must not be interrupted when the user switches goals.</li>
<li>	The Dialog Manager SHOULD update the History with dialog moves</li>
<li>	The Dialog Manager SHOULD determine the next dialog move </li>
<ol>
<li>based on internal considerations</li>
<li>based on output from other components in the same dialog system </li>
<li>based on output from other agents (IPA services)</li>
<li>how the Dialog Manager determines the next move is outside the scope of these requirements</li>
<li>	The Dialog Manager SHOULD make use of the TTS to generate audio data to be rendered on the IPA Client </li> 

<li>	The Dialog Manager MAY provide commands to be executed by the IPA Client or the External Services </li>
</ol>
</ol>

<h3><span class="secno">3.2.5</span> Context</h3>
  <ol class="req" start="65">
<li>The Context MAY make use of the Local Service Registry to include external knowledge from Local Data Providers</li>
<li>The Context MAY make use of the External Service Registry to include external knowledge from Data Providers</li> 
<li>The Context MAY provide external knowledge temporarily to the Knowledge Graph to be considered in reasoning</li>
</ol>

<h4><span class="secno">3.2.5.1</span> History</h4>
  <ol class="req" start="68">
<li>The Dialog History MAY store the past dialog events per user</li>
</ol>

<h2><span class="secno">3.3</span> API's/Data Layer</h2>
  <ol class="req" start="69">
<li>The Provider Selection Service MAY receive input from the Dialog Manager to query data from Data Providers</li>
<li>The Provider Selection Service MAY receive input from the Dialog Manager to execute External Services</li>
<li>	If the Provider Selection Service is called by the Dialog Manager with a preselected identifier of an IPA provider, it MUST use the preselected provider</li>
<li>	If the Provider Selection Service is not called with a preselected identifier of an IPA provider, the Provider Selection Service MUST follow a Provider Selection Strategy to determine those IPA Providers that are best suited to answer the request</li>
  </ol>

  

</body></html>