Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action semantics #81

Closed
mlagally opened this issue Jun 17, 2021 · 19 comments
Closed

Action semantics #81

mlagally opened this issue Jun 17, 2021 · 19 comments

Comments

@mlagally
Copy link
Contributor

mlagally commented Jun 17, 2021

There are two fundamentally different approaches to acotions:

  1. synchronous actions
    These are the baseline and need to be supported in any case.
    We have to define the set of error conditions and a way to communicate / signal a timeout.
    This should not be too hard.

  2. asynchronous actions
    It is easy to create a can of worms with race conditions if we don't get it right and mess up the design.
    This can get arbitrarily complex, if we think of non-atomic transactions, rollbacks, conflicting actions etc.

For these I suggest the following approach:

An action can return a "status" object, which can be used to query (i.e. poll) whether the action has been completed and returns the result.  
The caller has only "read-only" access on this object - there's no way to cancel an action using this object.
If an action should be cancellable, a separate "cancel_" can be defined by the TD,
which does the right thing.

@mlagally
Copy link
Contributor Author

Discussion in arch call on 17.6.

Needs a hypermedia format for an action model. See Stevens – Unix network programming
WebThings – Hypermedia format, consider for adoption? https://iot.mozilla.org/wot/#actions-resource
Ege – Robot arms – delays …

Actions can return a JSON object with multiple fields: containing "id", “status” and “cancel” endpoints and a notification endpoint to which to subscribed to for status change notifications.

Cancel and notification could be optional?

TD needs a way to communicate action capabilities, e.g. cancellable, notification support etc.

Output data schema of actions describe the capabilities, i.e. if they don't define a cancel endpoint an action is not cancellable.

Failure responses – protocol independent

Action status: Success, failed, ongoing, (not responding – on a gateway / proxy)

Link an action to a status object and an event endpoint? TD has no links.

@benfrancis
Copy link
Member

An action can return a "status" object, which can be used to query (i.e. poll) whether the action has been completed
and returns the result.
The caller has only "read-only" access on this object - there's no way to cancel an action using this object.
If an action should be cancellable, a separate "cancel_" can be defined by the TD,
which does the right thing.

As I understand it this would mean that all cancelable actions would require two separate interaction affordances in a Thing Description, e.g. fade and cancel_fade? What is the rationale for this?

WebThings – Hypermedia format, consider for adoption? https://iot.mozilla.org/wot/#actions-resource

To explain, the way that this works in the Web Thing REST API is that a POST on an Action resource to invoke an action responds with the URL of a dynamic ActionRequest resource. That ActionRequest resource can support a GET to query its status and a DELETE to cancel the action.

Invoke an action

POST https://mythingserver.com/things/lamp/actions/fade
Accept: application/json

{
  "fade": {
    "input": {
      "level": 50,
      "duration": 2000
    }
  }
}

201 Created

{
  "fade": {
    "input": {
      "level": 50,
      "duration": 2000
    },
    "href": "/things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655"
    "status": "pending"
  }
}

Query the status of an action

GET /things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655
Accept: application/json
200 OK
{
  "fade": {
    "input": {
      "level": 50,
      "duration": 2000
    },
    "href": "/things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655",
    "timeRequested": "2017-01-25T15:01:35+00:00",
    "status": "pending"
  }
}

Cancel an action

DELETE /things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655

204 No Content

A GET on the top level Action resource returns a list (queue) of all the pending ActionRequest resources corresponding that action.

List action requests

GET /things/lamp/actions/fade
Accept: application/json
200 OK
[
  {
    "fade": {
      "input": {
        "level": 50,
        "duration": 2000
      },
      "href": "/things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655",
      "timeRequested": "2017-01-25T15:01:35+00:00",
      "status": "pending"
    }
  },
  {
    "fade": {
      "input": {
        "level": 100,
        "duration": 2000
      },
      "href": "/things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655",
      "timeRequested": "2017-01-24T11:02:45+00:00",
      "timeCompleted": "2017-01-24T11:02:46+00:00",
      "status": "completed"
    }
  }
]

This approach doesn't require two separate interaction affordances per action.

I would suggest something along these lines for async actions in the Core Profile. There is a proposal in w3c/wot-thing-description#302 (comment) regarding how to represent some of these types of operations (queryaction, updateaction and cancelaction) in a Thing Description, but for the Core Profile this could be simplified via defaults.

The payload of the responses could be simplified from the Web Thing API by removing the object wrapper with the name of the action, since this is not strictly needed. (The reason it's there in the Web Thing API is that there's also a top level Actions resource which provides a queue of actions of all types, which uses the same payload format so needs to distinguish between action names).

For synchronous actions I assume that there would be no dynamically created ActionRequest resource, so the WoT producer could just respond to the invokeaction POST request with a success/failure status of some kind directly. But what happens if the action hasn't completed by the time the HTTP response comes back? The HTTP request may time out but the action still continues and eventually completes regardless, and the consumer would have no way to know what happened. One option would be to only define asynchronous actions and require that all implementations support that.

@mmccool
Copy link
Contributor

mmccool commented Jun 17, 2021

So my thought here is that the affordances for cancel, etc. would not have to be in the TD. This is hard anyway for dynamic resources. The original idea of the hypermedia approach (first proposed something like three years ago, and note it is in our charter to better nail it down) was that an "Action Description" would be returned by an action invocation and it would have a set of links in it for (dynamic) interactions that could be done to follow up on an action invocation. At a minimum support for checking status, requesting cancellation (if possible, so would be optional) and subscribing to a notification of a status change (also optional, just in case the endpoint can't deal with events, but the alternative is polling the status which is not efficient).

Anyway, the original proposal was to use a special case of a TD as an "Action Description" which would indeed allow a lot of flexibility, but would also be complicated. So my proposal is to keep things simple and just return a JSON object from an action invocation which would have a set of pre-defined entries. To make this concrete, when you invoke an action the "output" object (which, BTW, would be described in the TD's "output" data schema for the action) would look something like

{
    "id": <a per-action-invocation unique value>,
    "status": <a url to GET a status value, which would be one of a small number of states>,
    "cancel": <a url to POST to to cancel an action; optional; if omitted, the action would not be cancellable>,
    "notify": <a url to subscribe to notifications of status changes>
}

We would prescriptively define in the profile spec how each of these in turn would work (replacing a TD-like Action Description, basically, with normative specifications). For instance, for "status" we would indicate what values could be returned (one of a small set of strings, for instance) and how the protocol would work ("GET" on HTTP, for instance). Same for Notification. Note that you would be able to see from the TD whether or not an action is cancellable, etc. just by looking at the output data schema.

We could write a Thing Model for Actions to define all this if we wanted to get fancy but would not require the Thing to return it.

HOWEVER, in the meeting we all agreed that we should definitely start with the low-hanging fruit here and start by a least defining synchronous actions. Then only once that is done should we look at how to deal with async actions (and that means we need some way to distinguish the two).

We also discussed a number of alternatives to the above, but cluttering the TD with a bunch of extra properties and events for each action does not not really seem like a good idea. We also thought that maybe additional "ops" for actions like "notify" and "cancel" might go into the TD spec later, and wanted something consistent with that (possible) evolution of the TD. Taking that approach in the profile spec now though is not feasible.

@mmccool
Copy link
Contributor

mmccool commented Jun 17, 2021

@benfrancis BTW, I admit to typing up the above before reading all the details of your posts (I only had 5m between meetings). Skimming what you posted it seems we might be close to being on the same page. I will read your posts more carefully and post a followup soon.

@egekorkan
Copy link
Contributor

Please also check w3c/wot-thing-description#899

From my point of view:

  • Doing this in a generic way like the TD does for all Things -> Separate TF needed. This is a very big field where we have close to zero experience.
  • Doing this as a specific API/subprotocol -> Specification of this protocol in a spec like profile or binding templates.

@egekorkan
Copy link
Contributor

Also regarding the very first comment: w3c/wot-thing-description#890

@mlagally
Copy link
Contributor Author

We discussed a proposal during the vF2F, slides are here:
https://github.com/w3c/wot/blob/main/PRESENTATIONS/2021-06-online-f2f/2021-06-30-WoT-F2F-Action%20Semantics.pdf

@benfrancis
Copy link
Member

benfrancis commented Jul 15, 2021

Below is a sketch of a proposal for how the action operations could work in the Protocol Binding section of the WoT Core Profile.

Note: I could personally live without the updateaction operation, since for many use cases simply sending a second follow-up action request could fulfil the same purpose.

This proposal includes support for both synchronous and asynchronous action status responses. My suggestion is that web things can choose which type of response to send. Consumers MUST accept both types of response to the initial invokeaction request, but support for the other operations could be made optional.

invokeaction

POST /things/lamp/actions/fade HTTP/1.1
Host: mythingserver.com
Content-Type: application/json
Accept: application/json
{
  "level": 100,
  "duration": 5
}

See #81 (comment) for a proposal of how this could work in the Core Profile.
A web thing can either respond to an action invocation request synchronously with a 200 OK response containing an ActionStatus object, or respond asynchronously by responding with a 201 Created response with the URL of an ActionStatus resource in the Location header.

ActionStatus object

An action status object contains:

  • input - confirming the providing input parameters from the action invocation request
  • output - providing the output data of a completed action, if applicable
  • status - An enum with a set of status strings
    • pending
    • running
    • completed
    • failed
  • error - Error information following the RFC7807 Problem Details format, if applicable

Synchronous response

HTTP/1.1 200 OK
Content-Type: application/json
{
  "input": {
      "level": 100,
      "duration": 5
   },
  "status": "completed"
}

If there's an error carrying out the action, the server MUST return an error response (e.g. 400 for invalid parameters or 500 for a failed actuation). E.g.

HTTP/1.1 400 Bad Request
Content-Type: application/json
{
  "input": {
      "level": 101,
      "duration": 5
   },
  "status": "failed",
  "error": {
    "type": "https://mythingserver.com/docs/errors/invalid-level",
    "title": "Invalid value for level provided",
    "invalid-params": [
      {
        "name": "level",
        "reason": "Must be a valid number between 0 and 100",
      }
    ]
  }
}

Asynchronous response

HTTP/1.1 201 CREATED
Location: /things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655

queryaction

If a web thing responds with a link to an ActionStatus resource, a consumer can poll that resource to get the current state of the action.

GET /things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655 HTTP/1.1
Host: mythingserver.com
Accept: application/json

The web thing responds with an ActionStatus object.

HTTP/1.1 200 OK
Content-Type: application/json
{
  "input": {
      "level": 100,
      "duration": 5
   },
  "status": "running"
}

updateaction

In order to update a pending or running action, a consumer can send a PUT request to its ActionStatus resource URL with new input data.

PUT /things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655 HTTP/1.1
Host: mythingserver.com
Content-Type: application/json
Accept: application/json
{
  "level": 50,
  "duration": 5
}

If the action request is successfully updated, the web thing responds with an updated ActionStatus resource.

HTTP/1.1 200 OK
Content-Type: application/json
{
  "input": {
      "level": 50,
      "duration": 5
   },
  "status": "running"
}

Otherwise it may respond with an error code (e.g. if the action request can not be updated or has already completed).

cancelaction

In order to cancel an asynchronous action a consumer can send a DELETE request to its ActionStatus resource URL.

DELETE /things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655 HTTP/1.1
Host: mythingserver.com

If the action is sucessfully cancelled then the web thing responds with a 204 response.

HTTP/1.1 204 No Content

otherwise it may respond with an error (e.g. if the action can't be cancelled or has already completed).

Note that ActionStatus resources are not expected to persist forever so may be stored in volatile memory by a web thing and/or cleaned up on a regular interval.


Discussions around how to describe these types of action operations canonically in a Thing Description are continuing in w3c/wot-thing-description#302. For the purposes of the Core Profile we don't necessarily have to wait for those features to be added to the Thing Description, we could just expect Thing Descriptions to provide a single URL for an action affordance and apply the above set of operations as defaults. If and when the Thing Description specification catches up, we can provide an informative example of a canonical Thing Description describing these operations.

Edit: One thing that's missing from this proposal which we could add (and is already supported in WebThings), is an additional operation to enumerate the list of action requests in an action queue using a GET request on the action URL.

@sebastiankb
Copy link
Contributor

I like @benfrancis proposal. There two points, which I like to discuss:

  1. Shall we echo the input parameters in the response message? As @mmccool mentioned in today's call, how about the situation having big input parameters? E.g., there is a convertPhoto action where you can submit JPEG files. Does it make sens to have the origin JPEGs again in the response?

  2. Do we need a status element there for sync actions? Would HTTP status codes not be sufficient?

@benfrancis
Copy link
Member

benfrancis commented Jul 15, 2021

@sebastiankb wrote:

  1. Shall we echo the input parameters in the response message? As @mmccool mentioned in today's call, how about the situation having big input parameters? E.g., there is a convertPhoto action where you can submit JPEG files. Does it make sens to have the origin JPEGs again in the response?

I agree this could be inefficient for large inputs, as with the writeproperty operation.

One argument for including the input data in the body of the dynamically created resource is that it can then neatly be updated with a PUT request in the updateaction operation. Actually that makes me realise a couple of things:

  • The payload of the updateaction request should probably be wrapped in an object containing an "input" map, since we don't want to replace the whole resource with just the new input data
  • The method of the updateaction request should probably be a PATCH rather than a PUT since we are only updating the input member, not output, status or error.

E.g.

PATCH /things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655 HTTP/1.1
Host: mythingserver.com
Content-Type: application/json
Accept: application/json
{
  "input": {
    "level": 50,
    "duration": 5
  }
{

If we decide we don't need the updateaction operation then that's less of an issue and we can just omit input altogether, but we may just be storing up problems for the future.

Is there some other way we can mitigate the issue of large inputs? @mmccool suggested just including a hash for example. Could we truncate large values? How do other hypermedia systems and APIs deal with that issue?

  1. Do we need a status element there for sync actions? Would HTTP status codes not be sufficient?

I wondered this too. I concluded that given there's no way to guarantee that all actions can be completed within an HTTP timeout period, it's still useful for the consumer to know if the invoked action is still pending or running when the HTTP response comes back, even if a dynamic resource is not created to track its status.

@sebastiankb
Copy link
Contributor

Is there some other way we can mitigate the issue of large inputs?

How about introducing a sub-resource where the input parameters of the invoked action can be queried. E.g.,

GET /things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655/input HTTP/1.1
Host: mythingserver.com
Accept: application/json

The response can look like:

HTTP/1.1 200 OK
Content-Type: application/json
 {
      "level": 50,
      "duration": 5
  }

The advantages are, that the client can decide to check the input parameters and the usual queryaction response will be more compact.

I wondered this too. I concluded that given there's no way to guarantee that all actions can be completed within an HTTP timeout period, it's still useful for the consumer to know if the invoked action is still pending or running when the HTTP response comes back, even if a dynamic resource is not created to track its status.

I had quick look into XML-RPC. If everything is ok, simply the return value is provided without a status code in the response message. If something went wrong, the response message is different with a detailed error message. We could also do this by the usage of the additionalResponse feature in the TD. What do you think?

@relu91
Copy link
Member

relu91 commented Jul 16, 2021

Another alternative would be to make input optional. Consumers can even know it ahead of time by checking the output DataSchema. Considering that the updateaction is a less common use case I think it makes sense, in the end, a consumer that wants to update the resource would just do a check either before invoking the action (thanks to the DataSchema) or after (checking output.input !== undefined).

Do you see any downsides?

@mlagally
Copy link
Contributor Author

mlagally commented Jul 16, 2021

I suggest we implement the decision from the architecture call and create a PR with the sections of the current proposal that we agreed upon in the call, i.e. to include invoke, query and cancel into the draft.

@benfrancis - We can extend the branch/PR of #88 and evolve it, or do you prefer to create a separate PR?

This discussion about input parameters and whether it is optional in the response is very useful and should be continued in the next architecture/profile call. We can then incrementally refine and clarify these questions.

A JSON schema would be very helpful to have a proposal that we can agree on and can include into the spec.
@relu91 do you think you could help?

@relu91
Copy link
Member

relu91 commented Jul 19, 2021

Yes, sure! So what I had in mind was something like this:

{
// A TD action description 
"newAction" :{
            "title": "newAction",
            "description": "",
            "input": {
                "type": "object",
                "properties": {
                    "type": "object",
                    "property": {
                        "foo": {
                            "type": "string"
                        }
                    }
                }
            },
            "output": {
               // according to what is described above an ActionStatus can be described with this schema
                "type": "object",
                "properties": {
                    "input" : {
                        "type": "object",
                        "property": {
                            "foo": {"type":"string"}
                        }
                    },
                    "output": {
                        "type": "string" // The actual output of the action.  it can be anything
                    },
                    "status": {
                        "type": "string",
                        "enum": [
                            "pending",
                            "running",
                            "completed",
                            "failed"
                        ]
                    },
                    "error": {
                        "type": "object",
                        "description": "An error object according to RFC 7807",
                        "properties": {
                            "type": { "type": "string"},
                            "title": { "type": "string"},
                            "status": { "type": "string"},
                            "detail": { "type": "string"},
                            "instance": { "type": "string"}
                        }
                    }
                },
                "required": [ "status", "input" ]  // here I know that the response will have the input field
            },
            "forms": []
        }
}

As you can see using the required array I can state that the input it will be always returned in the response for the invokeaction operation. We can express also mixed situations where the input field might be there if needed removing it from the required array (e.g. "required": [ "status"]). Or we can defitly says that i won't be there just removing it from the properties object:

{
 "output": {
               // according to what is described above an ActionStatus can be described with this schema
                "type": "object",
                "properties": {
                   // no more input defined     
                    "output": {
                        "type": "string"  // The actual output of the action.  it can be anything
                    },
                    "status": {
                        "type": "string",
                        "enum": [
                            "pending",
                            "running",
                            "completed",
                            "failed"
                        ]
                    },
                    "error": {
                        "type": "object",
                        "description": "An error object according to RFC 7807",
                        "properties": {
                            "type": { "type": "string"},
                            "title": { "type": "string"},
                            "status": { "type": "string"},
                            "detail": { "type": "string"},
                            "instance": { "type": "string"}
                        }
                    }
                },
                "required": [ "status" ] 
            },
}

Note: the JSON schema might not be accurate to the spec defined by @benfrancis, it is meant to be just mean to explain my previous comment. We can describe further during the call and maybe refining it inside a PR.

@benfrancis
Copy link
Member

Please see #89 for a first draft of specification text to describe invokeaction, queryaction and cancelaction.

@benfrancis
Copy link
Member

@relu91 wrote:

So what I had in mind was something like this
...

It's probably worth noting at this stage that I'd ideally like to get to a point where a Web Thing conformant with the Core Profile could provide a very simple Thing Description like...

{
  "@context": "https://www.w3.org/2019/wot/td/v1",
  "id": "urn:ex:thing",
  "title": "My lamp",
  "profile": "https://www.w3.org/2021/wot/profile/core",
  "security": { ... },
  "actions": {
    "fade": {
      "input": {
        "type": "number",
        "description": "duration in ms"
      },
      "forms": [ { "href": "/fade" } ]
}

...then a conformant Consumer would see that the Web Thing supports the Core Profile and by applying all the defaults defined in the profile specification would arrive at a much more comprehensive canonical Thing Description much like the one you have provided above, or the one in w3c/wot-thing-description#302 (comment) with the full set of operations defined. This would mean that Web Things which support the Core Profile don't have to worry about all the complexities of dealing with multiple forms declarative protocol bindings for dynamic resources and can just provide a single HTTP endpoint for an action affordance which is then expanded out into the full set of operations for free. I see this as an extension of the current set of defaults in the Thing Description specification.

@mlagally
Copy link
Contributor Author

I completely agree - simplicity is one of the primary goals of the profile spec.

@egekorkan
Copy link
Contributor

Regarding the simplicity argument of @mlagally , this does not make an implementation simpler, only its non canonical TD

@benfrancis
Copy link
Member

@benfrancis wrote:

this does not make an implementation simpler, only its non canonical TD

Currently the Thing Description specification puts no constraints on the protocols that Web Things may use or the complexity of their protocol bindings, which makes it effectively impossible to implement a Consumer that can support any Web Thing.

If we accepted that a Consumer which implements support for the Core Profile does not have to support Web Things which don't conform with the profile, then actually it could drastically simplify implementations. This is because although it may be possible to expand a simplified TD into a more complex canonical TD with declarative protocol bindings describing every little detail, Consumers would not necessarily need to support other declarative protocol bindings which don't conform with the profile.

e.g. a Consumer conformant with the Core Profile may support a queryaction operation which follows the protocol binding and data schema defined in the Core Profile, but not support a queryaction operation which uses some other approach using a declarative protocol binding in a Form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants