Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: Define Miminum Metadata of Messages #42

Open
ylorph opened this issue Dec 16, 2019 · 10 comments
Open

Problem: Define Miminum Metadata of Messages #42

ylorph opened this issue Dec 16, 2019 · 10 comments

Comments

@ylorph
Copy link
Owner

ylorph commented Dec 16, 2019

https://twitter.com/SKleanthous/status/1197272391304523776

https://github.com/cloudevents/spec/blob/master/spec.md

See also
#29
#33

@ylorph ylorph changed the title Problem: Define Miminum Matadata of Messages Problem: Define Miminum Metadata of Messages Dec 16, 2019
@edblackburn
Copy link
Contributor

edblackburn commented Apr 22, 2020

A list of @skleanthous suggestions:

  1. Message name
  2. Name of service which owns the contract of the message
  3. Message version
  4. Message unique id
  5. Correlation id
  6. Causation id
  7. UTC Timestamp
  8. Authentication token
  9. Entity version. For commands, this is taken from a query (similarly to an ETag), and used in all messages for idempotency concurrency
  10. [Event-only] Resource which raised the event

runtime requires the above for:
1 & 2 identify the message that needs to be acted upon
3 allows for parsing of different versions, in case of breaking changes
5 allows for checking for errors in distributed processes
8 for security and auth
9 for idempotency concurrency

The above can also be used for debugging:
5 can be used to get all messages that occurred from a single action. This is very useful as an index to causal queries.
4 & 6 allow building a causal chain of each command in a system
7-10 for context (along with data)
The resource version is particularly useful when we can retrieve resource versions from our store (like in case of event sourcing), because having the causal chain available, we can replay the actual command on the precise state of the resource we want to inspect.

@ylorph
Copy link
Owner Author

ylorph commented May 1, 2020

Other things that might be of interrest;

  • client IP address for client-initiated actions

@kijanawoodard
Copy link

Per @skleanthous, 9 above should be about concurrency instead of idempotency.

  1. Entity version. For commands, this is taken from a query (similarly to an ETag), and used in all messages for CONCURRENCY.

@MerrionComputing
Copy link
Contributor

I would domain qualify the message name - i.e Bank.Account.InterestAccrued in case different domains/entities use the same word to describe different events

@skleanthous
Copy link

skleanthous commented Jul 14, 2020

@MerrionComputing The idea is that you put the domain name in metadata 2. Note that this doesn't necessarily have to be FQN = (metadata 2) + (metadata 1). Doing that has a risk of introducing breaking changes due to namespaces change due to refactoring if you then take advantage of this in the consumer, although this will very often be an acceptable tradeoff for productivity.

In some cases, I found it better to just use the domain, or domain and subdomain name, only in metdata 2 and leave the consumer explicitly map that to a message to get deserialized, although of course this is context-specific.

Regardless of what you put in metadata 2, I found splitting those two useful because it very often helps in debugging issues because it allows you to easily filter messages by name or by domain to enhance visibility.

@ylorph
Copy link
Owner Author

ylorph commented Jul 15, 2020

Thinking about having a qualifier for the message as well:
just have a case where I have an event DeliveryAvailable (It's assessments testing terminology ) , when some Delivery is published in the online assessment platform, but we use different online assessment platform.
So our integration has different DeliveryAvailable event type in different modules
Since some of the details of the content of the message is different, parsing them is different as well.
I do need different message type & type name to facilitate processing
The other would be to go further in the abstraction of the assessment platform, but that is a step to far for us.
so DeliveryAvailable in the Platform1 Module will have a name will be like Platform1.DeliveryAvailable and in platform 2 something like Platform2.DeliveryAvailable

@skleanthous
Copy link

skleanthous commented Jul 15, 2020

@ylorph That's an interesting point. A couple of thoughts on that:

  1. Having the service name (metadata point 2) contain the platform may be a good idea. You would use both service name and message name both to deserialize, even within a single platform (as you describe it), because two different BC's could raise two different events with the same event name since they would have different definitions of the same UL term.
  2. If the case is that different platforms have the same domains, and BC's with some minor deviations which cause deviations in the message contracts, I would suggest having platform as a separate piece of metadata. The above list I proposed is a meaningful minimum rather than a complete set.
  3. Also, having it in the name isn't a bad idea either; it depends on your context and situation

Personally I see it like this:

  • Message name -> the "what happened" part
  • Service name -> the context that defines the above, and is necessary to interpret it

As I said though, using something more in the message name isn't necessarily a bad idea, it comes with some tradeoffs. Splitting the service name is something that I recommend (as I said above) mostly to help with debugging, and helps me when I'm joining data to get context of a change.

@ylorph
Copy link
Owner Author

ylorph commented Jan 17, 2021

  • UTC Timestamp of occurrence
  • UTC Timestamp of event entry in the log
    => distinct timestamp and might differ greatly when moving events from one storage to the other

@kijanawoodard
Copy link

we've been leaning into

recorded_at
vs
occurred_at

The other time they may vary widely is when "the system was down when we did this in the real world" or some other connectivity issue like "we did this 1000 miles from civilization and recorded it when we returned".

@gregoryyoung
Copy link

gregoryyoung commented Mar 11, 2022

When doing this remember that you want two ways of reading things (possibly three, more on that below).

As-at and as-of

This can be done in a generic way and should apply to all event streams. Basically one is usually just read events forward (not occasionally connected might end up with two like the latter) ... the other is read events forward ... sort ... then return.

Once you have this you will be amazed how many problems fit pretty well into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants