Step hooks #111

tpluscode · 2023-04-26T11:56:32Z

As discussed with @cristianvasquez @giacomociti @mchlrch, we propose that instead of implementing specialized monitoring steps, there should be a built-in feature to allow observing existing steps in a pipeline.

_Originally posted by @tpluscode in #121

tpluscode · 2023-04-26T12:24:12Z

For starters, I propose to all each step to be extended with at least two extension points:

after or flush
onChunk
onError

The first one already has an issue: #71

The second could be defined in a similar fashion, by adding function(s) to a step. Those functions would be called for every chunk, similarly to a PassThrough step but without accessing the underlying stream.

Here's a step which produces the final set of triples. The onChunk hook is implemented by a hypothetical hook which counts the passing quads and reports the total when pipeline finishes

<#serialize>
  a :Step ;
  code:implementedBy [
    a code:EcmaScriptModule ;
    code:link <node:barnard59-formats/ntriples.js#serialize> ;
  ] ;
  :hook [ 
    code:implementedBy [
      a code:EcmaScriptModule ;
      code:link <file:monitoring.js#counter> ;
    ] ;
  ] ;
.

// monitoring.js

// code:arguments can be declared in pipeline definition
export function counter(...args) {
  // the pipeline context can be accessed as in steps
  const { logger } = this

  let total = 0

  // a map of hooks is returned to allow them to access shared state
  // and we might add more hooks in the future
  // all hooks optional
  return {
    data(quad) {
      total++
    },
    flush() {
      logger.info(`Total quads processed: ${total}`)
    }
  }
}

cristianvasquez · 2023-04-26T12:31:00Z

I think I understand the spirit of the hooks. But wouldn't it be better that they were implicit? (not to be added to the code).

The ideal would be to observe a node of an arbitrary pipeline running

tpluscode · 2023-04-26T12:39:45Z

Not sure I understand @cristianvasquez. It makes sense to attach observability to every single step, if that's what you are proposing. You'd make a choice to count at specific points. For example, in museum we process API objects and convert them to RDF so might count the number of input objects and then the final number of quads.

And how do without code? Even monitoring could be implemented in multiple ways, such as otel

ludovicm67 · 2023-04-26T12:56:56Z

I like the idea of having such kind of hooks, because we can do what we want, and it will help a lot for debugging, like printing quads, generate some metrics, and so on.

cristianvasquez · 2023-04-26T13:21:36Z

And how do without code? Even monitoring could be implemented in multiple ways, such as otel

Yes, there can be many different implementations, with their design choices.

My point was related to the choice of adding more triples to the pipeline steps or not, especially if those triples are not part of the business logic.

Choice: If one has a UI, where one can click nodes, then a virtual hook is added to show information in a box.

Example UI idea in VSCode: https://hackmd.io/@KhLoxKJzSyWQgXHIpI2qdQ/BycIGrTv5 (outdated)

Choice: If one does not use a UI, one can imagine having a turtle file where one adds by hand the triples for the hook, and then run the pipeline each time from the command line.

less is better :)

ludovicm67 · 2023-04-26T13:30:08Z

I like the idea to have something that we can extend the way we want, and this for the steps we want.

In the way @tpluscode suggested, it's easy to do any logic we want, we are not stuck to a limited thing.

Always be explicit to avoid confusion, and make sure that we directly know what is happening.
Explicitly writing hooks to specific steps is great for this.

Maybe providing a default collection of hooks can be nice (printing quads that matches a pattern, generating metrics, …).

cristianvasquez · 2023-04-26T13:31:20Z

Perhaps they can be run through the cli

> barnard59 run -v --pipeline=urn:pipeline -log=urn:serialize

ludovicm67 · 2023-04-26T15:54:12Z

But how do you handle custom logic (debug logs, metrics export, …) for a specific step that way?

cristianvasquez · 2023-04-26T16:53:23Z

The Barnard runner reads the turtle file and composes the pipeline in memory, at that moment it might add also the hooks @tpluscode mentions, being 'dummy' at the start. At runtime, logic can be injected into such hooks, to do the counting, logging etc.

I used to do debugging in that way when I started using the barnard pipelines...

tpluscode · 2023-04-27T06:15:02Z

The Barnard runner reads the turtle file

So there is one or multiple such pipeline files? For a moment I thought that you have in mind the proposal #93 where you could have multiple sources and use the CLI to combine them.

barnard59 run --source main.ttl --source console-debug.ttl
barnard59 run --source main.ttl --source otel.ttl

The second would extend the main graph with hooks at the desired steps

ludovicm67 · 2023-04-27T06:33:24Z

Can be an option also!

So as an example, this will be:

# main.ttl
<#serialize>
  a :Step ;
  code:implementedBy [
    a code:EcmaScriptModule ;
    code:link <node:barnard59-formats/ntriples.js#serialize> ;
  ] ;

# otel.ttl
<#serialize>
  :hook [ 
    code:implementedBy [
      a code:EcmaScriptModule ;
      code:link <file:monitoring.js#counter> ;
    ] ;
  ] ;

and running them using:

barnard59 run --source main.ttl --source otel.ttl

?

If so, this is also OK for me.

Is the example valid? If not, can you correct me?

tpluscode · 2023-04-27T10:40:30Z

The --source option does not exist now. We can only load from a single source

ludovicm67 · 2023-04-27T11:01:20Z

But once it's added, would my example work?

tpluscode · 2023-04-27T11:13:25Z

Yes, that's how I'd see it

tpluscode · 2023-07-20T10:45:04Z

I added an onError hook, which could potentially be used for logging and retries (re #94)

tpluscode · 2023-11-20T15:10:16Z

To add to this subject, I just had this idea of custom hooks to be defined by step implementors. For example, the filter operation could offer an optional hook to execute when a chunk has been filtered

Here in new syntax

[ 
  op:base\/filter ( "({ AnzahlRecords }) => Number(AnzahlRecords) > 0"^^code:EcmaScript ) ;
  p:onFiltered [ 
    code:implementedBy """
      function (chunk) {
        this.logger.info(`Skipping ${chunk.ExportFileName} because it is empty`)
      }
    """^^code:EcmaScript
  ]
]

tpluscode transferred this issue from zazuko/barnard59-core Jun 20, 2023

tpluscode mentioned this issue Jun 21, 2023

Monitoring step zazuko/barnard59-base#28

Closed

tpluscode added the 🎯 core label Aug 30, 2023

tpluscode mentioned this issue Aug 30, 2023

Debugging pipeline steps #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step hooks #111

Step hooks #111

tpluscode commented Apr 26, 2023 •

edited

Loading

tpluscode commented Apr 26, 2023 •

edited

Loading

cristianvasquez commented Apr 26, 2023

tpluscode commented Apr 26, 2023

ludovicm67 commented Apr 26, 2023

cristianvasquez commented Apr 26, 2023

ludovicm67 commented Apr 26, 2023

cristianvasquez commented Apr 26, 2023

ludovicm67 commented Apr 26, 2023

cristianvasquez commented Apr 26, 2023

tpluscode commented Apr 27, 2023

ludovicm67 commented Apr 27, 2023

tpluscode commented Apr 27, 2023

ludovicm67 commented Apr 27, 2023

tpluscode commented Apr 27, 2023

tpluscode commented Jul 20, 2023

tpluscode commented Nov 20, 2023 •

edited

Loading

Step hooks #111

Step hooks #111

Comments

tpluscode commented Apr 26, 2023 • edited Loading

tpluscode commented Apr 26, 2023 • edited Loading

cristianvasquez commented Apr 26, 2023

tpluscode commented Apr 26, 2023

ludovicm67 commented Apr 26, 2023

cristianvasquez commented Apr 26, 2023

ludovicm67 commented Apr 26, 2023

cristianvasquez commented Apr 26, 2023

ludovicm67 commented Apr 26, 2023

cristianvasquez commented Apr 26, 2023

tpluscode commented Apr 27, 2023

ludovicm67 commented Apr 27, 2023

tpluscode commented Apr 27, 2023

ludovicm67 commented Apr 27, 2023

tpluscode commented Apr 27, 2023

tpluscode commented Jul 20, 2023

tpluscode commented Nov 20, 2023 • edited Loading

tpluscode commented Apr 26, 2023 •

edited

Loading

tpluscode commented Apr 26, 2023 •

edited

Loading

tpluscode commented Nov 20, 2023 •

edited

Loading