-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Step hooks #111
Comments
For starters, I propose to all each step to be extended with at least two extension points:
The first one already has an issue: #71 The second could be defined in a similar fashion, by adding function(s) to a step. Those functions would be called for every chunk, similarly to a Here's a step which produces the final set of triples. The <#serialize>
a :Step ;
code:implementedBy [
a code:EcmaScriptModule ;
code:link <node:barnard59-formats/ntriples.js#serialize> ;
] ;
:hook [
code:implementedBy [
a code:EcmaScriptModule ;
code:link <file:monitoring.js#counter> ;
] ;
] ;
. // monitoring.js
// code:arguments can be declared in pipeline definition
export function counter(...args) {
// the pipeline context can be accessed as in steps
const { logger } = this
let total = 0
// a map of hooks is returned to allow them to access shared state
// and we might add more hooks in the future
// all hooks optional
return {
data(quad) {
total++
},
flush() {
logger.info(`Total quads processed: ${total}`)
}
}
} |
I think I understand the spirit of the hooks. But wouldn't it be better that they were implicit? (not to be added to the code). The ideal would be to observe a node of an arbitrary pipeline running |
Not sure I understand @cristianvasquez. It makes sense to attach observability to every single step, if that's what you are proposing. You'd make a choice to count at specific points. For example, in museum we process API objects and convert them to RDF so might count the number of input objects and then the final number of quads. And how do without code? Even monitoring could be implemented in multiple ways, such as otel |
I like the idea of having such kind of hooks, because we can do what we want, and it will help a lot for debugging, like printing quads, generate some metrics, and so on. |
Yes, there can be many different implementations, with their design choices. My point was related to the choice of adding more triples to the pipeline steps or not, especially if those triples are not part of the business logic. Choice: If one has a UI, where one can click nodes, then a virtual hook is added to show information in a box. Example UI idea in VSCode: https://hackmd.io/@KhLoxKJzSyWQgXHIpI2qdQ/BycIGrTv5 (outdated) Choice: If one does not use a UI, one can imagine having a turtle file where one adds by hand the triples for the hook, and then run the pipeline each time from the command line. less is better :) |
I like the idea to have something that we can extend the way we want, and this for the steps we want. In the way @tpluscode suggested, it's easy to do any logic we want, we are not stuck to a limited thing. Always be explicit to avoid confusion, and make sure that we directly know what is happening. Maybe providing a default collection of hooks can be nice (printing quads that matches a pattern, generating metrics, …). |
Perhaps they can be run through the cli
|
But how do you handle custom logic (debug logs, metrics export, …) for a specific step that way? |
The Barnard runner reads the turtle file and composes the pipeline in memory, at that moment it might add also the hooks @tpluscode mentions, being 'dummy' at the start. At runtime, logic can be injected into such hooks, to do the counting, logging etc. I used to do debugging in that way when I started using the barnard pipelines... |
So there is one or multiple such pipeline files? For a moment I thought that you have in mind the proposal #93 where you could have multiple sources and use the CLI to combine them.
The second would extend the main graph with hooks at the desired steps |
Can be an option also! So as an example, this will be: # main.ttl
<#serialize>
a :Step ;
code:implementedBy [
a code:EcmaScriptModule ;
code:link <node:barnard59-formats/ntriples.js#serialize> ;
] ; # otel.ttl
<#serialize>
:hook [
code:implementedBy [
a code:EcmaScriptModule ;
code:link <file:monitoring.js#counter> ;
] ;
] ; and running them using: barnard59 run --source main.ttl --source otel.ttl ? If so, this is also OK for me. Is the example valid? If not, can you correct me? |
The |
But once it's added, would my example work? |
Yes, that's how I'd see it |
I added an |
To add to this subject, I just had this idea of custom hooks to be defined by step implementors. For example, the filter operation could offer an optional hook to execute when a chunk has been filtered Here in new syntax [
op:base\/filter ( "({ AnzahlRecords }) => Number(AnzahlRecords) > 0"^^code:EcmaScript ) ;
p:onFiltered [
code:implementedBy """
function (chunk) {
this.logger.info(`Skipping ${chunk.ExportFileName} because it is empty`)
}
"""^^code:EcmaScript
]
] |
_Originally posted by @tpluscode in #121
The text was updated successfully, but these errors were encountered: