Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set sh:minCount 1 for sh:nodeKind to prevent mixed nodeKinds #109

Closed
Rdataflow opened this issue Oct 19, 2023 · 8 comments
Closed

set sh:minCount 1 for sh:nodeKind to prevent mixed nodeKinds #109

Rdataflow opened this issue Oct 19, 2023 · 8 comments

Comments

@Rdataflow
Copy link
Contributor

see @l00mi remark to prevent mixed nodeKinds
https://zulip.zazuko.com/#narrow/stream/40-bafu-ext/topic/foag.3A.20filtering.20dates/near/370780

@giacomociti
Copy link
Contributor

We should consider using shapes to validate shapes mentioned in the SHACL specs, which includes:

sh:property [
  sh:path sh:nodeKind ;
  sh:in ( sh:BlankNode sh:IRI sh:Literal sh:BlankNodeOrIRI sh:BlankNodeOrLiteral sh:IRIOrLiteral ) ;	# nodeKind-in
  sh:maxCount 1 ;                 # nodeKind-maxCount
] ;

Among others, there are shapes to validates lists (and in cube link we have something very similar).

These shacl-shacl shapes are available here

@t0b3
Copy link

t0b3 commented Oct 20, 2023

@giacomociti yes of course with shacl-shacl (as it's done already in the cube-constraint-constraint.ttl) 👍

based on your snippet we need to consider

  • omit the mixed ones as that's what we need to prevent. thus sh:in ( sh:BlankNode sh:IRI sh:Literal ) ; w/o mixed ones...
  • add sh:minCount 1 ; to enforce presence of nodeKind which itself enforces the (non-mixed) nodeKind of the observations. `

@giacomociti
Copy link
Contributor

so I understand our requirement is a little stronger than the basic consistency constraint provided by shacl-shacl (maybe I commented on the wrong issue because the constraints asked for in #105 instead are directly covered by shacl-shacl).

Currently, we have a constraint on sh:nodeKind within a sh:or condition: we require either a node kind or a data type (or multiple data types within another sh:or).

Maybe we could be even more precise and require either a literal node kind with some data type or an IRI node kind :

    sh:property [
        sh:message "sh:nodeKind needs to be either sh:IRI or sh:Literal with some sh:datatype" ;
        sh:or(
            [
                sh:path sh:nodeKind;
                sh:hasValue sh:IRI;
            ]
            [
                sh:and(
                    [
                        sh:path sh:nodeKind;
                        sh:hasValue sh:Literal ;
                    ]
                    sh:node <datatype> 
                )
            ]
        );
    ] ;

where the <datatype> shape requires some data type (possibly more than one in sh:or)

@Rdataflow
Copy link
Contributor Author

@giacomociti yes this approach looks promising 👍
not sure on the details: would we need minCount etc.? you have the expertise in this field and you got the main point we need to ensure 😄

nb: I just looked at shsh. in addition to our specific needs (above) this could serve to ensure the whole set of generic shacl conformity. thus we may benefit by adding this to the list of validations to check our cube:Constraints right in here https://cube.link/#the-integrity-of-the-constraints

@tpluscode
Copy link
Contributor

in addition to our specific needs (above) this could serve to ensure the whole set of generic shacl conformity

We discussed this and I see it two-fold. Cube Creator would not necessarily need to use shacl-shacl because the core of cube shapes are generated by a reusable pipeline step. Thus, the step code should be tested so that we ensure it always produces valid shacl.

On the other hand, data producers who do not use Cube Creator would benefit from a profile which includes shsh, or an explicit validation provided by a CLI (re zazuko/barnard59#187) to check their shapes against shsh in addition to cube.link rules.

@Rdataflow
Copy link
Contributor Author

Rdataflow commented Oct 20, 2023

@tpluscode sure if you already thoroughly tested cc code to fulfill shsh then there will be no need to test this twice 💯

@tpluscode
Copy link
Contributor

More tests never hurt anyone 😎

@Rdataflow
Copy link
Contributor Author

Closed by 7e710e3 which requires nodeKind ( sh:IRI sh:Literal )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants