You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We fail because the deserialization of the datatype fails, because the JSON encoding of datatypes is supposedly not allowed to contain capital letters: it should match some regex:
This regex does not make much sense to me because it fails even for some datatypes that Wikidata-Toolkit is already aware of, such as commonsMedia.
I see various ways forward:
Add the localMedia datatype to WDTK, just like 700: add edtf DatatypeIdValue #701 did it for EDTF. While that solves this exact issue, we are not solving the underlying problem and the code will fail again, should any other such datatype be introduced in the future. Datatypes are extensible so we cannot assume we know of all the datatypes in use in Wikibase instances.
Allowing capital letters in the regex that validates JSON datatypes. While that also works, this breaks the current assumption that the two transformations between JSON datatypes and datatype URIs are inverse of each other: localMedia (JSON datatype) would be converted to http://wikiba.se/ontology#LocalMedia by DatatypeIdImpl.getDatatypeIriFromJsonDatatype, and then that value would be converted back to local-media (not localMedia) by DatatypeIdImpl.getJsonDatatypeFromDatatypeIri. Because WDTK stores datatypes internally as URIs and not as JSON strings, this means that serializing back the property above will result in an invalid datatype. I would be tempted to have WDTK store the original JSON datatype as well, to make sure it is preserved.
The text was updated successfully, but these errors were encountered:
wetneb
added a commit
to wetneb/Wikidata-Toolkit
that referenced
this issue
Aug 9, 2022
Because the translation between the IRI and JSON datatypes is unreliable, we need
to remember the JSON datatype as well as the IRI, so that we are able to reliably
deserialize and re-serialize properties with custom datatypes.
Because the translation between the IRI and JSON datatypes is unreliable, we need
to remember the JSON datatype as well as the IRI, so that we are able to reliably
deserialize and re-serialize properties with custom datatypes.
When trying to deserialize a property with
localMedia
dataype, such as this one:We fail because the deserialization of the datatype fails, because the JSON encoding of datatypes is supposedly not allowed to contain capital letters: it should match some regex:
Wikidata-Toolkit/wdtk-datamodel/src/main/java/org/wikidata/wdtk/datamodel/implementation/DatatypeIdImpl.java
Line 110 in e99f196
This regex does not make much sense to me because it fails even for some datatypes that Wikidata-Toolkit is already aware of, such as
commonsMedia
.I see various ways forward:
localMedia
datatype to WDTK, just like 700: add edtf DatatypeIdValue #701 did it for EDTF. While that solves this exact issue, we are not solving the underlying problem and the code will fail again, should any other such datatype be introduced in the future. Datatypes are extensible so we cannot assume we know of all the datatypes in use in Wikibase instances.localMedia
(JSON datatype) would be converted tohttp://wikiba.se/ontology#LocalMedia
byDatatypeIdImpl.getDatatypeIriFromJsonDatatype
, and then that value would be converted back tolocal-media
(notlocalMedia
) byDatatypeIdImpl.getJsonDatatypeFromDatatypeIri
. Because WDTK stores datatypes internally as URIs and not as JSON strings, this means that serializing back the property above will result in an invalid datatype. I would be tempted to have WDTK store the original JSON datatype as well, to make sure it is preserved.The text was updated successfully, but these errors were encountered: