Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Visualize validation profile #114

Closed
1 task done
tpluscode opened this issue Oct 31, 2023 · 13 comments · Fixed by #153
Closed
1 task done

Create Visualize validation profile #114

tpluscode opened this issue Oct 31, 2023 · 13 comments · Fixed by #153

Comments

@tpluscode
Copy link
Contributor

tpluscode commented Oct 31, 2023

@tpluscode
Copy link
Contributor Author

@bprusinowski @sosiology

Could we have IXT's feedback on what is required by Visualize?

@bprusinowski
Copy link

bprusinowski commented Jan 18, 2024

Hi @tpluscode, the first set of requirements comes from rdf-cube-view-query library (e.g. things like this).

For the things we keep directly inside our repo, I tried to summarize them below. Btw. the dcterms:creator is not a hard requirement for Visualize, we just don't show the creator tag if it's missing. But of course if would be good to enforce this on the data side 👍

Cube

Required properties are listed below; if one's missing the application will throw an error.

  • localized schema:name
  • schema.creativeWorkStatus: ns.adminVocabulary("CreativeWorkStatus/Published") for published cube, otherwise cube is treated as a draft

Dimensions

  • Each dimension should have only one dataType
  • dataType can't be equal to ns.rdf.langString
  • For TemporalDimensions (dim.out(ns.cube("meta/dataKind")).out(ns.rdf.type).term === ns.time.GeneralDateTimeDescription) we need to have timeFormat: dimension.datatype is in
const timeFormats = new Map<string, string>([
  [ns.xsd.gYear.value, "%Y"],
  [ns.xsd.gYearMonth.value, "%Y-%m"],
  [ns.xsd.date.value, "%Y-%m-%d"],
  [ns.xsd.dateTime.value, "%Y-%m-%dT%H:%M:%S"],
]);

or timeUnit (dim.out(ns.cube("meta/dataKind")).out(ns.time.unitType).term) defined.

I think these are all the "hard requirements" related to data fetching that would make the application crash (except of Each dimension should have only one dataType - here we just take the first one, which might lead to some problems down the road).

I didn't include recommendations on what "should" be included in a cube / dimension to make Visualize happy. Should I also list these properties that can be missing, but are displayed somewhere if present (e.g. creators, contact points, landing page, etc)?

cc @adintegra @sosiology

@Rdataflow
Copy link
Contributor

Rdataflow commented Jan 18, 2024

@bprusinowski can add some words on how you treat timezones? i.e.

  • 2024-01-18T15:24:07Z
  • 2024-01-18T14:24:07+01:00
    at least in the cube dateModified you have things like "2022-09-21T08:29:55.550Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>

@bprusinowski
Copy link

@Rdataflow when it comes to this particular property, we pass such date string to a JS Date constructor – it's handled in a way described here.

Other dates, coming from e.g. temporal dimension values, are formatted using https://d3js.org/d3-time-format#timeParse, taking the timezone into account.

@Rdataflow
Copy link
Contributor

@bprusinowski thank you for sharing this amendment on time and timezones 👍

@Rdataflow
Copy link
Contributor

Rdataflow commented Jan 18, 2024

@bprusinowski can you explain more about the implicit requirements of visualize: i.e.

and it would be interesting to also share recommendations (those could lead to messages of severety Warning): i.e.

  • if list length is less than 5000 values should be made available in <dimX> sh:in (...)
  • have <dim> sh:{minInclusive,maxInclusive} "min|max" for all numerical datatypes
  • Nominal Lists are recommended to have schema:identifier for sorting
  • ... and what more

see also #53

@bprusinowski
Copy link

@Rdataflow I will try to expand on the mentioned points:

  • dimension's nodeKind is not required, but should be there (if ns.sh.nodeKind === ns.sh.Literal dimension is treated as literal dimension, otherwise as namedNode dimension). If nodeKind is absent, we treat dimension as Literal dimension,
  • dimension's scaleType is not required, but should be there so we can properly categorize dimension type (ns.qudt.NominalScale, ns.qudt.OrdinalScale, ns.qudt.RatioScale or ns.qudt.IntervalScale),
  • dimension's sh:in property should be specified to avoid sending additional queries to fetch available dimension values,
  • dimension should have:
    • localized ns.schema.name,
    • localized ns.schema.description,
    • ns.rdf.type === ns.cube.KeyDimension if it's a key dimension,
    • ns.rdf.type === ns.cube.MeasureDimension if it's a measure,
    • optionally have ns.qudt.hasUnit property,
    • ns.sh.order property to sort the dimensions e.g. in table preview,
  • dimension values should have:
    • ns.schema.identifier if dimension's ns.qudt.scaleType === (ns.qudt.NominalScale || ns.qudt.OrdinalScale),
    • ns.schema.position if dimension's ns.qudt.scaleType=== ns.qudt.OrdinalScale,
    • could have ns.schema.color if dimension's ns.qudt.scaleType === (ns.qudt.NominalScale || ns.qudt.OrdinalScale),
    • ns.schema.alternateName if user wants to use abbreviations.

I hope this is more helpful 🤞 Generally, our logic to parse things can be found below, maybe this will be easier to follow:

@Rdataflow
Copy link
Contributor

@bprusinowski how many dcat:landigPage can a cube have max?

cc @dbaeder

@bprusinowski
Copy link

@Rdataflow thanks for raising this point – we currently require either zero or one, non-localized string or more than one, localized strings (so we can filter them down to one entry per language) – see https://github.com/visualize-admin/visualization-tool/blob/2aa07b96db9a057281bef3cad2b7d08c49c5e1a7/app/rdf/query-cube-metadata.ts#L143-L146

@kronmar
Copy link
Contributor

kronmar commented Feb 5, 2024

Found an interesting edge-case: https://s.zazuko.com/3C25xc2

We have here a cube, with a dimension with 13 distinct hierarchies parallel to each other. Using cat cube-constraint.ttl | npx barnard59 cube check-metadata --profile https://cube.link/latest/shape/standalone-constraint-constraint returns no errors.

Nevertheless, Visualize can't handle this case:
image

The visualization issue can be fixed rather easily by giving each hierarchy a distinct name. Would it be possible to catch this issue in the standalone-constraint-constraint.ttl?

cc @Rdataflow @tpluscode

@tpluscode
Copy link
Contributor Author

@bprusinowski I started the profile for visualize in #141

I applied the rules for sh:datatype and temporal dimensions. The other point may be added as more warnings but we'd need better support for that first.

@tpluscode
Copy link
Contributor Author

About, time:unitType. The spec allows time:unitYear, time:unitMonth, time:unitWeek, time:unitDay, time:unitHour, time:unitMinute and time:unitSecond. Are all those supported?

@bprusinowski
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants