Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata to specification schema. #448

Closed
jheer opened this issue Nov 24, 2015 · 13 comments
Closed

Add metadata to specification schema. #448

jheer opened this issue Nov 24, 2015 · 13 comments
Labels
enhancement For enhancement of existing features question For general questions and clarifications

Comments

@jheer
Copy link
Member

jheer commented Nov 24, 2015

(Moving an offline discussion among @jheer, @arvind and @nyurik here for public sharing and archiving.)

Questions:

  • Should Vega/Vega-Lite specifications include additional metadata fields? Such fields might include content-type (Vega or Vega-Lite), expected version numbers, or more.
  • Should such metadata be part of a Vega or Vega-Lite specification directly, or should it be included as part of a shared meta-schema (most likely using the vega-embed schema)? The answer will in part depend on envisioned use cases, but one advantage of the meta-schema approach is that it would also accommodate non-JSON (e.g., pure string) encoding formats, such as the vega-lite shorthand.

Thoughts?

@jheer jheer added enhancement For enhancement of existing features question For general questions and clarifications labels Nov 24, 2015
@nyurik
Copy link
Member

nyurik commented Nov 24, 2015

  • Version clearly belongs within the graph specification, e.g. { "version": 2, ... }, and I think it should be the very first value in all examples and documentation (unenforceable, but a good practice). As for the version number format - I would keep it a simple integer for now - makes version check much easier, and forces us to a more linear incremental changes.
  • ContentType has traditionally been stored outside of the documents, e.g. file extension, MIME, content-type header. Usually document format designers added some identifiers to confirm the format, e.g. first few bytes of a GIF image or a ZIP file contain a magic value. With introduction of JSON as a generic content storage, we could move the content type inside the document, but at the cost of not using non-JSON storages, or wrapping it. Vega-lite-shorthand (VLS) would than have to be wrapped, e.g. {"contenttype":"vegaliteshorthand", "data": "......"}. I don't know if we should store VLS version inside the data or outside of it. So I am inclined to vote for content type being outside.
  • In Wikipedia, I would implement an attribute identifying the graph type. This will allow other, non-Vega graph formats, such as a number of our legacy formats.
<graph type="vegalite"> { ...vegalite...graph...spec... } </graph>
  • For interactive graphs with parameters, we will need to figure out the bindings to other page elements. Not yet sure on this.

@kanitw
Copy link
Member

kanitw commented Nov 24, 2015

I would keep it a simple integer for now

We use semantic versioning format for version name, so we probably want to keep that.

Vega-Lite Shorthand

Note that while we plan to release Vega-Lite 1.0 soon, we do not plan to include Vega-Lite Shorthand in the official release. Vega-Lite shorthand is subject to further revision and will be released as a separate project. But once VLS is released, it makes sense to support it in vega-embed.

@jheer
Copy link
Member Author

jheer commented Nov 24, 2015

Regarding the version number and semantic versioning, I think a "major.minor" format is probably needed. Following semver, the specification schema may add new features as part of a minor version. Non-backwards-compatible changes would require a major version change. I don't see a compelling reason to include patch numbers, which should not change the feature footprint of the specification schema.

So far, it sounds like the consensus is: (1) add "version" as a supported (and strongly encouraged) JSON property of both Vega and Vega-Lite JSON specs, (2) include support for "version" and "content-type" (or similar) within the vega-embed meta-specification. If any one has counter-proposals, please post!

@arvind
Copy link
Member

arvind commented Nov 24, 2015

I like the idea of including version numbers as part of Vega/Vega-Lite specifications, and a content-type in vega-embed. However, the expected behavior of these keys is unclear to me. If Vega/Vega-Lite is asked to parse specifications that include a version number greater than the current library, should an error/warning be thrown? If version is included in the vega-embed specification, should vega-embed be responsible for fetching the appropriate version and use that to parse?

I don't see a compelling reason to include patch numbers, which should not change the feature footprint of the specification schema.

My only concern with this is that for the past couple of releases, we have been including breaking changes with minor version bumps. So our take on semantic versioning, thus far, has been: "major release" = major new features (e.g., declarative interaction design + streaming data) + major breaking changes; "minor release" = minor new features (e.g., new transforms) + minor breaking changes (e.g., renaming bin to bin_start, and now adding the dimension signals); "patch release" = bug fixes.

Moving forward, here are some options I see:

  • We transition to proper semantic versioning. Thus, any breaking change requires bumping the major version. The disadvantage of this approach is that our major versions will likely inflate quite rapidly, which makes messaging major new features more difficult.
  • We maintain our current versioning structure. The version number included in specifications includes the patch version. Vendors can interpret that as the equivalent of the tilde (~) in package.json -- i.e., "version": "2.3.2" means that it will match all 2.3.x versions, but not 2.4.0 onwards.

@nyurik
Copy link
Member

nyurik commented Nov 24, 2015

I think we should not confuse code versioning with data format versioning. Code semantic versioning is useful because code is a blackbox for the caller - so as long as the api is the same and it does the same thing for the old way of calling it, the new version will continue to work. With data, there is no API - parser must know every aspect of the data, or else it will make a mistake. There are very few situations when major/minor version distinction would make a difference for the parsing code. For example, if we add an extra field to the spec, unless parser understands it, it will produce an incorrect graph without throwing an error. Same if the meaning of a field changes, or if we allow multiple field types, e.g. both "object or a string" for a value instead of "string only". The only time when parser might still produce a correct graph without knowing of the change beforehand is if we remove (disallow) a specific field or a value - because than parser would simply not use its code to handle the removed field, and even then it might break. So in short, either the given parser version knows about the specific data version, or it doesn't, and should throw an error. Hence, lets simplify and use a number. Besides, in JSON, there is no difference between 1 and 1.0, so if we really want to introduce it later on, we could.

@kanitw
Copy link
Member

kanitw commented May 21, 2016

Another option would be utilizing the standard $schema property of JSON and point to schema online. This way we have to publish schemas for all releases.

@domoritz
Copy link
Member

domoritz commented Dec 6, 2016

I second @kanitw's suggestion. What are your thoughts on using $schema for vega 3 and vega-lite 2?

Vega

{
  "$schema": "https://vega.github.io/vega/v3.0/schema.json",
  "width": 400,
  "height": 200,
  "padding": 5,

  "data": [
    {
      "name": "table",
      "values": [
        {"u": 1,  "v": 28}, {"u": 2,  "v": 55},
        {"u": 3,  "v": 43}, {"u": 4,  "v": 91},
        {"u": 5,  "v": 81}, {"u": 6,  "v": 53},
        {"u": 7,  "v": 19}, {"u": 8,  "v": 87},
        {"u": 9,  "v": 52}, {"u": 10, "v": 48},
        {"u": 11, "v": 24}, {"u": 12, "v": 49},
        {"u": 13, "v": 87}, {"u": 14, "v": 66},
        {"u": 15, "v": 17}, {"u": 16, "v": 27},
        {"u": 17, "v": 68}, {"u": 18, "v": 16},
        {"u": 19, "v": 49}, {"u": 20, "v": 15}
      ]
    }
  ],

  "scales": [
    {
      "name": "xscale",
      "type": "band",
      "range": "width",
      "domain": {"data": "table", "field": "u"}
    },
    {
      "name": "yscale",
      "type": "linear",
      "range": "height",
      "domain": {"data": "table", "field": "v"},
      "zero": true,
      "nice": true
    }
  ],

  "axes": [
    {"orient": "bottom", "scale": "xscale"},
    {"orient": "left", "scale": "yscale"}
  ],

  "marks": [
    {
      "type": "rect",
      "from": {"data": "table"},
      "encode": {
        "enter": {
          "x": {"scale": "xscale", "field": "u", "offset": 1},
          "width": {"scale": "xscale", "band": 1, "offset": -1},
          "y": {"scale": "yscale", "field": "v"},
          "y2": {"scale": "yscale", "value": 0}
        },
        "update": {
          "fill": {"value": "steelblue"}
        },
        "hover": {
          "fill": {"value": "red"}
        }
      }
    }
  ]
}

Vega-Lite

{
  "$schema": "https://vega.github.io/vega-lite/v2.0/schema.json",
  "description": "A simple bar chart with embedded data.",
  "data": {
    "values": [
      {"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43},
      {"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53},
      {"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52}
    ]
  },
  "mark": "bar",
  "encoding": {
    "x": {"field": "a", "type": "ordinal"},
    "y": {"field": "b", "type": "quantitative"}
  }
}

The exact URLs are not set in stone but I imagine that we could have a registry of versioned schemas that also supports semantic versioning (redirecting from v2.0 -> v2.0.9 or whatever the latest version is).

@domoritz
Copy link
Member

https://github.com/vega/schema has the schemas. We have to set up some infrastructure to automatically copy schema files but we are good to go to use $schema in vega 3 and vega-lite 2.

domoritz added a commit to vega/vega-parser that referenced this issue Jan 15, 2017
domoritz added a commit that referenced this issue Jan 15, 2017
@domoritz domoritz mentioned this issue Jan 15, 2017
domoritz added a commit to vega/vega-lite that referenced this issue Jan 15, 2017
kanitw pushed a commit to vega/vega-lite that referenced this issue Jan 15, 2017
@jheer jheer closed this as completed Jan 15, 2017
@domoritz
Copy link
Member

I've released a little helper function to get the version and library from a schema at https://www.npmjs.com/package/vega-schema-url-parser.

@ericsoco
Copy link
Contributor

This conversation is long-closed, but...any thoughts on adding a user-defined stub for arbitrary metadata to the spec? E.g. a top-level metadata property. The Vega spec offers description, but this accepts only a string; it would be useful to have an object in the spec to support more metadata.

My use case as one example: I'm converting an existing, proprietary schema to Vega, and there is information in the old schema that the application using these charts needs regardless of how the charts are specified or rendered. Currently, I'm able to use the description field but I can imagine that becoming a limitation fairly soon.

@domoritz
Copy link
Member

@ericsoco This should be an easy addition to the schema. I support the idea but only if we do it in Vega and Vega-Lite. If everyone agrees, could you send us a PR?

@ericsoco
Copy link
Contributor

Certainly. Let me know when I have the greenlight, will follow this conversation...

@domoritz
Copy link
Member

@ericsoco #1061

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement For enhancement of existing features question For general questions and clarifications
Projects
None yet
Development

No branches or pull requests

6 participants