Limit foreign key cross references to the same batch of processed resources #16

6a6d74 · 2014-06-06T17:32:54Z

AndyS suggests that: "The cross reference between files should be limited to files from one publisher - else they are just web links with no guarantee of whether the target of the link exists which 'foreign key' might imply."

This seems like a sensible recommendation - but needs confirmation from the group.

iherman · 2014-06-06T18:24:56Z

On 06 Jun 2014, at 13:32 , Jeremy Tandy notifications@github.com wrote:

R-ForeignKeyReferences

AndyS suggests that: "The cross reference between files should be limited to files from one publisher - else they are just web links with no guarantee of whether the target of the link exists which 'foreign key' might imply."

I agree

Ivan

This seems like a sensible recommendation - but needs confirmation from the group.

—
Reply to this email directly or view it on GitHub.

Ivan Herman
4, rue Beauvallon, clos St Joseph
13090 Aix-en-Provence, France
GPG: 0x343F1A3D
http://www.ivan-herman.net

yakovsh · 2014-06-11T12:21:13Z

I agree as well. Limiting this to one publisher would also help prevent security issues similar to cross site scripting in HTML

JeniT · 2014-10-31T22:04:11Z

We discussed this explicitly at the October 2014 F2F at TPAC and agreed to support "loose linking" between tabular data on the web, ie have references between CSV files without requiring that the link resolves.

My rationale would be that it's very useful from a governance perspective (eg UC-4 where central government dictates a format to local authorities) to enforce that the distributed publishers all point to the same central code list rather than copy it into their own publication space.

I also think that as CSV is a data format there isn't a security issue here, just as there isn't an issue with an HTML file linking to another HTML file. Or we could make reference to encapsulation of the processing, as there is with iframes.

iherman · 2014-11-01T00:26:17Z

To be honest, I do not understand the issue. E.g., what do you mean by "single publisher"?

Ivan

On 31 Oct 2014, at 15:04 , Jeni Tennison notifications@github.com wrote:

We discussed this explicitly at the October 2014 F2F at TPAC and agreed to support "loose linking" between tabular data on the web, ie have references between CSV files without requiring that the link resolves.

My rationale would be that it's very useful from a governance perspective (eg UC-4 where central government dictates a format to local authorities) to enforce that the distributed publishers all point to the same central code list rather than copy it into their own publication space.

I also think that as CSV is a data format there isn't a security issue here, just as there isn't an issue with an HTML file linking to another HTML file. Or we could make reference to encapsulation of the processing, as there is with iframes.

—
Reply to this email directly or view it on GitHub.

Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me

…en different publishers.

JeniT · 2014-11-30T16:31:30Z

@afs comment?

afs · 2014-12-01T10:07:06Z

Loose links (AKA generate a URL, no other contract) is essential.

I have been taking "ForeignKey" to mean URL and a guarantee that the target exists at least within the conversion. It can only be meaningful when a set of files converted together. e.g. "Generate link from cell X to cell Y if cell Y defined". This gives better robustness and conversion checking as well as better data. For example, having links generated that point to known missing items is just moving the pain on to the data consumer. It might be a conversion error and so good to trap early.

That does not require web resolution - it's only a local check. There is no security issue that I can see.

iherman · 2014-12-01T10:21:03Z

On 01 Dec 2014, at 11:07 , Andy Seaborne notifications@github.com wrote:

Loose links (AKA generate a URL, no other contract) is essential.

I have been taking "ForeignKey" to mean URL and a guarantee that the target exists at least within the conversion. It can only be meaningful when a set of files converted together. e.g. "Generate link from cell X to cell Y if cell Y defined". This gives better robustness and conversion checking as well as better data. For example, having links generated that point to known missing items is just moving the pain on to the data consumer. It might be a conversion error and so good to trap early.

That does not require web resolution - it's only a local check. There is no security issue that I can see.

I am not sure I understand the remark, mainly in light of your last sentence. Do you mean that a converter should check the existence of the CSV files that are referred to through foreignKey before doing the generation of the JSON k/v pairs or the RDF triples? If so, why does not that require a web resolution?

Ivan

—
Reply to this email directly or view it on GitHub.

Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

afs · 2014-12-01T10:54:08Z

It does not require web resolution because its all in the same batch for files converted together. In fact, it can't be a web resoution - the target may not have been published yet. It's more like keeping a local map and doing some checking.

iherman · 2014-12-01T11:09:40Z

I am sorry Andy, I still do not understand. The TPAC resolution (see Jeni's comment) is that we can use loose URL-s and that is what foreign keys are used for. You seem to say that we have two different concepts: a restricted foreign keys that have some sort of a a guarantee (although I am not sure how that can be reinforced) and a loose foreign keys. Is that what you mean? How would the processing of these two differ?

afs · 2014-12-01T11:31:51Z

Yes - strong, related conversion is a different use case to the one Jeni presents.

A loose link is not a foreign key as in R-ForeignKeyReferences. The term "foreign key" comes with a lot of baggage from the database world - it is not simply a link. I don't know what you mean by "restricted foreign key"; what is being restricted?

My suggestion was that foreign key, with its additional constraint of target existence, be restricted to files that are published together, hence the constraint can be checked/enforced.

iherman · 2014-12-01T11:43:19Z

On 01 Dec 2014, at 12:31 , Andy Seaborne notifications@github.com wrote:

Yes - strong, related conversion is a different use case to the one Jeni presents.

A loose link is not a foreign key as in R-ForeignKeyReferences. The term "foreign key" comes with a lot of baggage from the database world - it is not simply a link. I don't know what you mean by "restricted foreign key"; what is being restricted?

'restricted' in the sense of what you proposed, ie, that the tables should come from the same publishers.

My suggestion was that foreign key, with its additional constraint of target existence, be restricted to files that are published together, hence the constraint can be checked/enforced.

Hm. O.k., that is clear, thanks. But I am also not sure what it means 'from the same publisher'. Does it mean, operationally, that the CSV URI-s are on the same domain? If not, what else?

Is this restriction required by our use cases?

I personally do not see the necessity for this restriction and differentiation, to be honest. Maybe the term 'foreignKey' should be avoided indeed, because of its connotation in the database world. However, the feature of being able to link to another table, simply through a URI and without any further checks, seems to be useful.

Ivan

—
Reply to this email directly or view it on GitHub.

Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

JeniT · 2015-02-04T13:21:47Z

There might be some differentiation we can make here between valueUrl which provides a loose (unchecked) link, and foreign keys, which provide some guarantees through building maps from the resources about items identified in the data. This needs working through (suggest at the F2F).

gkellogg · 2015-02-04T16:09:26Z

In my implementation, I changed the roles example to use variations on aboutUrl, propertyUrl and valueUrl to create an actual semantic mapping between these tables:

JeniT · 2015-02-13T10:17:43Z

Discussed at Feb F2F. Foreign key references are for validation purposes (there's an error if the referenced value doesn't exist, and it can only be used to link resources in a single table group). URL generation (eg aboutUrl, propertyUrl, valueUrl) provides for unvalidated (weak) links.

Editor action: Make clear the above and that validators should provide the user option to either do strict or lax validation. Converters should provide the option to not to do validation at all.

danbri · 2015-02-13T10:18:15Z

See http://www.w3.org/2015/02/13-csvw-irc#T10-18-09

…added `_column`, `_sourceRow`, and `_sourceColumn` as other variables available when expanding a `URI template property`. This partially addresses issue #16.

fixes #16

Discussed at Feb F2F. Foreign key references are for validation purposes (there's an error if the referenced value doesn't exist, and it can only be used to link resources in a single table group). URL generation (eg `aboutUrl`, `propertyUrl`, `valueUrl`) provides for unvalidated (weak) links.

6a6d74 added the Use Case Document label Jun 6, 2014

JeniT added the Metadata vocabulary document label Oct 31, 2014

gkellogg added a commit that referenced this issue Nov 28, 2014

Reference issue #16 from metadata document for cross-references betwe…

3e6934f

…en different publishers.

JeniT added the Requires telcon discussion/decision label Nov 30, 2014

JeniT changed the title ~~Limit foreign key x-refs to files from a single publisher~~ Limit foreign key cross references to the same batch of processed resources Feb 4, 2015

JeniT added Editorial Resolved and removed Use Case Document Requires telcon discussion/decision labels Feb 13, 2015

gkellogg mentioned this issue Feb 15, 2015

added source & source number properties #207

Merged

JeniT self-assigned this Feb 17, 2015

JeniT pushed a commit that referenced this issue Feb 17, 2015

text & examples for strong & weak linking

3178498

fixes #16

JeniT mentioned this issue Feb 17, 2015

text & examples for strong & weak linking #233

Merged

gkellogg closed this as completed in #233 Feb 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit foreign key cross references to the same batch of processed resources #16

Limit foreign key cross references to the same batch of processed resources #16

6a6d74 commented Jun 6, 2014

iherman commented Jun 6, 2014

yakovsh commented Jun 11, 2014

JeniT commented Oct 31, 2014

iherman commented Nov 1, 2014

JeniT commented Nov 30, 2014

afs commented Dec 1, 2014

iherman commented Dec 1, 2014

afs commented Dec 1, 2014

iherman commented Dec 1, 2014

afs commented Dec 1, 2014

iherman commented Dec 1, 2014

JeniT commented Feb 4, 2015

gkellogg commented Feb 4, 2015

JeniT commented Feb 13, 2015

danbri commented Feb 13, 2015

Limit foreign key cross references to the same batch of processed resources #16

Limit foreign key cross references to the same batch of processed resources #16

Comments

6a6d74 commented Jun 6, 2014

iherman commented Jun 6, 2014

yakovsh commented Jun 11, 2014

JeniT commented Oct 31, 2014

iherman commented Nov 1, 2014

JeniT commented Nov 30, 2014

afs commented Dec 1, 2014

iherman commented Dec 1, 2014

afs commented Dec 1, 2014

iherman commented Dec 1, 2014

afs commented Dec 1, 2014

iherman commented Dec 1, 2014

JeniT commented Feb 4, 2015

gkellogg commented Feb 4, 2015

JeniT commented Feb 13, 2015

danbri commented Feb 13, 2015