Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit foreign key cross references to the same batch of processed resources #16

Closed
6a6d74 opened this issue Jun 6, 2014 · 15 comments · Fixed by #233
Closed

Limit foreign key cross references to the same batch of processed resources #16

6a6d74 opened this issue Jun 6, 2014 · 15 comments · Fixed by #233

Comments

@6a6d74
Copy link
Contributor

6a6d74 commented Jun 6, 2014

R-ForeignKeyReferences

AndyS suggests that: "The cross reference between files should be limited to files from one publisher - else they are just web links with no guarantee of whether the target of the link exists which 'foreign key' might imply."

This seems like a sensible recommendation - but needs confirmation from the group.

@iherman
Copy link
Member

iherman commented Jun 6, 2014

On 06 Jun 2014, at 13:32 , Jeremy Tandy notifications@github.com wrote:

R-ForeignKeyReferences

AndyS suggests that: "The cross reference between files should be limited to files from one publisher - else they are just web links with no guarantee of whether the target of the link exists which 'foreign key' might imply."

I agree

Ivan

This seems like a sensible recommendation - but needs confirmation from the group.


Reply to this email directly or view it on GitHub.


Ivan Herman
4, rue Beauvallon, clos St Joseph
13090 Aix-en-Provence, France
GPG: 0x343F1A3D
http://www.ivan-herman.net

@yakovsh
Copy link
Member

yakovsh commented Jun 11, 2014

I agree as well. Limiting this to one publisher would also help prevent security issues similar to cross site scripting in HTML

@JeniT
Copy link

JeniT commented Oct 31, 2014

We discussed this explicitly at the October 2014 F2F at TPAC and agreed to support "loose linking" between tabular data on the web, ie have references between CSV files without requiring that the link resolves.

My rationale would be that it's very useful from a governance perspective (eg UC-4 where central government dictates a format to local authorities) to enforce that the distributed publishers all point to the same central code list rather than copy it into their own publication space.

I also think that as CSV is a data format there isn't a security issue here, just as there isn't an issue with an HTML file linking to another HTML file. Or we could make reference to encapsulation of the processing, as there is with iframes.

@iherman
Copy link
Member

iherman commented Nov 1, 2014

To be honest, I do not understand the issue. E.g., what do you mean by "single publisher"?

Ivan

On 31 Oct 2014, at 15:04 , Jeni Tennison notifications@github.com wrote:

We discussed this explicitly at the October 2014 F2F at TPAC and agreed to support "loose linking" between tabular data on the web, ie have references between CSV files without requiring that the link resolves.

My rationale would be that it's very useful from a governance perspective (eg UC-4 where central government dictates a format to local authorities) to enforce that the distributed publishers all point to the same central code list rather than copy it into their own publication space.

I also think that as CSV is a data format there isn't a security issue here, just as there isn't an issue with an HTML file linking to another HTML file. Or we could make reference to encapsulation of the processing, as there is with iframes.


Reply to this email directly or view it on GitHub.


Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me

gkellogg added a commit that referenced this issue Nov 28, 2014
@JeniT
Copy link

JeniT commented Nov 30, 2014

@afs comment?

@afs
Copy link
Contributor

afs commented Dec 1, 2014

Loose links (AKA generate a URL, no other contract) is essential.

I have been taking "ForeignKey" to mean URL and a guarantee that the target exists at least within the conversion. It can only be meaningful when a set of files converted together. e.g. "Generate link from cell X to cell Y if cell Y defined". This gives better robustness and conversion checking as well as better data. For example, having links generated that point to known missing items is just moving the pain on to the data consumer. It might be a conversion error and so good to trap early.

That does not require web resolution - it's only a local check. There is no security issue that I can see.

@iherman
Copy link
Member

iherman commented Dec 1, 2014

On 01 Dec 2014, at 11:07 , Andy Seaborne notifications@github.com wrote:

Loose links (AKA generate a URL, no other contract) is essential.

I have been taking "ForeignKey" to mean URL and a guarantee that the target exists at least within the conversion. It can only be meaningful when a set of files converted together. e.g. "Generate link from cell X to cell Y if cell Y defined". This gives better robustness and conversion checking as well as better data. For example, having links generated that point to known missing items is just moving the pain on to the data consumer. It might be a conversion error and so good to trap early.

That does not require web resolution - it's only a local check. There is no security issue that I can see.

I am not sure I understand the remark, mainly in light of your last sentence. Do you mean that a converter should check the existence of the CSV files that are referred to through foreignKey before doing the generation of the JSON k/v pairs or the RDF triples? If so, why does not that require a web resolution?

Ivan


Reply to this email directly or view it on GitHub.


Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

@afs
Copy link
Contributor

afs commented Dec 1, 2014

It does not require web resolution because its all in the same batch for files converted together. In fact, it can't be a web resoution - the target may not have been published yet. It's more like keeping a local map and doing some checking.

@iherman
Copy link
Member

iherman commented Dec 1, 2014

I am sorry Andy, I still do not understand. The TPAC resolution (see Jeni's comment) is that we can use loose URL-s and that is what foreign keys are used for. You seem to say that we have two different concepts: a restricted foreign keys that have some sort of a a guarantee (although I am not sure how that can be reinforced) and a loose foreign keys. Is that what you mean? How would the processing of these two differ?

@afs
Copy link
Contributor

afs commented Dec 1, 2014

Yes - strong, related conversion is a different use case to the one Jeni presents.

A loose link is not a foreign key as in R-ForeignKeyReferences. The term "foreign key" comes with a lot of baggage from the database world - it is not simply a link. I don't know what you mean by "restricted foreign key"; what is being restricted?

My suggestion was that foreign key, with its additional constraint of target existence, be restricted to files that are published together, hence the constraint can be checked/enforced.

@iherman
Copy link
Member

iherman commented Dec 1, 2014

On 01 Dec 2014, at 12:31 , Andy Seaborne notifications@github.com wrote:

Yes - strong, related conversion is a different use case to the one Jeni presents.

A loose link is not a foreign key as in R-ForeignKeyReferences. The term "foreign key" comes with a lot of baggage from the database world - it is not simply a link. I don't know what you mean by "restricted foreign key"; what is being restricted?

'restricted' in the sense of what you proposed, ie, that the tables should come from the same publishers.

My suggestion was that foreign key, with its additional constraint of target existence, be restricted to files that are published together, hence the constraint can be checked/enforced.

Hm. O.k., that is clear, thanks. But I am also not sure what it means 'from the same publisher'. Does it mean, operationally, that the CSV URI-s are on the same domain? If not, what else?

Is this restriction required by our use cases?

I personally do not see the necessity for this restriction and differentiation, to be honest. Maybe the term 'foreignKey' should be avoided indeed, because of its connotation in the database world. However, the feature of being able to link to another table, simply through a URI and without any further checks, seems to be useful.

Ivan


Reply to this email directly or view it on GitHub.


Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

@JeniT JeniT changed the title Limit foreign key x-refs to files from a single publisher Limit foreign key cross references to the same batch of processed resources Feb 4, 2015
@JeniT
Copy link

JeniT commented Feb 4, 2015

There might be some differentiation we can make here between valueUrl which provides a loose (unchecked) link, and foreign keys, which provide some guarantees through building maps from the resources about items identified in the data. This needs working through (suggest at the F2F).

@gkellogg
Copy link
Member

gkellogg commented Feb 4, 2015

In my implementation, I changed the roles example to use variations on aboutUrl, propertyUrl and valueUrl to create an actual semantic mapping between these tables:

@JeniT
Copy link

JeniT commented Feb 13, 2015

Discussed at Feb F2F. Foreign key references are for validation purposes (there's an error if the referenced value doesn't exist, and it can only be used to link resources in a single table group). URL generation (eg aboutUrl, propertyUrl, valueUrl) provides for unvalidated (weak) links.

Editor action: Make clear the above and that validators should provide the user option to either do strict or lax validation. Converters should provide the option to not to do validation at all.

@danbri
Copy link
Contributor

danbri commented Feb 13, 2015

gkellogg added a commit that referenced this issue Feb 15, 2015
…added `_column`, `_sourceRow`, and `_sourceColumn` as other variables available when expanding a `URI template property`.

This partially addresses issue #16.
@JeniT JeniT self-assigned this Feb 17, 2015
JeniT pushed a commit that referenced this issue Feb 17, 2015
6a6d74 added a commit to 6a6d74/csvw that referenced this issue Feb 19, 2015
Discussed at Feb F2F. Foreign key references are for validation
purposes (there's an error if the referenced value doesn't exist, and
it can only be used to link resources in a single table group). URL
generation (eg `aboutUrl`, `propertyUrl`, `valueUrl`) provides for
unvalidated (weak) links.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants