New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identity of claims #1

Open
sandhawke opened this Issue Aug 22, 2018 · 7 comments

Comments

Projects
None yet
3 participants
@sandhawke
Copy link
Contributor

sandhawke commented Aug 22, 2018

How do we model the identity of claims?

For example: should we say that “The Earth is flat” and “The world is flat” are two expressions of the same claim, or should we say those are two different claims that happen to be very similar, even synonymous?

It seems likely there are several approached which would work both philosophically and technically, so the decision may come down to practical issues related to implementation, deployment, and security.

Proposal 1: Define a Claim as an abstract entity which can be expressed by one or more natural language expressions (each in the form of a logical proposition) which all mean essentially the same thing. In this model, “the Earth is flat” and “the world is flat” are two strings containing natural language propositions which one could then decide express the same abstract Claim.

Proposal 2: Define a Claim as a string containing a logical proposition. In this model, “the Earth is flat” and “the world is flat” are two different claims; one could then decide they are related as synonymous claims.

@sandhawke

This comment has been minimized.

Copy link
Contributor

sandhawke commented Aug 22, 2018

During f2f2 I expressed my inclination toward Proposal-2 but we didn't have time to get into why. I have not implemented either of these strategies beyond a weekend-hack project last year, so I certainly might be wrong.

In general, I think Proposal-2 makes the common cases simpler, while still allowing complexity when necessary. I think Proposal-1 feels a bit more ontologically pure, but we shouldn't let that feeling sway us without working through the consequences. I also suspect non-generalized RDF's poor handling of literals-as-subjects may drive people toward Proposal-1, but I suggest we treat that as a weakness of RDF with feasible workarounds and not impair the high-level design because of a syntactic weakness.

Most importantly, I think Proposal-1 unnecessarily introduces a trust and security issue, in exactly the spot where people are most worried about trust. For example, imagine we have hundreds of scientists weighing in on the flatness of the world. With Proposal-1 they would be publishing something like a machine-readable version of "I refute https://example.org/claim-34554" where everyone is supposed to know that URL means the world is flat. With Proposal-2, they would be publishing something like "I refute the claim 'the world is flat'".

The problem is that with the first approach, in the face of conspiracy theories, state actors, and ordinary web technology problems, I don't think the link between URL and claim will always stand up. Will everyone really know and accept, now and in the future, that https://example.org/claim-34554 means "the world is flat"? I'm a bit dubious. The URL-as-real-world-identifier story has always involved some hand waving, and I'd rather not bring that fuzziness in while trying to build persuasive and reliable chains of reasoning.

@sandhawke

This comment has been minimized.

Copy link
Contributor

sandhawke commented Aug 22, 2018

In https://lists.w3.org/Archives/Public/public-credibility/2018Aug/0005.html @danbri wrote:

This design [Proposal-1] addresses scenarios including sites like Snopes that want some but not all of their data public, but joinable with info shared through other channels. Also cases such as translation of verbatim and paraphrased claim text without mixing up which of those it is.

It seems to me that's also quite doable with Proposal-2. It's just a question of using a literal as a key instead of a URI, when doing the join. That is, the claim strings "the world is flat" and "the world is flat" are just as identical as the URI strings "https://example.org/claim-5023" and "https://example.org/claim-5023". I know that's not the primary way RDF wants to do merging, but using literals as keys is an old concept in RDF, and the bread and butter of regular computing (eg SQL joins).

This does require accurately copying the claim string. I'm not sure if that's easier or harder than accurately copying a URI. In either case, hopefully tools make it pretty easy and help catch errors when they occur.

In terms of translations and paraphrasing, again, that seems quite simple with the strings in Proposal-2. We can say things like:

  • "the world is flat" is a synonym of "the Earth is flat"
  • Person-3 claimed "the world is flat"
  • Person-5 claimed "the Earth is flat"
  • "the world is flat" is in English
  • "le monde est plat" is a translation of "the world is flat"

Some of these might be a little contorted in JSON-LD or other non-generalized RDF syntaxes, but I find their simplicity at this level (or in generalized RDF) quite appealing, and I think they have tolerable JSON-LD versions.

@connieimdialog

This comment has been minimized.

Copy link

connieimdialog commented Sep 13, 2018

Proposal-2 1+

@sandhawke

This comment has been minimized.

Copy link
Contributor

sandhawke commented Sep 13, 2018

@danbri is having 2FA issues and asked me to post this:

I brought this up with @MevanBabakar and Will last week, visiting FullFact, and I believe they were more comfortable with IDs. Amongst other things, strings get fixed, tweaked, typos and mistakes and inclarities updated. On the other side, backend systems are likely to distinguish claim IDs from language specific labels, and we may want in multi-language labels to ge attached to a single id, along lines that are commonly seen with SKOS

@sandhawke

This comment has been minimized.

Copy link
Contributor

sandhawke commented Sep 13, 2018

Oh, I absolutely understand being "more comfortable" with IDs. It lets you change things later, which is very re-assuring. The problem is: I suspect it also lets other people change things later.

(And I think you can achieve the same ends with proposal-2, it just needs a little different approach. More immutable-data-store style.)

@retog

This comment has been minimized.

Copy link

retog commented Sep 25, 2018

1, but without IDs.

If we know from a reputable source that "the earth is flat" is likely true and from an other that "the world is flat" means the same as "the earth is flat" we have a good reason to accept the statement "the world is flat" as likely true.

In a decentralised system we have no way of generating IDs in a way that is independent of the wording of the claim. Identity of claims may very well be disputed, but is nevertheless an important factor to establishing the truth value of a statement.

Claims that two statements have the same claim should be separate from the claim-review as asserting the identity of claim is independent of the evaluation of the claim. Yes one would merge the identity statement and a claim-review to a new claim review ("'the world is flat' is probably true because 'the world is flat' expresses the same as 'the earth is flat' and it has been shown by Flatfactss.com that 'the earth is flat' is probably true") but this unnecessarily conflates two different topics. Flatearther and others might well agree on the equality of the claims, not requiring claim-reviews for synonymic expression dramatically reduces the noise and contributes to a more efficient establishment of facts.

@connieimdialog

This comment has been minimized.

Copy link

connieimdialog commented Sep 26, 2018

  1. i like the recognition that this may be conflating two issues - identity of claim versus evaluation of claim, but then ask: what body or process equates the earth=world in this case?

on the one hand: perhaps there is a small spectrum of accepted equations, say as defined in dictionaries. allowing a range may also help the case of translations where things may not be so exact in terms of cultural associations/interpretations. so to the extent that earth-world (en) monde-terre (fr) erde-welt (de) all can be accepted to be the same, then great, let's not reinvent the work of the dictionaries. (need to test this out in non-European languages)

on the other hand: if there are no IDs and the interpretation/equating process needs to happen anyway, isn't that an aspect of what proposal 2 allows?

  1. i wonder if working through another example could help:

' During the Civil War of the United States, slaves were emancipated by Abraham Lincoln in 1863.'

Contrast this with
' Following the Civil War of the United States, slaves were emancipated in 1865.'

Depending on the location and how you count it, both are true and could also be considered not completely true; Lincoln's original declaration had limited effect, the latter statement represents the beginning of emancipation in Texas (https://en.wikipedia.org/wiki/Juneteenth). And to the extent that emancipation is considered a legal process/status, I believe some of this took even longer.

So how would this work in proposal 1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment