Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documenting relationships to the RDB2RDF Direct Mapping #455

Closed
iherman opened this issue Apr 6, 2015 · 5 comments
Closed

Documenting relationships to the RDB2RDF Direct Mapping #455

iherman opened this issue Apr 6, 2015 · 5 comments

Comments

@iherman
Copy link
Member

iherman commented Apr 6, 2015

(This issue is for the LCCR, not for the upcoming publication)

Our charter also includes:

"The output of the mapping mechanism for RDF MUST be consistent with either the RDF Direct Mapping or R2RML so that if a table from a relational database is exported as CSV and then mapped it produces semantically identical data."

Our approach turned out to be a bit different than either R2RML or the Direct Mapping. I believe, therefore, that we should document how the examples in the Direct Mapping document can be reproduced with a (hopefully simple) metadata. I have created the minimal metadata for the first example in the Direct Mapping:

{
    "@context": "http://www.w3.org/ns/csvw",
    "resources" : [{
      "url": "http://foo.example/DB/People.csv",
      "aboutUrl" : "http://foo.example/DB/People/ID={_row}",
      "propertyUrl" : "http://foo.example/DB/People#{_name}",
      "tableSchema": {
        "columns": [{
          "name": "ID",
          "datatype" : "integer"
        }, {
          "name": "fname",
        }, {
          "name": "ref-addr",
          "aboutUrl" : "http://foo.example/DB/Addresses/ID={ref-addr}"
        }, {
          "name": "type",
          "virtual": "true",
          "propertyUrl": "rdf:type",
          "valueUrl" : "http://foo.example/DB/People"
        }],
      }
    }, {
      "url": "http://foo.example/DB//Addresses.csv",
      "aboutUrl" : "http://foo.example/DB/Addresses/ID={_row}",
      "propertyUrl" : "http://foo.example/DB/Addresses#{_name}",
      "tableSchema": [{
        "columns": [{
          "name": "ID",
          "datatype" : "integer"
        }, {
          "name": "city",
        }, {
          "name": "state",
        }],
    }]
}

Actually, it is worth nothing that some additional features can be added that would reflect the original RDB schema, yielding:

{
    "@context": "http://www.w3.org/ns/csvw",
    "resources" : [{
      "url": "http://foo.example/DB/People.csv",
      "aboutUrl" : "http://foo.example/DB/People/ID={_row}",
      "propertyUrl" : "http://foo.example/DB/People#{_name}",
      "tableSchema": {
        "columns": [{
          "name": "ID",
          "required": true,
        }, {
          "name": "fname",
        }, {
          "name": "ref-addr",
          "aboutUrl" : "http://foo.example/DB/Addresses/ID={ref-addr}"
        }, {
          "name": "type",
          "virtual": "true",
          "propertyUrl": "rdf:type",
          "valueUrl" : "People"
        }],
        "primaryKey": "ID",
        "foreignKeys" : [{
            "columnReference" : "ref-addr",
            "reference" : {
                "resource" : "http://example.org/Addresses.csv",
                "columnReference" : "ID"
            }
        }]
      }
    }, {
      "url": "http://foo.example/DB/Addresses.csv",
      "aboutUrl" : "http://foo.example/DB/Addresses/ID={_row}",
      "propertyUrl" : "http://foo.example/DB/Addresses#{_name}",
      "tableSchema": [{
        "columns": [{
          "name": "ID",
          "required": true,
        }, {
          "name": "city",
        }, {
          "name": "state",
        }],
        "primaryKey": "ID",
    }]
}

Although these additions are not strictly necessary to reproduce the mapping, they are a "plus" considering validation.

@gkellogg, maybe it is worth testing with your processor that these are correct and produce the right results. I also think that these should be documented somewhere (as a result of the charter); maybe this simple example, and another slightly more complex one could be added in the metadata document.

B.t.w., if this mapping is correct, note the need for the virtual column. This may be an argument in favour of a final resolution of #179...

@gkellogg
Copy link
Member

I added examples to the examples directory dm-example-1 through dm-example-5.

Note that we can't reproduce triples that use the foreign-key relationship to provide a value, as that is not handled through our conversions as foreign-keys are used only for validation.

Also, there's a small bug in the second example: one of the triple results is <Department/ID=23> <Department#ref-manager> <People#ID=8>, but it should be <Department/ID=23> <Department#ref-manager> <People/ID=8> (note use of "/" vs "#").

@iherman
Copy link
Member Author

iherman commented Apr 16, 2015

@gkellogg

Note that we can't reproduce triples that use the foreign-key relationship to provide a value, as that is not handled through our conversions as foreign-keys are used only for validation.

Can you give an example for what you mean here? If we have a missing functionality in our system, we may have to record that in a separate issue explicitly. (Even if we decide to close it without any followup in the spec.)

@iherman iherman closed this as completed Apr 16, 2015
@iherman iherman reopened this Apr 16, 2015
@iherman
Copy link
Member Author

iherman commented Apr 16, 2015

(The merge has been done, but a pending question keeps this issue open for now...)

@gkellogg
Copy link
Member

The main difference between the RDB2RDF examples and our own is that they can use foreign-key information to create data. For example, People references Addresses using a composite foreign key and expects a triple to be emitted containing a value from the Addresses table.

Example 2 (metadata, people.csv, addresses.csv, department.csv, results)

Here, note that there is a virtual column "ref" with the "valueUrl": "http://foo.example/DB/People/ID=XX". This wants some variable to be used in place of "XX", but there's nothing we define which can be used. The expected results from R2RML are the following:

@base <http://foo.example/DB/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<People/ID=7> rdf:type <People> .
<People/ID=7> <People#ID> 7 .
<People/ID=7> <People#fname> "Bob" .
<People/ID=7> <People#addr> 18 .
<People/ID=7> <People#ref-addr> <Addresses/ID=18> .
<People/ID=7> <People#deptName> "accounting" .
<People/ID=7> <People#deptCity> "Cambridge" .
<People/ID=7> <People#ref-deptName;deptCity> <Department/ID=23> .
<People/ID=8> rdf:type <People> .
<People/ID=8> <People#ID> 8 .
<People/ID=8> <People#fname> "Sue" .

<Addresses/ID=18> rdf:type <Addresses> .
<Addresses/ID=18> <Addresses#ID> 18 .
<Addresses/ID=18> <Addresses#city> "Cambridge" .
<Addresses/ID=18> <Addresses#state> "MA" .

<Department/ID=23> rdf:type <Department> .
<Department/ID=23> <Department#ID> 23 .
<Department/ID=23> <Department#name> "accounting" .
<Department/ID=23> <Department#city> "Cambridge" .
<Department/ID=23> <Department#manager> 8 .
<Department/ID=23> <Department#ref-manager> <People#ID=8> .

Note that they can reference <Department/ID=23>, as there is a foreign-key relationship on deptName and deptCity to name and city in the department.csv. From this, they want to use the ID column from the department in the value. We don't have anyway to reach through a foreign-key relationship to access information from a referenced row.

Other examples use variations on this, in some cases including the BNode identifier created for a row that has a composite primary key.

For us to produce the same output would require a way to go from the cell to it's row and select one of the referenced rows based on a foreign-key specification as part of the virtual column description, and then get one of the cell values of that referenced row. In the case where the reference is for a BNode (as in their section 2.5 and our example 5, it's even trickier, as the value to use is the BNode associated with the referenced row when it was converted. Doing this would require a fair extension for variables to reference through these relationships at the very least.

@iherman
Copy link
Member Author

iherman commented Apr 17, 2015

@gkellogg thanks, I can see it now. I have created a separate wiki page for, in general, the deviation issues from the charter, to be reviewed by everyone (issue #503); that page refers to different examples. I hope I got it right.

Closing this issue for the good order!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants