Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How simple is simple mapping? #74

Closed
6a6d74 opened this issue Nov 28, 2014 · 6 comments
Closed

How simple is simple mapping? #74

6a6d74 opened this issue Nov 28, 2014 · 6 comments

Comments

@6a6d74
Copy link
Contributor

6a6d74 commented Nov 28, 2014

Mindful of our decision at the F2F / TPAC to 'punt' templated mapping out of scope, I am still trying to determine how simple the 'simple mapping' is ...

We agreed that the URI Templates were in scope - which allow a little cleverness in parsing the cell values to create, amongst other things, URLs for the row. But looking back on the minutes of our discussion of 17-Sep-2014 I see the following ...

"""
phila: following up jtandy, ... re use of string functions for URI generation
escaping?
I had experience of trying to do that, ... basic string function of removing white space, case normalization, ...
but that's as complex as it got
phila: was simple excel spreadsheet, using awk
so turning string name of a ministry into a URI
pretty basic stuff
case normalize, and get rid of whitespace
AndyS, you wanted to mention URI templates.
AndyS: similar to what Phil says
We use a lot of URI templates
multiple fields into one URI
sector, area, ID all go in.
certain amount of cleaning, string manipulation, whitespace, chars we don't want, ...
[...]
"""

From the overview of URI Templates (RFC 6570) that I produced for the F2F we can see that URI Templates support is mostly about 'expansion' of expressions; string manipulation support is limited to a substring value modifier (from position zero to an arbitrary position).

It seems from the discussion on the 17-Sep that we need a little more capability when it comes to string manipulation ...

  • basic string manipulation functions [tbd]
  • case normalisation
  • removal of whitespace and other unwanted characters - probably via substring extraction (not only from position 0!) and reassembly

I'm not advocating a position here (yet!) - just indicating that this requirement is outstanding. Certainly the FPWD of the mapping documents (anticipated for Dec-2014) will not include these functions.

@iherman
Copy link
Member

iherman commented Nov 28, 2014

On 28 Nov 2014, at 12:40 , Jeremy Tandy notifications@github.com wrote:

Mindful of our decision at the F2F / TPAC to 'punt' templated mapping out of scope, I am still trying to determine how simple the 'simple mapping' is ...

We agreed that the URI Templates were in scope - which allow a little cleverness in parsing the cell values to create, amongst other things, URLs for the row. But looking back on the minutes of our discussion of 17-Sep-2014 I see the following ...

"""
phila: following up jtandy, ... re use of string functions for URI generation
escaping?
I had experience of trying to do that, ... basic string function of removing white space, case normalization, ...
but that's as complex as it got
phila: was simple excel spreadsheet, using awk
so turning string name of a ministry into a URI
pretty basic stuff
case normalize, and get rid of whitespace
AndyS, you wanted to mention URI templates.
AndyS: similar to what Phil says
We use a lot of URI templates
multiple fields into one URI
sector, area, ID all go in.
certain amount of cleaning, string manipulation, whitespace, chars we don't want, ...
[...]
"""

From the overview of URI Templates (RFC 6570) that I produced for the F2F we can see that URI Templates support is mostly about 'expansion' of expressions; string manipulation support is limited to a substring value modifier (from position zero to an arbitrary position).

It seems from the discussion on the 17-Sep that we need a little more capability when it comes to string manipulation ...

• basic string manipulation functions [tbd]
• case normalisation
• removal of whitespace and other unwanted characters - probably via substring extraction (not only from position 0!) and reassembly
I'm not advocating a position here (yet!) - just indicating that this requirement is outstanding. Certainly the FPWD of the mapping documents (anticipated for Dec-2014) will not include these functions.

Yeah...

Whatever we decide to do, we should keep one more constraint in mind: we shouldn't mess with the RFC6570. What I mean is we should not incorporate, say, case normalization into the template. Not only would we create a mess specification-wise, but we would also make any implementation difficult because it could not rely on external RFC6570 implementations any more.

In practice, what this could mean is that these are achieved via separate manipulation k/v pairs in the metadata, to be performed on the cell values before any template function. Something like

  { ...
    "template" : "your full rdf6570 template",
    "filters"  : ["normalize", "strip" ]
  }

or get back to the regexp alternatives:

  { ...
    "template" : "your full rdf6570 template",
    "filters"  : [{"from" : "regexp to select", "to" : "transformation result, referencing to regexp groups from 'from'" }]
  }

Sigh. Feature creep... Sigh

:-)

Ivan


Reply to this email directly or view it on GitHub.


Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

@JeniT
Copy link

JeniT commented Feb 4, 2015

Proposed resolution: keep it simple and not allow any operations on CSV data aside from those provided through normalisation, parsing numeric & date/time formats & use of URI templates.

@iherman
Copy link
Member

iherman commented Feb 4, 2015

+1

@gkellogg
Copy link
Member

gkellogg commented Feb 4, 2015

+1. This also includes existing cell value language that extract a semantic value based on format descriptions (e.g., boolean, date/time, numeric).

@6a6d74
Copy link
Contributor Author

6a6d74 commented Feb 12, 2015

+1 ... this pushes string manipulation, case normalisation etc. to the more complicated (templated) mapping which we've not really discussed yet. I think we said this would be a community group thing.

@JeniT
Copy link

JeniT commented Feb 13, 2015

Discussed at Feb F2F. All agreed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants