Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial comments on the metadata document #679

Closed
iherman opened this issue Jul 11, 2015 · 5 comments
Closed

Editorial comments on the metadata document #679

iherman opened this issue Jul 11, 2015 · 5 comments

Comments

@iherman
Copy link
Member

iherman commented Jul 11, 2015

(This is a transfer of the comments sent by 'timeless' to public-csv-wg-comments@w3.org')

W3C Editor's Draft 10 July 2015 [1]

We are on Github

GitHub

Data Aggregators can use the indicated metadata, such as descriptions, titles, modification dates, and licences [sic], to enable more intelligent retrieval of relevant data on the web.

Implementations may fulfil [sic] one or more of these functions.

which are not defined in this specification and MUST generate a warning when they are encoutered [sic].

then that string is assumed to be in the default language (or have an undefined language, und, if there is no such property).

pattern:
Could you write "und" instead of und (unquoted)? For those of us
reading the document with a Screen Reader or similar tool (actually a
plain text serializer in my case), the monospace formatting is lost.

Foreign key definitions provide for strong linking between tables that guarantees (through validation) the existance [sic] of a referenced row.

For example, if an implementation recognises and displays the value of the dc:description property, it should also recognise and display the value of the http://purl.org/dc/terms/description property in the same way.

en-US:
please see point 2 of [2]

Note that this format is not the same as the existing text/csv and text/tab-delimited-values mediatypes [sic],

Google says it's two words.

The CSV on the Web Working Group was chartered to produce a

This is the first instance of the group, but it isn't linked here
(it's linked in the next paragraph). That feels like putting the cart
before the horse.

Also, if you're going to mention the charter, you could do everyone
the favor of linking to it.

Recommendation "Access methods for CSV Metadata" as well as
Recommendations for "Metadata vocabulary for CSV data" and "Mapping mechanism to transforming CSV into various Formats (e.g., RDF, JSON, or XML)"

I object to capitalizing Formats.

This document aims to primarily satisfy the second of those Recommendations.

I don't think that word should be capitalized in this context.

Depending on how you parse the previous lines, this either references
2/2 or 2/3.

I'm not quite sure the value of writing this as you did. I'd suggest
"This document addresses the charter scope for {}. Additional charter
scope items include {} and {} which would be covered by other
documents."

As an example, say that the following CSV file were available at http://example.org/tree-ops.csv

missing :

For example, say that the following metadata file were available at http://example.org/tree-ops.csv-metadata.json:

[reference]
A document reference (normative or informative) is enclosed in square brackets and links to the references section.

Here, the brackets are included in the link, they aren't in the document.

compliant applications MUST use that default value and MUST generate a warning.

=> MUST generate a warning and use that default value.

which parallels this:

If no default value is provided for that property, compliant applications MUST generate a warning and behave as if the property had not been specified.

including properties (aside from common properties) which are not defined in this specification.
having invalid values for a given property.

these items are not sentences and aren't preceded by a :

on a [sic] annotated tabular data model.

"an"

A description object is a JSON object that describes a component of the annotated tabular data model (a group of tables, a table or a column) and has one or more properties are mapped into properties on that component.

are => that are | {empty}

For example, in the column description

add , or : ?

Offhand, figure 1 [3] was generated at a resolution which is too small
to read at 100%; text runs together... Also, the gray background is
unhelpful.

figure 1 [4] isn't accessible, it's just a bunch of lines / polylines.
Someone helpfully stripped all the words from the source code.

There are different types of properties on description objects:

  • section 5.1.1 Array Properties.
  • section 5.1.2 Link Properties.

these aren't sentences...

The variables that are set are:

... are ... are <- yuck

most of these don't end in a period:

_sourceRow is set to the source number of the row that is currently being processed; this usually varies from _row by skip rows and header rows

except this one:

_name is set to the URI decoded column name annotation, as defined in [tabular-data-model], for the column that is currently being processed. (Percent-decoding is necessary as name may have been encoded if taken from titles; this prevents double percent-encoding.)

The propertyUrl property might be defined as "{#_name}", meaning that it resolves as a fragment identifier relative to the URL of the source of the table. For example, accessing it from a column with the column name GID would look like:

I couldn't follow this example. Partially, I think you hid the input
as prose and only showed output in example 11, whereas example 10 was
the input and the prose held the output.

the processor will issue a warning and ignore the value.

pattern:
will isn't an rfc word

No Oxford Comma:

An atomic property that MUST have a single string value that is one of "rtl", "ltr" or "auto".

Oxford Comma:

Indicates whether the tables in the group should be displayed with the first column on the right, on the left, or based on the first character in the table that has a specific direction.

please pick a style (I'd recommend Oxford).

It is also possible to provide weak linking between tables that are not tested by validations but which may be useful when converting tabular data into other formats, using aboutUrl and valueUrl.

I think you want to drop the ,;

the sentence is too long.

This annotation MUST be percent-encoded as necessary to conform to the syntactic requirements defined in [RFC3986]

missing .

A boolean atomic property taking a single value

isn't taking a single value redundant for boolean?

a [sic] inherited property defined in its containing schema description

an

takes precedence of one defined in its containing table description

pattern:
of => over

This property is irrelevant if the separator is null or undefined, but this is not an error.

but this is not an error doesn't make sense / read well

An [sic] URI template property that MAY be used to create a URI for a property if

pattern:
A -- note that you use a URI elsewhere, including later in this fragment.

quoteChar
The default is ".

I'd suggest writing this as: """ or '"'

An [sic] boolean atomic property
An [sic] numeric atomic property

pattern:
A --

A boolean atomic property that, if true, sets the trim flag to "start".
If false, to false.

This second "sentence" needs more words.
It probably should be to "false" also...

No other values are valid.

Is that an "Error", or just "whatever"?

Applications MUST raise an error if length, maxLength, or minLength are specified and the base datatype is not string or one of its subtypes, or a binary type.

the order of ors doesn't work.

try ... neither string/string subtype nor a binary type.

Values MAY be a string, native JSON type (such as number, true, or false.), value object, node object or an array of zero or more of any of these.

stray . inside the parenthetical

[1] http://w3c.github.io/csvw/metadata/
[2] https://lists.w3.org/Archives/Public/public-csv-wg-comments/2015Jun/0002.html
[3] http://w3c.github.io/csvw/metadata/properties.png
[4] http://w3c.github.io/csvw/metadata/properties.svg

@iherman
Copy link
Member Author

iherman commented Jul 11, 2015

Comment on one point:

figure 1 [4] isn't accessible, it's just a bunch of lines / polylines.
Someone helpfully stripped all the words from the source code.

The problem is the implementation issues of SVG. While the original source of the file has, of course, the texts, if the generated SVG keeps the text than I could not secure proper display across all browsers. The issue seems to be, obviously, font plus the fact that the tools (AI in this case) insist on positioning each character separately (which makes it equally un-accessible I guess). SVG fonts or using WOFF leads to differences among implementations (or do not work at all as for SVG Fonts), relying on system fonts is also unreliable.

Any good idea on how to handle this would be welcome, of course.

@iherman iherman added the NonWG label Jul 11, 2015
@gkellogg
Copy link
Member

More comments from timeless:

http://www.w3.org/TR/2015/CR-csv2rdf-20150716/

In order to faciliate [sic] the provision of such information,

The resources described by each row are explcitly [sic] defined using the about URL annotation this case three resources per row

Tables in a relational databases bear a strong resemblence [sic] to tabular data as defined in [tabular-data-model];

else, the datatype's base annotation value MUST be mapped to the RDF datatype IRI using the table below.

I expected to see a table below, instead I ran into a NOTE. Move the
note? (or somehow fix reader expectation about immediacy of table)

the cell value has already been parsed from the contents the cell according to the format annotation.

contents the => contents of the

Where the contents of the cell cannot be parsed, or other validation errors occur, cell errors will be provided.

will be provided is odd, provided by whom?

It is an implementation decision to determine how conversion applications should proceed in the event that cell errors are encountered.

occur/encountered is an odd mix

string xsd:string or rdf:langString depending on whether or not the value has an associated language.

there's a remarks column, and most of this text probably belongs in it...

According to [rdf11-concepts] language tags cannot be combined with any other xsd datatypes.

drop any

If a cell has any other datatype than string,

any other datatype => a datatype other

specific serializations, like Turtle [turtle],

you don't usually write like Foo [foo], the document is written to
be read as like Foo foo and thus you should probably just have like [foo] which reads as like foo.

In order to faciliate the provision of such information,

the provision of => providing

through some application specific API-s.

drop the -

Establish a new node S from the value of @id, if it exists, and new blank node otherwise and emit the following triple:

and new => and a new

If value is true or false, create an RDF Literal lit using the strings "true" or "false", accordingly with datatype xsd:boolean

missing .

Annotations for the resulting table T, with 4 columns and 3 rows, are shown below:
Annotations for the columns, rows and cells in table T are shown in the tables below.

I kind of expected :

The datatype annotation is set on columns C5, C6, C8 and C9 ({ "name": "dbh"}, { "name": > "inventory_date" }, { "name": "protected" } and { "name": "kml" });
integer, date, boolean and xml respectively.

you're missing an as or to or similar after ;

B.B. King,2014-04-13T20:00,"Lynn Auditorium","Lynn, MA, 01901",http://frontgatetickets.com/venue.php?id=11766

why isn't B.B. King quoted? Or why is Lynn Auditorium quoted?

e.g. for row R1 we state [] csvw:describes t1:event-1, t1:place-1, t1:offer-1 . .

this is hard to read without styling, please consider using ""s around
the stuff, or something, otherwise . . is a mess...

This example is concerned only with converting the information provided each government department or organization not the centrally published information listing organizations and professions.

provided => provided by
not => ; not or similar

Similarly, columns Cc7 and Cd8 (both with { "name": "organizationRef" }) use the about URL, property URL and value URL annotations to assert the relationship between the given post and the organization to which it belongs for the cells those columns.

cells [in] those columns ???

A processor exporting the table into a CSV file could also generate such a metadata automatically.

such a => such ???

Cells in both the city and state columns in the Addresses table contain strings

missing .

The corresponding relational schema specifies that

missing : ??

To generate the right object URI in the highlighted statement of the direct graph the processor has to find that unique key combination in the Departments table and, using that combination, has to establish the subject for that specific row that can is to be used as an object URI in the highlighted statement.

  1. this is too long to be a sentence
  2. that can is to be is wrong
  3. once you fix the first two, the section should be revisited...

@gkellogg gkellogg self-assigned this Jul 17, 2015
@gkellogg
Copy link
Member

Mote comments from timeless:

http://www.w3.org/TR/2015/CR-csv2json-20150716/

it MAY be possible to organise [sic] the objects associated with those subjects such that some objects are nested within others

See earlier feedback [1] on en-gb/en-us

We are on Github [2]
through some application specific API-s [3].
[reference]

Hopefully feedback [1][2][3] I've sent previously can be applied to
each of the four specifications without me actively calling out the
same comments on each (I will obviously call out spelling errors
throughout, although those really shouldn't be present in CR
documents...).

the edge relating M and N MUST be modifed [sic] such that the

Only the base annotation value is used to determine the primitive type used wihtin [sic] the JSON output.

The key words MAY and MUST are to be interpreted as described in [RFC2119].
...
no further action should be taken for this instance of Vurl.

note that in this document, should isn't an rfc word...

An object is defined in JSON ([RFC7159]) as unordered collection of zero or more name-value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.

as => as an
a name => name
a value => value

including information about table schemas and columns specified therein, foreign keys, table direction, transformations etc.

missing , before etc.

If name N occurs more than once within object Si, the name-value pairs from each occurrence of name N MUST be compacted to form a single name-value pair with name N and whose value is an array containing all values from each of those name-value pairs.

If N occurs once with value [1,2] and once with value [3,4], is the
result {"N":[1,2,3,4]} or {"N":[[1,2],[3,4]]} ?
I assume the latter, but think the text could use clarification.

One of the main constituents of graphs; in such data structures edges are used to establish relationships among vertices.

missing , before edges -- alternatively you could reword using an edge is ...

A collection of disjoint trees. For the purpose of this algorithm, the order of the trees are important, i.e., forests can also be viewed as a sequence of roots.

the trees => trees

A dedicated vertex in a tree; a root is not the child of any vertex.

any => another

One of the main constituents of graphs; in such data structures a vertex usually holds further information or data.

insert , after structures -- note that this sentence structure
varies slightly from edges above.

This clause in the algorithm prevents circular loops being created.

insert from before being

Furthermore, because the URL-list contains value URLs that occur only once for the current row, object Si cannot be a descendant of an intermediate vertices in the tree.

drop an

Specifications for such annotation processes should specify how these annotations should be converted into RDF.

How is this relevant to a document about JSON?

In addition to compacting values of property URLs,
URLs which ware [sic] the value of @type used within the notes and non-core annotations are compacted according to the rules as defined in URL Compaction in [tabular-metadata].

were? are?

@iherman
Copy link
Member Author

iherman commented Aug 2, 2015

To follow up on my previous note, I have added a longdesc file to describe the property diagram, taking care of one of the (sub-)issues.

6a6d74 added a commit to 6a6d74/csvw that referenced this issue Oct 13, 2015
6a6d74 added a commit to 6a6d74/csvw that referenced this issue Oct 13, 2015
6a6d74 added a commit to 6a6d74/csvw that referenced this issue Oct 13, 2015
This was referenced Oct 13, 2015
@JeniT
Copy link

JeniT commented Oct 14, 2015

All done, thanks @gkellogg & @6a6d74 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment