Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

determining cell text direction #620

Closed
r12a opened this issue Jun 15, 2015 · 5 comments · Fixed by #638
Closed

determining cell text direction #620

r12a opened this issue Jun 15, 2015 · 5 comments · Fixed by #638

Comments

@r12a
Copy link

r12a commented Jun 15, 2015

the 'base direction' of a string is crucial to correct display when dealing with bidirectional text (see http://www.w3.org/International/articles/inline-bidi-markup/uba-basics for an explanation of why). The base direction can be determined by metadata (such as the dir attribute in html) or by testing the data (typically identifying the direction of the first strong character). The latter approach does not always produce the correct base direction, eg. where latin characters appear at the start of a bidirectional string.

as far as i can tell, the overall 'direction' annotation for the table, sets the direction of columns on display, but is not used to set the default direction of text in cells.

6.4 Parsing Cells says that the cell annotation for text direction takes its value from the column annotation for text direction.

it seems, then, that it is possible to indicate that the direction of all cells in a particular column should be, say, rtl, by specifying direction for that column.

it's not clear to me whether the last paragraph in 6.5.1 is contradicting this by suggesting that the presence of strong characters in the cell will cause the UBA's first-strong algorithm to kick in, or simply that the contents of the cell will be treated as normal by the UBA, given the base direction set by the 'text direction' cell annotation. I'm particularly unclear, since the preceding sentence seems to be applying different rules for strings with no strong characters, and is followed by 'However...'.

here are some thoughts/issues/questions:

i would have thought that it would be easier to use the direction annotation of the table to establish the default base direction for cell values, rather than only being able to specify it per column. (Then it would be analogous to the dir attribute on the html element.)

i also think it may be useful sometimes to specify the direction for a given row.

if the direction of a table is not set, or the direction is set to auto/default, then i would expect that the first-strong algorithm would be applied to cells to determine the base direction.

on the other hand, if the direction of a cell is specified by an annotation, then the cell should probably take its direction from that.

am i correct in thinking that it is also possible to use metadata to set the direction for a particular cell? If so, then the algorithm should not overwrite that with the value in the column annotation.

@JeniT
Copy link

JeniT commented Jun 17, 2015

Thanks for the comment @r12a.

The text direction annotation on a column (and hence on a cell) is determined by the textDirection property within the JSON metadata, which is an inherited property that can be specified at the level of a group of tables, table, schema or column. So it is possible to set, in the metadata, a default text direction for all the cells in the table.

We made a decision a while ago not to support setting annotations on individual rows as there were very few use cases where it was actually required. This might be something we revisit in vNext if we find that the global notes annotation is being used to set these kinds of annotations at individual row or cell level (this also answers your last point).

We have referenced the Unicode Bidirectional Algorithm as the mechanism for determining how the contents of a cell should be displayed. This is a fairly complex algorithm, dealing with nesting of different directional markers within the text. I believe that we need to view CSV as a Higher-Level Protocol while using that defined algorithm, and that we should be customising it according to HL1 by setting the "paragraph embedding level" based on the text direction annotation on the cell. But this only sets the initial level (ie whether you start off thinking the paragraph is LTR or RTL): as I understand it this will be overridden by the first strong character.

Plainly this isn't explained well enough in the current text (and possibly not above either). If you have a suggestion about how to rephrase this section to make it clearer, I'd welcome that.

@matial
Copy link

matial commented Jun 17, 2015

Setting the text direction in a cell according to its first strong character actually defeats setting the direction with a text direction annotation.
Ideally, text direction should be specifiable at the cell level and when it is specified, the first strong algorithm must not override it.

From JeniT's comment, it seems to me that text direction annotation will always need to be specified for a RTL table, then this will be inherited by columns or respecified for some or all columns.

If column text direction overrides the first strong algorithm, then this algorithm will only be used for tables where text direction is nowhere specified, which means LTR tables by default, which is a rather surprising way of supporting bidi.

If column text direction is overridden by the first strong algorithm, then there is no point setting it, except for cells with no strong character at all, such as pure numbers.
Note that a column of numbers will usually have a textual column header in its first row. In a bidi context, this header will often need RTL text direction, while the numbers below it are better accommodated with LTR text direction (think of placement of the minus sign for negative numbers). If the finest granularity for text direction annotation is column level, there is no way out of this conflict.

@r12a
Copy link
Author

r12a commented Jun 19, 2015

But this only sets the initial level (ie whether you start off thinking the paragraph is LTR or RTL): as I understand it this will be overridden by the first strong character.

As Mati said, this is incorrect.

Bidi is always a difficult area to discuss and understand. In order to help with answering this thread, i have begun (and so far only begun) an article that discusses bidi in plain text, and makes some observations about CSV. It's incomplete at the moment, but if you read it you may be able to help me finish it. See http://r12a.github.io/docs/bidi-plain-text/

@JeniT
Copy link

JeniT commented Jun 30, 2015

Thank you @r12a & @matial for the explanations for why the text we had was faulty. I've created a pull request at #638 with some revised text. Please could you review it and let us know if it's OK?

@r12a
Copy link
Author

r12a commented Jul 3, 2015

This discussion continues at #638. The text there is sufficient long and complicated that i didn't try to bring the discussion back to this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants