/
index.html
287 lines (199 loc) · 15.4 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Bengali Layout Requirements (Preliminary Editor's Draft)</title>
<script src="https://www.w3.org/Tools/respec/respec-w3c-common" async="" class="remove" type="text/javascript">
</script>
<script class="remove" type="text/javascript">
var respecConfig = {
// specification status (e.g. WD, LCWD, WG-NOTE, etc.). If in doubt use ED.
specStatus: "ED",
//publishDate: "2015-07-21",
//previousPublishDate: "2014-12-16",
//previousMaturity: "FPWD",
noRecTrack: true,
shortName: "benglreq",
copyrightStart: "2019",
edDraftURI: "http://w3c.github.io/iip/bengali/",
// if this is a LCWD, uncomment and set the end of its review period
// lcEnd: "2009-08-05",
// editors, add as many as you like
// only "name" is required
editors: [
{ name: "Richard Ishida", company: "W3C" }
],
wg: "Internationalization Working Group",
wgURI: "http://www.w3.org/International/core/",
wgPublicList: "public-i18n-iip",
bugTracker: { new: "https://github.com/w3c/iip/issues", open: "https://github.com/w3c/iip/issues" } ,
otherLinks: [
{
key: "Github",
data: [
{
value: "repository",
href: "https://github.com/w3c/iip"
}
]
}
],
// URI of the patent status for this WG, for Rec-track documents
// !!!! IMPORTANT !!!!
// This is important for Rec-track documents, do not copy a patent URI from a random
// document unless you know what you're doing. If in doubt ask your friendly neighbourhood
// Team Contact.
wgPatentURI: "http://www.w3.org/2004/01/pp-impl/32113/status",
// !!!! IMPORTANT !!!! MAKE THE ABOVE BLINK IN YOUR HEAD
};
</script>
<link rel="stylesheet" href="local.css" type="text/css">
</head>
<body>
<div id="abstract">
<p>This document describes requirements for the layout and presentation of text in languages that use the Bengali script when they are used by Web standards and technologies, such as HTML, CSS, Mobile Web, Digital Publications, and Unicode.</p>
</div>
<div id="sotd">
<p style="font-weight: bold;">This early draft has not yet been through any review process. Please do not rely on the contents.</p>
<p>This document describes the basic requirements for Bengali script layout and text support on the Web and in eBooks. These requirements provide information for Web technologies such as CSS, HTML and digital publications about how to support users of Bengali scripts. Currently the document focuses on Bengali as used for the Bangla language. The information here is developed in conjunction with a <a href="../gap-analysis/beng-gap.html">document that summarises gaps</a> in support on the Web for Bengali.</p>
<p>The editor's draft of this document is being developed by the <a href="https://github.com/w3c/iip/">India International Program Task Force</a>, part of the W3C <a href="http://www.w3.org/International/ig/">Internationalization Interest Group</a>. It will be published by the <a href="http://www.w3.org/International/core/">Internationalization Working Group</a>. The end target for this document is a Working Group Note.</p>
<div class="note">
<p data-lang="en" style="font-weight: bold; font-size: 120%">Sending comments on this document</p>
<p data-lang="en">If you wish to make comments regarding this document, please raise them as <a href="https://github.com/w3c/iip/issues" style="font-size: 120%;">github issues</a> <!--against the <a href="http://www.w3.org/TR/2015/WD-ilreq-20150721/" style="font-size: 120%">latest dated version in /TR</a>-->. Only send comments by email if you are unable to raise issues on github (see links below). All comments are welcome.</p>
<p data-lang="en">To make it easier to track comments, please raise separate issues or emails for each comment, and point to the section you are commenting on using a URL.</p>
</div>
</div>
<section id="h_introduction">
<h2>Introduction</h2>
<section id="h_about_this_document">
<h3>About this document</h3>
<p>Some text goes here.</p>
</section>
<section id="h_gap_analysis">
<h3>Gap analysis</h3>
<p>This document is pointed to by a separate document, <a href="../gap-analysis/beng-gap.html">Bengali Gap Analysis</a>, which describes gaps in support for Bengali on the Web, and prioritises and describes the impact of those gaps on the user.</p>
<p>Wherever an unsupported feature is indentified through the gap analysis process, the requirements for that feature need to be documented. This document is where those requirements are described.</p>
<p>This document should contain no reference to a particular technology. For example, it should not say "CSS does/doesn't do such and such", and it should not describe how a technology, such as CSS, should implement the requirements. It is technology agnostic, so that it will be evergreen, and it simply describes how the script works. The gap analysis document is the appropriate place for all kinds of technology-specific information.</p>
</section>
<section id="h_info_requests">
<h3>Other related resources</h3>
<p>The document <a href="https://w3c.github.io/typography/">International text layout and typography index</a> (known informally as the text layout index) points to this document and others, and provides a central location for developers and implementers to find information related to various scripts.</p>
<p>The W3C also maintains a tracking system that has links to github issues in W3C repositories. There are separate links for (a) requests from developers to the user community for information about how scripts/languages work, (b) issues raised against a spec, and (c) browser bugs. For example, you can find out <a href="http://w3c.github.io/i18n-activity/textlayout/?filter=type-info-request">what information developers are currently seeking</a>, and the resulting list can also be filtered by script.</p>
</section>
</section>
<section id="h_script_overview">
<h2>Bengali Script Overview</h2>
<p>Bengali is an abugida. Consonant letters have an inherent vowel sound. Combining vowel-signs are attached to the consonant to indicate that a different vowel follows the consonant.</p>
<p>The orthographic syllable is the unit for various aspects of the behaviour of the script. The alphabet is split into vowels and consonants. With one exception (ɔ-kar), each vowel is represented by both an independent version and a combining vowel sign. </p>
<p>Text runs horizontally, left to right, and lines typically break at the spaces between words. </p>
<p>The script has no upper-/lowercase distinction. </p>
<p>The basic unit for text segmentation is the syllable. Unicode grapheme clusters don't cover consonant clusters, so some additional processing is needed to identify text unit boundaries.</p>
<p><a href="https://r12a.github.io/scripts/bengali/">Bengali script summary</a> can be read for a high level overview of characters used for the script, and some basic features. Text from that the latter part of that page was used for the initial version of this document.</p>
</section>
<section>
<h2 id="h_text_direction">Text direction</h2>
<p>Bengali is written horizontally, left to right.</p>
</section>
<section id="h_characters_and_phrases">
<h2>Structural boundaries & markers</h2>
<section id="h_graphemes">
<h3 class="reviewme">Grapheme boundaries</h3>
<p>The basic unit for working with Bengali text is the orthographic syllable, ie. one consonant or a sequence of consonants with hasant between, plus optional additional combining characters (such as vowel-signs).</p>
<p>In Bengali an orthographic syllable that forms a conjunct should be treated as an indivisible unit of text for most editing operations. <a href="#fig_grapheme_conjunct"></a> shows a Bengali word with a conjunct at the end, and the expected segmentation.</p>
<figure id="fig_grapheme_conjunct">
<p class="large"><span class="ex" lang="hi">ঝিল্লি → ঝি+ল্লি</span></p>
<figcaption>Expected minimal units (right) during segmentation of the word <span class="charExample" translate="no"><span class="ex" lang="hi">ঝিল্লি</span> <span class="trans">jhilli</span></span>.</figcaption>
</figure>
<p>If, however, a conjunct is not formed and the hasant is visible, the first consonant plus hasant would be treated as separate from the second consonant, and the vowel-sign would appear to the left of the second consonant (see <a href="#fig_grapheme_sequence"></a>).</p>
<figure id="fig_grapheme_sequence">
<p class="large"><span class="ex" lang="hi">ঝিল্লি → ঝি+ল্+লি</span></p>
<figcaption>Expected segmentation of the word <span class="charExample" translate="no"><span class="ex" lang="hi">ঝিল্লি</span> <span class="trans">jhilli</span></span> when there is no conjunct.</figcaption>
</figure>
<p>Note that in Bengali an orthographic syllable may be longer than a Unicode grapheme cluster, if it forms a conjunct. <a href="#fig_grapheme_cluster"></a> shows a Bengali word with a conjunct at the end, and the segmentation that would result from applying Unicode grapheme clusters only.</p>
<figure id="fig_grapheme_cluster">
<p class="large"><span class="ex" lang="hi">ঝিল্লি → ঝি+ল্+লি</span></p>
<figcaption>Segmentation of the word <span class="charExample" translate="no"><span class="ex" lang="hi">ঝিল্লি</span> <span class="trans">jhilli</span></span> with a conjunct when using Unicode grapheme clusters.</figcaption>
</figure>
<p>For Bengali, applications need to provide tailored extensions to correctly segment the text. Such tailoring needs to be able to distinguish between sequences that are displayed as conjuncts, and those where the hasant is visible.</p>
</section>
<section id="h_words">
<h3>Word boundaries</h3>
<p>Words are separated by spaces.</p>
</section>
<section id="h_danda">
<h3 class="reviewme">Phrase boundaries: Danda & double danda</h3>
<p><span class="codepoint"><span lang="bn">।</span> <a href="/scripts/bengali/block#char0964">[<span class="uname">U+0964 DEVANAGARI DANDA</span>]</a></span>, is used for sentence final punctuation.</p>
<p>There are two alternative approaches to the use of spaces with danda:</p>
<ol>
<li>No space character appears between the end of the phrase and the danda glyph, but the advance width of the danda in a font should open a small gap before it. The danda is then typically followed by a single space.</li>
<li>A space is allowed before and after the danda in order to balance the space before and after it. In this case, the danda must still be kept from wrapping to a new line on its own; it should wrap with the previous word and space together.</li>
</ol>
<p>These same principles apply to <span class="codepoint"><span lang="bn">॥</span> <a href="/scripts/bengali/block#char0965">[<span class="uname">U+0965 DEVANAGARI DOUBLE DANDA</span>]</a></span>.</p>
<p>The double danda should be written using the dedicated Unicode character, and not by combining two single dandas.</p>
<p>The double danda is sometimes used to set apart section or verse numbering, in which the number is placed between pairs of double dandas. To obtain the correct spacing, the character sequence is usually <double danda, space, numeral(s), double danda>.</p>
</section>
<section id="h_quotations">
<h3 class="reviewme">Quotations</h3>
<p>The default quote marks for Bengali should be <span class="codepoint" translate="no"><span lang="th">“</span> [<span class="uname">U+201C LEFT DOUBLE QUOTATION MARK</span>]</span> at the start, and <span class="codepoint" translate="no"><span lang="th">”</span> [<span class="uname">U+201D RIGHT DOUBLE QUOTATION MARK</span>]</span> at the end. </p>
<p>When an additional quote is embedded within the first, the quote marks should be <span class="ex" lang="lo"><span class="codepoint" translate="no"><span lang="th">‘</span> [<span class="uname">U+2018 LEFT SINGLE QUOTATION MARK</span>]</span> and <span class="codepoint" translate="no"><span lang="th">’</span> [<span class="uname">U+2019 RIGHT SINGLE QUOTATION MARK</span>]</span></span>. <span class="ednote">This is according to CLDR – need to check.</span></p>
</section>
<section id="h_segmentation">
<h3>Font styles</h3>
<p>Italics and bold are not traditional feature of Bengali text.</p>
</section>
<section id="h_text_decoration">
<h3>Text decoration</h3>
<p>Underlining is not traditional feature of Bengali text</p>
</section>
</section>
<section id="h_lines_and_paragraphs">
<h2>Line & paragraph layout</h2>
<section id="h_line_breaking">
<h3 class="reviewme">Line breaking</h3>
<p>The primary break opportunities for line breaking are at inter-word spaces.</p>
<p>If a line is broken inside a word, any consonant clusters should be kept intact unless they are separated by visible hasant characters (see <a href="#h_graphemes"></a>).</p>
<p>Line breaking should not move a danda or double danda to the beginning of a new line, even if they are preceded by a space character. These punctuation characters should behave in the same way as a full stop does in English text.</p>
</section>
<section id="h_counters">
<h3 class="reviewme">Counters</h3>
<p>Counters are used to number lists, chapter headings, etc. </p>
<p>Bengali uses a numeric counter style, based on the decimal model, and using the standard Bengali digits,'০' '১' '২' '৩' '৪' '৫' '৬' '৭' '৮' '৯' in a decimal pattern.</p>
<div class="figwrap">
<figure id="counter-styles">
<p class="large">1 ⇨ <span class="ex" lang="km">১</span> 2 ⇨ <span class="ex" lang="km">২</span> 3 ⇨ <span class="ex" lang="km">৩</span> 4 ⇨ <span class="ex" lang="km">৪</span> <br>
11 ⇨ <span class="ex">১১</span> 22 ⇨ <span class="ex" lang="km">২২</span> 33 ⇨ <span class="ex" lang="km">৩৩</span> 44 ⇨ <span class="ex" lang="km">৪৪</span> <br>
111 ⇨ <span class="ex">১</span><span class="ex">১</span><span class="ex">১</span> 2222 ⇨ <span class="ex" lang="km">২২২</span></p>
<figcaption>Examples of counter values using the Bengali numeric counter style.</figcaption>
</figure>
</div>
</section>
</section>
<!--section class="appendix" id="glossary">
<h2>Glossary</h2>
<table class="glossary">
<thead>
<tr>
<th>Term</th>
<th>XXXX</th>
<th>Transliteration</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr id="def_alignment">
<td>alignment</td>
<td class="rtlTermCell"> </td>
<td class="rtlTermCell"> </td>
<td><br></td>
</tr>
</tbody>
</table>
</section-->
<section class="appendix" id="acknowledgements">
<h2>Acknowledgements</h2>
<p>Special thanks to the following people who contributed to this document (contributors' names listed in in alphabetic order).</p>
<p>Akshat Joshi, Hai Liang, John Hudson, Vivek Pani.</p>
<p data-lang="en">Please find the latest info of the contributors at the <a href="https://github.com/w3c/iip/graphs/contributors">GitHub contributors list</a>.</p>
</section>
</body>
</html>