Skip to content

Setting up a Gap Analysis Project

Fuqiao Xue edited this page Jan 16, 2024 · 26 revisions

This document explains how to set up and work on a gap analysis project at the W3C. Gap analysis projects look at how well a language or set of languages is supported on the Web in terms of text layout, and creates proposals for improving support where needed. Such work allows us to understand which languages need attention, and what specifically needs to be addressed, so that users have a satisfactory experience when creating content for the Web in general and for digital publishing in particular.

For more detailed guidance, see also Stages in the development of layout information.

There are two levels on which a gap-analysis can be done.

  1. A preliminary review: typically done quickly by one or two people, in order to give a rough idea of what to expect from a proper gap-analysis. It doesn’t contain validated information and may be short on detail.

  2. A layout task force. This is the real gap analysis project, and it brings together experts for a given language or set of languages to discuss and work together. W3C already has an framework and set of procedures for establishing and running such task forces. Some task forces work in their native language, although the information needs to be communicated in English for consumption by the worldwide community.

Choosing the languages

The first step in creating a task force is for the participants to define its scope in terms of what languages it will cover. If the task force plans to work on more than one language there should be some relationship between the languages or the communities represented, that provides some cohesion for the group. The set of languages may begin small, in order to be manageable, and grow later.

Setting up a task force

Each task force is set up under the W3C Internationalization Interest Group. It has one or more chairs designated, and a pool of experts who are committed to help doing the actual work. Task force members are expected to actively contribute by creating content, doing reviews, providing regular advice, creating tests, etc.

Other people can follow the work and send comments by subscribing to the task force mailing list or github repos.

A proposal for a new task force should begin by developing a charter. This charter doesn't need review by W3C management, and can be changed at any time. Its main purpose is to communicate the goals and intentions of the group so that there is a common understanding and agreement about how they will work together. Participants joining the group must read and agree to abide by the charter, which mentions things such as the scope of the work, and how the group will communicate, as well as IP commitments and expected behaviour of group members.

The W3C i18n lead performs the mechanics of setting up the task force structure, and adding participants to the group, but the group chairs and current group members are responsible for proposing and recruiting new participants. The W3C i18n lead will also set up repositories, with template documents and directory structures.

Typically, each task force will have a github repository and a couple of email lists. The technical discussion should take place using github issues. An archived email list is available for administrative posts (eg. meeting agenda), and another for other occasional use (esp. sending out meeting minutes). The W3C i18n lead will also set up a notifications system that will send daily (and weekly, if desired) digests to the mailing list, summarising changes to issues and pull requests over the previous 24 hours.

The group should aim to hold a meeting, either face-to-face or via telecon, at least once a month, but preferably more often. Groups that do not hold regular meetings typically fall into inactivity quite quickly.

The work of the task force

The task force should aim to do the following.

The gap-analysis report

The group reviews the features listed in the language matrix, for the languages in scope, in order to determine what features are missing, and for those, the impact of that failure on the user. The features reported on should include those commonly required on a regular basis for Web content, and those needed for rendering beyond the basic expectations, ie. at the level of usage in high-quality books (excluding "art books") or magazines in the culture.

The focus is primarily on support for modern usage. There may also be features of the writing system for a given language that exist but are not really needed on the Web: those do not need to be reported on.

A template for the gap analysis report can be obtained from the W3C staff contact. See an example for a preliminary review of Amharic/Tigriña. Using the template is simple: just add a class name to each section to indicate the impact, and then write the text using HTML markup within the section. The Amharic/Tigriña page is only a preliminary review, and a proper review may contain more details and links. The formal description of the requirements, however, will be done in a separate document described in the next section.

The gap analysis report should be quite specific about what is not supported by browsers or e-readers, so that bugs and actions can be raised. It should ideally include links to tests and/or screen snaps to illustrate the failures described. It should also mention specifically which applications or browsers, and which versions, fail to support the expected behaviour.

Gap analysis reports address one or more languages, rather than scripts. A task force can have more than one gap-analysis report (for instance, an Indic task force may have separate gap-analysis documents per script). Where there are differences in the same report that are language-specific (eg. Hindi is different than Marathi), the report should clearly identify the language under discussion.

The keyword used to indicate priority should NOT describe how badly broken a feature is. Instead, it should describe the impact of the lack of that feature on the language user.

Information from the gap analysis will be summarised on the overall language matrix by the Internationalization Working Group.

Tests

Task force participants are encouraged to create tests to show how features are missing. These tests can then be re-used later to establish whether, or to demonstrate that, the feature is supported.

The language enablement framework provides for the creation of interactive exploratory tests, which are very simple to set up and record. This approach is likely to be most useful for the gap-analysis work.

Tests may also follow the pattern required for the Web Platform Tests initiative. There should be one file per test, and each test should have an assertion that indicates what is being tested, and a description of how to determine whether the test has passed.

For explanations of how to create both types of test, see Writing i18n tests.

Layout requirements document

Wherever it is found that a feature is unsupported, the requirements for that feature need to be documented. This documentation is added to the layout requirements (lreq) document.

The lreq document should contain no reference to a particular technology (for example, it should not say "CSS does/doesn't do such and such"). It should be technology agnostic, so that it will be evergreen.

The lreq document should contain plenty of illustrations and examples. It should also be quite specific about how the writing system works. For example, when dealing with initial letter styling (eg. drop caps in English), it should describe the normal size of the initial letter relative to the adjacent text and how ascenders/descenders and diacritics are handled, the anchor points for top and bottom of the letter relative to the adjacent text and whether it should connect (eg. for arabic or hindi), what to do if there is punctuation (such as an opening quote), how syllabic structures are addressed, whether initial letters are typically dropped, sunken or raised, whether boxes or background colouring are used around the character, and how the adjacent text adapts itself (if relevant) to the shape of the highlighted initial, and so on. In other words, rather than simply state that a feature is used and give a picture of it, the document should explain clearly what the feature should look like when implementers or spec writers create support for it (without being technology specific!). It shouldn't, however, attempt to tell the latter how to create the effect described.

A good model for an lreq document is the Requirements for Japanese Text Layout (jlreq) document.

The minimal requirement, when writing the lreq document, should be to describe the requirements for features needing support per the gap analysis. However, if there is sufficient bandwidth in the group, the document can also describe other features of the writing system. It is also useful to have an introductory section that gives a high level overview of how the writing system works for the languages in question.

When the lreq document reaches a sufficient level of stability, it should be converted to a First Public Working Draft (FPWD) on the W3C site. The W3C Internationalization Working Group will publish the document if they are satisfied after an initial review. Publication as a FPWD invites wide review of the information so far available. Additional wide reviews can be initiated as more information is added. The editor's version of the document is maintained in github, but after FPWD, there should be a snapshot on the w3.org/TR/ location that is regularly synchronised with the editor's copy (using Echidna). The final goal for this document is to be published as a Working Group Note (by the W3C Internationalization Working Group).

Follow up

Having identified needed features, and documented the requirements, the group should begin raising requests for the features to be supported in specifications and in applications. The W3C has a tracking system that can be used to track such requests and their ensuing discussions.

Read a summary of how the work interacts with other aspects of the work of the Internationalization Working Group, and how the feature categorisation is used throughout a number of different areas.

Oversight

The group and its chairs will be supported by the chair of the W3C i18n IG, and the W3C i18n WG. The W3C i18n lead will, as mentioned earlier, set up the group, and its repos, and mailing lists, and offer advice to the group on how to move forward. The W3C i18n WG will publish the lreq documents to W3C space.

Related links