Skip to content
This repository has been archived by the owner on Apr 26, 2022. It is now read-only.

characters/encodings best practices #42

Open
RachelComerford opened this issue May 9, 2018 · 1 comment
Open

characters/encodings best practices #42

RachelComerford opened this issue May 9, 2018 · 1 comment
Labels
Best Practice use for Best Practice documentation questions and suggestions

Comments

@RachelComerford
Copy link
Contributor

From BISG survey

@RachelComerford RachelComerford added the Best Practice use for Best Practice documentation questions and suggestions label May 9, 2018
@johnlourdusamy
Copy link

johnlourdusamy commented Jul 10, 2018

Character encoding in HTML

Below are the some best practices of character encoding in HTML files:

  1. Always declare the encoding of your document using a meta element with a @charset attribute.
  2. The declaration should fit completely within the first 1024 bytes at the start of the file, so it's best to put it immediately after the opening head tag.
  3. You should always use the UTF-8 character encoding.
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xml:lang="en" lang="en">
<head>
<meta charset="utf-8" />
...

  1. It doesn't matter whether you type UTF-8 or utf-8.
  2. Avoid using the @http-equiv="content-type" and @content="text/html" attributes (called a pragma directive) those are not recommended, because "@content-type" attribute value is obsolete as per HTML5 living standard.
  3. It is recommended to avoid using an XML declaration in XHTML5 documents. For example, <?xml version="1.0" encoding="utf-8"?>
  4. Avoid UTF-8 BOM encoding, as this is known to cause some ugly display issues with some user agents, and can even crash php includes.
  5. If possible, choose an editor or set up that will not output a BOM in UTF-8 files.
    1. For example, Notepad on Windows will always add a BOM when you save a file with the UTF-8 encoding.
    2. You can find out whether a document contains a BOM at the start or further down in the content by using the W3C Internationalization Checker https://validator.w3.org/i18n-checker/
      screenshot_bom
    3. If you need to remove the BOM, you can use editors such as Notepad++ on Windows and TextWrangler on the Mac, it is possible to select the encoding from a list while using the Save As function. The list has options to save as UTF-8 with or without the BOM. Just choose the option without the BOM and save.
  6. HTML5 deprecated the use of the charset attribute on an a, link and script elements, so you should avoid using it. For example:
    See our <a href="/mysite/mydoc.html" charset="iso-8859-15">list of publications</a>.

Source links:
https://www.w3.org/blog/2008/03/html-charset/
https://www.w3.org/International/questions/qa-html-encoding-declarations
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Best Practice use for Best Practice documentation questions and suggestions
Projects
None yet
Development

No branches or pull requests

2 participants