Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 1025 lines (766 sloc) 38.747 kB
1841dd9 @havocp Add a real README, license, and rewrite spec.
havocp authored
1 # HOCON (Human-Optimized Config Object Notation)
2
3 This is an informal spec, but hopefully it's clear.
4
5 ## Goals / Background
6
7 The primary goal is: keep the semantics (tree structure; set of
8 types; encoding/escaping) from JSON, but make it more convenient
9 as a human-editable config file format.
10
11 The following features are desirable, to support human usage:
12
13 - less noisy / less pedantic syntax
14 - ability to refer to another part of the configuration (set a value to
15 another value)
16 - import/include another configuration file into the current file
17 - a mapping to a flat properties list such as Java's System properties
18 - ability to get values from environment variables
19 - ability to write comments
20
21 Implementation-wise, the format should have these properties:
22
23 - a JSON superset, that is, all valid JSON should be valid and
24 should result in the same in-memory data that a JSON parser
25 would have produced.
26 - be deterministic; the format is flexible, but it is not
27 heuristic. It should be clear what's invalid and invalid files
28 should generate errors.
29 - require minimal look-ahead; should be able to tokenize the file
3610dc8 @havocp support "/" in unquoted strings (and therefore keys)
havocp authored
30 by looking at only the next three characters. (right now, the
31 only reason to look at three is to find "//" comments;
32 otherwise you can parse looking at two.)
1841dd9 @havocp Add a real README, license, and rewrite spec.
havocp authored
33
34 HOCON is significantly harder to specify and to parse than
35 JSON. Think of it as moving the work from the person maintaining
36 the config file to the computer program.
37
38 ## Definitions
39
40 - a _key_ is a string JSON would have to the left of `:` and a _value_ is
41 anything JSON would have to the right of `:`. i.e. the two
42 halves of an object _field_.
43
44 - a _value_ is any "value" as defined in the JSON spec, plus
45 unquoted strings and substitutions as defined in this spec.
46
47 - a _simple value_ is any value excluding an object or array
48 value.
49
50 - a _field_ is a key, any separator such as ':', and a value.
51
52 - references to a _file_ ("the file being parsed") can be
53 understood to mean any byte stream being parsed, not just
54 literal files in a filesystem.
55
56 ## Syntax
57
58 Much of this is defined with reference to JSON; you can find the
59 JSON spec at http://json.org/ of course.
60
61 ### Unchanged from JSON
62
63 - files must be valid UTF-8
64 - quoted strings are in the same format as JSON strings
65 - values have possible types: string, number, object, array, boolean, null
66 - allowed number formats matches JSON; as in JSON, some possible
67 floating-point values are not represented, such as `NaN`
68
69 ### Comments
70
71 Anything between `//` or `#` and the next newline is considered a comment
72 and ignored, unless the `//` or `#` is inside a quoted string.
73
74 ### Omit root braces
75
76 JSON documents must have an array or object at the root. Empty
77 files are invalid documents, as are files containing only a
78 non-array non-object value such as a string.
79
80 In HOCON, if the file does not begin with a square bracket or
81 curly brace, it is parsed as if it were enclosed with `{}` curly
82 braces.
83
84 A HOCON file is invalid if it omits the opening `{` but still has
85 a closing `}`; the curly braces must be balanced.
86
87 ### Key-value separator
88
89 The `=` character can be used anywhere JSON allows `:`, i.e. to
90 separate keys from values.
91
92 If a key is followed by `{`, the `:` or `=` may be omitted. So
93 `"foo" {}` means `"foo" : {}"`
94
95 ### Commas
96
97 Values in arrays, and fields in objects, need not have a comma
98 between them as long as they have at least one ASCII newline
99 (`\n`, decimal value 10) between them.
100
101 The last element in an array or last field in an object may be
102 followed by a single comma. This extra comma is ignored.
103
104 - `[1,2,3,]` and `[1,2,3]` are the same array.
105 - `[1\n2\n3]` and `[1,2,3]` are the same array.
106 - `[1,2,3,,]` is invalid because it has two trailing commas.
107 - `[,1,2,3]` is invalid because it has an initial comma.
108 - `[1,,2,3]` is invalid because it has two commas in a row.
109 - these same comma rules apply to fields in objects.
110
111 ### Whitespace
112
113 The JSON spec simply says "whitespace"; in HOCON whitespace is
114 defined as follows:
115
116 - any Unicode space separator (Zs category), line separator (Zl
117 category), or paragraph separator (Zp category), including
118 nonbreaking spaces (such as 0x00A0, 0x2007, and 0x202F).
119 - tab (`\t` 0x0009), newline ('\n' 0x000A), vertical tab ('\v'
120 0x000B)`, form feed (`\f' 0x000C), carriage return ('\r'
121 0x000D), file separator (0x001C), group separator (0x001D),
122 record separator (0x001E), unit separator (0x001F).
123
124 In Java, the `isWhitespace()` method covers these characters with
125 the exception of nonbreaking spaces.
126
127 While all Unicode separators should be treated as whitespace, in
128 this spec "newline" refers only and specifically to ASCII newline
129 0x000A.
130
131 ### Duplicate keys
132
133 The JSON spec does not clarify how duplicate keys in the same
134 object should be handled. In HOCON, duplicate keys that appear
135 later override those that appear earlier, unless both values are
136 objects. If both values are objects, then the objects are merged.
137
138 Note: this would make HOCON a non-superset of JSON if you assume
139 that JSON requires duplicate keys to have a behavior. The
140 assumption here is that duplicate keys are invalid JSON.
141
142 To merge objects:
143
144 - add fields present in only one of the two objects to the merged
145 object.
146 - for non-object-valued fields present in both objects,
147 the field found in the second object must be used.
148 - for object-valued fields present in both objects, the
149 object values should be recursively merged according to
150 these same rules.
151
152 Object merge can be prevented by setting the key to another value
9dc6f77 @havocp Fix behavior when merging object,primitive,object.
havocp authored
153 first. This is because merging is always done two values at a
154 time; if you set a key to an object, a non-object, then an object,
155 first the non-object falls back to the object (non-object always
156 wins), and then the object falls back to the non-object (no
157 merging, object is the new value). So the two objects never see
158 each other.
1841dd9 @havocp Add a real README, license, and rewrite spec.
havocp authored
159
160 These two are equivalent:
161
162 {
163 "foo" : { "a" : 42 },
164 "foo" : { "b" : 43 }
165 }
166
167 {
168 "foo" : { "a" : 42, "b" : 43 }
169 }
170
171 And these two are equivalent:
172
173 {
174 "foo" : { "a" : 42 },
175 "foo" : null,
176 "foo" : { "b" : 43 }
177 }
178
179 {
180 "foo" : { "b" : 43 }
181 }
182
183 The intermediate setting of `"foo"` to `null` prevents the object merge.
184
185 ### Unquoted strings
186
187 A sequence of characters outside of a quoted string is a string
188 value if:
189
190 - it does not contain "forbidden characters" '$', '"', '{', '}',
00f8b3d @havocp Prohibit additional chars in unquoted strings.
havocp authored
191 '[', ']', ':', '=', ',', '+', '#', '`', '^', '?', '!', '@',
192 '*', '&', '\' (backslash), or whitespace.
3610dc8 @havocp support "/" in unquoted strings (and therefore keys)
havocp authored
193 - it does not contain the two-character string "//" (which
194 starts a comment)
1841dd9 @havocp Add a real README, license, and rewrite spec.
havocp authored
195 - its initial characters do not parse as `true`, `false`, `null`,
196 or a number.
197
198 Unquoted strings are used literally, they do not support any kind
199 of escaping. Quoted strings may always be used as an alternative
200 when you need to write a character that is not permitted in an
201 unquoted string.
202
203 `truefoo` parses as the boolean token `true` followed by the
204 unquoted string `foo`. However, `footrue` parses as the unquoted
205 string `footrue`. Similarly, `10.0bar` is the number `10.0` then
206 the unquoted string `bar` but `bar10.0` is the unquoted string
207 `bar10.0`.
208
209 In general, once an unquoted string begins, it continues until a
3610dc8 @havocp support "/" in unquoted strings (and therefore keys)
havocp authored
210 forbidden character or the two-character string "//" is
211 encountered. Embedded (non-initial) booleans, nulls, and numbers
212 are not recognized as such, they are part of the string.
1841dd9 @havocp Add a real README, license, and rewrite spec.
havocp authored
213
214 An unquoted string may not _begin_ with the digits 0-9 or with a
215 hyphen (`-`, 0x002D) because those are valid characters to begin a
216 JSON number. The initial number character, plus any valid-in-JSON
217 number characters that follow it, must be parsed as a number
218 value. Again, these characters are not special _inside_ an
219 unquoted string; they only trigger number parsing if they appear
220 initially.
221
817bbb7 @havocp prohibit control characters in quoted strings per JSON spec
havocp authored
222 Note that quoted JSON strings may not contain control characters
223 (control characters include some whitespace characters, such as
224 newline). This rule is from the JSON spec. However, unquoted
225 strings have no restriction on control characters, other than the
226 ones listed as "forbidden characters" above.
227
00f8b3d @havocp Prohibit additional chars in unquoted strings.
havocp authored
228 Some of the "forbidden characters" are forbidden because they
229 already have meaning in JSON or HOCON, others are essentially
230 reserved keywords to allow future extensions to this spec.
231
1841dd9 @havocp Add a real README, license, and rewrite spec.
havocp authored
232 ### Value concatenation
233
234 The value of an object field or an array element may consist of
235 multiple values which are concatenated into one string.
236
237 Only simple values participate in value concatenation. Recall that
238 a simple value is any value other than arrays and objects.
239
240 As long as simple values are separated only by non-newline
241 whitespace, the _whitespace between them is preserved_ and the
242 values, along with the whitespace, are concatenated into a string.
243
244 Value concatenations never span a newline, or a character that is
245 not part of a simple value.
246
247 A value concatenation may appear in any place that a string may
248 appear, including object keys, object values, and array elements.
249
250 Whenever a value would appear in JSON, a HOCON parser instead
251 collects multiple values (including the whitespace between them)
252 and concatenates those values into a string.
253
254 Whitespace before the first and after the last simple value must
255 be discarded. Only whitespace _between_ simple values must be
256 preserved.
257
258 So for example ` foo bar baz ` parses as three unquoted strings,
259 and the three are value-concatenated into one string. The inner
260 whitespace is kept and the leading and trailing whitespace is
261 trimmed. The equivalent string, written in quoted form, would be
262 `"foo bar baz"`.
263
264 Value concatenation `foo bar` (two unquoted strings with
265 whitespace) and quoted string `"foo bar"` would result in the same
266 in-memory representation, seven characters.
267
268 For purposes of value concatenation, non-string values are
269 converted to strings as follows (strings shown as quoted strings):
270
271 - `true` and `false` become the strings `"true"` and `"false"`.
272 - `null` becomes the string `"null"`.
273 - quoted and unquoted strings are themselves.
274 - numbers should be kept as they were originally written in the
275 file. For example, if you parse `1e5` then you might render
276 it alternatively as `1E5` with capital `E`, or just `100000`.
277 For purposes of value concatenation, it should be rendered
278 as it was written in the file.
279 - a substitution is replaced with its value which is then
280 converted to a string as above, except that a substitution
281 which evaluates to `null` becomes the empty string `""`.
282 - it is invalid for arrays or objects to appear in a value
283 concatenation.
284
285 A single value is never converted to a string. That is, it would
286 be wrong to value concatenate `true` by itself; that should be
287 parsed as a boolean-typed value. Only `true foo` (`true` with
288 another simple value on the same line) should be parsed as a value
289 concatenation and converted to a string.
290
291 ### Path expressions
292
293 Path expressions are used to write out a path through the object
294 graph. They appear in two places; in substitutions, like
295 `${foo.bar}`, and as the keys in objects like `{ foo.bar : 42 }`.
296
297 Path expressions are syntactically identical to a value
298 concatenation, except that they may not contain
299 substitutions. This means that you can't nest substitutions inside
300 other substitutions, and you can't have substitutions in keys.
301
302 When concatenating the path expression, any `.` characters outside
0bb9d19 @havocp test "10.0foo" path expression, and correct the spec
havocp authored
303 quoted strings are understood as path separators, while inside
304 quoted strings `.` has no special meaning. So
1841dd9 @havocp Add a real README, license, and rewrite spec.
havocp authored
305 `foo.bar."hello.world"` would be a path with three elements,
306 looking up key `foo`, key `bar`, then key `hello.world`.
307
0bb9d19 @havocp test "10.0foo" path expression, and correct the spec
havocp authored
308 The main tricky point is that `.` characters in numbers do count
309 as a path separator. When dealing with a number as part of a path
310 expression, it's essential to retain the _original_ string
311 representation of the number as it appeared in the file (rather
312 than converting it back to a string with a generic
313 number-to-string library function).
314
315 - `10.0foo` is a number then unquoted string `foo` and should
316 be the two-element path with `10` and `0foo` as the elements.
1841dd9 @havocp Add a real README, license, and rewrite spec.
havocp authored
317 - `foo10.0` is an unquoted string with a `.` in it, so this would
318 be a two-element path with `foo10` and `0` as the elements.
319 - `foo"10.0"` is an unquoted then a quoted string which are
320 concatenated, so this is a single-element path.
321
322 Unlike value concatenations, path expressions are _always_
323 converted to a string, even if they are just a single value.
324
325 If you have an array or element value consisting of the single
326 value `true`, it's a value concatenation and retains its character
327 as a boolean value.
328
329 If you have a path expression (in a key or substitution) then it
330 must always be converted to a string, so `true` becomes the string
331 that would be quoted as `"true"`.
332
333 If a path element is an empty string, it must always be quoted.
334 That is, `a."".b` is a valid path with three elements, and the
335 middle element is an empty string. But `a..b` is invalid and
336 should generate an error. Following the same rule, a path that
337 starts or ends with a `.` is invalid and should generate an error.
338
339 ### Paths as keys
340
341 If a key is a path expression with multiple elements, it is
342 expanded to create an object for each path element other than the
343 last. The last path element, combined with the value, becomes a
344 field in the most-nested object.
345
346 In other words:
347
348 foo.bar : 42
349
350 is equivalent to:
351
352 foo { bar : 42 }
353
354 and:
355
356 foo.bar.baz : 42
357
358 is equivalent to:
359
360 foo { bar { baz : 42 } }
361
362 and so on. These values are merged in the usual way; which implies
363 that:
364
365 a.x : 42, a.y : 43
366
367 is equivalent to:
368
369 a { x : 42, y : 43 }
370
371 Because path expressions work like value concatenations, you can
372 have whitespace in keys:
373
374 a b c : 42
375
376 is equivalent to:
377
378 "a b c" : 42
379
380 Because path expressions are always converted to strings, even
381 single values that would normally have another type become
382 strings.
383
384 - `true : 42` is `"true" : 42`
385 - `3.14 : 42` is `"3.14" : 42`
386
387 As a special rule, the unquoted string `include` may not begin a
388 path expression in a key, because it has a special interpretation
389 (see below).
390
391 ### Substitutions
392
393 Substitutions are a way of referring to other parts of the
394 configuration tree.
395
396 For substitutions which are not found in the configuration tree,
397 implementations may try to resolve them by looking at system
398 environment variables, Java system properties, or other external
399 sources of configuration.
400
401 The syntax is `${pathexpression}` where the `pathexpression` is a
402 path expression as described above. This path expression has the
403 same syntax that you could use for an object key.
404
405 Substitutions are not parsed inside quoted strings. To get a
406 string containing a substitution, you must use value concatenation
407 with the substitution in the unquoted portion:
408
409 key : ${animal.favorite} is my favorite animal
410
411 Or you could quote the non-substitution portion:
412
413 key : ${animal.favorite}" is my favorite animal"
414
415 Substitutions are resolved by looking up the path in the
416 configuration. The path begins with the root configuration object,
417 i.e. it is "absolute" rather than "relative."
418
419 Substitution processing is performed as the last parsing step, so
420 a substitution can look forward in the configuration. If a
421 configuration consists of multiple files, it may even end up
422 retrieving a value from another file. If a key has been specified
423 more than once, the substitution will always evaluate to its
424 latest-assigned value (the merged object or the last non-object
425 value that was set).
426
427 If a substitutions does not match any value present in the
428 configuration, implementations may look up that substitution in
429 one or more external sources, such as a Java system property or an
430 environment variable. (More detail on this in a later section.)
431
432 If a configuration sets a value to `null` then it should not be
433 looked up in the external source. Unfortunately there is no way to
434 "undo" this in a later configuration file; if you have `{ "HOME" :
435 null }` in a root object, then `${HOME}` will never look at the
436 environment variable. There is no equivalent to JavaScript's
437 `delete` operation in other words.
438
439 If a substitution does not match any value present in the
440 configuration and is not resolved by an external source, it is
441 evaluated to `null`.
442
443 Substitutions are only allowed in object field values and array
444 elements (value concatenations), they are not allowed in keys or
445 nested inside other substitutions (path expressions).
446
447 A substitution is replaced with any value type (number, object,
448 string, array, true, false, null). If the substitution is the only
449 part of a value, then the type is preserved. Otherwise, it is
450 value-concatenated to form a string. There is one special rule:
451
452 - `null` is converted to an empty string, not the string `null`.
453
454 Because missing substitutions are evaluated to `null`, either
455 missing or explicitly-set-to-null substitutions become an empty
456 string when concatenated.
457
458 Circular substitutions are invalid and should generate an error.
459
460 Implementations must take care, however, to allow objects to refer
461 to paths within themselves. For example, this must work:
462
463 bar : { foo : 42,
464 baz : ${bar.foo}
465 }
466
467 Here, if an implementation resolved all substitutions in `bar` as
468 part of resolving the substitution `${bar.foo}`, there would be a
469 cycle. The implementation must only resolve the `foo` field in
470 `bar`, rather than recursing the entire `bar` object.
471
472 ### Includes
473
474 #### Include syntax
475
476 An _include statement_ consists of the unquoted string `include`
477 and a single quoted string immediately following it. An include
478 statement can appear in place of an object field.
479
480 If the unquoted string `include` appears at the start of a path
481 expression where an object key would be expected, then it is not
482 interpreted as a path expression or a key.
483
484 Instead, the next value must be a _quoted_ string. The quoted
485 string is interpreted as a filename or resource name to be
486 included.
487
488 Together, the unquoted `include` and the quoted string substitute
489 for an object field syntactically, and are separated from the
490 following object fields or includes by the usual comma (and as
491 usual the comma may be omitted if there's a newline).
492
493 If an unquoted `include` at the start of a key is followed by
494 anything other than a single quoted string, it is invalid and an
495 error should be generated.
496
497 There can be any amount of whitespace, including newlines, between
498 the unquoted `include` and the quoted string.
499
500 Value concatenation is NOT performed on the "argument" to
501 `include`. The argument must be a single quoted string. No
502 substitutions are allowed, and the argument may not be an unquoted
503 string or any other kind of value.
504
505 Unquoted `include` has no special meaning if it is not the start
506 of a key's path expression.
507
508 It may appear later in the key:
509
510 # this is valid
511 { foo include : 42 }
512 # equivalent to
513 { "foo include" : 42 }
514
515 It may appear as an object or array value:
516
517 { foo : include } # value is the string "include"
518 [ include ] # array of one string "include"
519
520 You can quote `"include"` if you want a key that starts with the
521 word `"include"`, only unquoted `include` is special:
522
523 { "include" : 42 }
524
525 #### Include semantics: merging
526
527 An _including file_ contains the include statement and an
528 _included file_ is the one specified in the include statement.
529 (They need not be regular files on a filesystem, but assume they
530 are for the moment.)
531
532 An included file must contain an object, not an array. This is
533 significant because both JSON and HOCON allow arrays as root
534 values in a document.
535
536 If an included file contains an array as the root value, it is
537 invalid and an error should be generated.
538
539 The included file should be parsed, producing a root object. The
540 keys from the root object are conceptually substituted for the
541 include statement in the including file.
542
543 - If a key in the included object occurred prior to the include
544 statement in the including object, the included key's value
545 overrides or merges with the earlier value, exactly as with
546 duplicate keys found in a single file.
547 - If the including file repeats a key from an earlier-included
548 object, the including file's value would override or merge
549 with the one from the included file.
550
551 #### Include semantics: substitution
552
553 Recall that substitution happens as a final step, _after_
554 parsing. It should be done for the entire app's configuration, not
555 for single files in isolation.
556
557 Therefore, if an included file contains substitutions, they must
558 be "fixed up" to be relative to the app's configuration root.
559
560 Say for example that the root configuration is this:
561
562 { a : { include "foo.conf" } }
563
564 And "foo.conf" might look like this:
565
566 { x : 10, y : ${x} }
567
568 If you parsed "foo.conf" in isolation, then `${x}` would evaluate
569 to 10, the value at the path `x`. If you include "foo.conf" in an
570 object at key `a`, however, then it must be fixed up to be
571 `${a.x}` rather than `${x}`.
572
573 Say that the root configuration redefines `a.x`, like this:
574
575 {
576 a : { include "foo.conf" }
577 a : { x : 42 }
578 }
579
580 Then the `${x}` in "foo.conf", which has been fixed up to
581 `${a.x}`, would evaluate to `42` rather than to `10`.
582 Substitution happens _after_ parsing the whole configuration.
583
584 #### Include semantics: missing files
585
586 If an included file does not exist, the include statement should
587 be silently ignored (as if the included file contained only an
588 empty object).
589
590 #### Include semantics: file formats and extensions
591
592 Implementations may support including files in other formats.
593 Those formats must be compatible with the JSON type system, or
594 have some documented mapping to JSON's type system.
595
596 If an implementation supports multiple formats, then the extension
597 may be omitted from the name of included files:
598
599 include "foo"
600
601 If a filename has no extension, the implementation should treat it
602 as a basename and try loading the file with all known extensions.
603
604 If the file exists with multiple extensions, they should _all_ be
605 loaded and merged together.
606
607 Files in HOCON format should be parsed last. Files in JSON format
608 should be parsed next-to-last.
609
610 In short, `include "foo"` might be equivalent to:
611
612 include "foo.properties"
613 include "foo.json"
614 include "foo.conf"
615
616 #### Include semantics: locating resources
617
618 Conceptually speaking, the quoted string in an include statement
619 identifies a file or other resource "adjacent to" the one being
620 parsed and of the same type as the one being parsed. The meaning
621 of "adjacent to", and the string itself, has to be specified
622 separately for each kind of resource.
623
624 Implementations may vary in the kinds of resources they support
625 including.
626
627 For plain files on the filesystem:
628
629 - if the included file is an absolute path then it should be kept
630 absolute and loaded as such.
631 - if the included file is a relative path, then it should be
632 located relative to the directory containing the including
633 file. The current working directory of the process parsing a
634 file must NOT be used when interpreting included paths.
635
636 For resources located on the Java classpath:
637
4619d05 @havocp minor tweaks to explanation of classpath includes
havocp authored
638 - included resources are looked up by calling `getResource()` on
639 the same class or class loader used to look up the including
640 resource.
641 - if the included resource name is absolute (starts with '/')
642 then it should be passed to `getResource()` as-is.
643 - if the included resource name does not start with '/' then it
644 should have the "directory" of the including resource.
645 prepended to it, before passing it to `getResource()`. If the
646 including resource is not absolute (no '/') and has no "parent
647 directory" (is just a single path element), then the included
648 relative resource name should be left as-is.
1841dd9 @havocp Add a real README, license, and rewrite spec.
havocp authored
649 - it would be wrong to use `getResource()` to get a URL and then
650 locate the included name relative to that URL, because a class
651 loader is not required to have a one-to-one mapping between
652 paths in its URLs and the paths it handles in `getResource()`.
653 In other words, the "adjacent to" computation should be done
654 on the resource name not on the resource's URL.
655
656 URLs:
657
658 - for both filesystem files and Java resources, if the
659 included name is a URL (begins with a protocol), it would
660 be reasonable behavior to try to load the URL rather than
661 treating the name as a filename or resource name.
662 - for files loaded from a URL, "adjacent to" should be based
663 on parsing the URL's path component, replacing the last
664 path element with the included name.
665
666 Implementations need not support files, Java resources, or URLs;
667 and they need not support particular URL protocols. However, if
668 they do support them they should do so as described above.
669
670 ## API Recommendations
671
672 Implementations of HOCON ideally follow certain conventions and
673 work in a predictable way.
674
675 ### Automatic type conversions
676
677 If an application asks for a value with a particular type, the
678 implementation should attempt to convert types as follows:
679
680 - number to string: convert the number into a string
681 representation that would be a valid number in JSON.
682 - boolean to string: should become the string "true" or "false"
683 - string to number: parse the number with the JSON rules
20b7554 @havocp support "on" and "off" when converting string to boolean
havocp authored
684 - string to boolean: the strings "true", "yes", "on", "false",
685 "no", "off" should be converted to boolean values. It's
686 tempting to support a long list of other ways to write a
687 boolean, but for interoperability and keeping it simple, it's
688 recommended to stick to these six.
1841dd9 @havocp Add a real README, license, and rewrite spec.
havocp authored
689 - string to null: the string `"null"` should be converted to a
690 null value if the application specifically asks for a null
691 value, though there's probably no reason an app would do this.
692
693 The following type conversions should NOT be performed:
694
695 - null to anything: If the application asks for a specific type
696 and finds null instead, that should usually result in an error.
697 - object to anything
698 - array to anything
699 - anything to object
700 - anything to array
701
702 Converting objects and arrays to and from strings is tempting, but
703 in practical situations raises thorny issues of quoting and
704 double-escaping.
705
706 ### Units format
707
708 Implementations may wish to support interpreting a value with some
709 family of units, such as time units or memory size units: `10ms`
710 or `512K`. HOCON does not have an extensible type system and there
711 is no way to add a "duration" type. However, for example, if an
712 application asks for milliseconds, the implementation can try to
713 interpret a value as a milliseconds value.
714
715 If an API supports this, for each family of units it should define
716 a default unit in the family. For example, the family of duration
717 units might default to milliseconds (see below for details on
718 durations). The implementation should then interpret values as
719 follows:
720
721 - if the value is a number, it is taken to be a number in
722 the default unit.
723 - if the value is a string, it is taken to be:
724
725 - optional whitespace
726 - a number
727 - optional whitespace
728 - an optional unit name consisting only of letters (letters
729 are the Unicode `L*` categories, Java `isLetter()`)
730 - optional whitespace
731
732 If a string value has no unit name, then it should be
733 interpreted with the default unit, as if it were a number. If a
734 string value has a unit name, that name of course specifies the
735 value's interpretation.
736
737 ### Duration format
738
739 Implementations may wish to support a `getMilliseconds()` (and
740 similar for other time units).
741
742 This can use the general "units format" described above; bare
743 numbers are taken to be in milliseconds already, while strings are
744 parsed as a number plus an optional unit string.
745
746 The supported unit strings for duration are case sensitive and
747 must be lowercase. Exactly these strings are supported:
748
749 - `ns`, `nanosecond`, `nanoseconds`
750 - `us`, `microsecond`, `microseconds`
751 - `ms`, `millisecond`, `milliseconds`
752 - `s`, `second`, `seconds`
753 - `m`, `minute`, `minutes`
754 - `h`, `hour`, `hours`
755 - `d`, `day`, `days`
756
757 ### Size in bytes format
758
515a71b @havocp Change getMemorySizeInBytes to getBytes and support mebi/gibi/etc.
havocp authored
759 Implementations may wish to support a `getBytes()` returning a
760 size in bytes.
1841dd9 @havocp Add a real README, license, and rewrite spec.
havocp authored
761
762 This can use the general "units format" described above; bare
763 numbers are taken to be in bytes already, while strings are
764 parsed as a number plus an optional unit string.
765
766 The one-letter unit strings may be uppercase (note: duration units
767 are always lowercase, so this convention is specific to size
768 units).
769
515a71b @havocp Change getMemorySizeInBytes to getBytes and support mebi/gibi/etc.
havocp authored
770 There is an unfortunate nightmare with size-in-bytes units, that
771 they may be in powers or two or powers of ten. The approach
772 defined by standards bodies appears to differ from common usage,
773 such that following the standard leads to people being confused.
774 Worse, common usage varies based on whether people are talking
775 about RAM or disk sizes, and various existing operating systems
776 and apps do all kinds of different things. See
777 http://en.wikipedia.org/wiki/Binary_prefix#Deviation_between_powers_of_1024_and_powers_of_1000
778 for examples. It appears impossible to sort this out without
779 causing confusion for someone sometime.
780
781 For single bytes, exactly these strings are supported:
1841dd9 @havocp Add a real README, license, and rewrite spec.
havocp authored
782
783 - `B`, `b`, `byte`, `bytes`
515a71b @havocp Change getMemorySizeInBytes to getBytes and support mebi/gibi/etc.
havocp authored
784
785 For powers of ten, exactly these strings are supported:
786
787 - `kB`, `kilobyte`, `kilobytes`
788 - `MB`, `megabyte`, `megabytes`
789 - `GB`, `gigabyte`, `gigabytes`
790 - `TB`, `terabyte`, `terabytes`
791 - `PB`, `petabyte`, `petabytes`
792 - `EB`, `exabyte`, `exabytes`
793 - `ZB`, `zettabyte`, `zettabytes`
794 - `YB`, `yottabyte`, `yottabytes`
795
796 For powers of two, exactly these strings are supported:
797
798 - `K`, `k`, `Ki`, `KiB`, `kibibyte`, `kibibytes`
799 - `M`, `m`, `Mi`, `MiB`, `mebibyte`, `mebibytes`
800 - `G`, `g`, `Gi`, `GiB`, `gibibyte`, `gibibytes`
801 - `T`, `t`, `Ti`, `TiB`, `tebibyte`, `tebibytes`
802 - `P`, `p`, `Pi`, `PiB`, `pebibyte`, `pebibytes`
803 - `E`, `e`, `Ei`, `EiB`, `exbibyte`, `exbibytes`
804 - `Z`, `z`, `Zi`, `ZiB`, `zebibyte`, `zebibytes`
805 - `Y`, `y`, `Yi`, `YiB`, `yobibyte`, `yobibytes`
806
807 It's very unclear which units the single-character abbreviations
808 ("128K") should go with; some precedents such as `java -Xmx 2G`
809 and the GNU tools such as `ls` map these to powers of two, so this
810 spec copies that. You can certainly find examples of mapping these
811 to powers of ten, though. If you don't like ambiguity, don't use
812 the single-letter abbreviations.
1841dd9 @havocp Add a real README, license, and rewrite spec.
havocp authored
813
9dc6f77 @havocp Fix behavior when merging object,primitive,object.
havocp authored
814 ### Config object merging and file merging
815
816 It may be useful to offer a method to merge two objects. If such a
817 method is provided, it should work as if the two objects were
818 duplicate values for the same key in the same file. (See the
819 section earlier on duplicate key handling.)
820
821 As with duplicate keys, an intermediate non-object value "hides"
822 earlier object values. So say you merge three objects in this
823 order:
824
825 - `{ a : { x : 1 } }` (first priority)
826 - `{ a : 42 }` (fallback)
827 - `{ a : { y : 2 } }` (another fallback)
828
829 The result would be `{ a : { x : 1 } }`. The two objects are not
830 merged because they are not "adjacent"; the merging is done in
831 pairs, and when `42` is paired with `{ y : 2 }`, `42` simply wins
832 and loses all information about what it overrode.
833
834 But if you re-ordered like this:
835
836 - `{ a : { x : 1 } }` (first priority)
837 - `{ a : { y : 2 } }` (fallback)
838 - `{ a : 42 }` (another fallback)
839
840 Now the result would be `{ a : { x : 1, y : 2 } }` because the two
841 objects are adjacent.
842
843 This rule for merging objects loaded from different files is
844 _exactly_ the same behavior as for merging duplicate fields in the
845 same file. All merging works the same way.
846
847 Needless to say, normally it's well-defined whether a config
848 setting is supposed to be a number or an object. This kind of
849 weird pathology where the two are mixed should not be happening.
850
851 The one place where it matters, though, is that it allows you to
852 "clear" an object and start over by setting it to null and then
853 setting it back to a new object. So this behavior gives people a
854 way to get rid of default fallback values they don't want.
855
1841dd9 @havocp Add a real README, license, and rewrite spec.
havocp authored
856 ### Java properties mapping
857
858 It may be useful to merge Java properties data with data loaded
859 from JSON or HOCON. See the Java properties spec here:
860 http://download.oracle.com/javase/7/docs/api/java/util/Properties.html#load%28java.io.Reader%29
861
862 Java properties parse as a one-level map from string keys to
863 string values.
864
865 To convert to HOCON, first split each key on the `.` character,
866 keeping any empty strings (including leading and trailing empty
867 strings). Note that this is _very different_ from parsing a path
868 expression.
869
870 The key split on `.` is a series of path elements. So the
871 properties key with just `.` is a path with two elements, both of
872 them an empty string. `a.` is a path with two elements, `a` and
873 empty string. (Java's `String.split()` does NOT do what you want
874 for this.)
875
876 It is impossible to represent a key with a `.` in it in a
877 properties file. If a JSON/HOCON key has a `.` in it, which is
878 possible if the key is quoted, then there is no way to refer to it
879 as a Java property. It is not recommended to name HOCON keys with
880 a `.` in them, since it would be confusing at best in any case.
881
882 Once you have a path for each value, construct a tree of
883 JSON-style objects with the string value of each property located
884 at that value's path.
885
886 Values from properties files are _always_ strings, even if they
887 could be parsed as some other type. Implementations should do type
888 conversion if an app asks for an integer, as described in an
889 earlier section.
890
891 When Java loads a properties file, unfortunately it does not
892 preserve the order of the file. As a result, there is an
893 intractable case where a single key needs to refer to both a
894 parent object and a string value. For example, say the Java
895 properties file has:
896
897 a=hello
898 a.b=world
899
900 In this case, `a` needs to be both an object and a string value.
901 The _object_ must always win in this case... the "object wins"
902 rule throws out at most one value (the string) while "string wins"
903 would throw out all values in the object. Unfortunately, when
904 properties files are mapped to the JSON structure, there is no way
905 to access these strings that conflict with objects.
906
907 The usual rule in HOCON would be that the later assignment in the
908 file wins, rather than "object wins"; but implementing that for
909 Java properties would require implementing a custom Java
910 properties parser, which is surely not worth it.
911
912 ### Root paths
913
914 By convention, a given application or library has a "root path."
915 Most commonly the root path has a single path element - "akka" for
916 example. But it could have multiple.
917
918 Conventional config file names and property names are derived from
919 the root path.
920
921 If an API looks like `load(rootPath)` then it would return an
922 object conceptually "at" the root path, not an object containing
923 the root path.
924
925 ### Conventional configuration file names for JVM apps
926
927 To get config file names, join the elements of the root path with
928 a hyphen, then add appropriate filename extensions.
929
930 If the root path is `foo.bar` (two elements, `foo` and `bar`),
931 then the configuration files should be searched for under the
932 following resource names on the classpath:
933
934 - /foo-bar.conf
935 - /foo-bar.json
936 - /foo-bar.properties
937 - /foo-bar-reference.conf
938 - /foo-bar-reference.json
939 - /foo-bar-reference.properties
940
941 The .json and .properties files are examples, different
942 implementations may support different file types. The "reference"
943 files are intended to contain defaults and be shipped with the
944 library or application being configured.
945
946 Note that the configuration files are absolute resource paths, not
947 relative to the package. So you would call
948 `klass.getResource("/foo-bar.conf")` not
949 `klass.getResource("foo-bar.conf")`.
950
951 ### Conventional override by system properties
952
953 For an application's config, Java System properties _override_
954 HOCON found in the configuration file. This supports specifying
955 config options on the command line.
956
957 Those system properties which begin with an application's root
958 path should override the configuration for that application.
959
960 For example, say your config is for root path "akka" then your
961 config key "foo" would go with `-Dakka.foo=10`. When loading your
962 config, any system properties starting with `akka.` would be
963 merged into the config.
964
965 ### Substitution fallback to system properties
966
967 Recall that if a substitution is not present (not even set to
968 `null`) within a configuration tree, implementations may search
969 for it from external sources. One such source could be Java system
970 properties.
971
972 To find a value for substitution, Java applications should look at
973 system properties directly, without the root path namespace.
974 Remember that namespaced system properties were already used as
975 overrides.
976
977 `${user.home}` would first look for a `user.home` in the
978 configuration tree (which has a scoped system property like
979 `akka.user.home` merged in!).
980
981 If no value for `${user.home}` exists in the configuration, the
982 implementation would look at system property `user.home` without
983 the `akka.` prefix.
984
985 The unprefixed system properties are _not_ merged in to the
986 configuration tree; if you iterate over your configuration, they
987 should not be in there. They are only used as a fallback when
988 evaluating substitutions.
989
990 The effect is to allow using generic system properties like
991 `user.home` and also to allow overriding those per-app.
992 So if someone wants to set their home directory for _all_ apps,
993 they set the `user.home` system property. If they then want to
994 force a particular home directory only for Akka, they could set
995 `akka.user.home` instead.
996
997 ### Substitution fallback to environment variables
998
999 Substitutions not found in the configuration may also fall back to
1000 environment variables. In Java, fallback should be to system
1001 properties first and environment variables second.
1002
1003 It's recommended that HOCON keys always use lowercase, because
1004 environment variables generally are capitalized. This avoids
1005 naming collisions between environment variables and configuration
1006 properties. (While on Windows getenv() is generally not
1007 case-sensitive, the lookup will be case sensitive all the way
1008 until the env variable fallback lookup is reached.)
1009
1010 An application can explicitly block looking up a substitution in
1011 the environment by setting a value in the configuration, with the
1012 same name as the environment variable. You could set `HOME : null`
1013 in your root object to avoid expanding `${HOME}` from the
1014 environment, for example.
1015
1016 Environment variables are interpreted as follows:
1017
1018 - present and set to empty string: treated as not present
1019 - System.getenv throws SecurityException: treated as not present
1020 - encoding is handled by Java (System.getenv already returns
1021 a Unicode string)
1022 - environment variables always become a string value, though
1023 if an app asks for another type automatic type conversion
1024 would kick in
Something went wrong with that request. Please try again.