diff --git a/eeps/eep-0001.md b/eeps/eep-0001.md index f16e2c9..507c2e4 100644 --- a/eeps/eep-0001.md +++ b/eeps/eep-0001.md @@ -1,25 +1,26 @@ -EEP: 1 -Title: EEP Purpose and Guidelines -Version: $Revision$ -Last-Modified: $Date$ -Author: Per Gustafsson [pergu(at)it(dot)uu(dot)se] -Status: Draft -Type: Process -Content-Type: text/x-rst -Created: 29-Jan-2007 -Post-History: 29-Jan-2007 - - -What is a EEP? -============== - -EEP stands for Erlang Enhancement Proposal. It is a concept borrowed -from the Python language [1]_ to facilitate community involvement in -developing Erlang. This document is heavily based on PEP 1 [2]_. An EEP is -a design document providing information to the Erlang community, or -describing a new feature for Erlang or its processes or environment. -The EEP should provide a concise technical specification of the -feature and a rationale for the feature. + Author: Per Gustafsson , + Raimo Niskanen + Status: Draft + Type: Process + Created: 29-Jan-2007 + Post-History: 29-Jan-2007 +**** +EEP 1: EEP Purpose and Guidelines +---- + + + +What is an EEP? +=============== + +EEP stands for Erlang Extension Proposal, or Erlang Enhancement +Process. It is a concept borrowed from the [Python][] language to +facilitate community involvement in developing Erlang. This document +is heavily based on [PEP 1][]. An EEP is a design document providing +information to the Erlang community, or describing a new feature for +Erlang or its processes or environment. The EEP should provide a +concise technical specification of the feature and a rationale for the +feature. We intend EEPs to be the primary mechanisms for proposing new features, for collecting community input on an issue, and for @@ -28,8 +29,9 @@ author is responsible for building consensus within the community and documenting dissenting opinions. Because the EEPs are maintained as text files in a versioned -repository, their revision history is the historical record of the -feature proposal [3]_. +repository, their [revision history][VCS] is the historical record of +the feature proposal. + EEP Types @@ -37,18 +39,19 @@ EEP Types There are two kinds of EEPs: -1. A **Standards Track** EEP describes a new feature or implementation - for Erlang. +1. A **Standards Track** EEP describes a new feature or implementation + for Erlang. + +2. A **Process** EEP describes a process surrounding Erlang, or + proposes a change to (or an event in) a process. Process EEPs are + like Standards Track EEPs but apply to areas other than the Erlang + language itself. They may propose an implementation, but not to + Erlang's codebase; they often require community consensus; they are + more than recommendations, and users are typically not free to ignore + them. Examples include release schedules, procedures, guidelines, + changes to the decision-making process, and changes to the tools or + environment used in Erlang development. -2. A **Process** EEP describes a process surrounding Erlang, or - proposes a change to (or an event in) a process. Process EEPs are - like Standards Track EEPs but apply to areas other than the Erlang - language itself. They may propose an implementation, but not to - Erlang's codebase; they often require community consensus; they are - more than recommendations, and users are typically not free to ignore - them. Examples include release schedules, procedures, guidelines, - changes to the decision-making process, and changes to the tools or - environment used in Erlang development. EEP Work Flow @@ -59,8 +62,8 @@ send all EEP-related email to . The EEP process begins with a new idea for Erlang. It is highly recommended that a single EEP contain a single key proposal or new -idea. The more focused the EEP, the more successful it tends to -be. The EEP editor reserves the right to reject EEP proposals if they +idea. The more focused the EEP, the more successful it tends to +be. The EEP editor reserves the right to reject EEP proposals if they appear too unfocused or too broad. If in doubt, split your EEP into several well-focused ones. @@ -69,40 +72,40 @@ style and format described below, shepherds the discussions in the appropriate forums, and attempts to build community consensus around the idea. The EEP champion (a.k.a. Author) should first attempt to ascertain whether the idea is EEP-able. Posting to the -erlang-questions@erlang.org mailing list is recommended. Small + mailing list is recommended. Small enhancements or patches often don't need a EEP and can be injected into the Erlang development work flow by sending a patch to -erlang-patches@erlang.org. - -The EEP champion writes a rough but fleshed out draft of the EEP, -with a proposed title. This draft must be written in EEP style -as described below. Then, after subscribing to the email list -, the EEP champion sends the EEP to that list. -Note that the list has a size limit for posts, -at the time of writing 128 KByte, so EEPs with attachments -that are too large will bounce. Large attachments can be put -on a suitable web page and then be referred to from the EEP. -If that is not possible, ask on the list how to -submit the large EEP in question. - -If the EEP editor approves, he will assign the EEP a number, label it -as Standards Track or Process, give it status "Draft", -and create and check-in the initial draft of the EEP. The EEP editor -will not unreasonably deny a EEP. Reasons for denying EEP status -include duplication of effort, being technically unsound, not -providing proper motivation or addressing backwards compatibility, or -not in keeping with the Erlang philosophy. +. + +The EEP champion writes a rough but fleshed out draft of the EEP, with +a proposed title. This draft must be written in EEP style as described +below. Then, after subscribing to the email list , +the EEP champion sends the EEP to that list. Note that the list has a +size limit for posts, at the time of writing 128 KByte, so EEPs with +attachments that are too large will bounce. Large attachments can be +put on a suitable web page and then be referred to from the EEP. If +that is not possible, ask on the list how to submit the large EEP in +question. + +If the EEP editor approves, she/he will assign the EEP a number, label +it as Standards Track or Process, give it status "Draft", and create +and check-in the initial draft of the EEP. The EEP editor will not +unreasonably deny a EEP. Reasons for denying EEP status include +duplication of effort, being technically unsound, not providing proper +motivation or addressing backwards compatibility, or not in keeping +with the Erlang philosophy. If a pre-EEP is rejected, the author may elect to take the pre-EEP to -the erlang-questions@erlang.org mailing list to help flesh it out, +the mailing list to help flesh it out, gain feedback and consensus from the community at large, and improve the EEP for re-submission. The author of the EEP is then responsible for posting the EEP to the community forums, and marshaling community support for it. As updates are necessary, the EEP author can check in new versions if they have -SVN commit permissions, or can email new EEP versions to the EEP -editor for committing. +commit permissions, can email new EEP versions or diffs to the EEP +editor for committing, or submit changes in any other suitable +way for the version control system. Standards Track EEPs consist of two parts, a design document and a reference implementation. The EEP should be reviewed and accepted @@ -113,16 +116,15 @@ or a URL to same -- before it can be considered Final. EEP authors are responsible for collecting community feedback on a EEP before submitting it for review. A EEP that has not been discussed on -the erlang mailing list will not be -accepted. However, wherever possible, long open-ended discussions on -public mailing lists should be avoided. Strategies to keep the -discussions efficient include: setting up a separate SIG mailing list -for the topic, having the EEP author accept private comments in the -early design phases, setting up a wiki page, etc. EEP authors should -use their discretion here. +the erlang mailing list will not be accepted. However, wherever +possible, long open-ended discussions on public mailing lists should +be avoided. Strategies to keep the discussions efficient include: +setting up a separate SIG mailing list for the topic, having the EEP +author accept private comments in the early design phases, setting up +a wiki page, etc. EEP authors should use their discretion here. Once the authors have completed a EEP, they must inform the EEP editor -that it is ready for review. EEPs are reviewed by a committee of +that it is ready for review. EEPs are reviewed by a committee of people from the Erlang/OTP and the Erlang community who may accept or reject a EEP or send it back to the author(s) for revision. For a EEP that is pre-determined to be acceptable (e.g., it is an obvious win @@ -130,37 +132,39 @@ as-is and/or its implementation has already been checked in) the Erlang/OTP team may also initiate a EEP review, first notifying the EEP author(s) and giving them a chance to make revisions. -The members of the committee are listed in the EEP index. +The committee members are the internal Erlang/OTP Technical Board plus +for the specific case summoned experts. -For a EEP to be accepted it must meet certain minimum criteria. It +For a EEP to be accepted it must meet certain minimum criteria. It must be a clear and complete description of the proposed enhancement. -The enhancement must represent a net improvement. The proposed +The enhancement must represent a net improvement. The proposed implementation, if applicable, must be solid and must not complicate -the interpreter unduly. Finally, a proposed enhancement must be +the interpreter unduly. Finally, a proposed enhancement must be compatible with the Erlang philosophy in order to be accepted. Once a EEP has been accepted, the reference implementation must be -completed. When the reference implementation is complete and -accepted, the status will be changed to "Final". +completed. When the reference implementation is complete and accepted, +the status will be changed to "Final". -A EEP can also be assigned status "Deferred". The EEP author or -editor can assign the EEP this status when no progress is being made -on the EEP. Once a EEP is deferred, the EEP editor can re-assign it -to draft status. +A EEP can also be assigned status "Deferred". The EEP author or editor +can assign the EEP this status when no progress is being made on the +EEP. Once a EEP is deferred, the EEP editor can re-assign it to draft +status. -A EEP can also be "Rejected". Perhaps after all is said and done it -was not a good idea. It is still important to have a record of this +A EEP can also be "Rejected". Perhaps after all is said and done it +was not a good idea. It is still important to have a record of this fact. EEPs can also be replaced by a different EEP, rendering the original -obsolete. +obsolete. EEP work flow is as follows: -.. image:: eep-0001-1.png +![EEP Work Flow][] Some Process EEPs may also have a status of "Active" -if they are never meant to be completed. E.g. EEP 1 (this EEP). +if they are never meant to be completed. E.g. [EEP 1][] (this EEP). + What belongs in a successful EEP? @@ -168,102 +172,99 @@ What belongs in a successful EEP? Each EEP should have the following parts: -1. Preamble -- RFC 822 style headers containing meta-data about the - EEP, including the EEP number, a short descriptive title (limited - to a maximum of 44 characters), the names, and optionally the - contact info for each author, etc. +1. Preamble -- RFC 822 style headers containing meta-data about the + EEP, including the EEP number, a short descriptive title (limited + to a maximum of 44 characters), the names, and optionally the + contact info for each author, etc. -2. Abstract -- a short (~200 word) description of the technical issue - being addressed. +2. Abstract -- a short (~200 word) description of the technical issue + being addressed. -3. Copyright/public domain -- Each EEP must either be explicitly - labelled as placed in the public domain (see this EEP as an - example) or licensed under the `Open Publication License`_, - or the `Creative Commons Attribution 3.0 License`_. +3. Copyright/public domain -- Each EEP must either be explicitly + labelled as placed in the public domain (see this EEP as an + example) or licensed under the [Open Publication License][OPL], or + the [Creative Commons Attribution 3.0 License][CCA3.0]. -4. Specification -- The technical specification should describe the - syntax and semantics of any new language feature. The - specification should be detailed enough to allow competing, - interoperable implementations. +4. Specification -- The technical specification should describe the + syntax and semantics of any new language feature. The + specification should be detailed enough to allow competing, + interoperable implementations. -5. Motivation -- The motivation is critical for EEPs that want to - change the Erlang language. It should clearly explain why the - existing language specification is inadequate to address the - problem that the EEP solves. EEP submissions without sufficient - motivation may be rejected outright. +5. Motivation -- The motivation is critical for EEPs that want to + change the Erlang language. It should clearly explain why the + existing language specification is inadequate to address the + problem that the EEP solves. EEP submissions without sufficient + motivation may be rejected outright. -6. Rationale -- The rationale fleshes out the specification by - describing what motivated the design and why particular design - decisions were made. It should describe alternate designs that - were considered and related work, e.g. how the feature is supported - in other languages. +6. Rationale -- The rationale fleshes out the specification by + describing what motivated the design and why particular design + decisions were made. It should describe alternate designs that + were considered and related work, e.g. how the feature is + supported in other languages. - The rationale should provide evidence of consensus within the - community and discuss important objections or concerns raised - during discussion. + The rationale should provide evidence of consensus within the + community and discuss important objections or concerns raised + during discussion. -7. Backwards Compatibility -- All EEPs that introduce backwards - incompatibilities must include a section describing these - incompatibilities and their severity. The EEP must explain how the - author proposes to deal with these incompatibilities. EEP - submissions without a sufficient backwards compatibility treatise - may be rejected outright. +7. Backwards Compatibility -- All EEPs that introduce backwards + incompatibilities must include a section describing these + incompatibilities and their severity. The EEP must explain how + the author proposes to deal with these incompatibilities. EEP + submissions without a sufficient backwards compatibility treatise + may be rejected outright. -8. Reference Implementation -- The reference implementation must be - completed before any EEP is given status "Final", but it need not - be completed before the EEP is accepted. It is better to finish - the specification and rationale first and reach consensus on it - before writing code. +8. Reference Implementation -- The reference implementation must be + completed before any EEP is given status "Final", but it need not + be completed before the EEP is accepted. It is better to finish + the specification and rationale first and reach consensus on it + before writing code. - The final implementation must include test code and documentation - appropriate for either the Erlang language reference or the - standard library reference. + The final implementation must include test code and documentation + appropriate for either the Erlang language reference or the + standard library reference. -EEP Formats and Templates -========================= -There are two EEP formats available to authors: plaintext and -reStructuredText_. Both are UTF-8-encoded text files. +EEP Format and Template +======================= -Plaintext EEPs are written with minimal structural markup that adheres -to a rigid style. EEP 2 contains a instructions and a template [4]_ -you can use to get started writing your plaintext EEP. +An EEP is written as an UTF-8-encoded text file in [Markdown][] format. +[EEP 33][] is a template and contains an instruction of how to write +an EEP. -ReStructuredText_ EEPs allow for rich markup that is still quite easy -to read, but results in much better-looking and more functional HTML. -EEP 3 contains instructions and a template [5]_ for reStructuredText -EEPs. +In the [repository][VCS] there is also a version of the [Markdown][] +Perl program, a Makefile and a Perl script for building the [EEP index][EEP]. +Just give the command `make` in the toplevel directory. -There is a Python script that converts both styles of EEPs to HTML for -viewing on the web [6]_. Parsing and conversion of plaintext EEPs is -self-contained within the script. reStructuredText EEPs are parsed -and converted by Docutils_ code called from the script. EEP Header Preamble =================== -Each EEP must begin with an RFC 822 style header preamble. The headers -must appear in the following order. Headers marked with "*" are -optional and are described below. All other headers are required. :: - - EEP: - Title: - Version: - Last-Modified: - Author: - * Discussions-To: - Status: - Type: - * Content-Type: - * Requires: - Created: - * Erlang-Version: - Post-History: - * Replaces: - * Replaced-By: +Each EEP must begin with an RFC 822 style header preamble all indented +four spaces to make them [Markdown][] code style. The headers must +appear in the following order. Headers marked with "*" are optional +and are described below. All other headers are required: + + Author: + * Discussions-To: + Status: + Type: + * Content-Type: + * Requires: + Created: + * Erlang-Version: + Post-History: + * Replaces: + * Replaced-By: + +Then follows a Markdown horizontal rule, the EEP number and title +as a Markdown header 2, and a blank line, all required: + **** + EEP : + ---- + The Author header lists the names, and optionally the email addresses of all the authors/owners of the EEP. The format of the Author header @@ -279,50 +280,46 @@ if the address is not given. If there are multiple authors, each should be on a separate line following RFC 2822 continuation line conventions. Note that personal -email addresses in EEPs will be obscured as a defense against spam +email addresses should be obscured as a defense against spam harvesters. While a EEP is in private discussions (usually during the initial Draft phase), a Discussions-To header will indicate the mailing list or URL where the EEP is being discussed. No Discussions-To header is necessary if the EEP is being discussed privately with the author, or -on the erlang mailing list. Note that email -addresses in the Discussions-To header will not be obscured. +on the erlang mailing list. Remember to obscure email addresses here +to. The Type header specifies the type of EEP: Standards Track or Process. -The format of a EEP is specified with a Content-Type header. The -acceptable values are "text/plain" for plaintext EEPs (see EEP 2 [3]_) -and "text/x-rst" for reStructuredText EEPs (see EEP 3 [4]_). -Plaintext ("text/plain") is the default if no Content-Type header is -present. - The Created header records the date that the EEP was assigned a number, while Post-History is used to record the dates of when new -versions of the EEP are posted to erlang-questions. Both -headers should be in dd-mmm-yyyy format, e.g. 14-Aug-2006. +versions of the EEP are posted to erlang-questions. Both headers +should be in dd-mmm-yyyy format, e.g. 14-Aug-2009. Standards Track EEPs must have a Erlang-Version header which indicates -the version of Erlang that the feature will be released with. -Process EEPs do not need a Erlang-Version header. +the version of Erlang that the feature will be released with. Process +EEPs do not need a Erlang-Version header. EEPs may have a Requires header, indicating the EEP numbers that this -EEP depends on. +EEP depends on.. EEPs may also have a Replaced-By header indicating that a EEP has been -rendered obsolete by a later document; the value is the number of the -EEP that replaces the current document. The newer EEP must have a -Replaces header containing the number of the EEP that it rendered -obsolete. +rendered obsolete by later EEP(s); the value is the number(s) of the +EEP(s) that replaces the current document. The newer EEP(s) must have +a Replaces header containing the number(s) of the EEP(s) that it +rendered obsolete. + Auxiliary Files =============== EEPs may include auxiliary files such as diagrams. Such files must be -named ``eep-XXXX-Y.ext``, where "XXXX" is the EEP number, "Y" is a -serial number (starting at 1), and "ext" is replaced by the actual -file extension (e.g. "png"). +named `eep-XXXX-Y.ext`, where "XXXX" is the EEP number, "Y" is a +serial number (starting at 1), and ".ext" is replaced by the actual +file extension (e.g. ".png"). + Reporting EEP Bugs, or Submitting EEP Updates @@ -330,10 +327,10 @@ Reporting EEP Bugs, or Submitting EEP Updates How you report a bug, or submit a EEP update depends on several factors, such as the maturity of the EEP, the preferences of the EEP -author, and the nature of your comments. For the early draft stages +author, and the nature of your comments. For the early draft stages of the EEP, it's probably best to send your comments and changes -directly to the EEP author. For more mature, or finished EEPs you may -want to submit corrections to the erlang-patches mailing list. +directly to the EEP author. For more mature, or finished EEPs you may +want to submit corrections to the mailing list. When in doubt about where to send your changes, please check first with the EEP author and/or EEP editor. @@ -341,6 +338,7 @@ with the EEP author and/or EEP editor. EEP authors can update EEPs by submitting new versions to the editors. + Transferring EEP Ownership ========================== @@ -362,34 +360,36 @@ email in a timely manner, the EEP editor will make a unilateral decision (it's not like such decisions can't be reversed :). -References and Footnotes -======================== -.. [1] We are very grateful to the Python community for devising such - a good process for language revisions and for placing their documents - in the public domain. +[Python]: http://www.python.org + "We are very grateful to the Python community for devising such a good process for language revisions and for placing their documents in the public domain" -.. [2] PEP 1, PEP Purpose and Guidelines, Goodger, Hylton, Warsaw - (http://www.python.org/dev/peps/pep-0001/) +[PEP 1]: http://www.python.org/dev/peps/pep-0001/ + "PEP 1, PEP Purpose and Guidelines, Goodger, Hylton, Warsaw" -.. [3] This svn server has not been setup yet but should be setup +[VCS]: http://www.github.com/erlang/eep/ + "EEP Sources at Github" -.. [4] EEP 2, Sample Plaintext EEP Template, Gustafsson - (http://www.erlang.org/eeps/eep-0002.html) +[EEP]: ./ + "EEP Index" -.. [5] EEP 3, Sample reStructuredText EEP Template, Gustafsson - (http://www.erlang.org/eeps/eep-0003.html) +[EEP 1]: eep-0001.md + "EEP 1, EEP Purpose and Guidelines, Gustafsson" -.. [6] The script referred to here is eep2html.py. +[EEP 33]: eep-0033.md + "EEP 33, Sample Markdown EEP Template, Niskanen" -.. _Open Publication License: http://www.opencontent.org/openpub/ +[Markdown]: http://daringfireball.net/projects/markdown/ + "Markdown Home Page" -.. _Creative Commons Attribution 3.0 License: - http://creativecommons.org/licenses/by/3.0/ +[OPL]: http://www.opencontent.org/openpub/ + "Open Publication License" -.. _reStructuredText: http://docutils.sourceforge.net/rst.html +[CCA3.0]: http://creativecommons.org/licenses/by/3.0/ + "Creative Commons Attribution 3.0 License" -.. _Docutils: http://docutils.sourceforge.net/ +[EEP Work Flow]: eep-0001-1.png + "EEP Work Flow" Copyright @@ -398,11 +398,11 @@ Copyright This document has been placed in the public domain. -.. - Local Variables: - mode: indented-text - indent-tabs-mode: nil - sentence-end-double-space: t - fill-column: 70 - coding: utf-8 - End: + +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0002.md b/eeps/eep-0002.md index b70b57a..052a45d 100644 --- a/eeps/eep-0002.md +++ b/eeps/eep-0002.md @@ -1,226 +1,226 @@ -EEP: 2 -Title: Sample Plaintext PEP Template -Version: $Revision$ -Last-Modified: $Date$ -Author: Per Gustafsson [pergu(at)it(dot)uu(dot)se] -Status: Active -Type: Process -Content-Type: text/plain -Created: 14-Aug-2001 -Post-History: - - -Abstract - - This EEP provides a boilerplate or sample template for creating - your own plaintext EEPs. In conjunction - with the content guidelines in EEP 1 [1], this should make it easy - for you to conform your own EEPs to the format outlined below. - - Note: if you are reading this EEP via the web, you should first - grab the plaintext source of this EEP in order to complete the - steps below. DO NOT USE THE HTML FILE AS YOUR TEMPLATE! - - If you would prefer to use lightweight markup in your EEP, please - see EEP 3, "Sample reStructuredText EEP Template" [2]. - - This document is based on PEP 9 [3]. - - -Rationale - - EEP submissions come in a wide variety of forms, not all adhering - to the format guidelines set forth below. Use this template, in - conjunction with the content guidelines in EEP 1, to ensure that - your EEP submission won't get automatically rejected because of - form. - - -How to Use This Template - - To use this template you must first decide whether your EEP is - going to be an Informational or Standards Track EEP. Most EEPs - are Standards Track because they propose a new feature for the - Erlang language or standard library. When in doubt, read EEP 1 - for details or contact the EEP editors . - - Once you've decided which type of EEP yours is going to be, follow - the directions below. - - - Make a copy of this file (.txt file, not HTML!) and perform the - following edits. - - - Replace the "EEP: 2" header with "EEP: XXX" since you don't yet - have an EEP number assignment. - - - Change the Title header to the title of your EEP. - - - Leave the Version and Last-Modified headers alone; we'll take - care of those when we check your EEP into the Subversion - repository. These headers consist of keywords ("Revision" and - "Date" enclosed in "$"-signs) which are automatically expanded - by the repository. Please do not edit the expanded date or - revision text. - - - Change the Author header to include your name, and optionally - your email address. Be sure to follow the format carefully: - your name must appear first, and it must not be contained in - parentheses. Your email address may appear second (or it can be - omitted) and if it appears, it must appear in angle brackets. - It is okay to obfuscate your email address. - - - If there is a mailing list for discussion of your new feature, - add a Discussions-To header right after the Author header. You - should not add a Discussions-To header if the mailing list to be - used is erlang-questions@erlang.org, or if discussions should be - sent to you directly. Most Informational EEPs don't have a - Discussions-To header. - - - Change the Status header to "Draft". - - - For Standards Track EEPs, change the Type header to "Standards - Track". - - - For Informational EEPs, change the Type header to - "Informational". - - - For Standards Track EEPs, if your feature depends on the - acceptance of some other currently in-development EEP, add a - Requires header right after the Type header. The value should - be the EEP number of the EEP yours depends on. Don't add this - header if your dependent feature is described in a Final EEP. - - - Change the Created header to today's date. Be sure to follow - the format carefully: it must be in dd-mmm-yyyy format, where - the mmm is the 3 English letter month abbreviation, e.g. one of - Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec. - - - For Standards Track EEPs, after the Created header, add a - Erlang-Version header and set the value to the next planned - version of Erlang, i.e. the one your new feature will hopefully - make its first appearance in. Thus, if the last version of - Erlang/OTP was R11B-3 and you're hoping to get your new feature - into R11B-4 set the version header to: - - Erlang-Version: R11B-4 - - - Leave Post-History alone for now; you'll add dates to this - header each time you post your EEP to - erlang-questions@erlang.org. E.g. if you posted your EEP to the - list on August 14, 2006 and September 3, 2006, the Post-History - header would look like: - - Post-History: 14-Aug-2006, 03-Sept-2006 - - You must manually add new dates and check them in. If you don't - have check-in privileges, send your changes to the EEP editor. - - - Add a Replaces header if your EEP obsoletes an earlier EEP. The - value of this header is the number of the EEP that your new EEP - is replacing. Only add this header if the older EEP is in - "final" form, i.e. is either Accepted, Final, or Rejected. You - aren't replacing an older open EEP if you're submitting a - competing idea. - - - Now write your Abstract, Rationale, and other content for your - EEP, replacing all this gobbledygook with your own text. Be sure - to adhere to the format guidelines below, specifically on the - prohibition of tab characters and the indentation requirements. - - - Update your References and Copyright section. Usually you'll - place your EEP into the public domain, in which case just leave - the "Copyright" section alone. Alternatively, you can use the - Open Publication License[4], but public domain is still strongly - preferred. - - - Leave the little Emacs turd at the end of this file alone, - including the formfeed character ("^L", or \f). - - - Send your EEP submission to the EEP editors (eeps@erlang.org), - (Funny Joke removed :) - - -Plaintext EEP Formatting Requirements - - EEP headings must begin in column zero and the initial letter of - each word must be capitalized as in book titles. Acronyms should - be in all capitals. The body of each section must be indented 4 - spaces. Code samples inside body sections should be indented a - further 4 spaces, and other indentation can be used as required to - make the text readable. You must use two blank lines between the - last line of a section's body and the next section heading. - - You must adhere to the Emacs convention of adding two spaces at - the end of every sentence. You should fill your paragraphs to - column 70, but under no circumstances should your lines extend - past column 79. If your code samples spill over column 79, you - should rewrite them. - - Tab characters must never appear in the document at all. An EEP - should include the standard Emacs stanza included by example at - the bottom of this EEP. - - When referencing an external web page in the body of an EEP, you - should include the title of the page in the text, with a - footnote reference to the URL. Do not include the URL in the body - text of the EEP. E.g. - - Refer to the Erlang Language web site [1] for more details. - ... - [1] http://www.erlang.org - - When referring to another EEP, include the EEP number in the body - text, such as "EEP 1". The title may optionally appear. Add a - footnote reference, a number in square brackets. The footnote - body should include the EEP's title and author. It may optionally - include the explicit URL on a separate line, but only in the - References section. Note that the eep2html.py script will - calculate URLs automatically. For example: - - ... - Refer to EEP 1 [7] for more information about EEP style + EEP: 2 + Title: Sample Plaintext PEP Template + Version: $Revision$ + Last-Modified: $Date$ + Author: Per Gustafsson + Status: Active + Type: Process + Content-Type: text/plain + Created: 14-Aug-2001 + Post-History: + + + Abstract + + This EEP provides a boilerplate or sample template for creating + your own plaintext EEPs. In conjunction + with the content guidelines in EEP 1 [1], this should make it easy + for you to conform your own EEPs to the format outlined below. + + Note: if you are reading this EEP via the web, you should first + grab the plaintext source of this EEP in order to complete the + steps below. DO NOT USE THE HTML FILE AS YOUR TEMPLATE! + + If you would prefer to use lightweight markup in your EEP, please + see EEP 3, "Sample reStructuredText EEP Template" [2]. + + This document is based on PEP 9 [3]. + + + Rationale + + EEP submissions come in a wide variety of forms, not all adhering + to the format guidelines set forth below. Use this template, in + conjunction with the content guidelines in EEP 1, to ensure that + your EEP submission won't get automatically rejected because of + form. + + + How to Use This Template + + To use this template you must first decide whether your EEP is + going to be an Informational or Standards Track EEP. Most EEPs + are Standards Track because they propose a new feature for the + Erlang language or standard library. When in doubt, read EEP 1 + for details or contact the EEP editors . + + Once you've decided which type of EEP yours is going to be, follow + the directions below. + + - Make a copy of this file (.txt file, not HTML!) and perform the + following edits. + + - Replace the "EEP: 2" header with "EEP: XXX" since you don't yet + have an EEP number assignment. + + - Change the Title header to the title of your EEP. + + - Leave the Version and Last-Modified headers alone; we'll take + care of those when we check your EEP into the Subversion + repository. These headers consist of keywords ("Revision" and + "Date" enclosed in "$"-signs) which are automatically expanded + by the repository. Please do not edit the expanded date or + revision text. + + - Change the Author header to include your name, and optionally + your email address. Be sure to follow the format carefully: + your name must appear first, and it must not be contained in + parentheses. Your email address may appear second (or it can be + omitted) and if it appears, it must appear in angle brackets. + It is okay to obfuscate your email address. + + - If there is a mailing list for discussion of your new feature, + add a Discussions-To header right after the Author header. You + should not add a Discussions-To header if the mailing list to be + used is erlang-questions@erlang.org, or if discussions should be + sent to you directly. Most Informational EEPs don't have a + Discussions-To header. + + - Change the Status header to "Draft". + + - For Standards Track EEPs, change the Type header to "Standards + Track". + + - For Informational EEPs, change the Type header to + "Informational". + + - For Standards Track EEPs, if your feature depends on the + acceptance of some other currently in-development EEP, add a + Requires header right after the Type header. The value should + be the EEP number of the EEP yours depends on. Don't add this + header if your dependent feature is described in a Final EEP. + + - Change the Created header to today's date. Be sure to follow + the format carefully: it must be in dd-mmm-yyyy format, where + the mmm is the 3 English letter month abbreviation, e.g. one of + Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec. + + - For Standards Track EEPs, after the Created header, add a + Erlang-Version header and set the value to the next planned + version of Erlang, i.e. the one your new feature will hopefully + make its first appearance in. Thus, if the last version of + Erlang/OTP was R11B-3 and you're hoping to get your new feature + into R11B-4 set the version header to: + + Erlang-Version: R11B-4 + + - Leave Post-History alone for now; you'll add dates to this + header each time you post your EEP to + erlang-questions@erlang.org. E.g. if you posted your EEP to the + list on August 14, 2006 and September 3, 2006, the Post-History + header would look like: + + Post-History: 14-Aug-2006, 03-Sept-2006 + + You must manually add new dates and check them in. If you don't + have check-in privileges, send your changes to the EEP editor. + + - Add a Replaces header if your EEP obsoletes an earlier EEP. The + value of this header is the number of the EEP that your new EEP + is replacing. Only add this header if the older EEP is in + "final" form, i.e. is either Accepted, Final, or Rejected. You + aren't replacing an older open EEP if you're submitting a + competing idea. + + - Now write your Abstract, Rationale, and other content for your + EEP, replacing all this gobbledygook with your own text. Be sure + to adhere to the format guidelines below, specifically on the + prohibition of tab characters and the indentation requirements. + + - Update your References and Copyright section. Usually you'll + place your EEP into the public domain, in which case just leave + the "Copyright" section alone. Alternatively, you can use the + Open Publication License[4], but public domain is still strongly + preferred. + + - Leave the little Emacs turd at the end of this file alone, + including the formfeed character ("^L", or \f). + + - Send your EEP submission to the EEP editors (eeps@erlang.org), + (Funny Joke removed :) + + + Plaintext EEP Formatting Requirements + + EEP headings must begin in column zero and the initial letter of + each word must be capitalized as in book titles. Acronyms should + be in all capitals. The body of each section must be indented 4 + spaces. Code samples inside body sections should be indented a + further 4 spaces, and other indentation can be used as required to + make the text readable. You must use two blank lines between the + last line of a section's body and the next section heading. + + You must adhere to the Emacs convention of adding two spaces at + the end of every sentence. You should fill your paragraphs to + column 70, but under no circumstances should your lines extend + past column 79. If your code samples spill over column 79, you + should rewrite them. + + Tab characters must never appear in the document at all. An EEP + should include the standard Emacs stanza included by example at + the bottom of this EEP. + + When referencing an external web page in the body of an EEP, you + should include the title of the page in the text, with a + footnote reference to the URL. Do not include the URL in the body + text of the EEP. E.g. + + Refer to the Erlang Language web site [1] for more details. ... - - References - - [7] EEP 1, EEP Purpose and Guidelines, Gustafsson - http://www.erlang.org/eeps/eep-0001.html - - If you decide to provide an explicit URL for an EEP, please use - this as the URL template: - - http://www.erlang.org/eeps/eep-xxxx.html - - EEP numbers in URLs must be padded with zeros from the left, so as - to be exactly 4 characters wide, however EEP numbers in the text - are never padded. - - -References - - [1] EEP 1, EEP Purpose and Guidelines, Gustafsson - http://www.erlang.org/eeps/eep-0001.html - - [2] EEP 3, Sample reStructuredText EEP Template, Gustafsson - http://www.erlang.org/eeps/eep-0003.html - - [3] PEP 9, Sample Plaintext PEP Template, Warsaw - http://www.python.org/dev/peps/pep-0009/ - - [4] http://www.opencontent.org/openpub/ - - - -Copyright - - This document has been placed in the public domain. - - - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: + [1] http://www.erlang.org + + When referring to another EEP, include the EEP number in the body + text, such as "EEP 1". The title may optionally appear. Add a + footnote reference, a number in square brackets. The footnote + body should include the EEP's title and author. It may optionally + include the explicit URL on a separate line, but only in the + References section. Note that the eep2html.py script will + calculate URLs automatically. For example: + + ... + Refer to EEP 1 [7] for more information about EEP style + ... + + References + + [7] EEP 1, EEP Purpose and Guidelines, Gustafsson + http://www.erlang.org/eeps/eep-0001.html + + If you decide to provide an explicit URL for an EEP, please use + this as the URL template: + + http://www.erlang.org/eeps/eep-xxxx.html + + EEP numbers in URLs must be padded with zeros from the left, so as + to be exactly 4 characters wide, however EEP numbers in the text + are never padded. + + + References + + [1] EEP 1, EEP Purpose and Guidelines, Gustafsson + http://www.erlang.org/eeps/eep-0001.html + + [2] EEP 3, Sample reStructuredText EEP Template, Gustafsson + http://www.erlang.org/eeps/eep-0003.html + + [3] PEP 9, Sample Plaintext PEP Template, Warsaw + http://www.python.org/dev/peps/pep-0009/ + + [4] http://www.opencontent.org/openpub/ + + + + Copyright + + This document has been placed in the public domain. + + + + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: diff --git a/eeps/eep-0003.md b/eeps/eep-0003.md index 1924607..bdaf3eb 100644 --- a/eeps/eep-0003.md +++ b/eeps/eep-0003.md @@ -1,629 +1,629 @@ -EEP: 3 -Title: Sample reStructuredText EEP Template -Version: $Revision$ -Last-Modified: $Date$ -Author: Per Gustafsson [pergu(at)it(dot)uu(dot)se] -Status: Active -Type: Process -Content-Type: text/x-rst -Created: 05-Aug-2002 -Post-History: 30-Aug-2002 - - -Abstract -======== - -This EEP provides a boilerplate or sample template for creating your -own reStructuredText EEPs. In conjunction with the content guidelines -in EEP 1 [1]_, this should make it easy for you to conform your own -EEPs to the format outlined below. - -Note: if you are reading this EEP via the web, you should first grab -the text (reStructuredText) source of this EEP in order to complete -the steps below. **DO NOT USE THE HTML FILE AS YOUR TEMPLATE!** - -If you would prefer not to use markup in your EEP, please see EEP 2, -"Sample Plaintext EEP Template" [2]_. - -This EEP is a slightly revised version of PEP 12 [3]_. - - -Rationale -========= - -ReStructuredText is offered as an alternative to plaintext EEPs, to -allow EEP authors more functionality and expressivity, while -maintaining easy readability in the source text. The processed HTML -form makes the functionality accessible to readers: live hyperlinks, -styled text, tables, images, and automatic tables of contents, among -other advantages. - - -How to Use This Template -======================== - -To use this template you must first decide whether your EEP is going -to be an Informational or Standards Track EEP. Most EEPs are -Standards Track because they propose a new feature for the Erlang -language or standard library. When in doubt, read EEP 1 for details -or contact the EEP editors . - -Once you've decided which type of EEP yours is going to be, follow the -directions below. - -- Make a copy of this file (``.txt`` file, **not** HTML!) and perform - the following edits. - -- Replace the "EEP: 3" header with "EEP: XXX" since you don't yet have - a EEP number assignment. - -- Change the Title header to the title of your EEP. - -- Leave the Version and Last-Modified headers alone; we'll take care - of those when we check your EEP into the Subversion repository. - These headers consist of keywords ("Revision" and "Date" enclosed in - "$"-signs) which are automatically expanded by the repository. - Please do not edit the expanded date or revision text. - -- Change the Author header to include your name, and optionally your - email address. Be sure to follow the format carefully: your name - must appear first, and it must not be contained in parentheses. - Your email address may appear second (or it can be omitted) and if - it appears, it must appear in angle brackets. It is okay to - obfuscate your email address. - -- If there is a mailing list for discussion of your new feature, add a - Discussions-To header right after the Author header. You should not - add a Discussions-To header if the mailing list to be used is either - the erlang mailing list, or if discussions - should be sent to you directly. Most Informational EEPs don't have - a Discussions-To header. - -- Change the Status header to "Draft". - -- For Standards Track EEPs, change the Type header to "Standards - Track". - -- For Informational EEPs, change the Type header to "Informational". - -- For Standards Track EEPs, if your feature depends on the acceptance - of some other currently in-development EEP, add a Requires header - right after the Type header. The value should be the EEP number of - the EEP yours depends on. Don't add this header if your dependent - feature is described in a Final EEP. - -- Change the Created header to today's date. Be sure to follow the - format carefully: it must be in ``dd-mmm-yyyy`` format, where the - ``mmm`` is the 3 English letter month abbreviation, i.e. one of Jan, - Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec. - -- For Standards Track EEPs, after the Created header, add a - Erlang-Version header and set the value to the next planned - version of Erlang, i.e. the one your new feature will hopefully - make its first appearance in. Thus, if the last version of - Erlang/OTP was R11B-3 and you're hoping to get your new feature - into R11B-4 set the version header to: - - Erlang-Version: R11B-4 - -- Leave Post-History alone for now; you'll add dates to this header - each time you post your EEP to the Erlang Mailing list. If you posted - your EEP to the lists on August 14, 2001 and September 3, 2001, the - Post-History header would look like:: - - Post-History: 14-Aug-2001, 03-Sept-2001 - - You must manually add new dates and check them in. If you don't - have check-in privileges, send your changes to the EEP editors. - -- Add a Replaces header if your EEP obsoletes an earlier EEP. The - value of this header is the number of the EEP that your new EEP is - replacing. Only add this header if the older EEP is in "final" - form, i.e. is either Accepted, Final, or Rejected. You aren't - replacing an older open EEP if you're submitting a competing idea. - -- Now write your Abstract, Rationale, and other content for your EEP, - replacing all this gobbledygook with your own text. Be sure to - adhere to the format guidelines below, specifically on the - prohibition of tab characters and the indentation requirements. - -- Update your References and Copyright section. Usually you'll place - your EEP into the public domain, in which case just leave the - Copyright section alone. Alternatively, you can use the `Open - Publication License`__, but public domain is still strongly - preferred. - - __ http://www.opencontent.org/openpub/ - -- Leave the Emacs stanza at the end of this file alone, including the - formfeed character ("^L", or ``\f``). - -- Send your EEP submission to the EEP editors at eeps@erlang.org. - - -ReStructuredText EEP Formatting Requirements -============================================ - -The following is a EEP-specific summary of reStructuredText syntax. -For the sake of simplicity and brevity, much detail is omitted. For -more detail, see `Resources`_ below. `Literal blocks`_ (in which no -markup processing is done) are used for examples throughout, to -illustrate the plaintext markup. - - -General -------- - -You must adhere to the Emacs convention of adding two spaces at the -end of every sentence. You should fill your paragraphs to column 70, -but under no circumstances should your lines extend past column 79. -If your code samples spill over column 79, you should rewrite them. - -Tab characters must never appear in the document at all. A EEP should -include the standard Emacs stanza included by example at the bottom of -this EEP. - - -Section Headings ----------------- - -EEP headings must begin in column zero and the initial letter of each -word must be capitalized as in book titles. Acronyms should be in all -capitals. Section titles must be adorned with an underline, a single -repeated punctuation character, which begins in column zero and must -extend at least as far as the right edge of the title text (4 -characters minimum). First-level section titles are underlined with -"=" (equals signs), second-level section titles with "-" (hyphens), -and third-level section titles with "'" (single quotes or -apostrophes). For example:: - - First-Level Title - ================= - - Second-Level Title + EEP: 3 + Title: Sample reStructuredText EEP Template + Version: $Revision$ + Last-Modified: $Date$ + Author: Per Gustafsson + Status: Active + Type: Process + Content-Type: text/x-rst + Created: 05-Aug-2002 + Post-History: 30-Aug-2002 + + + Abstract + ======== + + This EEP provides a boilerplate or sample template for creating your + own reStructuredText EEPs. In conjunction with the content guidelines + in EEP 1 [1]_, this should make it easy for you to conform your own + EEPs to the format outlined below. + + Note: if you are reading this EEP via the web, you should first grab + the text (reStructuredText) source of this EEP in order to complete + the steps below. **DO NOT USE THE HTML FILE AS YOUR TEMPLATE!** + + If you would prefer not to use markup in your EEP, please see EEP 2, + "Sample Plaintext EEP Template" [2]_. + + This EEP is a slightly revised version of PEP 12 [3]_. + + + Rationale + ========= + + ReStructuredText is offered as an alternative to plaintext EEPs, to + allow EEP authors more functionality and expressivity, while + maintaining easy readability in the source text. The processed HTML + form makes the functionality accessible to readers: live hyperlinks, + styled text, tables, images, and automatic tables of contents, among + other advantages. + + + How to Use This Template + ======================== + + To use this template you must first decide whether your EEP is going + to be an Informational or Standards Track EEP. Most EEPs are + Standards Track because they propose a new feature for the Erlang + language or standard library. When in doubt, read EEP 1 for details + or contact the EEP editors . + + Once you've decided which type of EEP yours is going to be, follow the + directions below. + + - Make a copy of this file (``.txt`` file, **not** HTML!) and perform + the following edits. + + - Replace the "EEP: 3" header with "EEP: XXX" since you don't yet have + a EEP number assignment. + + - Change the Title header to the title of your EEP. + + - Leave the Version and Last-Modified headers alone; we'll take care + of those when we check your EEP into the Subversion repository. + These headers consist of keywords ("Revision" and "Date" enclosed in + "$"-signs) which are automatically expanded by the repository. + Please do not edit the expanded date or revision text. + + - Change the Author header to include your name, and optionally your + email address. Be sure to follow the format carefully: your name + must appear first, and it must not be contained in parentheses. + Your email address may appear second (or it can be omitted) and if + it appears, it must appear in angle brackets. It is okay to + obfuscate your email address. + + - If there is a mailing list for discussion of your new feature, add a + Discussions-To header right after the Author header. You should not + add a Discussions-To header if the mailing list to be used is either + the erlang mailing list, or if discussions + should be sent to you directly. Most Informational EEPs don't have + a Discussions-To header. + + - Change the Status header to "Draft". + + - For Standards Track EEPs, change the Type header to "Standards + Track". + + - For Informational EEPs, change the Type header to "Informational". + + - For Standards Track EEPs, if your feature depends on the acceptance + of some other currently in-development EEP, add a Requires header + right after the Type header. The value should be the EEP number of + the EEP yours depends on. Don't add this header if your dependent + feature is described in a Final EEP. + + - Change the Created header to today's date. Be sure to follow the + format carefully: it must be in ``dd-mmm-yyyy`` format, where the + ``mmm`` is the 3 English letter month abbreviation, i.e. one of Jan, + Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec. + + - For Standards Track EEPs, after the Created header, add a + Erlang-Version header and set the value to the next planned + version of Erlang, i.e. the one your new feature will hopefully + make its first appearance in. Thus, if the last version of + Erlang/OTP was R11B-3 and you're hoping to get your new feature + into R11B-4 set the version header to: + + Erlang-Version: R11B-4 + + - Leave Post-History alone for now; you'll add dates to this header + each time you post your EEP to the Erlang Mailing list. If you posted + your EEP to the lists on August 14, 2001 and September 3, 2001, the + Post-History header would look like:: + + Post-History: 14-Aug-2001, 03-Sept-2001 + + You must manually add new dates and check them in. If you don't + have check-in privileges, send your changes to the EEP editors. + + - Add a Replaces header if your EEP obsoletes an earlier EEP. The + value of this header is the number of the EEP that your new EEP is + replacing. Only add this header if the older EEP is in "final" + form, i.e. is either Accepted, Final, or Rejected. You aren't + replacing an older open EEP if you're submitting a competing idea. + + - Now write your Abstract, Rationale, and other content for your EEP, + replacing all this gobbledygook with your own text. Be sure to + adhere to the format guidelines below, specifically on the + prohibition of tab characters and the indentation requirements. + + - Update your References and Copyright section. Usually you'll place + your EEP into the public domain, in which case just leave the + Copyright section alone. Alternatively, you can use the `Open + Publication License`__, but public domain is still strongly + preferred. + + __ http://www.opencontent.org/openpub/ + + - Leave the Emacs stanza at the end of this file alone, including the + formfeed character ("^L", or ``\f``). + + - Send your EEP submission to the EEP editors at eeps@erlang.org. + + + ReStructuredText EEP Formatting Requirements + ============================================ + + The following is a EEP-specific summary of reStructuredText syntax. + For the sake of simplicity and brevity, much detail is omitted. For + more detail, see `Resources`_ below. `Literal blocks`_ (in which no + markup processing is done) are used for examples throughout, to + illustrate the plaintext markup. + + + General + ------- + + You must adhere to the Emacs convention of adding two spaces at the + end of every sentence. You should fill your paragraphs to column 70, + but under no circumstances should your lines extend past column 79. + If your code samples spill over column 79, you should rewrite them. + + Tab characters must never appear in the document at all. A EEP should + include the standard Emacs stanza included by example at the bottom of + this EEP. + + + Section Headings + ---------------- + + EEP headings must begin in column zero and the initial letter of each + word must be capitalized as in book titles. Acronyms should be in all + capitals. Section titles must be adorned with an underline, a single + repeated punctuation character, which begins in column zero and must + extend at least as far as the right edge of the title text (4 + characters minimum). First-level section titles are underlined with + "=" (equals signs), second-level section titles with "-" (hyphens), + and third-level section titles with "'" (single quotes or + apostrophes). For example:: + + First-Level Title + ================= + + Second-Level Title + ------------------ + + Third-Level Title + ''''''''''''''''' + + If there are more than three levels of sections in your EEP, you may + insert overline/underline-adorned titles for the first and second + levels as follows:: + + ============================ + First-Level Title (optional) + ============================ + + ----------------------------- + Second-Level Title (optional) + ----------------------------- + + Third-Level Title + ================= + + Fourth-Level Title + ------------------ + + Fifth-Level Title + ''''''''''''''''' + + You shouldn't have more than five levels of sections in your EEP. If + you do, you should consider rewriting it. + + You must use two blank lines between the last line of a section's body + and the next section heading. If a subsection heading immediately + follows a section heading, a single blank line in-between is + sufficient. + + The body of each section is not normally indented, although some + constructs do use indentation, as described below. Blank lines are + used to separate constructs. + + + Paragraphs + ---------- + + Paragraphs are left-aligned text blocks separated by blank lines. + Paragraphs are not indented unless they are part of an indented + construct (such as a block quote or a list item). + + + Inline Markup + ------------- + + Portions of text within paragraphs and other text blocks may be + styled. For example:: + + Text may be marked as *emphasized* (single asterisk markup, + typically shown in italics) or **strongly emphasized** (double + asterisks, typically boldface). ``Inline literals`` (using double + backquotes) are typically rendered in a monospaced typeface. No + further markup recognition is done within the double backquotes, + so they're safe for any kind of code snippets. + + + Block Quotes + ------------ + + Block quotes consist of indented body elements. For example:: + + This is a paragraph. + + This is a block quote. + + A block quote may contain many paragraphs. + + Block quotes are used to quote extended passages from other sources. + Block quotes may be nested inside other body elements. Use 4 spaces + per indent level. + + + Literal Blocks + -------------- + + .. + In the text below, double backquotes are used to denote inline + literals. "``::``" is written so that the colons will appear in a + monospaced font; the backquotes (``) are markup, not part of the + text. See "Inline Markup" above. + + By the way, this is a comment, described in "Comments" below. + + Literal blocks are used for code samples or preformatted ASCII art. To + indicate a literal block, preface the indented text block with + "``::``" (two colons). The literal block continues until the end of + the indentation. Indent the text block by 4 spaces. For example:: + + This is a typical paragraph. A literal block follows. + + :: + + for a in [5,4,3,2,1]: # this is program code, shown as-is + print a + print "it's..." + # a literal block continues until the indentation ends + + The paragraph containing only "``::``" will be completely removed from + the output; no empty paragraph will remain. "``::``" is also + recognized at the end of any paragraph. If immediately preceded by + whitespace, both colons will be removed from the output. When text + immediately precedes the "``::``", *one* colon will be removed from + the output, leaving only one colon visible (i.e., "``::``" will be + replaced by "``:``"). For example, one colon will remain visible + here:: + + Paragraph:: + + Literal block + + + Lists + ----- + + Bullet list items begin with one of "-", "*", or "+" (hyphen, + asterisk, or plus sign), followed by whitespace and the list item + body. List item bodies must be left-aligned and indented relative to + the bullet; the text immediately after the bullet determines the + indentation. For example:: + + This paragraph is followed by a list. + + * This is the first bullet list item. The blank line above the + first list item is required; blank lines between list items + (such as below this paragraph) are optional. + + * This is the first paragraph in the second item in the list. + + This is the second paragraph in the second item in the list. + The blank line above this paragraph is required. The left edge + of this paragraph lines up with the paragraph above, both + indented relative to the bullet. + + - This is a sublist. The bullet lines up with the left edge of + the text blocks above. A sublist is a new list so requires a + blank line above and below. + + * This is the third item of the main list. + + This paragraph is not part of the list. + + Enumerated (numbered) list items are similar, but use an enumerator + instead of a bullet. Enumerators are numbers (1, 2, 3, ...), letters + (A, B, C, ...; uppercase or lowercase), or Roman numerals (i, ii, iii, + iv, ...; uppercase or lowercase), formatted with a period suffix + ("1.", "2."), parentheses ("(1)", "(2)"), or a right-parenthesis + suffix ("1)", "2)"). For example:: + + 1. As with bullet list items, the left edge of paragraphs must + align. + + 2. Each list item may contain multiple paragraphs, sublists, etc. + + This is the second paragraph of the second list item. + + a) Enumerated lists may be nested. + b) Blank lines may be omitted between list items. + + Definition lists are written like this:: + + what + Definition lists associate a term with a definition. + + how + The term is a one-line phrase, and the definition is one + or more paragraphs or body elements, indented relative to + the term. + + + Tables + ------ + + Simple tables are easy and compact:: + + ===== ===== ======= + A B A and B + ===== ===== ======= + False False False + True False False + False True False + True True True + ===== ===== ======= + + There must be at least two columns in a table (to differentiate from + section titles). Column spans use underlines of hyphens ("Inputs" + spans the first two columns):: + + ===== ===== ====== + Inputs Output + ------------ ------ + A B A or B + ===== ===== ====== + False False False + True False True + False True True + True True True + ===== ===== ====== + + Text in a first-column cell starts a new row. No text in the first + column indicates a continuation line; the rest of the cells may + consist of multiple lines. For example:: + + ===== ========================= + col 1 col 2 + ===== ========================= + 1 Second column of row 1. + 2 Second column of row 2. + Second line of paragraph. + 3 - Second column of row 3. + + - Second item in bullet + list (row 3, column 2). + ===== ========================= + + + Hyperlinks + ---------- + + When referencing an external web page in the body of a EEP, you should + include the title of the page in the text, with either an inline + hyperlink reference to the URL or a footnote reference (see + `Footnotes`_ below). Do not include the URL in the body text of the + EEP. + + Hyperlink references use backquotes and a trailing underscore to mark + up the reference text; backquotes are optional if the reference text + is a single word. For example:: + + In this paragraph, we refer to the `Erlang web site`_. + + An explicit target provides the URL. Put targets in a References + section at the end of the EEP, or immediately after the reference. + Hyperlink targets begin with two periods and a space (the "explicit + markup start"), followed by a leading underscore, the reference text, + a colon, and the URL (absolute or relative):: + + .. _Erlang web site: http://www.erlang.org/ + + The reference text and the target text must match (although the match + is case-insensitive and ignores differences in whitespace). Note that + the underscore trails the reference text but precedes the target text. + If you think of the underscore as a right-pointing arrow, it points + *away* from the reference and *toward* the target. + + The same mechanism can be used for internal references. Every unique + section title implicitly defines an internal hyperlink target. We can + make a link to the Abstract section like this:: + + Here is a hyperlink reference to the `Abstract`_ section. The + backquotes are optional since the reference text is a single word; + we can also just write: Abstract_. + + Footnotes containing the URLs from external targets will be generated + automatically at the end of the References section of the EEP, along + with footnote references linking the reference text to the footnotes. + + Text of the form "EEP x" or "RFC x" (where "x" is a number) will be + linked automatically to the appropriate URLs. + + + Footnotes + --------- + + Footnote references consist of a left square bracket, a number, a + right square bracket, and a trailing underscore:: + + This sentence ends with a footnote reference [1]_. + + Whitespace must precede the footnote reference. Leave a space between + the footnote reference and the preceding word. + + When referring to another EEP, include the EEP number in the body + text, such as "EEP 1". The title may optionally appear. Add a + footnote reference following the title. For example:: + + Refer to EEP 1 [2]_ for more information. + + Add a footnote that includes the EEP's title and author. It may + optionally include the explicit URL on a separate line, but only in + the References section. Footnotes begin with ".. " (the explicit + markup start), followed by the footnote marker (no underscores), + followed by the footnote body. For example:: + + References + ========== + + .. [2] EEP 1, "EEP Purpose and Guidelines", Gustafsson + (http://www.erlang.org/eeps/eep-0001) + + If you decide to provide an explicit URL for a EEP, please use this as + the URL template:: + + http://www.erlang.org/eeps/eep-xxxx + + EEP numbers in URLs must be padded with zeros from the left, so as to + be exactly 4 characters wide, however EEP numbers in the text are + never padded. + + During the course of developing your EEP, you may have to add, remove, + and rearrange footnote references, possibly resulting in mismatched + references, obsolete footnotes, and confusion. Auto-numbered + footnotes allow more freedom. Instead of a number, use a label of the + form "#word", where "word" is a mnemonic consisting of alphanumerics + plus internal hyphens, underscores, and periods (no whitespace or + other characters are allowed). For example:: + + Refer to EEP 1 [#EEP-1]_ for more information. + + References + ========== + + .. [#EEP-1] EEP 1, "EEP Purpose and Guidelines", Gustafsson + + http://www.erlang.org/eeps/eep-0001 + + Footnotes and footnote references will be numbered automatically, and + the numbers will always match. Once a EEP is finalized, auto-numbered + labels should be replaced by numbers for simplicity. + + + Images + ------ + + If your EEP contains a diagram, you may include it in the processed + output using the "image" directive:: + + .. image:: diagram.png + + Any browser-friendly graphics format is possible: .png, .jpeg, .gif, + .tiff, etc. + + Since this image will not be visible to readers of the EEP in source + text form, you should consider including a description or ASCII art + alternative, using a comment (below). + + + Comments + -------- + + A comment block is an indented block of arbitrary text immediately + following an explicit markup start: two periods and whitespace. Leave + the ".." on a line by itself to ensure that the comment is not + misinterpreted as another explicit markup construct. Comments are not + visible in the processed document. For the benefit of those reading + your EEP in source form, please consider including a descriptions of + or ASCII art alternatives to any images you include. For example:: + + .. image:: dataflow.png + + .. + Data flows from the input module, through the "black box" + module, and finally into (and through) the output module. + + The Emacs stanza at the bottom of this document is inside a comment. + + + Escaping Mechanism ------------------ - - Third-Level Title - ''''''''''''''''' - -If there are more than three levels of sections in your EEP, you may -insert overline/underline-adorned titles for the first and second -levels as follows:: - - ============================ - First-Level Title (optional) - ============================ - - ----------------------------- - Second-Level Title (optional) - ----------------------------- - - Third-Level Title - ================= - - Fourth-Level Title - ------------------ - - Fifth-Level Title - ''''''''''''''''' - -You shouldn't have more than five levels of sections in your EEP. If -you do, you should consider rewriting it. - -You must use two blank lines between the last line of a section's body -and the next section heading. If a subsection heading immediately -follows a section heading, a single blank line in-between is -sufficient. - -The body of each section is not normally indented, although some -constructs do use indentation, as described below. Blank lines are -used to separate constructs. - - -Paragraphs ----------- - -Paragraphs are left-aligned text blocks separated by blank lines. -Paragraphs are not indented unless they are part of an indented -construct (such as a block quote or a list item). - - -Inline Markup -------------- - -Portions of text within paragraphs and other text blocks may be -styled. For example:: - - Text may be marked as *emphasized* (single asterisk markup, - typically shown in italics) or **strongly emphasized** (double - asterisks, typically boldface). ``Inline literals`` (using double - backquotes) are typically rendered in a monospaced typeface. No - further markup recognition is done within the double backquotes, - so they're safe for any kind of code snippets. - - -Block Quotes ------------- - -Block quotes consist of indented body elements. For example:: - - This is a paragraph. - - This is a block quote. - - A block quote may contain many paragraphs. - -Block quotes are used to quote extended passages from other sources. -Block quotes may be nested inside other body elements. Use 4 spaces -per indent level. - - -Literal Blocks --------------- - -.. - In the text below, double backquotes are used to denote inline - literals. "``::``" is written so that the colons will appear in a - monospaced font; the backquotes (``) are markup, not part of the - text. See "Inline Markup" above. - - By the way, this is a comment, described in "Comments" below. - -Literal blocks are used for code samples or preformatted ASCII art. To -indicate a literal block, preface the indented text block with -"``::``" (two colons). The literal block continues until the end of -the indentation. Indent the text block by 4 spaces. For example:: - - This is a typical paragraph. A literal block follows. - - :: - - for a in [5,4,3,2,1]: # this is program code, shown as-is - print a - print "it's..." - # a literal block continues until the indentation ends - -The paragraph containing only "``::``" will be completely removed from -the output; no empty paragraph will remain. "``::``" is also -recognized at the end of any paragraph. If immediately preceded by -whitespace, both colons will be removed from the output. When text -immediately precedes the "``::``", *one* colon will be removed from -the output, leaving only one colon visible (i.e., "``::``" will be -replaced by "``:``"). For example, one colon will remain visible -here:: - - Paragraph:: - - Literal block - - -Lists ------ - -Bullet list items begin with one of "-", "*", or "+" (hyphen, -asterisk, or plus sign), followed by whitespace and the list item -body. List item bodies must be left-aligned and indented relative to -the bullet; the text immediately after the bullet determines the -indentation. For example:: - - This paragraph is followed by a list. - - * This is the first bullet list item. The blank line above the - first list item is required; blank lines between list items - (such as below this paragraph) are optional. - - * This is the first paragraph in the second item in the list. - - This is the second paragraph in the second item in the list. - The blank line above this paragraph is required. The left edge - of this paragraph lines up with the paragraph above, both - indented relative to the bullet. - - - This is a sublist. The bullet lines up with the left edge of - the text blocks above. A sublist is a new list so requires a - blank line above and below. - - * This is the third item of the main list. - - This paragraph is not part of the list. - -Enumerated (numbered) list items are similar, but use an enumerator -instead of a bullet. Enumerators are numbers (1, 2, 3, ...), letters -(A, B, C, ...; uppercase or lowercase), or Roman numerals (i, ii, iii, -iv, ...; uppercase or lowercase), formatted with a period suffix -("1.", "2."), parentheses ("(1)", "(2)"), or a right-parenthesis -suffix ("1)", "2)"). For example:: - - 1. As with bullet list items, the left edge of paragraphs must - align. - - 2. Each list item may contain multiple paragraphs, sublists, etc. - - This is the second paragraph of the second list item. - - a) Enumerated lists may be nested. - b) Blank lines may be omitted between list items. - -Definition lists are written like this:: - - what - Definition lists associate a term with a definition. - - how - The term is a one-line phrase, and the definition is one - or more paragraphs or body elements, indented relative to - the term. - - -Tables ------- - -Simple tables are easy and compact:: - - ===== ===== ======= - A B A and B - ===== ===== ======= - False False False - True False False - False True False - True True True - ===== ===== ======= - -There must be at least two columns in a table (to differentiate from -section titles). Column spans use underlines of hyphens ("Inputs" -spans the first two columns):: - - ===== ===== ====== - Inputs Output - ------------ ------ - A B A or B - ===== ===== ====== - False False False - True False True - False True True - True True True - ===== ===== ====== - -Text in a first-column cell starts a new row. No text in the first -column indicates a continuation line; the rest of the cells may -consist of multiple lines. For example:: - - ===== ========================= - col 1 col 2 - ===== ========================= - 1 Second column of row 1. - 2 Second column of row 2. - Second line of paragraph. - 3 - Second column of row 3. - - - Second item in bullet - list (row 3, column 2). - ===== ========================= - - -Hyperlinks ----------- - -When referencing an external web page in the body of a EEP, you should -include the title of the page in the text, with either an inline -hyperlink reference to the URL or a footnote reference (see -`Footnotes`_ below). Do not include the URL in the body text of the -EEP. - -Hyperlink references use backquotes and a trailing underscore to mark -up the reference text; backquotes are optional if the reference text -is a single word. For example:: - - In this paragraph, we refer to the `Erlang web site`_. - -An explicit target provides the URL. Put targets in a References -section at the end of the EEP, or immediately after the reference. -Hyperlink targets begin with two periods and a space (the "explicit -markup start"), followed by a leading underscore, the reference text, -a colon, and the URL (absolute or relative):: - - .. _Erlang web site: http://www.erlang.org/ - -The reference text and the target text must match (although the match -is case-insensitive and ignores differences in whitespace). Note that -the underscore trails the reference text but precedes the target text. -If you think of the underscore as a right-pointing arrow, it points -*away* from the reference and *toward* the target. - -The same mechanism can be used for internal references. Every unique -section title implicitly defines an internal hyperlink target. We can -make a link to the Abstract section like this:: - - Here is a hyperlink reference to the `Abstract`_ section. The - backquotes are optional since the reference text is a single word; - we can also just write: Abstract_. - -Footnotes containing the URLs from external targets will be generated -automatically at the end of the References section of the EEP, along -with footnote references linking the reference text to the footnotes. - -Text of the form "EEP x" or "RFC x" (where "x" is a number) will be -linked automatically to the appropriate URLs. - - -Footnotes ---------- - -Footnote references consist of a left square bracket, a number, a -right square bracket, and a trailing underscore:: - - This sentence ends with a footnote reference [1]_. - -Whitespace must precede the footnote reference. Leave a space between -the footnote reference and the preceding word. - -When referring to another EEP, include the EEP number in the body -text, such as "EEP 1". The title may optionally appear. Add a -footnote reference following the title. For example:: - - Refer to EEP 1 [2]_ for more information. - -Add a footnote that includes the EEP's title and author. It may -optionally include the explicit URL on a separate line, but only in -the References section. Footnotes begin with ".. " (the explicit -markup start), followed by the footnote marker (no underscores), -followed by the footnote body. For example:: - - References - ========== - - .. [2] EEP 1, "EEP Purpose and Guidelines", Gustafsson - (http://www.erlang.org/eeps/eep-0001) - -If you decide to provide an explicit URL for a EEP, please use this as -the URL template:: - - http://www.erlang.org/eeps/eep-xxxx - -EEP numbers in URLs must be padded with zeros from the left, so as to -be exactly 4 characters wide, however EEP numbers in the text are -never padded. - -During the course of developing your EEP, you may have to add, remove, -and rearrange footnote references, possibly resulting in mismatched -references, obsolete footnotes, and confusion. Auto-numbered -footnotes allow more freedom. Instead of a number, use a label of the -form "#word", where "word" is a mnemonic consisting of alphanumerics -plus internal hyphens, underscores, and periods (no whitespace or -other characters are allowed). For example:: - - Refer to EEP 1 [#EEP-1]_ for more information. - + + reStructuredText uses backslashes ("``\``") to override the special + meaning given to markup characters and get the literal characters + themselves. To get a literal backslash, use an escaped backslash + ("``\\``"). There are two contexts in which backslashes have no + special meaning: `literal blocks`_ and inline literals (see `Inline + Markup`_ above). In these contexts, no markup recognition is done, + and a single backslash represents a literal backslash, without having + to double up. + + If you find that you need to use a backslash in your text, consider + using inline literals or a literal block instead. + + + Habits to Avoid + =============== + + Many programmers who are familiar with TeX often write quotation marks + like this:: + + `single-quoted' or ``double-quoted'' + + Backquotes are significant in reStructuredText, so this practice + should be avoided. For ordinary text, use ordinary 'single-quotes' or + "double-quotes". For inline literal text (see `Inline Markup`_ + above), use double-backquotes:: + + ``literal text: in here, anything goes!`` + + + Resources + ========= + + Many other constructs and variations are possible. For more details + about the reStructuredText markup, in increasing order of + thoroughness, please see: + + * `A ReStructuredText Primer`__, a gentle introduction. + + __ http://docutils.sourceforge.net/docs/rst/quickstart.html + + * `Quick reStructuredText`__, a users' quick reference. + + __ http://docutils.sourceforge.net/docs/rst/quickref.html + + * `reStructuredText Markup Specification`__, the final authority. + + __ http://docutils.sourceforge.net/spec/rst/reStructuredText.html + + The processing of reStructuredText EEPs is done using Docutils_. The + `Docutils project web site`_ has more information. + + .. _Docutils: + .. _Docutils project web site: http://docutils.sourceforge.net/ + + References ========== - - .. [#EEP-1] EEP 1, "EEP Purpose and Guidelines", Gustafsson - - http://www.erlang.org/eeps/eep-0001 - -Footnotes and footnote references will be numbered automatically, and -the numbers will always match. Once a EEP is finalized, auto-numbered -labels should be replaced by numbers for simplicity. - - -Images ------- - -If your EEP contains a diagram, you may include it in the processed -output using the "image" directive:: - - .. image:: diagram.png - -Any browser-friendly graphics format is possible: .png, .jpeg, .gif, -.tiff, etc. - -Since this image will not be visible to readers of the EEP in source -text form, you should consider including a description or ASCII art -alternative, using a comment (below). - - -Comments --------- - -A comment block is an indented block of arbitrary text immediately -following an explicit markup start: two periods and whitespace. Leave -the ".." on a line by itself to ensure that the comment is not -misinterpreted as another explicit markup construct. Comments are not -visible in the processed document. For the benefit of those reading -your EEP in source form, please consider including a descriptions of -or ASCII art alternatives to any images you include. For example:: - - .. image:: dataflow.png - - .. - Data flows from the input module, through the "black box" - module, and finally into (and through) the output module. - -The Emacs stanza at the bottom of this document is inside a comment. - - -Escaping Mechanism ------------------- - -reStructuredText uses backslashes ("``\``") to override the special -meaning given to markup characters and get the literal characters -themselves. To get a literal backslash, use an escaped backslash -("``\\``"). There are two contexts in which backslashes have no -special meaning: `literal blocks`_ and inline literals (see `Inline -Markup`_ above). In these contexts, no markup recognition is done, -and a single backslash represents a literal backslash, without having -to double up. - -If you find that you need to use a backslash in your text, consider -using inline literals or a literal block instead. - - -Habits to Avoid -=============== - -Many programmers who are familiar with TeX often write quotation marks -like this:: - - `single-quoted' or ``double-quoted'' - -Backquotes are significant in reStructuredText, so this practice -should be avoided. For ordinary text, use ordinary 'single-quotes' or -"double-quotes". For inline literal text (see `Inline Markup`_ -above), use double-backquotes:: - - ``literal text: in here, anything goes!`` - - -Resources -========= - -Many other constructs and variations are possible. For more details -about the reStructuredText markup, in increasing order of -thoroughness, please see: - -* `A ReStructuredText Primer`__, a gentle introduction. - - __ http://docutils.sourceforge.net/docs/rst/quickstart.html - -* `Quick reStructuredText`__, a users' quick reference. - - __ http://docutils.sourceforge.net/docs/rst/quickref.html - -* `reStructuredText Markup Specification`__, the final authority. - - __ http://docutils.sourceforge.net/spec/rst/reStructuredText.html - -The processing of reStructuredText EEPs is done using Docutils_. The -`Docutils project web site`_ has more information. - -.. _Docutils: -.. _Docutils project web site: http://docutils.sourceforge.net/ - - -References -========== - -.. [1] EEP 1, EEP Purpose and Guidelines, Gustafsson - (http://www.erlang.org/eeps/eep-0001.html) - -.. [2] EEP 2, Sample Plaintext EEP Template, Gustafsson - (http://www.erlang.org/eeps/eep-0002.html) - -.. [3] PEP 12, Sample reStructuredText PEP Template, Godger, Warsaw - (http://www.python.org/dev/peps/pep-0012/) - -Copyright -========= - -This document has been placed in the public domain. - - - -.. - Local Variables: - mode: indented-text - indent-tabs-mode: nil - sentence-end-double-space: t - fill-column: 70 - coding: utf-8 - End: + + .. [1] EEP 1, EEP Purpose and Guidelines, Gustafsson + (http://www.erlang.org/eeps/eep-0001.html) + + .. [2] EEP 2, Sample Plaintext EEP Template, Gustafsson + (http://www.erlang.org/eeps/eep-0002.html) + + .. [3] PEP 12, Sample reStructuredText PEP Template, Godger, Warsaw + (http://www.python.org/dev/peps/pep-0012/) + + Copyright + ========= + + This document has been placed in the public domain. + + + + .. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: diff --git a/eeps/eep-0004.md b/eeps/eep-0004.md index 646d809..a338df9 100644 --- a/eeps/eep-0004.md +++ b/eeps/eep-0004.md @@ -1,14 +1,14 @@ -EEP: 4 -Title: New BIFs for bit-level binaries (bit strings) -Version: $Revision$ -Last-Modified: $Date$ -Author: Per Gustafsson -Status: Draft -Type: Standards Track -Content-Type: text/x-rst -Created: 10-Aug-2007 -Erlang-Version: R12B-0 -Post-History: + Author: Per Gustafsson + Status: Final/R12B-0 Proposal is implemented in OTP release R12B-0 + Type: Standards Track + Created: 10-Aug-2007 + Erlang-Version: R12B-0 + Post-History: +**** +EEP 4: New BIFs for bit-level binaries (bit strings) +---- + + Abstract ======== @@ -125,3 +125,13 @@ bit-level binaries as we have for ordinary binaries without changing the semantics of the BIFs for binaries such as size/1, binary_to_list/1, list_to_binary/1 etc.. This means that all such BIFs will throw an exception if their arguments contains bit strings. + + + +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0005.md b/eeps/eep-0005.md index 02d2ac0..c4cb287 100644 --- a/eeps/eep-0005.md +++ b/eeps/eep-0005.md @@ -1,20 +1,19 @@ -EEP 5: More Versatile Encapsulation with export_to -==== - - Author: Per Gustafsson + Author: Per Gustafsson Status: Draft Type: Standards Track - Content-Type: text/x-markdown Created: 10-Aug-2007 Erlang-Version: R12B-0 Post-History: +**** +EEP 5: More Versatile Encapsulation with `export_to` +---- + -==== Abstract --------- +======== -This EEP describes a new directive called export_to which allows a +This EEP describes a new directive called `export_to` which allows a module to specify exactly which other modules that can call a function defined in the module. This provides a very fine grained primitive for encapsulation. Allowing the programmer to control more directly how @@ -22,12 +21,14 @@ his code should be used. This is an idea originally proposed by Richard O'Keefe. + + Specification -------------- +============= -This is the syntax for the export_to directive: +This is the syntax for the `export_to` directive: -``-export_to(m,[f/a])`` + -export_to(m,[f/a]) where `f` is the name of a function of arity `a` and `m` is a module. (Perhaps we should allow a list of modules) @@ -40,8 +41,10 @@ In addition these functions should act as exported functions to the rest of the world i.e. calls to these functions should always be to the latest version of the function. + + Motivation ----------- +========== The module in Erlang have several roles. It is the unit of compilation and code reloading. It is also the unit of encapsulation, because the @@ -61,18 +64,20 @@ the module, but sometimes we want to have more control than this e.g. this function should only be called from other modules in this application, or this function should only be called from the shell. -The export_to directive gives the programmer the possibility to +The `export_to` directive gives the programmer the possibility to express such restrictions that the runtime system then enforces. It -should be noted that the export_to directive is not meant to replace +should be noted that the `export_to` directive is not meant to replace the export directive, but to be an alternative in the case when the programmer knows all possible collaborators. + + Rationale ---------- +========= -There are some choices in designing the export_to syntax for example +There are some choices in designing the `export_to` syntax for example should m be allowed to be a list of modules or should we have an -export_to list where each entry is a module, function/arity pair. One +`export_to` list where each entry is a module, function/arity pair. One reason to use the suggested syntax is that it reads pretty easily as: export to module `m` this list of functions `[f/a]` @@ -84,30 +89,42 @@ make it possible to apply the function or to make it possible to update the code of the function. Discussions about such changes is outside the scope of this EEP, we -only note that the export_to directive makes a good building block for +only note that the `export_to` directive makes a good building block for creating such extensions without having to change the Erlang runtime. - .. Other languages that have something like this? + Backwards Compatibility ------------------------ +======================= -Adding an export_to directive should be totally backwards +Adding an `export_to` directive should be totally backwards compatible. Since writing such a directive now causes a syntax error since it is not a legal attribute. + + Implementation --------------- +============== This feature has not been implemented yet, but here are some goals that we think the implementation should fulfill: -* Ordinary static calls to an export_to function should cost the same - as calls to other exported functions +* Ordinary static calls to an `export_to` function should cost the same + as calls to other exported functions -* The performance of other calls should not be affected by the - introduction of export_to calls +* The performance of other calls should not be affected by the + introduction of `export_to` calls This can be archived by putting most of the machinery to handle this feature in to the loader and only use dynamic checks for dynamic calls. + + + +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0006.md b/eeps/eep-0006.md index d0abde7..f265102 100644 --- a/eeps/eep-0006.md +++ b/eeps/eep-0006.md @@ -1,69 +1,88 @@ -EEP 6: New BIFs for tuple and binary sizes -==== - - Author: Bjorn Gustavsson - Status: Draft + Author: Björn Gustavsson + Status: Final/R12B-0 Proposal is implemented in OTP release R12B-0 Type: Standards Track - Content-Type: text/x-markdown Created: 10-Aug-2007 Erlang-Version: R12B-0 Post-History: +**** +EEP 6: New BIFs for tuple and binary sizes +---- + -==== Abstract --------- +======== + +This EEP describes the two new guards BIFs `tuple_size/1` and `byte_size/1 ` +as a prefered alternative to the `size/1 ` BIF. + -This EEP describes the two new guards BIFs ``tuple_size/1`` and ``byte_size/1`` -as a prefered alternative to the ``size/1`` BIF. Specifications --------------- +============== -``byte_size/1::bitstring() -> integer()`` + byte_size/1::bitstring() -> integer() Returns the number of bytes needed to store the entire *bitstring* -(see ). This BIF will return -the same value as ``(bit_size(Bin)+7) div 8`` (that is, the number -of bytes will be rounded up if number of bits is not evenly divisible by 8). +(see [EEP 4][]). This BIF will return the same value as +`(bit_size(Bin)+7) div 8` (that is, the number of bytes will be +rounded up if number of bits is not evenly divisible by 8). This BIF is allowed in guards. -``tuple_size/1::tuple() -> integer()`` + tuple_size/1::tuple() -> integer() Returns the size of a tuple. This BIF will fail if passed anything that is not a tuple. This BIF is allowed in guards. + + Rationale ---------- +========= -The ``size/1`` BIF accepts either a binary or a tuple, and returns +The `size/1` BIF accepts either a binary or a tuple, and returns either the size of binary in bytes or the size of the tuple. -Because ``size/1`` accepts two different types, it is difficult to +Because `size/1` accepts two different types, it is difficult to optimize uses of it, both in the compiler and in the run-time system. Adding the two new BIF will faciliate optimization, and will also help Dialyzer. -It could be argued that ``byte_size/1`` should only work for +It could be argued that `byte_size/1` should only work for binaries (bitstrings whose size in bits is disivible by 8) to catch the bug that the code cannot handle general bitstrings and still does not -use an ``is_binary/1`` guard test. In my opinion, if the programmer -must round up the result from ``bit_size/1`` to a whole number of bytes, +use an `is_binary/1` guard test. In my opinion, if the programmer +must round up the result from `bit_size/1` to a whole number of bytes, he or she is more likely to get *that* wrong: The "obvious" expressions -``bit_size(B) / 8 + 1`` or ``bit_size(B) div 8 + 1`` are both wrong, -and the correct expression ``(bit_size(B)+7) div 8`` is not immediately +`bit_size(B) / 8 + 1` or `bit_size(B) div 8 + 1` are both wrong, +and the correct expression `(bit_size(B)+7) div 8` is not immediately obvious. + + Implementation --------------- +============== The implementation is trivial. + + Backwards Compatibility ------------------------ +======================= + +Code containing local functions named `tuple_size/1` or `byte_size/1` +need to be changed. + +The compiler will issue a warning that `size/1` is deprecated +and will be removed in R14B for code that uses `size/1`. + + -Code containing local functions named ``tuple_size/1`` or ``byte_size/1`` need -to be changed. +[EEP 4]: "EEP 4" -The compiler will issue a warning that ``size/1`` is deprecated and will be removed -in R14B for code that uses ``size/1``. +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0007.md b/eeps/eep-0007.md index 837b8fa..57778ca 100644 --- a/eeps/eep-0007.md +++ b/eeps/eep-0007.md @@ -1,36 +1,36 @@ -EEP 7: Foreign Function Interface (FFI) -==== - - Author: Alceste Scalas [alceste(at)crs4(dot)it] + Author: Alceste Scalas Status: Draft Type: Standards Track - Content-Type: text/x-markdown; charset=utf-8 Created: 3-Sep-2007 Erlang-Version: R12B Post-History: +**** +EEP 7: Foreign Function Interface (FFI) +---- + -==== Abstract --------- +======== This EEP describes a Foreign Function Interface (FFI) for Erlang/OTP, that allows to easily perform direct calls of external C functions. -It introduces three new BIFs (`ffi:raw_call/3`_, -`erl_ddll:load_library/3`_ and `ffi:raw_call/2`_) that accomplish the main +It introduces three new BIFs (`ffi:raw_call/3`, +`erl_ddll:load_library/3` and `ffi:raw_call/2`) that accomplish the main FFI tasks: loading generic C libraries, making external function calls and performing automatic Erlang-to-C and C-to-Erlang type conversions. It also introduces two auxiliary BIFs for converting C buffers/strings -into binaries (`ffi:raw_buffer_to_binary/2`_ and -`ffi:raw_cstring_to_binary/1`_), a new ``ffi`` Erlang module that +into binaries (`ffi:raw_buffer_to_binary/2` and +`ffi:raw_cstring_to_binary/1`), a new `ffi` Erlang module that provides a higher-level API with stricter type checking, and some -utility macros. Finally, it extends erl_ddll:info/2 with FFI +utility macros. Finally, it extends `erl_ddll:info/2` with FFI information. + Motivation ----------- +========== The current Erlang extension mechanisms can be divided in two main categories: @@ -46,8 +46,8 @@ the Erlang driver interface implies the development of relevant amounts of glue code, mostly because the communication between Erlang and C always requires data parsing and (de)serialization. Several tools have been created in order to autogenerate (at least part of) -that glue: from the (now unmaintained) [IG driver generation tool] [1] -to the newer [Erlang Driver Toolkit (EDTK)] [2] and [Dryverl] [3]. +that glue: from the (now unmaintained) [IG driver generation tool][1] +to the newer [Erlang Driver Toolkit (EDTK)][2] and [Dryverl][3]. But, even with the help of these tools, developing an Erlang driver is a difficult and time-consuming task (especially when interfacing @@ -67,8 +67,9 @@ An easier method for interfacing Erlang and C code could drastically extend the Erlang capabilities and open new usage scenarios. + Rationale ---------- +========= This EEP proposes a Foreign Function Interface (FFI) extension that would allow to easily perform direct C function calls. This concept @@ -76,10 +77,10 @@ is implemented in almost every language, with two main (non-exclusive) approaches: 1. automatic type conversions between the host and the foreign - language (examples: [Python] [7], [Haskell] [8]); + language (examples: [Python][7], [Haskell][8]); 2. documented C interface for handling host language types from the - foreign language (examples: [Java] [9], [Python] [10] [11][]). + foreign language (examples: [Java][9], [Python][10] [(API)][11]). This EEP follows the first approach, but (when possible) also reuses part of the existing C Driver API (and, thus, allows to manage @@ -98,8 +99,9 @@ from final users), and advanced programmers looking for an easy (and efficient) way to call C code from Erlang. + Overview --------- +======== In order to call a C function, the FFI needs a port opened towards the required C code. Thus, with the current driver loading mechanism, a @@ -111,47 +113,49 @@ developer would be required to: 2. compile it and possibly link it against the required C libraries, thus obtaining a void Erlang driver; -3. load the driver in the Erlang VM, by using erl_ddll:load/2. +3. load the driver in the Erlang VM, by using `erl_ddll:load/2`. In order to simplify this procedure, this EEP proposes the -`erl_ddll:load_library/3`_ function, that allows to load a generic +`erl_ddll:load_library/3` function, that allows to load a generic library in the Erlang VM --- even if it lacks the structure of an Erlang linked-in driver. -erl_ddll:load_library/3 also offers an option to preload a list of C +`erl_ddll:load_library/3` also offers an option to preload a list of C function symbols and signatures, thus precompiling the internal structures needed for performing dynamic function calls. Information -about preloaded data can be retrieved with `erl_ddll:info/2`_. +about preloaded data can be retrieved with `erl_ddll:info/2`. -Once a library or driver has been loaded, erlang:open_port/2 or -`erlang:open_port/1`_ could be used to get a port for the FFI +Once a library or driver has been loaded, `erlang:open_port/2` or +`erlang:open_port/1` could be used to get a port for the FFI functions, and perform calls either through the low-level or the high-level APIs. -### Low-level API ### +Low-level API +------------- -The low-level FFI methods are denoted by the ``raw_`` prefix. The -main function is the `ffi:raw_call/3`_ BIF, that performs a direct C +The low-level FFI methods are denoted by the `raw_` prefix. The +main function is the `ffi:raw_call/3` BIF, that performs a direct C function call through an open port. It converts C types to/from Erlang types. -When taken alone, ffi:raw_call/3 has got a major drawback: it introduces +When taken alone, `ffi:raw_call/3` has got a major drawback: it introduces great call overhead, due to the C symbol lookup and the dynamic construction of the function call. -In order to exploit preloading option of erl_ddll:load_library/3, the -`ffi:raw_call/2`_ BIF is introduced: it avoids symbol lookup and call +In order to exploit preloading option of `erl_ddll:load_library/3`, the +`ffi:raw_call/2` BIF is introduced: it avoids symbol lookup and call structure compilation, thus guaranteeing a lower call overhead than -ffi:raw_call/3. +`ffi:raw_call/3`. Furthermore, the low-level interface provides two BIFs for creating an Erlang binary from a C pointer (possibly returned by a FFI call). -These BIFs are `ffi:raw_buffer_to_binary/2`_ and -`ffi:raw_cstring_to_binary/1`_. +These BIFs are `ffi:raw_buffer_to_binary/2` and +`ffi:raw_cstring_to_binary/1`. -### High-level API ### +High-level API +-------------- The high-level interface is built upon the low-level one. It introduces the concept of type-tagged values: any value passed to or @@ -163,126 +167,128 @@ allows to: 2. make the C calls safer: the consistency of tagged values is checked before the values themselves are passed to the low-level API. Furthermore, the preload information given to - erl_ddll:load_library/3 is used (when available) to ensure that the + `erl_ddll:load_library/3` is used (when available) to ensure that the tagged values actually match the function signature; 3. simulate the static typing of C code, thus requiring proper and explicit "casts" when a tagged value needs to be converted to another type. -These checks are performed by `ffi:call/3`_, `ffi:buffer_to_binary/2`_ -and `ffi:cstring_to_binary/1`_ (the type-tagged equivalents of the +These checks are performed by `ffi:call/3`, `ffi:buffer_to_binary/2` +and `ffi:cstring_to_binary/1` (the type-tagged equivalents of the low-level BIFs). Type-tagged values can also be checked with -`ffi:check/1`_. Furthermore, the allowed minimum and maximum value of -each FFI type can be examined with `ffi:min/1`_ and `ffi:max/1`_. +`ffi:check/1`. Furthermore, the allowed minimum and maximum value of +each FFI type can be examined with `ffi:min/1` and `ffi:max/1`. -### Utility macros ### +Utility macros +-------------- -The FFI defines a series of utility macros in the `ffi_hardcodes.hrl`_ +The FFI defines a series of utility macros in the `ffi_hardcodes.hrl` header file, that could be used for binary matching of C buffers and structures. Specifications --------------- +============== -### Types ### +Types +----- +### `c_func_name()` -#### c_func_name() - - `c_func_name() = atom() | string()` + c_func_name() = atom() | string() Name of a C function. -#### type_tag() +### `type_tag()` - `type_tag() = atom()` + type_tag() = atom() Valid FFI type atom. For the list of allowed values, see the Appendix. -##### tagged_value() +### `tagged_value()` - `tagged_value() = tuple(type_tag(), term())` + tagged_value() = tuple(type_tag(), term()) Type-tagged value used for FFI calls. -#### tagged_func_name() +### `tagged_func_name()` - `tagged_func_name() = tuple(type_tag(), c_func_name())` + tagged_func_name() = tuple(type_tag(), c_func_name()) C function name with return type. -#### func_index() +### `func_index()` - `func_index() = integer()` + func_index() = integer() Function position on the list of preloads given to -`erl_ddll:load_library/3`_. +`erl_ddll:load_library/3`. -#### tagged_func_index() +### `tagged_func_index()` - `tagged_func_index() = tuple(type_tag(), func_index())` + tagged_func_index() = tuple(type_tag(), func_index()) C function index with return type. -#### signature() +### `signature()` - `signature() = tuple(type_tag(), ...)` + signature() = tuple(type_tag(), ...) Signature of a C function: return type followed by arguments types (if any). -### erl_ddll:load_library/3 ### - +`erl_ddll:load_library/3` +------------------------- erl_ddll:load_library(Path, Name, OptionsList) -> ok | {error, ErrorDesc} Types: -- Path = Name = string() | atom() +- `Path = Name = string() | atom()` -- OptionList = [Option] +- `OptionList = [Option]` -- Option = tuple(preload, [Preload]) +- `Option = tuple(preload, [Preload])` -- Preload = tuple(`c_func_name()`_, `signature()`_) +- `Preload = tuple(c_func_name(), signature())` Load a generic shared library. -If an ``ErlDrvEntry`` structure and a driver init function are found when -loading the library, this BIF will behave like erl_ddll:load/2. The -function parameters are also the same of erl_ddll:load/2, with the +If an `ErlDrvEntry` structure and a driver init function are found when +loading the library, this BIF will behave like `erl_ddll:load/2`. The +function parameters are also the same of `erl_ddll:load/2`, with the following addition: **OptionList** is a list of options for library/driver loading. The supported options are: - - **`{preload, PreloadList}`** - Preload the given list of functions, and prepare their - call structures. Each PreloadList element is a tuple - in the form: - tuple(`c_func_name()`_, `signature()`_) +- **`{preload, PreloadList}`** + Preload the given list of functions, and prepare their + call structures. Each PreloadList element is a tuple + in the form: + + tuple(c_func_name(), `signature()) - i.e. the function name followed by its return and - arguments types. + i.e. the function name followed by its return and + arguments types. -The function return values are the same of erl_ddll:load/2. +The function return values are the same of `erl_ddll:load/2`. Once a library has been loaded, it is possible to use `erlang:open_port/2` to get a port. That port could *always* be used -with `ffi:call/3`_, `ffi:raw_call/3`_ or `ffi:raw_call/2`_. However, -if the loaded library does *not* contain a proper ``ErlDrvEntry`` +with `ffi:call/3`, `ffi:raw_call/3` or `ffi:raw_call/2`. However, +if the loaded library does *not* contain a proper `ErlDrvEntry` structure and a driver init function, the port will **not** be usable with `erlang:port_command/2`, `erlang:port_control/3` etc. @@ -297,31 +303,34 @@ functions: :: {free, {void, nonnull}}]}]). -### erl_ddll:load_library/2 ### +`erl_ddll:load_library/2` +------------------------- erl_ddll:load_library(Path, Name) -Utility function that calls `erl_ddll:load_library/3`_ with an empty +Utility function that calls `erl_ddll:load_library/3` with an empty OptionsList. -### erlang:open_port/1 ### +`erlang:open_port/1` +-------------------- erlang:open_port(Library) Types: -- Library = string() | atom() +- `Library = string() | atom()` Open a port towards the specified shared library, possibly loaded with -`erl_ddll:load_library/3`_. Calling this function is equivalent to: +`erl_ddll:load_library/3`. Calling this function is equivalent to: erlang:open_port({spawn, Library}, [binary]) -### erl_ddll:info/2 ### +`erl_ddll:info/2` +----------------- -This EEP proposes a new parameter for the erl_ddll:info/2 BIF: the +This EEP proposes a new parameter for the `erl_ddll:info/2` BIF: the 'preloads' atom. It allows to retrieve information about FFI preloads for the given library. @@ -333,23 +342,24 @@ function. Each proplist, in turn, has the following format: { address, integer() }, % Function address { signature, signature() } ] % Function signature -This information would be made available also through erl_ddll:info/0 +This information would be made available also through `erl_ddll:info/0` and `erl_ddll:info/1`. -### ffi:raw_call/3 ### +`ffi:raw_call/3` +---------------- ffi:raw_call(Port, CallArgs, Signature) -> term() Types: -- Port = port() +- `Port = port()` -- CallArgs = tuple(`c_func_name()`_, Arg1, ...) +- `CallArgs = tuple(`c_func_name()`, Arg1, ...)` -- Arg1, ... = term() +- `Arg1, ... = term()` -- Signature = `signature()`_ +- `Signature = signature()` Call the specified C function. @@ -384,29 +394,30 @@ affect the Erlang VM, possibly making it crash. Use this BIF with extreme care. -### ffi:raw_call/2 ### +`ffi:raw_call/2` +---------------- ffi:raw_call(Port, OptimizedCall) -> term() Types: -- Port = port() +- `Port = port()` -- OptimizedCall = {FuncIndex, Arg1, ...} +- `OptimizedCall = {FuncIndex, Arg1, ...}` -- FuncIndex = func_index() +- `FuncIndex = func_index()` -- Arg1, ... = term() +- `Arg1, ... = term()` Call a function preloaded with the 'preload' option of -`erl_ddll:load_library/3`_. +`erl_ddll:load_library/3`. This BIF accepts the following parameters: - **Port** A port opened towards the required driver/library (that - **must** have been loaded with `erl_ddll:load_library/3`_). + **must** have been loaded with `erl_ddll:load_library/3`). - **OptimizedCall** A tuple with the function index (i.e. its position in @@ -429,15 +440,16 @@ affect the Erlang VM, possibly making it crash. Use this BIF with extreme care. -### ffi:raw_buffer_to_binary/2 ### +`ffi:raw_buffer_to_binary/2` +---------------------------- ffi:raw_buffer_to_binary(Pointer, Size) -> binary() Types: -- Pointer = integer() +- `Pointer = integer()` -- Size = integer() +- `Size = integer()` Return a binary with a copy of Size bytes read from the given C pointer (represented by an integer, possibly returned by a FFI call). @@ -446,13 +458,14 @@ pointer (represented by an integer, possibly returned by a FFI call). Erlang VM to crash. Use with extreme care. -### ffi:raw_cstring_to_binary/1 ### +`ffi:raw_cstring_to_binary/1` +----------------------------- ffi:raw_cstring_to_binary(CString) -> binary() Types: -- CString = integer() +- `CString = integer()` Return a binary with a copy of the given NULL-terminated C string (an integer representing a pointer, possibly returned by a FFI call). The @@ -462,36 +475,37 @@ binary will include the trailing 0. VM to crash. Use with extreme care. -### ffi:call/3 ### +`ffi:call/3` +------------ call(Port, CFunc, Args) -> RetVal Types: -- Port = port() +- `Port = port()` -- CFunc = `c_func_name()`_ | `func_index()`_ - | `tagged_func_name()`_ | `tagged_func_index()`_ +- `CFunc = c_func_name() | func_index() + | tagged_func_name()_ | `tagged_func_index()` -- Args = [`tagged_value()`_] +- `Args = [tagged_value()] -- RetVal = `tagged_value()`_ +- `RetVal = tagged_value()` -Call the C function ``CFunc`` with the given list of arguments, using -the port ``Port``. If the function was preloaded with -ffi:load_library/3, all the type tags will be matched against the +Call the C function `CFunc` with the given list of arguments, using +the port `Port`. If the function was preloaded with +`ffi:load_library/3`, all the type tags will be matched against the preloaded signature before performing the call. Return the return value of the C function, with the proper type tag. -**Note:** if ``CFunc`` is not of type `tagged_func_name()`_, the C +**Note:** if ``CFunc`` is not of type `tagged_func_name()`, the C function will be called if and only if it was preloaded with -`erl_ddll:load_library/3`_ (it is required in order to determine its +`erl_ddll:load_library/3` (it is required in order to determine its return type). -As an example, the following ``malloc()`` calls are all valid and +As an example, the following `malloc()` calls are all valid and equivalent when executed after the code sample shown in -`erl_ddll:load_library/3`_: +`erl_ddll:load_library/3`: %% Use function name, but require preloads for return type {nonnull, Ptr1} = ffi:call(Port, "malloc", [{size_t, 1024}]), @@ -510,30 +524,32 @@ affect the Erlang VM, possibly making it crash. Use this BIF with extreme care. -### ffi:buffer_to_binary/2 ### +`ffi:buffer_to_binary/2` +------------------------ ffi:buffer_to_binary(TaggedNonNull, Size) -> binary() Types: -- TaggedNonNull = tuple(nonnull, integer()) +- `TaggedNonNull = tuple(nonnull, integer())` -- Size: integer() +- `Size: integer()` -Return a binary with a copy of ``Size`` bytes read from the given C +Return a binary with a copy of `Size` bytes read from the given C pointer. **WARNING:** passing a wrong pointer to this function may cause the Erlang VM to crash. Use with extreme care. -### ffi:cstring_to_binary/1 ### +`ffi:cstring_to_binary/1` +------------------------- ffi:cstring_to_binary(TaggedCString) -> binary() Types: -- TaggedCString = tuple(cstring, integer()) +- `TaggedCString = tuple(cstring, integer())` Return a binary with a copy of the given NULL-terminated C string. @@ -541,62 +557,67 @@ Return a binary with a copy of the given NULL-terminated C string. Erlang VM to crash. Use with extreme care. -### ffi:sizeof/1 ### +`ffi:sizeof/1` +-------------- ffi:sizeof(TypeTag) -> integer() Types: -- TypeTag: `type_tag()`_ +- `TypeTag: type_tag()` Return the size (in bytes) of the given FFI type, on the current platform. -### ffi:check/1 ### +`ffi:check/1` +------------- ffi:check(TaggedValue) -> true | false Types: -- TaggedValue = `tagged_value()`_ +- `TaggedValue = tagged_value()` Returns 'true' if the given type-tagged value is well-formed and consistent (i.e. it falls in the allowed range for its type, on the current platform). Otherwise, returns 'false'. -### ffi:min/1 ### +`ffi:min/1` +----------- ffi:min(TypeTag) -> integer() Types: -- TypeTag = `type_tag()`_ +- `TypeTag = type_tag()` Return the minimum value allowed for the given FFI type, on the current platform. -### ffi:max/1 ### +`ffi:max/1` +----------- ffi:max(TypeTag) -> integer() Types: -- TypeTag = `type_tag()`_ +- `TypeTag = type_tag()` Return the maximum value allowed for the given FFI type, on the current platform. -### ffi_hardcodes.hrl ### +`ffi_hardcodes.hrl` +------------------- The `ffi_hardcodes.hrl` file is part of the Erlang ffi library. It defines a set of macros for handling FFI types sizes, and for easy binary matching on C buffers and structures: -- **FFI_HARDCODED_** +- **`FFI_HARDCODED_`** An Erlang bit-syntax snippet (Size/TypeSpecifier) that could be used to match the given FFI type inside a binary (possibly @@ -609,11 +630,11 @@ binary matching on C buffers and structures: <> = Binary -- **FFI_HARDCODED_SIZEOF_** +- **`FFI_HARDCODED_SIZEOF_`** The type size in *bytes* -- **FFI_HARDCODED__BITS** +- **`FFI_HARDCODED__BITS`** The type size in *bits* @@ -622,34 +643,36 @@ As implied by their name, the `ffi_hardcodes.hrl` contents are hard-coded in the resulting ``.beam`` files. Thus, these macros should be avoided if a developer expects his/her FFI-based code to be *portable without recompilation*. The recommended method for getting -FFI type sizes in a portable way is the `ffi:sizeof/1`_ function. +FFI type sizes in a portable way is the `ffi:sizeof/1` function. -Further notes -------------- +Further notes +============= -### Notes on FFI preloading ### +Notes on FFI preloading +----------------------- -When a library is loaded with `erl_ddll:load_library/3`_, it may be +When a library is loaded with `erl_ddll:load_library/3`, it may be reloaded or unloaded just like any Erlang linked-in driver. If the 'preload' option is used, then two additional behaviors arise: -- if `erl_ddll:load_library/3`_ is called two or more times with the - same library, then the associated preload list must be rebuilt - according to the last call. If no 'preload' option is used, then - the last preloads (if any) must be kept intact; +- if `erl_ddll:load_library/3` is called two or more times with the + same library, then the associated preload list must be rebuilt + according to the last call. If no 'preload' option is used, then + the last preloads (if any) must be kept intact; -- if an erl_ddll:reload/2 is issued, then the last preloads must be - refreshed by performing a new symbol lookup in the loaded library. - If one or more symbols could not be found anymore, then they must be - disabled (and an error must raised when trying to use them with - `ffi:raw_call/2`_). +- if an `erl_ddll:reload/2` is issued, then the last preloads must be + refreshed by performing a new symbol lookup in the loaded library. + If one or more symbols could not be found anymore, then they must be + disabled (and an error must raised when trying to use them with + `ffi:raw_call/2`). -### Notes on vararg functions ### +Notes on vararg functions +------------------------- -`ffi:call/3`_ and `ffi:raw_call/3`_ may be used to call vararg C +`ffi:call/3` and `ffi:raw_call/3` may be used to call vararg C functions, simply by providing the desired number of arguments. In order to exploit the preloading optimizations, however, it is @@ -666,9 +689,10 @@ like the following one: {printf, {sint, cstring, cstring}}]}]). -### Notes on C pointers and Erlang binaries ### +Notes on C pointers and Erlang binaries +--------------------------------------- -As reported in the Appendix_, an Erlang binary can be passed to a C +As reported in the Appendix, an Erlang binary can be passed to a C function as a 'pointer' value. In this case, the C function will receive a pointer to the first byte of binary data. @@ -678,9 +702,10 @@ the 'binary' FFI type (see next paragraph) or copy the data itself in a safe place. -### Notes on Erlang binaries and reference counting ### +Notes on Erlang binaries and reference counting +----------------------------------------------- -As reported in the Appendix_, when the 'binary' FFI type is used as +As reported in the Appendix, when the 'binary' FFI type is used as argument, the C function will also receive a binary (in the form of an ``ErlDrvBinary`` pointer). Correspondingly, a C function with 'binary' FFI return type must return an ``ErlDrvBinary`` pointer. Furthermore, @@ -704,13 +729,14 @@ passed to, or returned from, the C side through a FFI call. call ``driver_binary_dec_refc()`` before returning. -### Notes on type-tagged values ### +Notes on type-tagged values +--------------------------- As reported above, the high-level FFI API is based on type-tagged values. Type tags, however, may introduce yet another way to annotate/represent the types of Erlang function parameters --- and it -may become an annoying redundancy, expecially now that type contracts -are (probably) going to be introduced in Erlang [12][]. +may become an annoying redundancy, expecially now that type [contracts][12] +are (probably) going to be introduced in Erlang. Thus, the high-level FFI API should be considered highly experimental and subject to change, depending on how type contracts will allow to @@ -719,8 +745,9 @@ to be explored if/when contracts will be available in the standard Erlang/OTP distribution. + Backwards Compatibility ------------------------ +======================= This EEP, and the proposed FFI patches (see below), do not introduce incompatibilities with the standard OTP release. However, three @@ -730,7 +757,7 @@ incompatibilities with the standard OTP release. However, three reach the refcount of 0 without errors or warnings (even when debugging). This is necessary in order to allow a C function to create a binary, drop its references and return it to the Erlang VM - (see `Notes on Erlang binaries and reference counting`_); + (see 'Notes on Erlang binaries and reference counting'); 2. as a consequence of the previous point, `driver_binary_inc_refc()` must be allowed to reach a minimum @@ -741,18 +768,19 @@ incompatibilities with the standard OTP release. However, three to be exposed as a stand-alone function, to be used by the FFI. + Reference implementation ------------------------- +======================== -An implementation of this EEP is available on [4][] as a set of patches -against OTP R11B-5. +An implementation of this EEP is available on [muvara.org][4] +as a set of patches against OTP R11B-5. -The code is based on the GCC FFI library (libffi) [5][]. libffi is +The code is based on the GCC FFI library [(libffi)][5]. libffi is multi-platform, can be packaged and used separately from the GCC -source code, and is released under a very permissive license [6]_ +source code, and is released under a very permissive [license][6] (compatible with the Erlang Public License). It has been used to implement the FFI interface of several applications and languages, -including [Python] [7]. +including [Python][7]. The current EEP implementation looks for libffi on the build system, and links the Erlang emulator against it (preferring the libffi shared @@ -769,51 +797,12 @@ possibly adopt the same approach depending on the developers' feedback. -References ----------- - -[1]: http://www1.erlang.org/documentation/doc-4.8.2/lib/ig-1.8/doc/index.html - "IG: the Erlang Interface Generator, Törnquist and Lundell" - -[2]: http://www.erlang.se/workshop/2002/Fritchie.pdf - "The Evolution of Erlang Drivers and the Erlang Driver Toolkit, Fritchie" - -[3]: http://dryverl.objectweb.org/ - "The Dryverl Erlang/C binding compiler" - -[4]: http://muvara.org/crs4/erlang/ffi - "Foreign Function Interface (FFI) for Erlang/OTP" - -[5]: http://gcc.gnu.org/viewcvs/trunk/libffi/ - "libffi: the GCC Foreign Function Interface Library" - -[6]: http://gcc.gnu.org/viewcvs/checkout/trunk/libffi/LICENSE - "The libffi license" - -[7]: http://python.net/crew/theller/ctypes/ - "The CPython package" - -[8]: http://www.cse.unsw.edu.au/~chak/haskell/ffi/ - "The Haskell 98 Foreign Function Interface" - -[9]: http://java.sun.com/j2se/1.5.0/docs/guide/jni/ - "The Java Native Interface" - -[10]: http://docs.python.org/ext/ext.html - "Extending and Embedding the Python Interpreter" - -[11]: http://docs.python.org/api/api.html - "Python/C API Reference Manual" - -[12]: http://user.it.uu.se/~kostis/Papers/contracts.pdf - "A Language for Specifying Type Contracts in Erlang and its - Interaction with Success Typings, Jiménez Lindahl and Sagonas - (Presented at the 2007 SIGPLAN Erlang Workshop)." Appendix --------- +======== -### Erlang-to-C automatic type conversions ### +Erlang-to-C automatic type conversions +-------------------------------------- The following table reports the Erlang-to-C conversions, used for passing Erlang terms as C function call arguments. @@ -852,7 +841,8 @@ passing Erlang terms as C function call arguments. ====================== =============================== -### C-to-Erlang automatic type conversions ### +C-to-Erlang automatic type conversions +-------------------------------------- The following table reports the C-to-Erlang conversions, used for converting C function return values into Erlang terms. @@ -890,24 +880,62 @@ converting C function return values into Erlang terms. ====================== =============================== + +[1]: http://www1.erlang.org/documentation/doc-4.8.2/lib/ig-1.8/doc/index.html + "IG: the Erlang Interface Generator, Törnquist and Lundell" + +[2]: http://www.erlang.se/workshop/2002/Fritchie.pdf + "The Evolution of Erlang Drivers and the Erlang Driver Toolkit, Fritchie" + +[3]: http://dryverl.objectweb.org/ + "The Dryverl Erlang/C binding compiler" + +[4]: http://muvara.org/crs4/erlang/ffi + "Foreign Function Interface (FFI) for Erlang/OTP" + +[5]: http://gcc.gnu.org/viewcvs/trunk/libffi/ + "libffi: the GCC Foreign Function Interface Library" + +[6]: http://gcc.gnu.org/viewcvs/checkout/trunk/libffi/LICENSE + "The libffi license" + +[7]: http://python.net/crew/theller/ctypes/ + "The CPython package" + +[8]: http://www.cse.unsw.edu.au/~chak/haskell/ffi/ + "The Haskell 98 Foreign Function Interface" + +[9]: http://java.sun.com/j2se/1.5.0/docs/guide/jni/ + "The Java Native Interface" + +[10]: http://docs.python.org/ext/ext.html + "Extending and Embedding the Python Interpreter" + +[11]: http://docs.python.org/api/api.html + "Python/C API Reference Manual" + +[12]: http://user.it.uu.se/~kostis/Papers/contracts.pdf + "A Language for Specifying Type Contracts in Erlang and its Interaction with Success Typings, Jiménez Lindahl and Sagonas (Presented at the 2007 SIGPLAN Erlang Workshop)." + + + Copyright ---------- +========= Copyright (C) 2007 by CRS4 (Center for Advanced Studies, Research and -Development in Sardinia) - http://www.crs4.it/ +Development in Sardinia) - Author: Alceste Scalas This EEP is released under the terms of the Creative Commons -Attribution 3.0 License. See -http://creativecommons.org/licenses/by/3.0/ - - -.. - Local Variables: - mode: indented-text - indent-tabs-mode: nil - sentence-end-double-space: t - fill-column: 70 - coding: utf-8 - End: +Attribution 3.0 License. See + + + +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0008.md b/eeps/eep-0008.md index e93333a..d3e5c96 100644 --- a/eeps/eep-0008.md +++ b/eeps/eep-0008.md @@ -1,14 +1,15 @@ -EEP 8: Types and function specifications -==== - - Author: Tobias Lindahl [tobias(dot)lindahl(at)it(dot)uu(dot)se], Kostis Sagonas [kostis(at)it(dot)uu(dot)se] + Author: Tobias Lindahl , + Kostis Sagonas Status: Draft Type: Standards Track - Content-Type: text/x-markdown Created: 2-Dec-2007 Erlang-Version: R12B + Post-History: +**** +EEP 8: Types and function specifications +---- + -==== Abstract ======== @@ -339,17 +340,21 @@ The main limitation is the inability to define recursive types. +[EEP-8]: "EEP 8 Source" + + + Copyright ========= This document has been placed in the public domain. -.. - Local Variables: - mode: indented-text - indent-tabs-mode: nil - sentence-end-double-space: t - fill-column: 70 - coding: utf-8 - End: + +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0009.md b/eeps/eep-0009.md index 0fc2986..3206c75 100644 --- a/eeps/eep-0009.md +++ b/eeps/eep-0009.md @@ -1,15 +1,14 @@ -EEP 9: Library for working with binaries -==== - - Author: Fredrik Svahn [Fredrik(dot)Svahn(at)gmail] + Author: Fredrik Svahn Status: Draft Type: Standards Track - Content-Type: text/x-markdown Created: 28-Dec-2007 Erlang-Version: R12B-2 Post-History: +**** +EEP 9: Library for working with binaries +---- + -==== Abstract ======== @@ -386,8 +385,8 @@ the old regexp module): During a first round of feedback it has been suggested that the final implementation should be a built in function based on the -Perl Compatible Regular Expressions (PCRE) library [1]. It is -optimised, well supported, and is more or less considered a standard +[Perl Compatible Regular Expressions (PCRE) library][1]. It is +optimised, well supported, and is more or less considered a [standard][2] today. It is used in a number of prominent products and projects, e.g. Apples Safari, Apache, KDE, PHP, Postfix and Nmap. @@ -494,7 +493,7 @@ important using the reference implementation. Some examples: 1. Searching for a non-existing 1 and 3 byte binary in a ~1 Mb binary. Notice how binary:match/2 gets faster the longer the needle is thanks to - the O(n/m) algorithm. All times in microseconds. + the O(n/m) [algorithm][3]. All times in microseconds. Search for: 1 byte 3 bytes --------------------------------------- @@ -527,15 +526,18 @@ Reference implementation A reference implementation has been provided to the OTP team. -References -========== -[1] http://en.wikipedia.org/wiki/PCRE +[EEP-9]: "EEP 9 Source" + +[1]: http://en.wikipedia.org/wiki/PCRE + +[2]: http://www.pcre.org/pcre.txt + "Man page for pcrematching" + + +[3]: http://swtch.com/~rsc/regexp/regexp1.html -[2] see man page for pcrematching, also available here: - http://www.pcre.org/pcre.txt -[3] http://swtch.com/~rsc/regexp/regexp1.html Copyright ========= @@ -544,4 +546,10 @@ This document is licensed under the Creative Commons license. - +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0010.md b/eeps/eep-0010.md index d46b5bc..a126182 100644 --- a/eeps/eep-0010.md +++ b/eeps/eep-0010.md @@ -1,20 +1,19 @@ -EEP 10: Representing Unicode characters in Erlang -==== - Author: Patrik Nyblom Status: Draft Type: Standards Track - Content-Type: text/x-markdown - Created: 07-05-2008 + Created: 07-may-2008 Erlang-Version: R12B-4 Post-History: 01-jan-1970 +**** +EEP 10: Representing Unicode characters in Erlang +---- + -==== Abstract ======== -This EEP suggest a standard representation of Unicode [2]_ characters in +This EEP suggest a standard representation of [Unicode][2] characters in Erlang, as well as the basic functionality to deal with them. Motivation @@ -93,7 +92,7 @@ string is encoded in the Unicode encoding standard UTF-32, one Unicode character per position. However, the currently most common representation of Unicode -characters is UTF-8 [1]_, in which the characters are stored in one to +characters is [UTF-8][1], in which the characters are stored in one to four 8-bit entities organized in such way that plain 7-bit US ASCII is untouched, while characters 128 and upwards are split over more than one byte. The advantage of this coding is that e.g. characters having @@ -755,23 +754,26 @@ The io-protocol need to be changed to always handle Unicode characters. Options given when opening a file will allow for implicit conversion of text files. -References -========== -.. [1] http://www.ietf.org/rfc/rfc3629.txt - The UTF-8 RFC. -.. [2] http://www.unicode.org/ - The Unicode homepage, containing downloadable versions of the standard(s). + +[EEP-10]: "EEP 10 Source" + +[1]: http://www.ietf.org/rfc/rfc3629.txt + "The UTF-8 RFC" +[2]: http://www.unicode.org/ + "The Unicode homepage, containing downloadable versions of the standard(s)" Copyright ========= This document has been placed in the public domain. -.. - Local Variables: - mode: indented-text - indent-tabs-mode: nil - sentence-end-double-space: t - fill-column: 70 - coding: utf-8 - End: + +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0011.md b/eeps/eep-0011.md index c0e0cc6..c9d08dc 100644 --- a/eeps/eep-0011.md +++ b/eeps/eep-0011.md @@ -1,15 +1,15 @@ -EEP 11: Built in regular expressions in Erlang -==== - Author: Patrik Nyblom - Status: Draft + Status: Accepted/R12B-3u Proposal is implemented in OTP release R12B-3, + except for Unicode support according to EEP 10 Type: Standards Track - Content-Type: text/x-markdown - Created: 04-06-2008 + Created: 04-Jun-2008 Erlang-Version: R12B-5 Post-History: 01-Jan-1970 +**** +EEP 11: Built in regular expressions in Erlang +---- + -==== Abstract ======== @@ -161,7 +161,7 @@ situation fulfill the following wishes: - The library should provide Unicode support. No available regular expression library currently provides a perfect -match. The best available is the PCRE [1]_ library, which has compile time +match. The best available is the [PCRE][] library, which has compile time options for not using the C stack, Perl (and Python) compatible regular expressions and also is written in a well structured way, making it suitable for integration, porting and implementing @@ -185,7 +185,7 @@ conclusion that PCRE was the best choice for the following reasons: - Widely spread: Used in Apache, PHP, Apple Safari etc. - The regexp engine is pure C. - Unicode support (UTF-8) which fits nicely into the suggested - Unicode representation in Erlang (EEP 10). + Unicode representation in Erlang ([EEP 10][]). - Recursion on the C stack can be avoided. - The library has most of the infrastructure for an interruptable execution of the expressions present, although restarting of @@ -253,408 +253,690 @@ Here follows part of the suggested manual page: ### Excerpt from a suggested manual page ### -#### DATA TYPES ##### +#### DATA TYPES #### + + iodata() = iolist() | binary() + iolist() = [char() | binary() | iolist()] + % a binary is allowed as the tail of the list + + mp() = Opaque datatype containing a compiled regular expression. - - iodata() = iolist() | binary() - - iolist() = [char() | binary() | iolist()] - * a binary is allowed as the tail of the list - - mp() = Opaque datatype containing a compiled regular expression. #### EXPORTS #### -**compile(Regexp) -> {** ``ok`` **, MP} | {** ``error`` **, ErrSpec}** + + +##### compile(Regexp) -> {ok, MP} | {error, ErrSpec} Types: -- Regexp = iodata() + Regexp = iodata() The same as compile(Regexp,[]) -**compile(Regexp,Options) -> {** ``ok`` **, MP} | {** ``error`` **, ErrSpec}** -Types: -- Regexp = iodata() -- Options = [ Option ] -- Option = anchored | caseless | dollar_endonly | dotall | extended | firstline | multiline | no_auto_capture | dupnames | ungreedy | {newline, NLSpec} -- NLSpec = cr | crlf | lf | anycrlf -- MP = mp() -- ErrSpec = {ErrString, Position} -- ErrString = string() -- Position = int() +##### compile(Regexp,Options) -> {ok, MP} | {error, ErrSpec} -This function compiles a regular expression with the syntax described below into an internal format to be used later as a parameter to the run/2,3 functions. +Types: -Compiling the regular expression before matching is useful if the same expression is to be used in matching against multiple subjects during the program's lifetime. Compiling once and executing many times is far more efficient than compiling each time one wants to match. + Regexp = iodata() + Options = [ Option ] + Option = anchored | caseless | dollar_endonly | dotall | extended | + firstline | multiline | no_auto_capture | dupnames | + ungreedy | {newline, NLSpec} + NLSpec = cr | crlf | lf | anycrlf + MP = mp() + ErrSpec = {ErrString, Position} + ErrString = string() + Position = int() + +This function compiles a regular expression with the syntax described below +into an internal format to be used later as a parameter to the run/2,3 functions. + +Compiling the regular expression before matching is useful if the same +expression is to be used in matching against multiple subjects during the +program's lifetime. Compiling once and executing many times is far more +efficient than compiling each time one wants to match. The options have the following meanings: -``anchored`` - The pattern is forced to be "anchored", that is, it is constrained to match only at the first matching point in the string that is being searched (the "subject string"). This effect can also be achieved by appropriate constructs in the pattern itself. -``caseless`` - Letters in the pattern match both upper and lower case letters. It is equivalent to Perl's /i option, and it can be changed within a pattern by a (?i) option setting. Uppercase and lowercase letters are defined as in the ISO-8859-1 character set. -``dollar_endonly`` - A dollar metacharacter in the pattern matches only at the end of the subject string. Without this option, a dollar also matches immediately before a newline at the end of the string (but not before any other newlines). The dollar_endonly option is ignored if multiline is given. There is no equivalent option in Perl, and no way to set it within a pattern. -``dotall`` - A dot maturate in the pattern matches all characters, including those that indicate newline. Without it, a dot does not match when the current position is at a newline. This option is equivalent to Perl's /s option, and it can be changed within a pattern by a (?s) option setting. A negative class such as [^a] always matches newline characters, independent of the setting of this option. -``extended`` - Whitespace data characters in the pattern are ignored except when escaped or inside a character class. Whitespace does not include the VT character (ASCII 11). In addition, characters between an unescaped # outside a character class and the next newline, inclusive, are also ignored. This is equivalent to Perl's /x option, and it can be changed within a pattern by a (?x) option setting. This option makes it possible to include comments inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence (?( which introduces a conditional subpattern. -``firstline`` - An unanchored pattern is required to match before or at the first newline in the subject string, though the matched text may continue over the newline. -``multiline`` - By default, PCRE treats the subject string as consisting of a single line of characters (even if it actually contains newlines). The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, or before a terminating newline (unless dollar_endonly is given). This is the same as Perl. - When multiline it is given, the "start of line" and "end of line" constructs match immediately following or immediately before internal newlines in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m option, and it can be changed within a pattern by a (?m) option setting. If there are no newlines in a subject string, or no occurrences of ^ or $ in a pattern, setting multiline has no effect. -``no_auto_capture`` - Disables the use of numbered capturing parentheses in the pattern. Any opening parenthesis that is not followed by ? behaves as if it were followed by ?: but named parentheses can still be used for capturing (and they acquire numbers in the usual way). There is no equivalent of this option in Perl. -``dupnames`` - Names used to identify capturing subpatterns need not be unique. This can be helpful for certain types of pattern when it is known that only one instance of the named subpattern can ever be matched. There are more details of named subpatterns below -``ungreedy`` - This option inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by "?". It is not compatible with Perl. It can also be set by a (?U) option setting within the pattern. -{``newline`` , NLSpec} +* `anchored` + The pattern is forced to be "anchored", that is, it is constrained to match + only at the first matching point in the string that is being searched + (the "subject string"). This effect can also be achieved by appropriate + constructs in the pattern itself. + +* `caseless` + Letters in the pattern match both upper and lower case letters. + It is equivalent to Perl's `/i` option, and it can be changed within + a pattern by a `(?i)` option setting. Uppercase and lowercase letters + are defined as in the ISO-8859-1 character set. + +* `dollar_endonly` + A dollar metacharacter in the pattern matches only at the end of the subject + string. Without this option, a dollar also matches immediately before a newline + at the end of the string (but not before any other newlines). The dollar_endonly + option is ignored if multiline is given. There is no equivalent option in Perl, + and no way to set it within a pattern. + +* `dotall` + A dot maturate in the pattern matches all characters, including those that + indicate newline. Without it, a dot does not match when the current position + is at a newline. This option is equivalent to Perl's `/s` option, and it + can be changed within a pattern by a `(?s)` option setting. A negative class + such as `[^a]` always matches newline characters, + independent of the setting of this option. + +* `extended` + Whitespace data characters in the pattern are ignored except when escaped or + inside a character class. Whitespace does not include the VT character + (ASCII 11). In addition, characters between an unescaped `#` outside a + character class and the next newline, inclusive, are also ignored. This is + equivalent to Perl's `/x` option, and it can be changed within a pattern by + a `(?x)` option setting. This option makes it possible to include comments + inside complicated patterns. Note, however, that this applies only to data + characters. Whitespace characters may never appear within special character + sequences in a pattern, for example within the sequence `(?(` which introduces + a conditional subpattern. + +* `firstline` + An unanchored pattern is required to match before or at the first newline + in the subject string, though the matched text may continue over the newline. + +* `multiline` + By default, PCRE treats the subject string as consisting of a single line of + characters (even if it actually contains newlines). The "start of line" + metacharacter (`^`) matches only at the start of the string, while the + "end of line" metacharacter (`$`) matches only at the end of the string, + or before a terminating newline (unless dollar_endonly is given). This is + the same as Perl. + + When multiline it is given, the "start of line" and "end of line" constructs + match immediately following or immediately before internal newlines in the + subject string, respectively, as well as at the very start and end. This is + equivalent to Perl's `/m` option, and it can be changed within a pattern by + a `(?m)` option setting. If there are no newlines in a subject string, + or no occurrences of `^` or `$` in a pattern, setting multiline has no effect. + +* `no_auto_capture` + Disables the use of numbered capturing parentheses in the pattern. + Any opening parenthesis that is not followed by `?` behaves as if it were + followed by `?:` but named parentheses can still be used for capturing + (and they acquire numbers in the usual way). There is no equivalent + of this option in Perl. + +* `dupnames` + Names used to identify capturing subpatterns need not be unique. + This can be helpful for certain types of pattern when it is known + that only one instance of the named subpattern can ever be matched. + There are more details of named subpatterns below. + +* `ungreedy` + This option inverts the "greediness" of the quantifiers so that they + are not greedy by default, but become greedy if followed by `?`. + It is not compatible with Perl. It can also be set by a `(?U)` option + setting within the pattern. + +* `{newline, NLSpec}` Override the default definition of a newline in the subject string, which is LF (ASCII 10) in Erlang. - ``cr`` + - `cr` Newline is indicated by a single character CR (ASCII 13) - ``lf`` + - `lf` Newline is indicated by a single character LF (ASCII 10), the default - ``crlf`` + - `crlf` Newline is indicated by the two-character CRLF (ASCII 13 followed by ASCII 10) sequence. - ``anycrlf`` + - `anycrlf` Any of the three preceding sequences should be recognized. -**run(Subject,RE) -> {** ``match`` **, Captured} |** ``nomatch`` **| {** ``error`` **, ErrSpec}** -Types: -- Subject = iodata() -- RE = mp() | iodata() -- Captured = [ CaptureData ] -- CaptureData = {int(),int()} | string() | binary() -- ErrSpec = {ErrString, Position} -- ErrString = string() -- Position = int() - -The same as run(Subject,RE,[]). - -**run(Subject,RE) -> {** ``match`` **, Captured} |** ``match`` **|** ``nomatch`` **| {** ``error`` **, ErrSpec}** +##### run(Subject,RE) -> {match, Captured} | nomatch | {error, ErrSpec} Types: -- Subject = iodata() -- RE = mp() | iodata() -- Options = [ Option ] -- Option = anchored | global | notbol | noteol | notempty | {offset, int()} | {newline, NLSpec} | {capture, ValueSpec} | {capture, ValueSpec, Type} | CompileOpt -- Type = index | list | binary -- ValueSpec = all | all_but_first | first | ValueList -- ValueList = [ ValueID ] -- ValueID = int() | string() | atom() -- CompileOpt = see compile/2 above -- NLSpec = cr | crlf | lf | anycrlf -- Captured = [ CaptureData ] | [ [ CaptureData ] ... ] -- CaptureData = {int(),int()} | string() | binary() -- ErrSpec = {ErrString, Position} -- ErrString = string() -- Position = int() + Subject = iodata() + RE = mp() | iodata() + Captured = [ CaptureData ] + CaptureData = {int(),int()} | string() | binary() + ErrSpec = {ErrString, Position} + ErrString = string() + Position = int() + +The same as run(Subject,RE,[]). -Executes a regexp matching, returning ``match`` /{ ``match`` , Captured} or ``nomatch`` . The regular expression can be given either as iodata() in which case it is automatically compiled (as by re:compile/2) and executed, or as a pre compiled mp() in which case it is executed against the subject directly. -When compilation is involved, the function may return compilation errors as when compiling separately ({ ``error`` , {string(),int()}}); when only matching, no errors are returned. -If the regular expression is previously compiled, the option list can only contain the options ``anchored``, ``global``, ``notbol``, ``noteol``, ``notempty``, { ``offset`` , int()}, { ``newline`` , NLSpec} and { ``capture`` , ValueSpec}/{ ``capture`` , ValueSpec, Type}. Otherwise all options valid for the re:compile/2 function are allowed as well. Options allowed both for compilation and execution of a match, namely ``anchored`` and { ``newline`` , NLSpec}, will affect both the compilation and execution if present together with a non pre-compiled regular expression. +##### run(Subject,RE) -> {match, Captured} | match | nomatch | {error, ErrSpec} -The { ``capture`` , ValueSpec}/{ ``capture`` , ValueSpec, Type} defines what to return from the function upon successful matching. The capture tuple may contain both a value specification telling which of the captured substrings are to be returned, and a type specification, telling how captured substrings are to be returned (as index tuples, lists or binaries). The capture option makes the function quite flexible and powerful. The different options are described in detail below +Types: -If the capture options describe that no substring capturing at all is to be done ({ ``capture`` , ``none`` }), the function will return the single atom match upon successful matching, otherwise the tuple { ``match`` , ValueList} is returned. Disabling capturing can be done either by specifying none or an empty list as ValueSpec. + Subject = iodata() + RE = mp() | iodata() + Options = [ Option ] + Option = anchored | global | notbol | noteol | notempty | {offset, int()} | + {newline, NLSpec} | {capture, ValueSpec} | + {capture, ValueSpec, Type} | CompileOpt + Type = index | list | binary + ValueSpec = all | all_but_first | first | ValueList + ValueList = [ ValueID ] + ValueID = int() | string() | atom() + CompileOpt = see compile/2 above + NLSpec = cr | crlf | lf | anycrlf + Captured = [ CaptureData ] | [ [ CaptureData ] ... ] + CaptureData = {int(),int()} | string() | binary() + ErrSpec = {ErrString, Position} + ErrString = string() + Position = int() + +Executes a regexp matching, returning `match` / `{match, Captured}` or +`nomatch`. The regular expression can be given either as iodata() +in which case it is automatically compiled (as by re:compile/2) +and executed, or as a pre compiled mp() in which case it is executed +against the subject directly. + +When compilation is involved, the function may return compilation errors +as when compiling separately (`{error, {string(),int()}}`); when +only matching, no errors are returned. + +If the regular expression is previously compiled, the option list can +only contain the options `anchored`, `global`, `notbol`, `noteol`, `notempty`, +`{offset, int()}`, `{newline, NLSpec}` and `{capture, ValueSpec}` / +`{capture, ValueSpec, Type}`. Otherwise all options valid for the +`re:compile/2` function are allowed as well. Options allowed both +for compilation and execution of a match, namely ``anchored`` and +`{newline, NLSpec}`, will affect both the compilation and execution +if present together with a non pre-compiled regular expression. + +The `{capture, ValueSpec}` / `{capture, ValueSpec, Type}` defines +what to return from the function upon successful matching. The capture +tuple may contain both a value specification telling which of the captured +substrings are to be returned, and a type specification, telling how +captured substrings are to be returned (as index tuples, lists or binaries). +The capture option makes the function quite flexible and powerful. +The different options are described in detail below + +If the capture options describe that no substring capturing at all is to be +done (`{capture, none}`), the function will return the single atom match +upon successful matching, otherwise the tuple `{match, ValueList}` +is returned. Disabling capturing can be done either by specifying +none or an empty list as ValueSpec. A description of all the options relevant for execution follows: -``anchored`` - Limits re:run/3 to matching at the first matching position. If a pattern was compiled with anchored, or turned out to be anchored by virtue of its contents, it cannot be made unachored at matching time, hence there is no unanchored option. -``global`` - Implements global (repetitive) search as the g flag in i.e. Perl. Each match found is returned as a separate list() containing the specific match as well as any matching subexpressions (or as specified by the capture option). The Captured part of the return value will hence be a list() of list()'s when this option is given. - When the regular expression matches an empty string, the behaviour might seem non-intuitive, why the behaviour requites some clarifying. With the global option, re:run/3 handles empty matches in the same way as Perl, meaning that a match at any point giving an empty string (with length 0) will be retried with the options [anchored, notempty] as well. If that search gives a result of length > 0, the result is included. An example:: +* `anchored` + Limits `re:run/3` to matching at the first matching position. If a pattern + was compiled with anchored, or turned out to be anchored by virtue of its + contents, it cannot be made unachored at matching time, hence there is no + unanchored option. + +* `global` + Implements global (repetitive) search as the `/g` flag in i.e. Perl. + Each match found is returned as a separate list() containing + the specific match as well as any matching subexpressions (or as + specified by the capture option). The Captured part of the return + value will hence be a list() of list()'s when this option is given. + + When the regular expression matches an empty string, the behaviour + might seem non-intuitive, why the behaviour requites some clarifying. + With the global option, `re:run/3` handles empty matches in the same way + as Perl, meaning that a match at any point giving an empty string + (with length 0) will be retried with the options `[anchored, notempty]` + as well. If that search gives a result of length > 0, the result + is included. An example: re:run("cat","(|at)",[global]). The matching will be performed as following: - At offset 0 - The regexp ``(|at)`` will first match at the initial position of - the string cat, giving the result set [{0,0},{0,0}] (the - second {0,0} is due to the subexpression marked by the + - **At offset 0** + The regexp `(|at)` will first match at the initial position of + the string cat, giving the result set `[{0,0},{0,0}]` (the + second `{0,0}` is due to the subexpression marked by the parentheses). As the length of the match is 0, we don't advance to the next position yet. - At offset 0 with [ ``anchored`` , ``notempty`` ] - The search is retried with the options [anchored, notempty] at the same position, which does not give any interesting result of longer length, why the search position is now advanced to the next character (a). - At offset 1 - Now the search results in [{1,0},{1,0}] meaning this search will also be repeated with the extra options. - At offset 1 with [ ``anchored`` , ``notempty`` ] - Now the ab alternative is found and the result will be [{1,2},{1,2}]. The result is added to the list of results and the position in the search string is advanced two steps. - At offset 3 - The search now once again matches the empty string, giving [{3,0},{3,0}]. - At offset 1 with [ ``anchored`` , ``notempty`` ] - This will give no result of length > 0 and we are at the last position, so the global search is complete. - - The result of the call is:: - - {match,[[{0,0},{0,0}],[{1,0},{1,0}],[{1,2},{1,2}],[{3,0},{3,0}]]} -``notempty`` - An empty string is not considered to be a valid match if this option is given. If there are alternatives in the pattern, they are tried. If all the alternatives match the empty string, the entire match fails. For example, if the pattern:: + - **At offset 0 with `[anchored, notempty]`** + The search is retried with the options [anchored, notempty] at + the same position, which does not give any interesting result of + longer length, why the search position is now advanced to the next + character (`a`). + + - **At offset 1** + Now the search results in `[{1,0}, {1,0}]` meaning this search + will also be repeated with the extra options. + - **At offset 1 with `[anchored, notempty]`** + Now the ab alternative is found and the result will be + `[{1,2}, {1,2}]`. The result is added to the list of results + and the position in the search string is advanced two steps. + - **At offset 3** + The search now once again matches the empty string, + giving `[{3,0}, {3,0}]`. + - **At offset 1 with `[anchored, notempty]** + This will give no result of length > 0 and we are at the last + position, so the global search is complete. + + The result of the call is: + + {match,[[{0,0},{0,0}],[{1,0},{1,0}],[{1,2},{1,2}],[{3,0},{3,0}]]} + +* `notempty` + An empty string is not considered to be a valid match if this option + is given. If there are alternatives in the pattern, they are tried. + If all the alternatives match the empty string, the entire match fails. + For example, if the pattern: a?b? - is applied to a string not beginning with "a" or "b", it matches the empty string at the start of the subject. With notempty given, this match is not valid, so re:run/3 searches further into the string for occurrences of "a" or "b". - Perl has no direct equivalent of notempty, but it does make a special case of a pattern match of the empty string within its split() function, and when using the /g modifier. It is possible to emulate Perl's behavior after matching a null string by first trying the match again at the same offset with notempty and anchored, and then if that fails by advancing the starting offset (see below) and trying an ordinary match again. -``notbol`` - This option specifies that the first character of the subject string is not the beginning of a line, so the circumflex metacharacter should not match before it. Setting this without multiline (at compile time) causes circumflex never to match. This option affects only the behavior of the circumflex metacharacter. It does not affect \A. -``noteol`` - This option specifies that the end of the subject string is not the end of a line, so the dollar metacharacter should not match it nor (except in multiline mode) a newline immediately before it. Setting this without multiline (at compile time) causes dollar never to match. This option affects only the behavior of the dollar metacharacter. It does not affect \Z or \z. -{ ``offset`` , int()} - Start matching at the offset (position) given in the subject string. The offset is zero-based, so that the default is {offset,0} (all of the subject string). -{ ``newline`` , NLSpec} - Override the default definition of a newline in the subject string, which is LF (ASCII 10) in Erlang. - - ``cr`` - Newline is indicated by a single character CR (ASCII 13) - ``lf`` - Newline is indicated by a single character LF (ASCII 10), the default - ``crlf`` - Newline is indicated by the two-character CRLF (ASCII 13 followed by ASCII 10) sequence. - ``anycrlf`` - Any of the three preceding sequences should be recognized. - -{ ``capture`` , ValueSpec}/{ ``capture`` , ValueSpec, Type} - Specifies which captured substrings are returned and in what format. By default, re:run/3 captures all of the matching part of the substring as well as all capturing subpatterns (all of the pattern is automatically captured). The default return type is (zero-based) indexes of the captured parts of the string, given as {Offset,Length} pairs (the index Type of capturing). - As an example of the default behavior, the following call:: + is applied to a string not beginning with "a" or "b", it matches the + empty string at the start of the subject. With notempty given, + this match is not valid, so `re:run/3` searches further into the string + for occurrences of "a" or "b". + + Perl has no direct equivalent of notempty, but it does make a special + case of a pattern match of the empty string within its `split()` function, + and when using the `/g` modifier. It is possible to emulate Perl's + behavior after matching a null string by first trying the match + again at the same offset with notempty and anchored, and then + if that fails by advancing the starting offset (see below) + and trying an ordinary match again. + +* `notbol` + This option specifies that the first character of the subject string + is not the beginning of a line, so the circumflex metacharacter should + not match before it. Setting this without multiline (at compile time) + causes circumflex never to match. This option affects only the behavior + of the circumflex metacharacter. It does not affect `\A`. + +* `noteol` + This option specifies that the end of the subject string is not the end + of a line, so the dollar metacharacter should not match it nor + (except in multiline mode) a newline immediately before it. + Setting this without multiline (at compile time) causes dollar + never to match. This option affects only the behavior of the dollar + metacharacter. It does not affect `\Z` or `\z`. + +* `{offset`` , int()}` + Start matching at the offset (position) given in the subject string. + The offset is zero-based, so that the default is `{offset,0}` + (all of the subject string). + +* `{newline, NLSpec}` + Override the default definition of a newline in the subject string, + which is LF (ASCII 10) in Erlang. + + - `cr` + Newline is indicated by a single character CR (ASCII 13). + - `lf` + Newline is indicated by a single character LF (ASCII 10), + the default. + - `crlf` + Newline is indicated by the two-character CRLF + (ASCII 13 followed by ASCII 10) sequence. + - `anycrlf` + Any of the three preceding sequences should be recognized + +* `{capture, ValueSpec}` / `{capture, ValueSpec, Type}` + Specifies which captured substrings are returned and in what format. + By default, `re:run/3` captures all of the matching part of the substring + as well as all capturing subpatterns (all of the pattern is + automatically captured). The default return type is (zero-based) + indexes of the captured parts of the string, given as `{Offset,Length}` + pairs (the index Type of capturing). + + As an example of the default behavior, the following call: re:run("ABCabcdABC","abcd",[]). - returns, as first and only captured string the matching part of the subject ("abcd" in the middle) as a index pair {3,4}, where character positions are zero based, just as in offsets. The return value of the call above would then be:: + returns, as first and only captured string the matching part of + the subject ("abcd" in the middle) as a index pair `{3,4}`, where + character positions are zero based, just as in offsets. The return + value of the call above would then be: {match,[{3,4}]} - Another (and quite common) case is where the regular expression matches all of the subject, as in:: + Another (and quite common) case is where the regular expression + matches all of the subject, as in: re:run("ABCabcdABC",".*abcd.*",[]). - where the return value correspondingly will point out all of the string, beginning at index 0 and being 10 characters long:: + where the return value correspondingly will point out all of the string, + beginning at index 0 and being 10 characters long: {match,[{0,10}]} - If the regular expression contains capturing subpatterns, like in the following case:: + If the regular expression contains capturing subpatterns, + like in the following case: re:run("ABCabcdABC",".*(abcd).*",[]). - all of the matched subject is captured, as well as the captured substrings:: + all of the matched subject is captured, as well as + the captured substrings: {match,[{0,10},{3,4}]} - the complete matching pattern always giving the first return value in the list and the rest of the subpatterns being added in the order they occurred in the regular expression. + the complete matching pattern always giving the first return value in + the list and the rest of the subpatterns being added in the order they + occurred in the regular expression. + The capture tuple is built up as follows: - ValueSpec - Specifies which captured (sub)patterns are to be returned. The ValueSpec can either be an atom describing a predefined set of return values, or a list containing either the indexes or the names of specific subpatterns to return. - The predefined sets of subpatterns are: + - `ValueSpec` + Specifies which captured (sub)patterns are to be returned. + The `ValueSpec` can either be an atom describing a predefined set + of return values, or a list containing either the indexes or the + names of specific subpatterns to return. - ``all`` - All captured subpatterns including the complete matching string. This is the default. - ``first`` - Only the first captured subpattern, which is always the complete matching part of the subject. All explicitly captured subpatterns are discarded. - ``all_but_first`` - All but the first matching subpattern, i.e. all explicitly captured subpatterns, but not the complete matching part of the subject string. This is useful if the regular expression as a whole matches a large part of the subject, but the part you're interested in is in an explicitly captured subpattern. If the return type is list or binary, not returning subpatterns you're not interested in is a good way to optimize. - ``none`` - Do not return matching subpatterns at all, yielding the single atom match as the return value of the function when matching successfully instead of the {match, list()} return. Specifying an empty list gives the same behavior. + The predefined sets of subpatterns are: - The value list is a list of indexes for the subpatterns to return, where index 0 is for all of the pattern, and 1 is for the first explicit capturing subpattern in the regular expression, and so forth. When using named captured subpatterns (see below) in the regular expression, one can use atom()'s or string()'s to specify the subpatterns to be returned. This deserves an example, consider the following regular expression:: + + `all` + All captured subpatterns including the complete matching string. + This is the default. + + + `first` + Only the first captured subpattern, which is always the complete + matching part of the subject. All explicitly captured subpatterns + are discarded. + + + `all_but_first` + All but the first matching subpattern, i.e. all explicitly + captured subpatterns, but not the complete matching part of the + subject string. This is useful if the regular expression as + a whole matches a large part of the subject, but the part you're + interested in is in an explicitly captured subpattern. + If the return type is list or binary, not returning subpatterns + you're not interested in is a good way to optimize. + + + `none` + Do not return matching subpatterns at all, yielding the single + atom match as the return value of the function when matching + successfully instead of the {match, list()} return. Specifying + an empty list gives the same behavior. + + The value list is a list of indexes for the subpatterns to return, + where index 0 is for all of the pattern, and 1 is for the first + explicit capturing subpattern in the regular expression, + and so forth. When using named captured subpatterns (see below) + in the regular expression, one can use `atom()`'s or `string()`'s + to specify the subpatterns to be returned. This deserves an example, + consider the following regular expression:: ".*(abcd).*" - matched against the string ""ABCabcdABC", capturing only the "abcd" part (the first explicit subpattern):: + matched against the string `"ABCabcdABC"`, capturing only the + `"abcd"` part (the first explicit subpattern): re:run("ABCabcdABC",".*(abcd).*",[{capture,[1]}]). - The call will yield the following result:: + The call will yield the following result: {match,[{3,4}]} - as the first explicitly captured subpattern is "(abcd)", matching "abcd" in the subject, at (zero-based) position 3, of length 4. - Now consider the same regular expression, but with the subpattern explicitly named 'FOO':: + as the first explicitly captured subpattern is `"(abcd)"`, + matching `"abcd"` in the subject, at (zero-based) position 3, + of length 4. + + Now consider the same regular expression, but with the subpattern + explicitly named `'FOO'`: ".*(?abcd).*" - With this expression, we could still give the index of the subpattern with the following call:: + With this expression, we could still give the index of the subpattern + with the following call:: re:run("ABCabcdABC",".*(?abcd).*",[{capture,[1]}]). - giving the same result as before. But as the subpattern is named, we can also give its name in the value list:: + giving the same result as before. But as the subpattern is named, + we can also give its name in the value list:: re:run("ABCabcdABC",".*(?abcd).*",[{capture,['FOO']}]). - which would yield the same result as the earlier examples, namely:: + which would yield the same result as the earlier examples, namely: {match,[{3,4}]} - The values list might specify indexes or names not present in the regular expression, in which case the return values vary depending on the type. If the type is index, the tuple {-1,0} is returned for values having no corresponding subpattern in the regexp, but for the other types (binary and list), the values are the empty binary or list respectively. - Type - Optionally specifies how captured substrings are to be returned. If omitted, the default of index is used. The Type can be one of the following: + The values list might specify indexes or names not present in the + regular expression, in which case the return values vary depending + on the type. If the type is `index`, the tuple `{-1,0}` is returned + for values having no corresponding subpattern in the regexp, but for + the other types (binary and list), the values are the empty binary + or list respectively. - ``index`` - Return captured substrings as pairs of byte indexes into the subject string and length of the matching string in the subject (as if the subject string was flattened with iolist_to_binary prior to matching). This is the default. - ``list`` - Return matching substrings as lists of characters (Erlang string()'s). - ``binary`` - Return matching substrings as binaries. + - `Type` + Optionally specifies how captured substrings are to be returned. + If omitted, the default of index is used. The Type can be one of + the following: - In general, subpatterns that got assigned no value in the match are returned as the tuple {-1,0} when type is index. Unasigned subpatterns are returned as the empty binary or list respectively for other return types. Consider the regular expression:: + + `index` + Return captured substrings as pairs of byte indexes into + the subject string and length of the matching string in + the subject (as if the subject string was flattened with + iolist_to_binary prior to matching). This is the default. + + + `list` + Return matching substrings as lists of characters + (Erlang `string()`'s). + + + `binary` + Return matching substrings as binaries. + + In general, subpatterns that got assigned no value in the match + are returned as the tuple `{-1,0}` when type is `index`. + Unasigned subpatterns are returned as the empty binary or list + respectively for other return types. Consider the regular expression: ".*((?abdd)|a(..d)).*" - There are three explicitly capturing subpatterns, where the opening parenthesis position determines the order in the result, hence ((?abdd)|a(..d)) is subpattern index 1, (?abdd) is subpattern index 2 and (..d) is subpattern index 3. When matched against the following string:: + There are three explicitly capturing subpatterns, where the opening + parenthesis position determines the order in the result, + hence `"((?abdd)|a(..d))"` is subpattern index 1, + `"(?abdd)"` is subpattern index 2 and `"(..d)"` + is subpattern index 3. When matched against the following string: "ABCabcdABC" - the subpattern at index 2 won't match, as "abdd" is not present in the string, but the complete pattern matches (due to the alternative a(..d). The subpattern at index 2 is therefore unassigned and the default return value will be:: + the subpattern at index 2 won't match, as `"abdd"` is not present + in the string, but the complete pattern matches (due to the alternative + `"a(..d)"`. The subpattern at index 2 is therefore unassigned and + the default return value will be: {match,[{0,10},{3,4},{-1,0},{4,3}]} - Setting the capture Type to binary would give the following:: + Setting the capture Type to binary would give the following: {match,[<<"ABCabcdABC">>,<<"abcd">>,<<>>,<<"bcd">>]} - where the empty binary (<<>>) represents the unassigned subpattern. In the binary case, some information about the matching is therefore lost, the <<>> might just as well be an empty string captured. - If differentiation between empty matches and non existing subpatterns is necessary, use the type index and do the conversion to the final type in Erlang code. - When the option global is given, the capture specification affects each match separately, so that:: + where the empty binary (`<<>>`) represents the unassigned subpattern. + In the binary case, some information about the matching is therefore lost, + the `<<>>` might just as well be an empty string captured. - re:run("cacb","c(a|b)",[global,{capture,[1],list}]). + If differentiation between empty matches and non existing subpatterns + is necessary, use the type index and do the conversion to + the final type in Erlang code. - gives the result:: + When the option global is given, the capture specification affects each + match separately, so that: - {match,[["a"],["b"]]} + re:run("cacb","c(a|b)",[global,{capture,[1],list}]). -The options solely affecting the compilation step are described in the re:compile/2 function. + gives the result: -**replace(Subject,RE,Replacement) -> iodata() | {** ``error`` **, ErrSpec}** + `{match,[["a"],["b"]]}` -Types: +The options solely affecting the compilation step are described in +the `re:compile/2` function. -- Subject = iodata() -- RE = mp() | iodata() -- Replacement = iodata() -- ErrSpec = {ErrString, Position} -- ErrString = string() -- Position = int() -The same as replace(Subject,RE,Replacement,[]). -**replace(Subject, RE, Replacement, Options) -> iodata() | binary() | list() | {** ``error`` **, ErrSpec}** +##### replace(Subject, RE, Replacement) -> iodata() | {error, ErrSpec} Types: -- Subject = iodata() -- RE = mp() | iodata() -- Replacement = iodata() -- Options = [ Option ] -- Option = anchored | global | notbol | noteol | notempty | {offset, int()} | {newline, NLSpec} | {return, ReturnType} | CompileOpt -- ReturnType = iodata | list | binary -- CompileOpt = see compile/2 above -- NLSpec = cr | crlf | lf | anycrlf -- ErrSpec = {ErrString, Position} -- ErrString = string() -- Position = int() + Subject = iodata() + RE = mp() | iodata() + Replacement = iodata() + ErrSpec = {ErrString, Position} + ErrString = string() + Position = int() -Replaces the matched part of the Subject string with the content of Replacement. +The same as replace(Subject, RE, Replacement,[]). -Options are given as to the re:run/3 function except that the ``capture`` option of re:run/3 is not allowed. Instead a { ``return`` , ReturnType} is present. The default return type is ``iodata`` , constructed in a way to minimize copying. The iodata result can be used directly in many i/o-operations. If a flat list() is desired, specify { ``return`` , ``list`` } and if a binary is preferred, specify { ``return`` , ``binary`` }. -The replacement string can contain the special character ``&`` , which inserts the whole matching expression in the result, and the special sequence ``\N`` (where N is an integer > 0), resulting in the subexpression number N will be inserted in the result. If no subexpression with that number is generated by the regular expression, nothing is inserted. -To insert an ``&`` or ``\`` in the result, precede it with a ``\``. Note that Erlang already gives a special meaning to ``\`` in literal strings, why a single ``\`` has to be written as ``"\\"`` and therefore a double ``\`` as ``"\\\\"`` . Example:: +##### replace(Subject, RE, Replacement, Options) -> iodata() | binary() | list() | {error, ErrSpec} + +Types: + + Subject = iodata() + RE = mp() | iodata() + Replacement = iodata() + Options = [ Option ] + Option = anchored | global | notbol | noteol | notempty | + {offset, int()} | {newline, NLSpec} | + {return, ReturnType} | CompileOpt + ReturnType = iodata | list | binary + CompileOpt = see compile/2 above + NLSpec = cr | crlf | lf | anycrlf + ErrSpec = {ErrString, Position} + ErrString = string() + Position = int() + +Replaces the matched part of the Subject string with +the content of Replacement. + +Options are given as to the re:run/3 function except that the `capture` +option of re:run/3 is not allowed. Instead a `{return, ReturnType}` +is present. The default return type is ``iodata`` , constructed in a way +to minimize copying. The iodata result can be used directly in many +I/O-operations. If a flat list() is desired, specify `{return, list}` +and if a binary is preferred, specify `{return, binary}`. + +The replacement string can contain the special character `&`, +which inserts the whole matching expression in the result, +and the special sequence `\N` (where N is an integer > 0), +resulting in the subexpression number N will be inserted in the result. +If no subexpression with that number is generated by the regular expression, +nothing is inserted. + +To insert an `&` or `\` in the result, precede it with a `\`. +Note that Erlang already gives a special meaning to `\` in literal strings, +why a single `\` has to be written as `"\\"` and therefore +a double `\` as `"\\\\"`. Example: re:replace("abcd","c","[&]",[{return,list}]). -gives:: +gives: "ab[c]d" -while:: +while: re:replace("abcd","c","[\\&]",[{return,list}]). -gives:: +gives: "ab[&]d" -The { ``error`` , ErrSpec} return value can only arise from compilation, i.e. when a non precompiled malformed RE is given. - -**split(Subject,RE) -> SplitList | {** ``error`` **, ErrSpec}** - -Types: +The `{error, ErrSpec}` return value can only arise from compilation, +i.e. when a non precompiled malformed RE is given. -- Subject = iodata() -- RE = mp() | iodata() -- SplitList = [ iodata() ] -- ErrSpec = {ErrString, Position} -- ErrString = string() -- Position = int() -The same as split(Subject,RE,[]). -**split(Subject,RE,Options) -> SplitList | {** ``error`` **, ErrSpec}** +##### split(Subject,RE) -> SplitList | {error, ErrSpec} Types: -- Subject = iodata() -- RE = mp() | iodata() -- Options = [ Option ] -- Option = anchored | global | notbol | noteol | notempty | {offset, int()} | {newline, NLSpec} | {return, ReturnType} | {parts, NumParts} | group | CompileOpt -- NumParts = int() | infinity -- ReturnType = iodata | list | binary -- CompileOpt = see compile/2 above -- NLSpec = cr | crlf | lf | anycrlf -- SplitList = [ RetData ] | [ GroupedRetData ] -- GroupedRetData = [ RetData ] -- RetData = iodata() | binary() | list() -- ErrSpec = {ErrString, Position} -- ErrString = string() -- Position = int() + Subject = iodata() + RE = mp() | iodata() + SplitList = [ iodata() ] + ErrSpec = {ErrString, Position} + ErrString = string() + Position = int() -This function splits the input into parts by finding tokens according to the regular expression supplied. +The same as `split(Subject, RE, [])`. -The splitting is done basically by running a global regexp match and dividing the initial string wherever a match occurs. The matching part of the string is removed from the output. +##### split(Subject,RE,Options) -> SplitList | {error, ErrSpec} -The result is given as a list of "strings", the preferred datatype given in the return option (default ``iodata`` ). +Types: -If subexpressions are given in the regular expression, the matching subexpressions are returned in the resulting list as well. An example:: + Subject = iodata() + RE = mp() | iodata() + Options = [ Option ] + Option = anchored | global | notbol | noteol | notempty | + {offset, int()} | {newline, NLSpec} | {return, ReturnType} | + {parts, NumParts} | group | CompileOpt + NumParts = int() | infinity + ReturnType = iodata | list | binary + CompileOpt = see compile/2 above + NLSpec = cr | crlf | lf | anycrlf + SplitList = [ RetData ] | [ GroupedRetData ] + GroupedRetData = [ RetData ] + RetData = iodata() | binary() | list() + ErrSpec = {ErrString, Position} + ErrString = string() + Position = int() + +This function splits the input into parts by finding tokens according to +the regular expression supplied. + +The splitting is done basically by running a global regexp match and dividing +the initial string wherever a match occurs. The matching part of the string +is removed from the output. + +The result is given as a list of "strings", the preferred datatype given in +the return option (default `iodata`). + +If subexpressions are given in the regular expression, the matching +subexpressions are returned in the resulting list as well. An example: re:split("Erlang","[ln]",[{return,list}]). -will yield the result:: +will yield the result: ["Er","a","g"] -while:: +while: re:split("Erlang","([ln])",[{return,list}]). -will yield:: +will yield: ["Er","l","a","n","g"] -The text matching the subexpression (marked by the parantheses in the regexp) is inserted in the result list where it was found. In effect this means that concatenating the result of a split where the whole regexp is a single subexpression (as in the example above) will always result in the original string. +The text matching the subexpression (marked by the parantheses in the regexp) +is inserted in the result list where it was found. In effect this means that +concatenating the result of a split where the whole regexp is +a single subexpression (as in the example above) will always +result in the original string. -As there is no matching subexpression for the last part in the example (the "g"), there is nothing inserted after that. To make the group of strings and the parts matching the subexpressions more obvious, one might use the group option, which groups together the part of the subject string with the parts matching the subexpressions when the string was split:: +As there is no matching subexpression for the last part in the example +(the `"g"`), there is nothing inserted after that. To make the group of +strings and the parts matching the subexpressions more obvious, one might use +the group option, which groups together the part of the subject string with +the parts matching the subexpressions when the string was split: re:split("Erlang","([ln])",[{return,list},group]). -gives:: +gives: [["Er","l"],["a","n"],["g"]] -Here the regular expression matched first the "l", causing "Er" to be the first part in the result. When the regular expression matched, the (only) subexpression was bound to the "l", why the "l" is inserted in the group together with "Er". The next match is of the "n", making "a" the next part to be returned. As the subexpression is bound to the substring "n" in this case, the "n" is inserted into this group. The last group consists of the rest of the string, as no more matches are found. - -All empty strings are per default removed from the end of the result list, the semantics beeing that we split the string in as many parts as possible until we reach the end of the string. In effect this means that all empty strings are stripped from the result list (or all empty groups if the group option is given). The ``parts`` option can be used to change this behaviour. Let's look at an example:: +Here the regular expression matched first the `"l"`, causing `"Er"` to be +the first part in the result. When the regular expression matched, +the (only) subexpression was bound to the `"l"`, why the `"l"` is inserted in +the group together with `"Er"`. The next match is of the `"n"`, making `"a"` +the next part to be returned. As the subexpression is bound to +the substring `"n"` in this case, the `"n"` is inserted into this group. +The last group consists of the rest of the string, as no more matches are found. + +All empty strings are per default removed from the end of the result list, +the semantics beeing that we split the string in as many parts as possible +until we reach the end of the string. In effect this means that all empty +strings are stripped from the result list (or all empty groups if the group +option is given). The `parts` option can be used to change this behaviour. +Let's look at an example: re:split("Erlang","[lg]",[{return,list}]). @@ -662,77 +944,107 @@ The result will be:: ["Er","an"] -as the matching of the "g" in the end effectively makes the matching reach the end of the string. If we however say we want more parts:: +as the matching of the "g" in the end effectively makes the matching reach +the end of the string. If we however say we want more parts: re:split("Erlang","[lg]",[{return,list},{parts,3}]). -We will get the last part as well, even though there is only an empty string after the last match (matching the "g"):: +We will get the last part as well, even though there is only an empty string +after the last match (matching the `"g"`): ["Er","an",[]] -More than three parts are not possible with this indata, why:: +More than three parts are not possible with this indata, why: re:split("Erlang","[lg]",[{return,list},{parts,4}]). -will give the same result. To specify that as many results as possible are to be returned, including any empty results at end, you can specify infinity as the number of parts to return. Specifying 0 as the number of parts gives the default behaviour of returning all parts except empty parts at the end. +will give the same result. To specify that as many results as possible +are to be returned, including any empty results at end, you can specify +infinity as the number of parts to return. Specifying 0 as the number of +parts gives the default behaviour of returning all parts except empty +parts at the end. + +If subexpressions are captured, empty subexpression matches at the end +are also stripped from the result if `{parts, N}` is not specified. +If you are familiar with Perl, the default behaviour corresponds exactly +to the Perl default, the `{parts, N}` where `N` is a positive integer +corresponds exactly to the Perl behaviour with a positive numerical +third parameter and the {parts, infinity} behaviour corresponds to that +when the Perl routine is given a negative integer as the third parameter. -If subexpressions are captured, empty subexpression matches at the end are also stripped from the result if { ``parts`` ,N} is not specified. If you are familiar with Perl, the default behaviour corresponds exactly to the Perl default, the { ``parts`` ,N} where N is a positive integer corresponds exactly to the Perl behaviour with a positive numerical third parameter and the {parts, infinity} behaviour corresponds to that when the Perl routine is given a negative integer as the third parameter. +Summary of options not previously described for the `re:run/3` function: -Summary of options not previously described for the re:run/3 function: +* `{return, ReturnType}` + Specifies how the parts of the original string are presented in + the result list. The possible types are: -{ ``return`` ,ReturnType} - Specifies how the parts of the original string are presented in the result list. The possible types are: + - `iodata` + The variant of iodata() that gives the least copying of data with + the current implementation (often a binary, but don't depend on it). - ``iodata`` - The variant of iodata() that gives the least copying of data with the current implementation (often a binary, but don't depend on it). - ``binary`` + - `binary` All parts returned as binaries. - ``list`` - All parts returned as lists of characters ("strings"). -``group`` - Groups together the part of the string with the parts of the string matching the subexpressions of the regexp. - The return value from the function will in this case be a list() of list()'s. Each sublist begins with the string picked out of the subject string, followed by the parts matching each of the subexpressions in order of occurence in the regular expression. -{ ``parts`` ,N} + - `list` + All parts returned as lists of characters ("strings"). + +* `group` + Groups together the part of the string with the parts of the string + matching the subexpressions of the regexp. + + The return value from the function will in this case be a `list()` + of `list()`'s. Each sublist begins with the string picked out of + the subject string, followed by the parts matching each of + the subexpressions in order of occurence in the regular expression. + +* `{parts, N}` Specifies the number of parts the subject string is to be split into. - The number of parts should be 0 for the default behaviour "as many as there are, skipping empty parts at the end", a positive integer for a specific maximum on the number of parts and infinity for the maximum number of parts possible, regardless of if the parts are empty strings at the end. + + The number of parts should be 0 for the default behaviour + "as many as there are, skipping empty parts at the end", a positive + integer for a specific maximum on the number of parts and infinity for + the maximum number of parts possible, regardless of if the parts are + empty strings at the end. -Supported string representations -:::::::::::::::::::::::::::::::: + +### Supported string representations ### As can be viewed in the manual excerpt, I suggest allowing both the regular expressions and the subject strings to be provided as -*iodata()*, which means either binaries, lists or a mix of binaries +`iodata()`, which means either binaries, lists or a mix of binaries and deep lists. When Unicode is not involved, this basically means a -implicit *iolist_to_binary()* when supplying data to the re module. +implicit `iolist_to_binary()` when supplying data to the re module. + + -Further extensions -:::::::::::::::::: +### Further extensions ### The following extensions are not yet implemented in the prototype, but should be included in a final release: -- Unicode support. Unicode strings should be represented as suggested - in EEP 10, which means either UTF-8 in binaries, lists of Unicode - characters as integers, or a mix thereof. If the regular expression - was compiled for Unicode or a ``unicode`` option is supplied when - compiling and running in one go, the data is expected to be in one - of the supported Unicode formats, otherwise a ``badarg`` exception - will be thrown. +* Unicode support. Unicode strings should be represented as suggested + in [EEP 10][], which means either UTF-8 in binaries, lists of Unicode + characters as integers, or a mix thereof. If the regular expression + was compiled for Unicode or a `unicode` option is supplied when + compiling and running in one go, the data is expected to be in one + of the supported Unicode formats, otherwise a `badarg` exception + will be thrown. -- Match predicates to make it easy to use regular expressions in - logical Erlang expressions. +* Match predicates to make it easy to use regular expressions in + logical Erlang expressions. Of these, Unicode support is the far most important, and also the one that can not be implemented efficiently purely in Erlang code. + + Prototype implementation ------------------------ A prototype implementation using the PCRE library is present along with a reference manual page in the R12B-4 distribution. This -implementation does not yet fully support Unicode, as EEP 10 is not +implementation does not yet fully support Unicode, as [EEP 10][] is not accepted at the time of writing. The prototype implementation also lacks the "split" function, which was implemented after the R12B-4 release. @@ -755,23 +1067,27 @@ allow an integration into the Erlang emulator without using asynchronous threads is in an absolute worst scenario no more than 6% compared to a theoretical maximum. -References -========== -.. [1] http://www.pcre.org/ - The PCRE homepage. + +[PCRE]: http://www.pcre.org/ + "The PCRE homepage" + +[EEP 10]: eep-0010.md + "EEP 10" + + Copyright ========= This document has been placed in the public domain. -.. - Local Variables: - mode: indented-text - indent-tabs-mode: nil - sentence-end-double-space: t - fill-column: 70 - coding: utf-8 - End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0012.md b/eeps/eep-0012.md index 4bce15e..10d7eb7 100644 --- a/eeps/eep-0012.md +++ b/eeps/eep-0012.md @@ -1,15 +1,14 @@ -EEP 12: Extensions to comprehensions -==== - - Author: Richard A. O'Keefe [ok(at)cs(dot)otago(dot)ac(dot)nz] + Author: Richard A. O'Keefe Status: Draft Type: Standards Track - Erlang-Version: R12B-4 - Content-Type: text/x-markdown Created: 10-Jul-2008 + Erlang-Version: R12B-4 Post-History: +**** +EEP 12: Extensions to comprehensions +---- + -==== Abstract ======== @@ -194,9 +193,9 @@ code will be affected. Reference Implementation ======================== -The auxiliary file [1] is a patch file to be applied to `erl_parse.yrl`. -The patched file has been checked by `yecc`, which is happy -with it. However, that's all the testing that has been done. +The auxiliary file [`eep-0012-1.diff`][1] is a patch file to be +applied to `erl_parse.yrl`. The patched file has been checked by `yecc`, +which is happy with it. However, that's all the testing that has been done. This implementation does the three source to source rewrites described in the previous section, entirely in the parser. @@ -204,10 +203,10 @@ The rest of the Erlang system needs no changes whatever. -References -========== - -[1] Patch file to be applied to erl_parse.yrl: eep-0012-1.diff +[EEP-12]: "EEP 12 Source" + +[1]: eep-0012-1.diff + "Patch file to be applied to erl_parse.yrl" @@ -216,3 +215,12 @@ Copyright This document has been placed in the public domain. + + +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0013.md b/eeps/eep-0013.md index f3bfd21..24d6c65 100644 --- a/eeps/eep-0013.md +++ b/eeps/eep-0013.md @@ -1,207 +1,229 @@ -EEP: 13 -Title: -enum declarations -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe [ok(at)cs(dot)otago(dot)ac(dot)nz] -Status: Draft -Type: Standards Track -Erlang-Version: R12B-4 -Content-Type: text/plain -Created: 09-Jul-2008 -Post-History: + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Erlang-Version: R12B-4 + Created: 09-Jul-2008 + Post-History: +**** +EEP 13: -enum declarations +---- Abstract +======== - Erlang programs often need to process data streams using data - formats devised without reference to Erlang. For this reason - OTP supports ASN.1 and CORBA, amongst other interface techniques. - Binary data streams often contain "symbolic" values that are - represented in the original description by some kind of - enumeration declaration, often literally a C "enum" declaration. +Erlang programs often need to process data streams using data +formats devised without reference to Erlang. For this reason +OTP supports ASN.1 and CORBA, amongst other interface techniques. +Binary data streams often contain "symbolic" values that are +represented in the original description by some kind of +enumeration declaration, often literally a C "enum" declaration. - This EEP proposes an "-enum" declaration for Erlang for - convenient mapping between atoms on one side of an interface and - integers on the other, especially in the bit syntax. - - This replaces some uses of the preprocessor with something that - permits the clearer expression of the programmer's intent. +This EEP proposes an "`-enum`" declaration for Erlang for +convenient mapping between atoms on one side of an interface and +integers on the other, especially in the bit syntax. + +This replaces some uses of the preprocessor with something that +permits the clearer expression of the programmer's intent. Specification +============= + +A new form of declaration is added, four new guard BIFs, and a +new type specifier for bit syntax. + +Declaration +----------- + + '-' 'enum' '(' identifier-and-size ',' '{' enum-binding + {',' enum-binding}* ')' '.' + +where identifier-and-size is + + identifier + +or + + identifier : size + +or - A new form of declaration is added, four new guard BIFs, and a - new type specifier for bit syntax. + identifier / type-specifier-list - Declaration: +or - '-' 'enum' '(' identifier-and-size ',' '{' enum-binding - {',' enum-binding}* ')' '.' + identifier : size / type-specifier-list - where identifier-and-size is +and enum-binding is - identifier - or - identifier : size - or - identifier / type-specifier-list - or - identifier : size / type-specifier-list + identifier '=' constant-integer-expression - and enum-binding is +or - identifier '=' constant-integer-expression + identifier - or - identifier +size and type-specifier-list are as in the bit syntax, +except that the type-specifier-list may not include a Type. +If the size is missing, it will be the first of [8,16,32,64] +that is compatible with the integer values, as described later. +If the size is present, it must be an integer that is compatible +with the integer values. Signedness, if present, must agree +with the integer values. - size and type-specifier-list are as in the bit syntax, - except that the type-specifier-list may not include a Type. - If the size is missing, it will be the first of [8,16,32,64] - that is compatible with the integer values, as described later. - If the size is present, it must be an integer that is compatible - with the integer values. Signedness, if present, must agree - with the integer values. +Example +------- - Example: + -enum(colour, {red,orange,yellow,green,blue}). + -enum(fruit:32, {quandong,lime,banana,orange,apple}). - -enum(colour, {red,orange,yellow,green,blue}). - -enum(fruit:32, {quandong,lime,banana,orange,apple}). +The identifier following the left parenthesis is called the +"enumeration identifier" and the identifiers bound by the +bindings are called "enumerals". - The identifier following the left parenthesis is called the - "enumeration identifier" and the identifiers bound by the - bindings are called "enumerals". +After `-include` and `-if` processing, there should be at most one +enum declaration for any identifier. The identifier must not +be one of - After -include and -if processing, there should be at most one - enum declaration for any identifier. The identifier must not - be one of - integer | float | binary | bytes | bitstring | bits - Such a declaration only has significance within the constructs - defined in this EEP; the only existing notation which is affected - is the bit syntax. + integer | float | binary | bytes | bitstring | bits - Within a single enum declaration, an enumeral may not be bound in - two or more bindings. +Such a declaration only has significance within the constructs +defined in this EEP; the only existing notation which is affected +is the bit syntax. - If the first binding does not have an integer-constant-expression, - it is as if "= 0" appeared. If a later binding does not have an - integer-constant-expression, it is as if "= N" appeared, where N - is one more than the integer value of the previous binding. +Within a single enum declaration, an enumeral may not be bound in +two or more bindings. - Within a single enum declaration, an integer value may not be used - in two or more bindings, whether implicitly or explicitly. +If the first binding does not have an integer-constant-expression, +it is as if "= 0" appeared. If a later binding does not have an +integer-constant-expression, it is as if "= N" appeared, where N +is one more than the integer value of the previous binding. - Built-in functions: +Within a single enum declaration, an integer value may not be used +in two or more bindings, whether implicitly or explicitly. - is_enum_atom(Atom, Enumeration_Identifier) - true when Enumeration_Identifier is an atom that is declared - as an enumeration identifier and Atom is one of the enumerals - in that declaration, false otherwise. +Built-in functions +------------------ - May be used as a guard test provided - Enumeration_Identifier is a literal atom, - with a compile-time error if it has no enum declaration. +### `is_enum_atom(Atom, Enumeration_Identifier)` +* `true` when Enumeration_Identifier is an atom that is declared + as an enumeration identifier and Atom is one of the enumerals + in that declaration, +* `false` otherwise. - is_enum_integer(Integer, Enumeration_Identifier) - true when Enumeration_Identifier is an atom that is declared - as an enumeration identifier and Integer is an integer that - is used as the value in one of the bindings in that - declaration, false otherwise. +May be used as a guard test provided +Enumeration_Identifier is a literal atom, +with a compile-time error if it has no enum declaration. - May be used as a guard test provided - Enumeration_Identifier is a literal atom, - with a compile-time error if it has no enum declaration. +### `is_enum_integer(Integer, Enumeration_Identifier)` +* `true` when Enumeration_Identifier is an atom that is declared + as an enumeration identifier and Integer is an integer that + is used as the value in one of the bindings in that + declaration, +* `false` otherwise. - enum_to_atom(Integer, Enumeration_Identifier) - when is_enum_integer(Integer, Enumeration_Identifier) - -> the enumeral bound to Integer in the - declaration of Enumeration_Identifier - otherwise exits with 'badarg'. +May be used as a guard test provided +Enumeration_Identifier is a literal atom, +with a compile-time error if it has no enum declaration. - May be used in a guard expression provided - Enumeration_Identifier is a literal atom, - with a compile-time error if it has no enum declaration. +### `enum_to_atom(Integer, Enumeration_Identifier)` +* when `is_enum_integer(Integer, Enumeration_Identifier)` -> + the enumeral bound to Integer in the + declaration of Enumeration_Identifier +* otherwise exits with `badarg`. - enum_to_integer(Atom, Enumeration_Identifier) - when is_enum_atom(Atom, Enumeration_Identifier) - -> the integer value that Atom is bound to in the - declaration of Enumeration_Identifier - otherwise exits with 'badarg'. +May be used in a guard expression provided +Enumeration_Identifier is a literal atom, +with a compile-time error if it has no enum declaration. - May be used in a guard expression provided - Enumeration_Identifier is a literal atom, - with a compile-time error if it has no enum declaration. +### `enum_to_integer(Atom, Enumeration_Identifier)` +* when `is_enum_atom(Atom, Enumeration_Identifier)` -> + the integer value that Atom is bound to in the + declaration of Enumeration_Identifier +* otherwise exits with `badarg`. - All four of these functions are expected to take O(1) time - and to allocate no storage at run time. +May be used in a guard expression provided +Enumeration_Identifier is a literal atom, +with a compile-time error if it has no enum declaration. + +All four of these functions are expected to take O(1) time +and to allocate no storage at run time. + +Bit syntax extension +-------------------- + +The Type in a segment of the bit syntax may additionally be +an Enumeration_Identifier, and the corresponding Value will +then be an atom. The value in the bit string that is being +matched or constructed is or will be the integer bound to +the atom; as such the Size, Endianness, Signedness, and Unit +are interpreted as for the `integer` Type. + +In constructing a bit string, - Bit syntax extension: - - The Type in a segment of the bit syntax may additionally be - an Enumeration_Identifier, and the corresponding Value will - then be an atom. The value in the bit string that is being - matched or constructed is or will be the integer bound to - the atom; as such the Size, Endianness, Signedness, and Unit - are interpreted as for the 'integer' Type. - - In constructing a bit string, V / Enumeration_Identifier ... or V : Size / Enumeration_Identifier ... - acts as if + +acts as if + enum_to_integer(V, Enumeration_Identifier) / integer ... or enum_to_integer(V, Enumeration_Identifier) : Size / integer ... - had been written, with one exception, which is now described. - - If all the integer values in an enum declaration are non-negative, - let k be the smallest integer such that 2**k is greater than all - of them. If some are negative, let k be the smallest integer such - that 2**(k-1) is greater than all of them and -(2**(k-1)) is less - than or equal to all of them. The size of a segment for an - enumeration value must then be at least k bits, whatever the - actual value. A programmer who finds a need to bypass this can - do the enumeral<->integer conversion manually; what this limit - does is to prevent accidental mis-specification. The size given - in the enum declaration must be at least k. If no size is given - in the bit syntax, the size given (or defaulted) in the enum - declaration will be used. - - When such a segment is used in pattern matching, it is as if - - first an integer is extracted as if the Type had been 'integer', - - then the value is converted to an atom as if by 'enum_to_atom', - - and finally the atom is matched to whatever pattern appeared. - One expects that cases where the value V is an explicit atom - will be translated completely at compile time, therefore having - no overhead compared with using macros and /integer. +had been written, with one exception, which is now described. +If all the integer values in an enum declaration are non-negative, +let k be the smallest integer such that 2^k is greater than all +of them. If some are negative, let k be the smallest integer such +that 2^(k-1) is greater than all of them and -(2^(k-1)) is less +than or equal to all of them. The size of a segment for an +enumeration value must then be at least k bits, whatever the +actual value. A programmer who finds a need to bypass this can +do the enumeral<->integer conversion manually; what this limit +does is to prevent accidental mis-specification. The size given +in the enum declaration must be at least k. If no size is given +in the bit syntax, the size given (or defaulted) in the enum +declaration will be used. + +When such a segment is used in pattern matching, it is as if + +- first an integer is extracted as if the Type had been `integer`, +- then the value is converted to an atom as if by `enum_to_atom`, +- and finally the atom is matched to whatever pattern appeared. + +One expects that cases where the value V is an explicit atom +will be translated completely at compile time, therefore having +no overhead compared with using macros and `/integer`. -Motivation - This was inspired by thinking about PADS and other data - description languages. Imagine a C program doing something like - - enum seriousness { - not_serious = 'N', - hospitalised = 'H', - life_threatening = 'L', - congenital_abnormality = 'C', - persisting_disability = 'P', - intervention_required = 'I', - death = 'D' - }; - struct Message { - char tag; /* a seriousness */ - union { - int number_of_days; /* H */ - float extent_of_disability; /* C or P */ - char procedure_code[5]; /* I */ - } supplement; - }; - - (The Message structure has been considerably simplified.) - Now imagine matching it. + +Motivation +========== + +This was inspired by thinking about PADS and other data +description languages. Imagine a C program doing something like + + enum seriousness { + not_serious = 'N', + hospitalised = 'H', + life_threatening = 'L', + congenital_abnormality = 'C', + persisting_disability = 'P', + intervention_required = 'I', + death = 'D' + }; + struct Message { + char tag; /* a seriousness */ + union { + int number_of_days; /* H */ + float extent_of_disability; /* C or P */ + char procedure_code[5]; /* I */ + } supplement; + }; + +(The Message structure has been considerably simplified.) +Now imagine matching it. -define(NOT_SERIOUS, $N). -define(HOSPITALISED, $H). @@ -210,7 +232,7 @@ Motivation -define(PERSISTING_DISABILITY, $P). -define(INTERVENTION_REQUIRED, $I). -define(DEATH, $D). - + decode_message(B0) -> case B0 of <> -> @@ -229,23 +251,24 @@ Motivation {{death}, B1} end. - There are a number of problems with this. - - You have to use macros; functions are not allowed in patterns. - - There is nothing to link these macros together as a group. - - So there is no help checking that you are using the right ones. - - There is no word to relate them back to the original enum. - - If the size isn't 8, it must be repeated in each pattern. - - If the Endianness isn't 'big', it must be repeated in each - pattern. - - If the size is wrong, too bad. - - If a macro from the wrong list is used, too bad. - - You cannot use the same enumeral name for more than one - enumeration, unless it happens to have the same value in both. - - If you pass the macros around in a computation, they look - just like numbers to tracers and debuggers; they have no - run-time symbolic value. - - Now here's the version using -enum. +There are a number of problems with this. + +- You have to use macros; functions are not allowed in patterns. +- There is nothing to link these macros together as a group. +- So there is no help checking that you are using the right ones. +- There is no word to relate them back to the original enum. +- If the size isn't 8, it must be repeated in each pattern. +- If the Endianness isn't `big`, it must be repeated in each + pattern. +- If the size is wrong, too bad. +- If a macro from the wrong list is used, too bad. +- You cannot use the same enumeral name for more than one + enumeration, unless it happens to have the same value in both. +- If you pass the macros around in a computation, they look + just like numbers to tracers and debuggers; they have no + run-time symbolic value. + +Now here's the version using `-enum`. -enum(seriousness : 8, { not_serious = $N, @@ -256,7 +279,7 @@ Motivation intervention_required = $I, death = $D }). - + decode_message(B0) -> case B0 of <> - can be translated as - ( V1 = enum_to_integer(V, X), <<... V1 : S / integer X ...>>) - and the pattern - <<... V : S / T X ...>> - can be translated to - <<... V' : S / integer X ...>> - by adding - V =:= enum_to_atom(V', T) - to the guard if V occurs elsewhere in the pattern or will be - bound in the context, or - V = enum_to_atom(V', T) - if V would not otherwise become bound. +There is none. However, we can sketch one. +The four new BIFs are all simple table lookups of the kind that +the Erlang compiler already has to be able to generate for +indexed clause selection. As such, they are safe to call in +guards. Since the Type in the bit syntax may only be an +enumeration name when it is a literal atom known to the compiler +as an enumeration name, the constructor - Binding like this should be allowed in guards anyway, - but in this case it is perfectly safe because it is O(1) and - does not require any dynamic storage allocation (unlike, say, - arithmetic). + <<... V : S / T X ...>> -References - - None. +can be translated as + + ( V1 = enum_to_integer(V, X), <<... V1 : S / integer X ...>>) + +and the pattern + + <<... V : S / T X ...>> + +can be translated to + + <<... V' : S / integer X ...>> + +by adding + + V =:= enum_to_atom(V', T) + +to the guard if V occurs elsewhere in the pattern or will be +bound in the context, or + + V = enum_to_atom(V', T) + if V would not otherwise become bound. + +Binding like this should be allowed in guards anyway, +but in this case it is perfectly safe because it is O(1) and +does not require any dynamic storage allocation (unlike, say, +arithmetic). Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0014.md b/eeps/eep-0014.md index ba2c140..94a179e 100644 --- a/eeps/eep-0014.md +++ b/eeps/eep-0014.md @@ -1,230 +1,234 @@ -EEP: 14 -Title: Guard clarification and extension -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe [ok(at)cs(dot)otago(dot)ac(dot)nz] -Status: Draft -Type: Standards Track -Erlang-Version: R12B-4 -Content-Type: text/plain -Created: 10-Jul-2008 -Post-History: + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Erlang-Version: R12B-4 + Created: 10-Jul-2008 + Post-History: +**** +EEP 14: Guard clarification and extension +---- Abstract +======== + +Allow Pattern = Guard_Expression as a simple guard test. +Make obviously silly guards syntax errors. - Allow Pattern = Guard_Expression as a simple guard test. - Make obviously silly guards syntax errors. Specification +============= + +Replace the opening text of section 6.24 "Guard Sequences" +as follows. - Replace the opening text of section 6.24 "Guard Sequences" - as follows. + ::= + ::= {';' }* - ::= - ::= {';' }* +An `` is a sequence of `` separated by +semicolons. Here, as elsewhere in Erlang, semicolon means +sequential OR: an `` evaluates its `` +one at a time from left to right, until one succeeds or +until all have failed. - An is a sequence of separated by - semicolons. Here, as elsewhere in Erlang, semicolon means - sequential OR: an evaluates its - one at a time from left to right, until one succeeds or - until all have failed. + ::= {',' }* - ::= {',' }* +An `` is a sequence of `` separated by +semicolons. Here, as is often the case in Erlang, comma +means sequential AND: an `` evaluates its +`` one at a time from left to right, until all +have succeeded or one has failed. - An is a sequence of separated by - semicolons. Here, as is often the case in Erlang, comma - means sequential AND: an evaluates its - one at a time from left to right, until all - have succeeded or one has failed. + ::= + | - ::= - | + ::= '=' + | '=' - ::= '=' - | '=' +A `` is either a match or a Boolean expression. +In a guard, a match suceeds if and only if the `` +can be evaluated without exception, and the result can be +matched with the ``, possibly binding some variables. - A is either a match or a Boolean expression. - In a guard, a match suceeds if and only if the - can be evaluated without exception, and the result can be - matched with the , possibly binding some variables. +If a variable is bound in one ``, it may be used in +later ``s of the same ``. If a variable +is bound in all of the ``s of an `` it may +be used in the guarded code, so - If a variable is bound in one , it may be used in - later s of the same . If a variable - is bound in all of the s of an it may - be used in the guarded code, so + if X = 1, is_atom(element(X, Tup)) + ; X = 2, is_atom(element(X, Tup)) + -> ... uses X ... - if X = 1, is_atom(element(X, Tup)) - ; X = 2, is_atom(element(X, Tup)) - -> ... uses X ... +is OK. If a variable is bound in one of the ``s of +an `` but not all of them it may not be used in the +guarded code, so - is OK. If a variable is bound in one of the s of - an but not all of them it may not be used in the - guarded code, so + if X = a + ; Y = b + -> ... uses X ... - if X = a - ; Y = b - -> ... uses X ... +is not allowed. - is not allowed. +A `` in a guard consists of a number +of subexpressions + constant 'false' + constant 'true' + variable (must be bound to 'false' or 'true') + term comparison with `` operands + calls to type test BIFs with `` operands + ``s in parentheses +combined using the operators 'not', 'and', 'or', +'andalso', and 'orelse'. Thus - A in a guard consists of a number - of subexpressions - constant 'false' - constant 'true' - variable (must be bound to 'false' or 'true') - term comparison with operands - calls to type test BIFs with operands - s in parentheses - combined using the operators 'not', 'and', 'or', - 'andalso', and 'orelse'. Thus + X+1 == Y - X+1 == Y +is a `` that can be used as a `` +but + + X+1 - is a that can be used as a - but - - X+1 +is not. You are advised never to use the 'and' and 'or' operators +and to avoid 'andalso' and 'orelse' whenever ',' and ';' will do +what you need. - is not. You are advised never to use the 'and' and 'or' operators - and to avoid 'andalso' and 'orelse' whenever ',' and ';' will do - what you need. +The set of ``s is a subset of the set of valid Erlang +expressions. The reason for restricting the set of valid +expressions is that evaluation of a guard expression must be +guaranteed to be free of side effects and to terminate. - The set of s is a subset of the set of valid Erlang - expressions. The reason for restricting the set of valid - expressions is that evaluation of a guard expression must be - guaranteed to be free of side effects and to terminate. +A `` consists of a number of subexpressions - A consists of a number of subexpressions - constants - variables - calls to "other BIFs Allowed in Guard Expressions" (see - table) with arguments - record field selections - s in parentheses - combined using the built in arithmetic and bitwise operators. +* constants +* variables +* calls to "other BIFs Allowed in Guard Expressions" + (see table) with `` arguments +* record field selections +* ``s in parentheses + +combined using the built in arithmetic and bitwise operators. Motivation +========== + +There are two parts to this EEP. It was originally going to +be just about allowing matches in guards. Then it was going +to be two, because the current situation is just too messy, +but then it became one again for brevity. + +Consider this case. A function is given a tuple and an index. +If the element at that index is in the range 0..127, it +should be returned. Otherwise some other clause should apply. +Currently, we have to write + + f(Tuple, Index) + when is_integer(element(Tuple, Index)), + 0 =< element(Tuple, Index), + element(Tuple, Index) =< 127 + -> element(Tuple, Index); + ... + +or something else which is even clumsier. Why can't we write + + f(Tuple, Index) + when X = element(Tuple, Index), + is_integer(X), 0 =< X, X =< 127 + -> X; + ... + +In trying to explain how to add this to the language, I found +that the current description of guards in the Erlang reference +manual is remarkably fuzzy. Dismayingly, this is matched +by an equally fuzzy implementation. The description mixes +up things that can be used as arguments of guard BIFs +(guard expressions) with simple guards. + +Consider the example + + X = 1, + if X+1 -> true + ; X-1 -> false + end. + +This clearly makes no sense at all, and should be rejected +as bad syntax. According to the current reference manual, +it is legal; X+1 and X-1 are legal "guard expressions". + +In the shell, this exampel crashes, which indeed makes +a lot of sense. But 'erlc' says: - There are two parts to this EEP. It was originally going to - be just about allowing matches in guards. Then it was going - to be two, because the current situation is just too messy, - but then it became one again for brevity. - - Consider this case. A function is given a tuple and an index. - If the element at that index is in the range 0..127, it - should be returned. Otherwise some other clause should apply. - Currently, we have to write - - f(Tuple, Index) - when is_integer(element(Tuple, Index)), - 0 =< element(Tuple, Index), - element(Tuple, Index) =< 127 - -> element(Tuple, Index); - ... - - or something else which is even clumsier. Why can't we write - - f(Tuple, Index) - when X = element(Tuple, Index), - is_integer(X), 0 =< X, X =< 127 - -> X; - ... - - In trying to explain how to add this to the language, I found - that the current description of guards in the Erlang reference - manual is remarkably fuzzy. Dismayingly, this is matched - by an equally fuzzy implementation. The description mixes - up things that can be used as arguments of guard BIFs - (guard expressions) with simple guards. - - Consider the example - - X = 1, - if X+1 -> true - ; X-1 -> false - end. - - This clearly makes no sense at all, and should be rejected - as bad syntax. According to the current reference manual, - it is legal; X+1 and X-1 are legal "guard expressions". - - In the shell, this exampel crashes, which indeed makes - a lot of sense. But 'erlc' says {X+1} Warning: the guard for this clause evaluates to 'false' {X-1} Warning: the guard for this clause evaluates to 'false' - It is good that there is a warning, but bad that the text of - the warning is wrong. These things DON'T evaluate to 'false', - they evaluate to numbers. Then, despite having given a warning, - you get a run-time error. + +It is good that there is a warning, but bad that the text of +the warning is wrong. These things DON'T evaluate to 'false', +they evaluate to numbers. Then, despite having given a warning, +you get a run-time error. exited: {if_clause,[{a,f,0},{shell,exprs,6},{shell,eval_loop,3}]} - What happened in this example, of course, was that all of the - clauses of the 'if' were eliminated because all of them were - malformed. More realistic examples would simply quietly do the - wrong thing at run time. +What happened in this example, of course, was that all of the +clauses of the 'if' were eliminated because all of them were +malformed. More realistic examples would simply quietly do the +wrong thing at run time. Rationale +========= - The syntax for allowing matches in guards is obvious; - no other syntax would be tolerable. The only real question - is whether they can be embedded inside 'andalso' and 'orelse' - or not, and in order to avoid questions of backtracking, I - have said "no". This is really the simplest extension of - guards to allow matches that I can think of. +The syntax for allowing matches in guards is obvious; +no other syntax would be tolerable. The only real question +is whether they can be embedded inside 'andalso' and 'orelse' +or not, and in order to avoid questions of backtracking, I +have said "no". This is really the simplest extension of +guards to allow matches that I can think of. - The rest of the EEP is concerned with trying to rule out - obviously silly guard tests at compile time. Precisely how - this is done is debateable. That it should be done surely - isn't. What benefit do we currently obtain (other than - unwarranted simplicity in the compiler) from allowing "27" - and "X+5" as guards? +The rest of the EEP is concerned with trying to rule out +obviously silly guard tests at compile time. Precisely how +this is done is debateable. That it should be done surely +isn't. What benefit do we currently obtain (other than +unwarranted simplicity in the compiler) from allowing "27" +and "X+5" as guards? Backwards Compatibility +======================= - Matches are currently not allowed in guards, so no existing - application code can be broken by adding them. Obviously, - anything that works with Erlang parse trees will need to be - extended. +Matches are currently not allowed in guards, so no existing +application code can be broken by adding them. Obviously, +anything that works with Erlang parse trees will need to be +extended. - Cleaning up what's allowed in guards may affect existing code. - However, in most cases the compiler would already have warned - about this, and the compatibility issue amounts to turning a - warning message into an error message. +Cleaning up what's allowed in guards may affect existing code. +However, in most cases the compiler would already have warned +about this, and the compatibility issue amounts to turning a +warning message into an error message. Reference Implementation +======================== - None. - - - -References - - None. +None. Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0015.md b/eeps/eep-0015.md index 9d12c62..240f06e 100644 --- a/eeps/eep-0015.md +++ b/eeps/eep-0015.md @@ -1,281 +1,284 @@ -EEP: 15 -Title: Portable funs -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe -Status: Draft -Type: Standards Track -Erlang-Version: R12B-4 -Content-Type: text/plain -Created: 15-Jul-2008 -Post-History: + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Erlang-Version: R12B-4 + Created: 15-Jul-2008 + Post-History: +**** +EEP 15: Portable funs +---- Abstract +======== - Current Erlang has two kinds of funs. An "external" fun, - Module:Name/Arity, is just a name and can be used freely. - A "local" fun contains code that is bound to the module it - was defined in. This means that you cannot save internal - funs in data bases or send them to remote systems and expect them to - work. +Current Erlang has two kinds of funs. An "external" fun, +Module:Name/Arity, is just a name and can be used freely. +A "local" fun contains code that is bound to the module it +was defined in. This means that you cannot save internal +funs in data bases or send them to remote systems and expect them to +work. - I propose a "portable fun", which is a syntactically restricted - kind of fun. The restriction ensures that a programmer knows - (and the run time can discover) exactly what modules are/will be - required. These funs can be safely sent to remote nodes, and - can safely be stored in data bases, retrieved at a later time, - and executed. Nor need a process holding a reference to such a - fun be killed when the module it came from is unloaded. +I propose a "portable fun", which is a syntactically restricted +kind of fun. The restriction ensures that a programmer knows +(and the run time can discover) exactly what modules are/will be +required. These funs can be safely sent to remote nodes, and +can safely be stored in data bases, retrieved at a later time, +and executed. Nor need a process holding a reference to such a +fun be killed when the module it came from is unloaded. - A new way of implementing these funs is required for best speed, - so this is quite a large change. However, a prototype that - interpreted portable functions would be possible. +A new way of implementing these funs is required for best speed, +so this is quite a large change. However, a prototype that +interpreted portable functions would be possible. Specification - - Currently, Erlang has - - fun_expr -> 'fun' fun_clauses 'end' : ... - - We add - - fun_expr -> 'fun' '!' fun_clauses 'end' : ... - - and make the following restrictions: - - (a) A portable fun may not contain plain funs. - (b) A portable fun may not contain a call f(...) - without a module prefix unless f is a built-in - function. - (c) A portable fun may not contain any call of the - form M:f(...) or m:F(...) or M:F(...). - (d) A portable fun may not contain any call of the - form F(...) unless F is bound in its head. - (e) In a system where abstract patterns are available, - they are restricted the same way as function calls. - - The intent of these restrictions is to ensure that - every call is to a built in function, a known export - of a known module, or to some kind of fun received as - a parameter. - - The built-in function erlang:fun_info/1 is extended in - the following ways: - (1) In a {type,Type} item, Type may be 'portable'. - (2) In a {module,Module} item for a portable fun, the Module - will be present, but there will in fact be no other - connection between a portable fun and any module by that - name. - (3) In a {name,Name} item for a portable fun, - Name will always be []. - (4) None of the items specified for 'local' funs will be - returned for 'portable' funs. - (5) {calls,List} will be returned for a portable fun, - where List is a list of {Module,Imports} pairs, where - each Module that is used in a remote call in the fun is - listed once, and the Imports are a list of {Name,Arity} - pairs as reported in *:module_info/0. This permits the - receiver of a portable fun to determine which modules - need loading and which functions they are expected to - export. - (6) For consistency, - erlang:fun_info(fun M:F/A, calls) - => [{M,[{F,A}]}] - - The built-in function erlang:fun_info/2 is extended similarly. - An additional key 'source' is provided for this function. - fun_info(Fun, source) - - for a local fun, the result is 'undefined'. - - for an external fun, the result is the abstract syntax - tree the parser returns for fun M:F/A. - - for a portable fun, the result is the abstract syntax - tree the parser returned for the fun!..... end form - it came from. - - The built-in functions-and-guard-predicates - erlang:is_function(Term) and erlang:is_function(Term, Arity) - accept portable funs as well as external and local ones. - - Two new built-in functions-and-guard-predicates - erlang:is_portable_function(Term) - erlang:is_portable_function(Term, Arity) - are provided, which recognise 'portable' and 'external' functions. - (This proposal will definitely need to be revised to make the - names clearer.) +============= + +Currently, Erlang has + + fun_expr -> 'fun' fun_clauses 'end' : ... + +We add + + fun_expr -> 'fun' '!' fun_clauses 'end' : ... + +and make the following restrictions: + +1. A portable fun may not contain plain funs. +2. A portable fun may not contain a call f(...) + without a module prefix unless f is a built-in function. +3. A portable fun may not contain any call of the + form M:f(...) or m:F(...) or M:F(...). +4. A portable fun may not contain any call of the + form F(...) unless F is bound in its head. +5. In a system where abstract patterns are available, + they are restricted the same way as function calls. + +The intent of these restrictions is to ensure that +every call is to a built in function, a known export +of a known module, or to some kind of fun received as +a parameter. + +The built-in function erlang:fun_info/1 is extended in +the following ways: + +1. In a {type,Type} item, Type may be 'portable'. +2. In a {module,Module} item for a portable fun, the Module + will be present, but there will in fact be no other + connection between a portable fun and any module by that name. +3. In a {name,Name} item for a portable fun, + Name will always be []. +4. None of the items specified for 'local' funs will be + returned for 'portable' funs. +5. {calls,List} will be returned for a portable fun, + where List is a list of {Module,Imports} pairs, where + each Module that is used in a remote call in the fun is + listed once, and the Imports are a list of {Name,Arity} + pairs as reported in *:module_info/0. This permits the + receiver of a portable fun to determine which modules + need loading and which functions they are expected to export. +6. For consistency, + erlang:fun_info(fun M:F/A, calls) + => [{M,[{F,A}]}] + +The built-in function erlang:fun_info/2 is extended similarly. +An additional key 'source' is provided for this function. + +###fun_info(Fun, source)### +- for a local fun, the result is 'undefined'. +- for an external fun, the result is the abstract syntax + tree the parser returns for fun M:F/A. +- for a portable fun, the result is the abstract syntax + tree the parser returned for the fun!..... end form + it came from. + +The built-in functions-and-guard-predicates +erlang:is_function(Term) and erlang:is_function(Term, Arity) +accept portable funs as well as external and local ones. + +Two new built-in functions-and-guard-predicates +erlang:is_portable_function(Term) and +erlang:is_portable_function(Term, Arity) +are provided, which recognise 'portable' and 'external' functions. +(This proposal will definitely need to be revised to make the +names clearer.) Motivation - - Imagine that you have an Erlang node reporting events to clients - on other nodes. Clients wish to receive only a few of the events. - One approach is for the reporter to send all events to all clients - and let the clients do the filtering. A better approach lets the - clients tell the reporter which events they want, and for it to - send just the interesting events. But how do the clients tell the - reporter which events they are interested in? - - One approach is to simply have a fixed set of event classes. - That is too coarse. - - Another approach would be to define an event description language, - perhaps based in some way on match specifications. - That is better, but there is currently no way to compile match - specifications (that's another thing this is for!) so matching is - slow, and it is still limited; the reporter might want to provide - summary functions that the filters can use. - - Another approach would be to send a fun, which is really the - obvious way to do it. Unfortunately, this currently will not work, - and there are reasons why it shouldn't. (For example, the body of - a local function may have been subject to inline expansion of - functions whose definitions on the receiving node are different.) - - Another approach would be to send an entire module as a binary. - This gets a bit heavyweight. It also creates a problem of managing - possibly large numbers of modules in the reporter. It is also - insecure unless the reporter does a lot of work to verify the code - for safety. Long term, it will also create version skew problems - if the client and reporter are not using exactly the same BEAM - (or other VM). - - For another example, consider storing functions in a data base. - Since a local fun is tied to a specific version of a specific - module, if you save a function one month, upgrade your system, - and restore the module next month, you cannot expect it to work. - This means that, for example, you cannot store a binary together - with a function that knows how to decode it. - - For another example, consider something like a data base that - dynamically receive matchspecs (or something like matchspecs) - and wishes to apply such a thing to millions of records. It - is easy enough to transform a matchspec to Erlang code, and - even to compile the result, but now you have a module to manage, - not a simple thing that can be cleaned up by a garbage collector. - - Basically, the aim of this proposal is to move Erlang one step - further along the "functions are data" functional programming way. - - However, it is necessary to do this in such a way that a process - receiving a portable fun does not have to place total trust in - the source. The receiver must be able to inspect a portable fun - as well as just call it. +========== + +Imagine that you have an Erlang node reporting events to clients +on other nodes. Clients wish to receive only a few of the events. +One approach is for the reporter to send all events to all clients +and let the clients do the filtering. A better approach lets the +clients tell the reporter which events they want, and for it to +send just the interesting events. But how do the clients tell the +reporter which events they are interested in? + +One approach is to simply have a fixed set of event classes. +That is too coarse. + +Another approach would be to define an event description language, +perhaps based in some way on match specifications. +That is better, but there is currently no way to compile match +specifications (that's another thing this is for!) so matching is +slow, and it is still limited; the reporter might want to provide +summary functions that the filters can use. + +Another approach would be to send a fun, which is really the +obvious way to do it. Unfortunately, this currently will not work, +and there are reasons why it shouldn't. (For example, the body of +a local function may have been subject to inline expansion of +functions whose definitions on the receiving node are different.) + +Another approach would be to send an entire module as a binary. +This gets a bit heavyweight. It also creates a problem of managing +possibly large numbers of modules in the reporter. It is also +insecure unless the reporter does a lot of work to verify the code +for safety. Long term, it will also create version skew problems +if the client and reporter are not using exactly the same BEAM +(or other VM). + +For another example, consider storing functions in a data base. +Since a local fun is tied to a specific version of a specific +module, if you save a function one month, upgrade your system, +and restore the module next month, you cannot expect it to work. +This means that, for example, you cannot store a binary together +with a function that knows how to decode it. + +For another example, consider something like a data base that +dynamically receive matchspecs (or something like matchspecs) +and wishes to apply such a thing to millions of records. It +is easy enough to transform a matchspec to Erlang code, and +even to compile the result, but now you have a module to manage, +not a simple thing that can be cleaned up by a garbage collector. + +Basically, the aim of this proposal is to move Erlang one step +further along the "functions are data" functional programming way. + +However, it is necessary to do this in such a way that a process +receiving a portable fun does not have to place total trust in +the source. The receiver must be able to inspect a portable fun +as well as just call it. Rationale +========= - It would not be a good idea to just add the portability - restrictions on top of existing fun syntax. That would break - most programs that use funs. - - Perhaps the obvious thing would be to use #fun...end, as the - sharp seems to be Erlang's "oops, we didn't think of that in the - Good Old Days" marker, much as it is in Common Lisp. However, we - want that notation for anonymous abstract patterns, and in any - case, there is nothing iconic about the sharp in this context. - - The bang is used to suggest that this is a kind of fun that you - might want to send, which indeed it is. As for where it is - placed, the bang is to be thought of as post-modifying the 'fun' - keyword, not as pre-modifying the argument list, so that - fun!({a,X}) -> ... - ;({b,Y}) -> ... - end - does not have a repeated bang. - - What do you send when you send a portable fun? - - the environment, of course - - some sort of header, of course - - but what does the CODE look like? - If it is native code, you can't send a fun from a SPARC to a Mac. - If it is BEAM code, you can't send a fun to another system unless - it has exactly the same version of BEAM. - In either case, you have made life extremely hard for a wary - receiver that wants to inspect the code. - If it is the source code, then - + it can be (lazily!) compiled to BEAM (or some other VM) - + it can be interpreted - + it can be debugged - + it can be inspected - + we don't have to worry about how the compiler deals with - comprehensions -- sadly, the current compiler generates - recursive auxiliary functions, which complicates things, - and better approaches are possible - - Accordingly, the binary format for a portable fun would include - the source tree, possibly compressed as in Kistler's Juice. - The native representation would include a pointer to a block of - BEAM code and optionally a pointer to a block of native code, - but these would be filled in on first call. - - The possibility of interpretation means that there is a cheap way - to implement a prototype of this EEP: always interpret. This too - argues against any change to existing funs; we don't want to slow - them down. +It would not be a good idea to just add the portability +restrictions on top of existing fun syntax. That would break +most programs that use funs. +Perhaps the obvious thing would be to use #fun...end, as the +sharp seems to be Erlang's "oops, we didn't think of that in the +Good Old Days" marker, much as it is in Common Lisp. However, we +want that notation for anonymous abstract patterns, and in any +case, there is nothing iconic about the sharp in this context. +The bang is used to suggest that this is a kind of fun that you +might want to send, which indeed it is. As for where it is +placed, the bang is to be thought of as post-modifying the 'fun' +keyword, not as pre-modifying the argument list, so that -Backwards Compatibility + fun!({a,X}) -> ... + ;({b,Y}) -> ... + end - "fun!" is currently a syntax error, - so no existing code can be affected by that. +does not have a repeated bang. - As I read the documentation for erlang:fun_info/[1,2], - programmers should always have treated these functions as - open-ended. Nothing promised by the existing manual is - removed or altered, only new values provided. +What do you send when you send a portable fun? - Any existing program that called - erlang:is_portable_function/[1,2] - didn't work anyway, there being no such functions. - If a module defined is_portable_function/1 or /2, - it would not have been allowed in a guard, but would have - been allowed elsewhere; such a module could be affected. - If the compiler discovers a definition of either function - in a module, it should print a warning message, and use only - the module's version. +- the environment, of course +- some sort of header, of course +- but what does the CODE look like? +If it is native code, you can't send a fun from a SPARC to a Mac. +If it is BEAM code, you can't send a fun to another system unless +it has exactly the same version of BEAM. +In either case, you have made life extremely hard for a wary +receiver that wants to inspect the code. +If it is the source code, then ++ it can be (lazily!) compiled to BEAM (or some other VM) ++ it can be interpreted ++ it can be debugged ++ it can be inspected ++ we don't have to worry about how the compiler deals with + comprehensions -- sadly, the current compiler generates + recursive auxiliary functions, which complicates things, + and better approaches are possible -Reference Implementation +Accordingly, the binary format for a portable fun would include +the source tree, possibly compressed as in Kistler's Juice. +The native representation would include a pointer to a block of +BEAM code and optionally a pointer to a block of native code, +but these would be filled in on first call. - None. +The possibility of interpretation means that there is a cheap way +to implement a prototype of this EEP: always interpret. This too +argues against any change to existing funs; we don't want to slow +them down. - Long term, this needs at least two things: - (A) a fun representation that holds instructions in a binary - that is not part of any module, not unlike the classic - Interlisp-D implementation, so that such funs can be - individually garbage collected. This is desirable anyway. - (B) a compilation strategy for comprehensions that, like the - classic Pop-2 system, generates in-line loops instead of - calls to out-of-line auxiliary functions. This is - desirable anyway; it should be noticeably faster. +Backwards Compatibility +======================= + +"fun!" is currently a syntax error, +so no existing code can be affected by that. + +As I read the documentation for erlang:fun_info/[1,2], +programmers should always have treated these functions as +open-ended. Nothing promised by the existing manual is +removed or altered, only new values provided. + +Any existing program that called +erlang:is_portable_function/[1,2] +didn't work anyway, there being no such functions. +If a module defined is_portable_function/1 or /2, +it would not have been allowed in a guard, but would have +been allowed elsewhere; such a module could be affected. +If the compiler discovers a definition of either function +in a module, it should print a warning message, and use only +the module's version. + + + +Reference Implementation +======================== + +None. +Long term, this needs at least two things: +1. a fun representation that holds instructions in a binary + that is not part of any module, not unlike the classic + Interlisp-D implementation, so that such funs can be + individually garbage collected. This is desirable anyway. -References - - None. +2. a compilation strategy for comprehensions that, like the + classic Pop-2 system, generates in-line loops instead of + calls to out-of-line auxiliary functions. This is + desirable anyway; it should be noticeably faster. Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0016.md b/eeps/eep-0016.md index 81f0f39..94b7aea 100644 --- a/eeps/eep-0016.md +++ b/eeps/eep-0016.md @@ -1,214 +1,224 @@ -EEP: 16 -Title: is_between/3 -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe -Status: Draft -Type: Standards Track -Erlang-Version: R12B-4 -Content-Type: text/plain -Created: 23-Jul-2008 -Post-History: + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Erlang-Version: R12B-4 + Created: 23-Jul-2008 + Post-History: +**** +EEP 16: is_between/3 +---- Abstract +======== - There should be a new built in function for guards, +There should be a new built in function for guards, is_between(Term, Lower_Bound, Upper_Bound) - which succeeds when Term, Lower_Bound, and Upper_Bound - are all integers, and Lower_Bound =< Term =< Upper_Bound. - +which succeeds when `Term`, `Lower_Bound`, and `Upper_Bound` +are all integers, and `Lower_Bound =< Term =< Upper_Bound`. Specification +============= - A new guard BIF is added. +A new guard BIF is added. is_between(Term, LB, UB) - In expression use, if LB or UB is not an integer, - a badarith exception is thrown, just like an attempt to - do remainder or bitwise operations on non-integer arguments. - In guard use, that exception becomes failure. +In expression use, if LB or UB is not an integer, +a badarith exception is thrown, just like an attempt to +do remainder or bitwise operations on non-integer arguments. +In guard use, that exception becomes failure. + +This is a type test which succeeds (or returns true) if +Term is an integer and lies between LB and UB inclusive, +and fails (or returns false) for other values of Term. + +As an expression, it has the same effect as + + ( X = Term, Y = LB, Z = UB, + Y bor Z, + ( is_integer(X), X >= Y, X =< Z ) + ) + +where X, Y, and Z are new variables that are not exported. + +In particular, + + is_integer(tom, dick, harry) - This is a type test which succeeds (or returns true) if - Term is an integer and lies between LB and UB inclusive, - and fails (or returns false) for other values of Term. +should raise an exception, not return false, as `is_integer(Term)` +is only tested after LB and UB have been found to be integers. - As an expression, it has the same effect as - ( X = Term, Y = LB, Z = UB, - Y bor Z, - ( is_integer(X), X >= Y, X =< Z ) - ) +As a guard test, it has the same effect as - where X, Y, and Z are new variables that are not exported. - In particular, - is_integer(tom, dick, harry) - should raise an exception, not return false, as is_integer(Term) - is only tested after LB and UB have been found to be integers. + ( X = Term, Y = LB, Z = UB, + is_integer(Y), is_integer(Z), is_integer(X), + X >= Y, X =< Z + ) - As a guard test, it has the same effect as - ( X = Term, Y = LB, Z = UB, - is_integer(Y), is_integer(Z), is_integer(X), - X >= Y, X =< Z - ) - would have, were that allowed. However, it admits a much - more efficient implementation. +would have, were that allowed. However, it admits a much +more efficient implementation. Motivation +========== - Currently some people test whether a variable is a byte thus: - - -define(is_byte(X), (X >= 0 andalso X =< 255)). +Currently some people test whether a variable is a byte thus: - This is actual current practice. However, it fails to check - that X is an integer, so ?is_byte(1.5) succeeds, it may - evaluate X twice, so ?is_byte((Pid ! 0)) will send two messages, - not the expected one, and the current Erlang compiler generates - noticeably worse code in guards for 'andalso' and 'orelse' than - it does for ',' and ';'. + -define(is_byte(X), (X >= 0 andalso X =< 255)). - It is also useful to test whether a subscript is in range, +This is actual current practice. However, it fails to check +that `X` is an integer, so `?is_byte(1.5)` succeeds, it may +evaluate `X` twice, so `?is_byte((Pid ! 0))` will send two messages, +not the expected one, and the current Erlang compiler generates +noticeably worse code in guards for 'andalso' and 'orelse' than +it does for ',' and ';'. - -define(in_range(X, T), (X >= 1 andalso X =< size(T))). +It is also useful to test whether a subscript is in range, - which has similar problems. + -define(in_range(X, T), (X >= 1 andalso X =< size(T))). - Using is_between, we can replace these definitions with +which has similar problems. - -define(is_byte(X), is_between(X, 0, 255)). - -define(in_range(X, T), is_between(X, 1, size(T))). +Using `is_between`, we can replace these definitions with - which are free of those problems + -define(is_byte(X), is_between(X, 0, 255)). + -define(in_range(X, T), is_between(X, 1, size(T))). + +which are free of those problems Rationale +========= + +One alternative to this design would be to follow the example +of Common Lisp (and the even earlier example of the systems +programming language on HP 3000s) and allow + + E1 =< E2 =< E3 % (<= E1 E2 E3) in Lisp - One alternative to this design would be to follow the example - of Common Lisp (and the even earlier example of the systems - programming language on HP 3000s) and allow - E1 =< E2 =< E3 % (<= E1 E2 E3) in Lisp - (and possibly also - E1 =< E2 < E3 - E1 < E2 =< E3 - E1 < E2 < E3) % (< E1 E2 E3) in Lisp - as guards and expressions, evaluating each expression exactl - once. I am very fond of this syntax and would be pleased to - see it. This would resolve the double evaluation of E2, the - possible non-evaluation of E3, and the inefficiency of 'andalso'. - However, it would not address the problem that a byte or an - index is not just a NUMBER in a certain range, but an INTEGER. - If Erlang had multiple comparison syntax, there would still be - a use for is_between/3. +(and possibly also + + E1 =< E2 < E3 + E1 < E2 =< E3 + E1 < E2 < E3) % (< E1 E2 E3) in Lisp + +as guards and expressions, evaluating each expression exactl +once. I am very fond of this syntax and would be pleased to +see it. This would resolve the double evaluation of `E2`, the +possible non-evaluation of `E3`, and the inefficiency of 'andalso'. +However, it would not address the problem that a byte or an +index is not just a NUMBER in a certain range, but an INTEGER. +If Erlang had multiple comparison syntax, there would still be +a use for `is_between/3`. Backwards Compatibility +======================= - Code that defines a function named is_between/3 will be - affected. Since the Erlang compiler parses an entire - module before semantic analysis, it's easy to - - check for a definition of is_between/3 - - warn if one is present - - disable the new built-in in such a case. +Code that defines a function named `is_between/3` will be +affected. Since the Erlang compiler parses an entire +module before semantic analysis, it's easy to +- check for a definition of `is_between/3` +- warn if one is present +- disable the new built-in in such a case. Reference Implementation - - There is none. However, we can sketch one. - Two new BEAM instructions are required: - - {test,is_between,Lbl,[Src1,Src2,Src3]} - {bif,is_between,?,[Src1,Src2,Src3],Dst} - - The test does - if Src2 is not an integer, goto Lbl. - if Src3 is not an integer, goto Lbl. - if Src1 is not an integer, goto Lbl. - if Src1 < Src2, goto Lbl. - if Src3 < Src1, goto Lbl. - - The bif does - if Src2 is not an integer, except! - if Src3 is not an integer, except! - if Src1 is not an integer - or Src1 < Src2 - or Src3 < Src1 - then move 'false' to Dst - else move 'true' to Dst. - - Nothing here is fundamentally new, and only my unfamiliarity with - how to add instructions to the emulator prevents me doing it. And - my total ignorance of how to tell HiPE about them! - - There might be some point in having variants of these instructions - for use when Src2 and Src3 are integer literals; I would certainly - expect HiPE to elide redundant tests here. - - The compiler would simply recognise is_between/3 and emit the - appropriate BEAM rather like it recognises is_atom/1. - My ignorance of how to extend the emulator is exceeded by my - ignorance of how to extend the compiler. Certainly we'd need - ... - is_bif(erlang, is_between, 3) -> true; - ... - is_guard_bif(erlang, is_between, 3) -> true; - ... - is_pure(erlang, is_between, 3) -> true; - ... - (but NOT an is_safe rule) in erl_bifs.erl. Or would we? I've - not been able to figure out where is_guard_bif/3 is called. - There will need to be a new entry in genop.tab as well. - Ohhh, erl_internal.erl is in .../stdlib, not .../compiler. - OK, so a couple of functions in erl_internal.erl need to be patched - to recognise is_between/3; what needs changing to generate BEAM? - The annoying thing is that if I knew my way around the compiler, - it would be easier to add this than to write it up. - - Here's some text to go in the documentation: - ---------------------------------------------------------------- - is_integer(Term, LB, UB) -> bool() - - Types: - Term = term() - LB = integer() - UB = integer() - - Returns true if Term is an integer lying between LB - and UB inclusive (LB =< Term, Term =< UB); otherwise - returns false. In an expression, raises an exception - if LB or UB is not an integer. Having UB < LB is not - an error. - - Allowed in guard tests. - ---------------------------------------------------------------- - - - -References - - None. +======================== + +There is none. However, we can sketch one. +Two new BEAM instructions are required: + + {test,is_between,Lbl,[Src1,Src2,Src3]} + {bif,is_between,?,[Src1,Src2,Src3],Dst} + +The test does + + if Src2 is not an integer, goto Lbl. + if Src3 is not an integer, goto Lbl. + if Src1 is not an integer, goto Lbl. + if Src1 < Src2, goto Lbl. + if Src3 < Src1, goto Lbl. + +The bif does + + if Src2 is not an integer, except! + if Src3 is not an integer, except! + if Src1 is not an integer + or Src1 < Src2 + or Src3 < Src1 + then move 'false' to Dst + else move 'true' to Dst. + +Nothing here is fundamentally new, and only my unfamiliarity with +how to add instructions to the emulator prevents me doing it. And +my total ignorance of how to tell HiPE about them! + +There might be some point in having variants of these instructions +for use when Src2 and Src3 are integer literals; I would certainly +expect HiPE to elide redundant tests here. + +The compiler would simply recognise `is_between/3` and emit the +appropriate BEAM rather like it recognises `is_atom/1`. +My ignorance of how to extend the emulator is exceeded by my +ignorance of how to extend the compiler. Certainly we'd need + + ... + is_bif(erlang, is_between, 3) -> true; + ... + is_guard_bif(erlang, is_between, 3) -> true; + ... + is_pure(erlang, is_between, 3) -> true; + ... + +(but NOT an `is_safe` rule) in `erl_bifs.erl`. Or would we? I've +not been able to figure out where `is_guard_bif/3` is called. +There will need to be a new entry in genop.tab as well. +Ohhh, `erl_internal.erl` is in `.../stdlib`, not `.../compiler`. +OK, so a couple of functions in `erl_internal.erl` need to be patched +to recognise `is_between/3`; what needs changing to generate BEAM? +The annoying thing is that if I knew my way around the compiler, +it would be easier to add this than to write it up. + +Here's some text to go in the documentation: +> is_integer(Term, LB, UB) -> bool() +> +> Types: +> Term = term() +> LB = integer() +> UB = integer() +> +> Returns true if Term is an integer lying between LB +> and UB inclusive (LB =< Term, Term =< UB); otherwise +> returns false. In an expression, raises an exception +> if LB or UB is not an integer. Having UB < LB is not +> an error. +> +> Allowed in guard tests. Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0017.md b/eeps/eep-0017.md index 6bdb5c3..f57116d 100644 --- a/eeps/eep-0017.md +++ b/eeps/eep-0017.md @@ -1,173 +1,179 @@ -EEP: 17 -Title: Fix andalso and orelse -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe -Status: Draft -Type: Standards Track -Erlang-Version: R12B-4 -Content-Type: text/plain -Created: 23-Jul-2008 -Post-History: + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Erlang-Version: R12B-4 + Created: 23-Jul-2008 + Post-History: +**** +EEP 17: Fix andalso and orelse +---- Abstract +======== - Erlang 5.1 added the ability to use 'andalso', 'orelse', - 'and', and 'or' in guards. However, the semantics for - 'andalso' and 'orelse' differs from that in other related - languages, causing confusion and inefficiency. +Erlang 5.1 added the ability to use 'andalso', 'orelse', +'and', and 'or' in guards. However, the semantics for +'andalso' and 'orelse' differs from that in other related +languages, causing confusion and inefficiency. - I propose making 'andalso' and 'orelse' work like Lisp - AND and OR respectively. +I propose making 'andalso' and 'orelse' work like Lisp +AND and OR respectively. Specification +============= - Currently, (E1 andalso E2) as an expression acts like +Currently, (E1 andalso E2) as an expression acts like - case E1 - of false -> false - ; true -> case E2 - of false -> false - ; true -> true - end - end + case E1 + of false -> false + ; true -> case E2 + of false -> false + ; true -> true + end + end - except that in my tests the former raises {badarg,NonBool} - exceptions and the latter raises {case_clause,NonBool} ones. +except that in my tests the former raises `{badarg,NonBool}` +exceptions and the latter raises `{case_clause,NonBool}` ones. - This should be changed to +This should be changed to - case E1 - of false -> false - ; true -> E2 - end. + case E1 + of false -> false + ; true -> E2 + end. - Currently, (E1 orelse E2) as an expression acts like +Currently, (E1 orelse E2) as an expression acts like - case E1 - of true -> true - ; false -> case E2 - of true -> true - ; false -> false - end - end + case E1 + of true -> true + ; false -> case E2 + of true -> true + ; false -> false + end + end - except that in my tests the former raises {badarg,NonBool} - exceptions and the latter raises {case_clause,NonBool} ones. +except that in my tests the former raises `{badarg,NonBool}` +exceptions and the latter raises `{case_clause,NonBool}` ones. - This should be changed to +This should be changed to - case E1 - of true -> true - ; false -> E2 - end + case E1 + of true -> true + ; false -> E2 + end - There is apparently a folklore belief that using 'andalso' (or - 'orelse') in a guard will somehow give you better code than using - ',' (or ';'). On the contrary, you will get rather worse code. - See "Motivation" for an example. This should change. +There is apparently a folklore belief that using 'andalso' (or +'orelse') in a guard will somehow give you better code than using +',' (or ';'). On the contrary, you will get rather worse code. +See "Motivation" for an example. This should change. guard ::= gconj {';' gconj}* gconj ::= gtest {',' gtest}* gtest ::= '(' guard ')' | ... - First, we allow ',' and ';' to nest, using parentheses. - Second, we rule that as outer operators in a guard, the - only difference between ',' and 'andalso' is precedence, - and the only difference between ';' and 'orelse' is also - precedence. In a guard test like - is_atom(X andalso Y) - the andalso cannot be replaced by ',', but whenever one - COULD be replaced by the other, they should have the same - effect. +First, we allow ',' and ';' to nest, using parentheses. +Second, we rule that as outer operators in a guard, the +only difference between ',' and 'andalso' is precedence, +and the only difference between ';' and 'orelse' is also +precedence. In a guard test like + + is_atom(X andalso Y) + +the 'andalso' cannot be replaced by ',', but whenever one +COULD be replaced by the other, they should have the same effect. Motivation +========== - Cultural consistency. +### Cultural consistency ### - Common Lisp: +* Common Lisp (defun member-p (X Xs) (and (consp Xs) (or (equal X (first Xs)) (member-p X (rest Xs))))) - Scheme: +* Scheme (define (member? X Xs) (and (pair? Xs) (or (equal? X (car Xs)) (member? X (cdr Xs))))) - Standard ML: +* Standard ML fun is_member(x, xs) = not (null xs) andalso ( x = hd xs orelse is_member(x, tl xs)) - Haskell: +* Haskell x `is_member_of` xs = not (null xs) && (x == head xs || x `is_member_of` tail xs) - Dylan: +* Dylan - I don't know Dylan syntax well enough to finish this - example, but I do know that '&' and '|' in Dylan are exactly - like AND and OR in Common Lisp except for syntax. (They are - documented as allowing the right operand to return anything, - including multiple values.) + I don't know Dylan syntax well enough to finish this + example, but I do know that '&' and '|' in Dylan are exactly + like AND and OR in Common Lisp except for syntax. (They are + documented as allowing the right operand to return anything, + including multiple values.) - Python: +* Python def is_member(x, xs): n = len(xs) return n > 0 and (x == xs[0] or is_member(x, xs[1:n])) - I'm not perfectly sure about this, but the reference manual - is very explicit that the second operand of 'and' or 'or' can - be anything. + I'm not perfectly sure about this, but the reference manual + is very explicit that the second operand of 'and' or 'or' can + be anything. + +* Smalltalk + + Doing this example this way in Smalltalk requires considerable + pain in going against the grain of Smalltalk, however the + 'and:' and 'or:' selectors in Smalltalk DO check that their + first argument is Boolean and DON'T check anything about (the + result of) their second argument. + +In all of these, the "and" and "or" operations work exactly the +same way, and in the languages whose implementations support tail +recursion (Common Lisp, Scheme, Standard ML, Haskell), the +function shown above is tail recursive. (I could have added more +languages to the list.) - Smalltalk: +Erlang stands out. The behaviour of 'andalso' is surprising, and +the fact that 'andalso' and 'orelse' block tail recursion is quite +astonishing. I am all in favour of giving programmers shocks that +teach them something useful about programming, but this one is not +a useful lesson. Testing both arguments of 'and' and 'or' makes +sense, because the code executed for those operators always GETS +the values of both operands. But 'andalso' and 'orelse' only test +their second operand SOME of the time. - Doing this example this way in Smalltalk requires considerable - pain in going against the grain of Smalltalk, however the - 'and:' and 'or:' selectors in Smalltalk DO check that their - first argument is Boolean and DON'T check anything about (the - result of) their second argument. + X = 1, X >= 0 andalso X % checked error + X = 1, X < 0 andalso X % unchecked error - In all of these, the "and" and "or" operations work exactly the - same way, and in the languages whose implementations support tail - recursion (Common Lisp, Scheme, Standard ML, Haskell), the - function shown above is tail recursive. (I could have added more - languages to the list.) +There doesn't seem to be much point in checking SOME of the time, +especially when it does something as dramatic as blocking tail +recursion. - Erlang stands out. The behaviour of 'andalso' is surprising, and - the fact that 'andalso' and 'orelse' block tail recursion is quite - astonishing. I am all in favour of giving programmers shocks that - teach them something useful about programming, but this one is not - a useful lesson. Testing both arguments of 'and' and 'or' makes - sense, because the code executed for those operators always GETS - the values of both operands. But 'andalso' and 'orelse' only test - their second operand SOME of the time. - X = 1, X >= 0 andalso X % checked error - X = 1, X < 0 andalso X % unchecked error - There doesn't seem to be much point in checking SOME of the time, - especially when it does something as dramatic as blocking tail - recursion. +### Guards code ### - As for guards, here is a small example. +As for guards, here is a small example - f(X) when X >= 0, X < 1 -> math:sqrt(X). + f(X) when X >= 0, X < 1 -> math:sqrt(X). - This compiles to the following rather obvious code: +This compiles to the following rather obvious code: function, f, 1, 2}. {label,1}. @@ -177,13 +183,13 @@ Motivation {test,is_lt,{f,1},[{x,0},{integer,1}]}. {call_ext_only,1,{extfunc,math,sqrt,1}}. - Some people expect 'andalso' to do as well or better. - I expected it to do the same, and this EEP requires it to. - Here's the source code: +Some people expect 'andalso' to do as well or better. +I expected it to do the same, and this EEP requires it to. +Here's the source code: - g(X) when X >= 0 andalso X < 1 -> math:sqrt(X). + g(X) when X >= 0 andalso X < 1 -> math:sqrt(X). - and here are the BEAM instructions: +and here are the BEAM instructions: {function, g, 1, 4}. {label,3}. @@ -205,89 +211,86 @@ Motivation {deallocate,1}. {jump,{f,3}}. - It not only does a lot more work, it even allocates a stack - frame that the traditional code does not. +It not only does a lot more work, it even allocates a stack +frame that the traditional code does not. Rationale +========= - There are several ways to deal with the surprising behaviour - of 'andalso' and 'orelse'. +There are several ways to deal with the surprising behaviour +of 'andalso' and 'orelse'. - 0. Leave things the way they are. +0. Leave things the way they are. - The manual should have lots of warnings added, - saying not to use these operators, because they block - tail recursion and are inefficient in guards. + The manual should have lots of warnings added, + saying not to use these operators, because they block + tail recursion and are inefficient in guards. - It is reasonable to address other issues first, but it just - will not do long term. You don't have to rush around - bandaging everyone you meet, but you shouldn't build pit - traps in front of them either. + It is reasonable to address other issues first, but it just + will not do long term. You don't have to rush around + bandaging everyone you meet, but you shouldn't build pit + traps in front of them either. - 1. Remove them from the language. +1. Remove them from the language. - I would prefer this. And that goes double for 'and' and 'or', - which seem to be completely pointless, as well as confusing. - I do not think this would be practical politics. + I would prefer this. And that goes double for 'and' and 'or', + which seem to be completely pointless, as well as confusing. + I do not think this would be practical politics. - 2. Add new operators with sensible semantics. +2. Add new operators with sensible semantics. - But what would we call them? 'and' and 'or' are taken, - and both '|' and '||' are used for something else. Above - all, 'andalso' and 'orelse' would still be there, and still - be surprising (in a bad way). We have too many ways to - spell "or" as it is. + But what would we call them? 'and' and 'or' are taken, + and both '|' and '||' are used for something else. Above + all, 'andalso' and 'orelse' would still be there, and still + be surprising (in a bad way). We have too many ways to + spell "or" as it is. - 3. Fix them. +3. Fix them. - As for the recommendation that ',' and ';' should nest, - I want Erlang to be simple to think. If 'andalso' and 'orelse' - are to act like ',' and ';' in guards -- which I've argued - above -- then clearly ',' and ';' should act like 'andalso' - and 'orelse' in guards. +As for the recommendation that ',' and ';' should nest, +I want Erlang to be simple to think. If 'andalso' and 'orelse' +are to act like ',' and ';' in guards -- which I've argued +above -- then clearly ',' and ';' should act like 'andalso' +and 'orelse' in guards. Backwards Compatibility +======================= - Any code that ran without raising exceptions will continue - to produce the same results, except for running faster. +Any code that ran without raising exceptions will continue +to produce the same results, except for running faster. - Code that did raise exceptions may raise different exceptions - elsewhere later, or may quietly complete in unexpected ways. - I believe it to be unlikely that anyone deliberately relied - on (E1 andelse 0) raising an exception. +Code that did raise exceptions may raise different exceptions +elsewhere later, or may quietly complete in unexpected ways. +I believe it to be unlikely that anyone deliberately relied +on (E1 andelse 0) raising an exception. - Code that was previously broken because these operators have - such surprising behaviour will now work in more cases. +Code that was previously broken because these operators have +such surprising behaviour will now work in more cases. Reference Implementation +======================== - None. - - - - -References - - None. +None. Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0018.md b/eeps/eep-0018.md index f446934..7d59232 100644 --- a/eeps/eep-0018.md +++ b/eeps/eep-0018.md @@ -1,35 +1,34 @@ -EEP: 18 -Title: JSON bifs -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe -Status: Draft -Type: Standards Track -Erlang-Version: R12B-4 -Content-Type: text/plain -Created: 28-Jul-2008 -Post-History: + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Erlang-Version: R12B-4 + Created: 28-Jul-2008 + Post-History: +**** +EEP 18: JSON bifs +---- Abstract +======== - According to the JSON web site [1], - "JSON (JavaScript Object Notation) is a lightweight - data-interchange format. It is easy for humans to read and write. - It is easy for machines to parse and generate." +According to the [JSON web site][1], +"JSON (JavaScript Object Notation) is a lightweight +data-interchange format. It is easy for humans to read and write. +It is easy for machines to parse and generate." - JSON is specified by RFC 4627 [2], which defines a Media Type - application/json. +JSON is specified by [RFC 4627][2], which defines a Media Type +application/json. - There are JSON libraries for a wide range of languages, so it is a - useful format. CouchDB [6] uses JSON as its storage format and in - its RESTful interface; it offers an alternative to Mnesia for some - projects, and is accessible from many more languages. There are - already JSON bindings for Erlang, such as the rfc4627 [7] module - from LShift, but on the 24th of July 2008, Joe Armstrong suggested - that it would be worth having built in functions to convert Erlang - terms to and from the JSON format. +There are JSON libraries for a wide range of languages, so it is a +useful format. [CouchDB][6] [uses][6b] JSON as its storage format and in +its RESTful interface; it offers an alternative to Mnesia for some +projects, and is accessible from many more languages. There are +already JSON bindings for Erlang, such as the [rfc4627][7] module +from LShift, but on the 24th of July 2008, Joe Armstrong suggested +that it would be worth having built in functions to convert Erlang +terms to and from the JSON format. term_to_json -- convert a term to JSON form json_to_term -- convert a JSON form to Erlang @@ -37,924 +36,969 @@ Abstract Specification +============= - Three new types are added to the vocabulary of well known - types to be used in edoc. +Three new types are added to the vocabulary of well known +types to be used in edoc. + + @type json_label() = atom() + binary(). + @type json(L, N) = null + false + true + + N % some kind of number + + [{}] % empty "object" + + [{L, json(L,N)}] % non-empty "object" + + [json(L, N)]. % "array" + | [json(L, N)] | tuple({L, json(L, N)}). + @type json() = json(json_label(), number()). + + + +### New functions ### + +Four new functions are added to the erlang: module. - @type json_label() = atom() + binary(). - @type json(L, N) = null + false + true - + N % some kind of number - + [{}] % empty "object" - + [{L, json(L,N)}] % non-empty "object" - + [json(L, N)]. % "array" - | [json(L, N)] | tuple({L, json(L, N)}). - @type json() = json(json_label(), number()). - Four new functions are added to the erlang: module. erlang:json_to_term(IO_Data) -> json() erlang:json_to_term(IO_Data, Option_List) -> json() - Types: - IO_Data = iodata() - Option_List = [Option] - Option = {encoding,atom()} - | {float,bool()} - | {label,binary|existing_atom|atom} - - json_to_term(X) is equivalent to json_to_term(X, []). - - The IO_Data implies a sequence of bytes. - - The encoding option says what character encoding to use for - converting those bytes to characters. The default encoding - is UTF-8. All encodings supported elsewhere in Erlang should - be supported here. The JSON specification mentions - auto-detection of the encoding as a possibility; the ones - that can be detected include UTF-32-BE, UTF-32-LE, - UTF-16-BE, UTF-16-LE, UTF-8, and UTF-EBDIC. The encoding - 'auto' requests auto-detection. - - The {float,true} option says to convert all JSON numbers to - Erlang floats, even if they look like integers. - With this option, the result has type json(L, float()). - - The {float,false} option says to convert integers to integers; - it is the default. - With this option, the result has type json(L, number()). - - The {label,binary} option says to convert all JSON strings - to Erlang binaries, even if they are keys in key:value pairs. - With this option, the result has type json(binary(), N). - This is the default. - - The {label,atom} option says to convert keys to atoms if - possible, leaving other strings as binaries. - With this option, the result has type json(json_label(), N). - - The {label,existing_atom} option says to convert keys to - atoms if the atoms already exist, leaving other keys as - binaries. All other strings remain binaries too. - With this option, the result has type json(json_label(), N). - - Other options may be added in the future. - - The mapping from JSON to Erlang is described below in this - section. An argument that is not a well formed IO_Data, - or that cannot be decoded, or that when decoded does not - follow the rules of JSON syntax, results in a badarg - exception. [It would be nice if there were Erlang-wide - conventions for distinguishing these cases.] +Types: - erlang:term_to_json(JSON) -> binary() - erlang:term_to_json(JSON, Option_List) -> Binary() + IO_Data = iodata() + Option_List = [Option] + Option = {encoding,atom()} + | {float,bool()} + | {label,binary|existing_atom|atom} - Types: - JSON = json() - Option_List = [Option] - Option = {encoding,atom()} - | {space,int()} - | space - | {indent,int()} - | indent - - This is a function for producing portable JSON. - It is not intended as a means for encoding arbitrary Erlang - terms. Terms that do not fit into the mapping scheme - described below in this section result in a badarg exception. - The JSON RFC says that "The names within an object SHOULD be - unique." JSON terms that violate this should also result in - a badarg exception. - - term_to_json(X) is equivalent to term_to_json(X, []). - - Converting Erlang terms to JSON results in a (logical) - character sequence, which is encoded as a sequence of - bytes, which is returned as a binary. The default encoding - is UTF-8; this may be overridden by the encoding option. - Any encoding supported elsewhere in Erlang should be - supported here. - - There are two options for controlling white space. - By default, none is generated. - - {space,N}, where N is a non-negative integer, says to - add N spaces after each colon and comma. - 'space' is equivalent to {space,1}. - No other space is ever inserted. - - {indent,N}, where N is a non-negative integer, says - to add a line break and some indentation after each - comma. The indentation is N spaces for each enclosing - [] or {}. Note that this still does not result in any - other spaces being added; in particular ] and } will - not appear at the beginning of lines. - 'indent' is equivalent to {indent,1}. - - Other options may be added in the future. - - Converting JSON to Erlang. - - The keywords null, false, and true are converted to the - corresponding Erlang atoms. No other complete JSON forms - are converted to atoms. - - A number is converted to an Erlang float if - - it contains a decimal point, or - - it contains an exponent, or - - it is a negative zero, or - - the option {float,true} was passed. - A JSON number that looks like an integer other than -0 - will be converted to an Erlang integer unless {float,true} - was provided. - - When occurring as a label in an "object", a string may on - explicit request be converted to an Erlang atom, if possible. - Otherwise, a string is converted to a UTF-8-encoded binary, - whatever the encoding used by the data source. - An empty string is converted to an empty binary. - - A sequence is converted to an Erlang list. The elements have - the same order in the list as in the original sequence. - - A non-empty "object" is converted to a list of {Key,Value} - pairs suitable for processing with the 'proplists' module. - Note that proplists: does not require that keys be atoms. - An "object" with no key:value pairs is converted to - the list [{}], preserving the invariant that an object - is always represented by a non-empty list of tuples. - The proplists: module will correctly view [{}] as holding - no keys. - - Keys in the JSON form are always strings. A Key is converted - to an Erlang atom if and only if - - {label,atom} was specified or - {label,existing_atom} was specified and a suitable atom - already existed; and - - every character in the JSON string can be held in an atom. - Currently, only names made of Latin-1 characters can be turned - into atoms. Empty keys, "", are converted to empty atoms ''. - Keys are otherwise converted to binaries, using the UTF-8 - encoding, whatever the original encoding was. - - This means that if you read and convert a JSON term now, - and save the binary somewhere, then read and convert it in - a later fully-Unicode Erlang, you will find the - representations different. However, the order of the pairs - in a JSON "object" has no significance, and an implementation - of this specification is free to report them in any order it - likes (as given, reversed, sorted, sorted by some hash, you - name it). Within any particular Erlang version, this - conversion is a pure function, but different Erlang releases - may change the order of pairs, so you cannot expect exactly - the same term from release to release anyway. - - See the rationale for reasons why we do not convert to - a canonical form, for example by sorting. - - In the spirit of "be generous in what you accept, strict in - what you produce", it might be a good idea to accept unquoted - labels in the input. You can't accept just any old junk, - but allowing Javascript [8] IdentifierNames would make sense. - - IdentifierName = IdentifierStart IdentifierPart*. - IdentifierStart = UnicodeLetter | '$' | '_' | - '\u' HexDigit*4 - IdentifierPart = IdentifierStart | UnicodeCombiningMark | - UnicodeDigit | UnicodeConnectorPunctuation - - There are apparently JSON generators out there that do this, - so it would add value, but it is not _required_. - - Converting Erlang to JSON. - - The atoms null, false, and true are converted to the - corresponding JSON keywords. No other Erlang atoms are - allowed. - - An Erlang integer is converted to a JSON integer. - An Erlang float is converted to a JSON float, as precisely - as practical. An Erlang float which has an integral value - is written in such a way that it will read back as a float; - suitable methods include suffixing ".0" or "e0". - - An Erlang binary that is the UTF-8 representation of some - Unicode string is converted to a string. No other binaries - are allowed. - - An Erlang list all of whose elements are tuples is converted - to a JSON "object". If the list is [{}] it is converted to - "{}", otherwise all the tuples must have two elements and - the first must be an atom or binary; other tuples are not - allowed. For each {Key,Value} pair, the key must be an atom - or a binary that is the UTF-8 representation of some Unicode - string; the key is converted to a JSON string. The value must - be a JSON term. The order of the key:value pairs in the - output is the same as the order of the {Key,Value} pairs - in the list. A list with two equivalent keys is not allowed. - Two binaries, or two atoms, are equivalent iff they are equal. - An atom and a binary are equivalent if they would convert to - the same JSON string. - - Erlang tuples are not allowed except as elements of lists - that will be converted to JSON "objects". - No other tuples are allowed. - - An Erlang proper list whose elements are not tuples is - converted to a JSON sequence by converting its elements in - natural order. - - An improper list is not allowed. - - Other Erlang terms are not allowed. If you want to "tunnel" - other Erlang terms through JSON, fine, but it is entirely up - to you to do whatever conversion you want. +`json_to_term(X)` is equivalent to `json_to_term(X, [])`. +The `IO_Data` implies a sequence of bytes. +The encoding option says what character encoding to use for +converting those bytes to characters. The default encoding +is UTF-8. All encodings supported elsewhere in Erlang should +be supported here. The JSON specification mentions +auto-detection of the encoding as a possibility; the ones +that can be detected include UTF-32-BE, UTF-32-LE, +UTF-16-BE, UTF-16-LE, UTF-8, and UTF-EBDIC. The encoding +'auto' requests auto-detection. -Motivation +The `{float,true}` option says to convert all JSON numbers to +Erlang floats, even if they look like integers. +With this option, the result has type `json(L, float())`. - As Joe Armstrong put it in his message, - "JSON seems to be ubiquitous". - It should not only be supported, it should be supported - simply, efficiently, and reliably. +The `{float,false}` option says to convert integers to integers; +it is the default. With this option, the result has type +`json(L, number())`. - As noted above, http://www.ietf.org/rfc/rfc4627.txt - defines an application/json Media Type that Erlang - should be able to handle "out of the box". +The `{label,binary}` option says to convert all JSON strings +to Erlang binaries, even if they are keys in key:value pairs. +With this option, the result has type `json(binary(), N)`. +This is the default. +The `{label,atom}` option says to convert keys to atoms if +possible, leaving other strings as binaries. +With this option, the result has type `json(json_label(), N)`. +The `{label,existing_atom}` option says to convert keys to +atoms if the atoms already exist, leaving other keys as +binaries. All other strings remain binaries too. +With this option, the result has type `json(json_label(), N)`. -Rationale +Other options may be added in the future. - The very first question is whether the interface should be a - "value" interface (where a chunk of data is converted to an - Erlang term in one go) or an "event stream" interface, like - the classical ESIS interface offered by SGML parsers, for - some arcane reason known as SAX these days. - - There is room in the world for both kinds of interface. - This one is a "value" interface, which is best suited to - modest quantities of JSON data, less than a few megabytes say, - where the latency of waiting for the whole form before - processing any of it is not a problem. Someone else might - want to write an "event stream" EEP. - - Related to this issue, a JSON text must be an array or an object, - not, for example, a bare number. Or so says the JSON RFC. I do - not know whether all JSON libraries enforce this. Since a JSON - text must be [something] or {something}, JSON texts are self- - delimiting, and it makes sense to consume them one at a time from - a stream. Should that be part of this interface? Maybe, maybe - not. I note that you can separate parsing - - skip leading white space - - check for '[' or '{' - - keep on accumulating characters until you find a - matching ']' or '}', ignoring characters inside "". - from conversion. So I have separated them. This proposal only - addresses conversion. An extension should address parsing. It - might work better to have that as part of an event stream EEP. - - Let's consider conversion then. Round trip conversion fidelity - (X -> Y -> X should be an identity function) is always nice. Can - we have it? - - JSON has - - null - - false - - true - - number (integers, floats, and ratios are not distinguished) - - string - - sequence (called array) - - record (called object) - Erlang has - - atom - - number (integers and floats are distinguished) - - binary - - list - - tuple - - pid - - port - - reference - - fun - - More precisely, JSON syntax DOES make integers distinguishable - from floats; it is Javascript (when JSON is used with Javascript) - that fails to distinguish them. Since we would like to use JSON - to exchange data between Erlang, Common Lisp, Scheme, Smalltalk, - and above all Python, all of which have such a distinction, it is - fortunate that JSON syntax and the RFC allow the distinction. - - Clearly, Erlang->JSON->Erlang is going to be tricky. To take - just one minor point, neither www.json.org nor RFC 4627 makes - an promises whatever about the range of numbers that can be - passed through JSON. There isn't even any minimum range. It - seems as though a JSON implementation could reject all numbers - other than 0 as too large and still conform! This is stupid. - We can PROBABLY rely on IEEE doubles; we almost certainly cannot - expect to get large integers through JSON. - - Converting pids, ports, and references to textual form using - pid_to_list/1, erlang:port_to_list/1, and erlang:ref_to_list/1 - is possible. A built in function can certainly convert back - from textual form if we want it to. The problem is telling these - strings from other strings: when is "<0.43.0>" a pid and when is - it a string? As for funs, let's not go there. - - Basically, converting Erlang terms to JSON so that they can be - reconstructed as the same (or very similar) Erlang terms would - involve something like this: - - atom -> string - number -> number - binary -> {"type":"binary", "data":[]} - list -> , if it's a proper list - list -> {"type":"dotted", "data":, "end":} - tuple -> {"type":"tuple", "data":} - pid -> {"type":"pid", "data":} - port -> {"type":"port", "data":} - ref -> {"type":"ref", "data":} - fun -> {"module":, "name":, "arity":} - fun -> we're pushing things a bit for anything else. - - This is not part of the specification because I am not proposing - JSON as a representation for arbitrary Erlang data. I am making - the point that we COULD represent (most) Erlang data in JSON if - we really wanted to, but it is not an easy or natural fit. For - that we have Erlang binary format and we have UBF. To repeat, - we have no reason to believe that a JSON->JSON copier that works - by decoding JSON to an internal form and recoding it for output - will preserve Erlang terms, even encoded like this. - - No, the point of JSON support in Erlang is to let Erlang programs - deal with the JSON data that other people are sending around the - net, and to send JSON data to other programs (like scripts in Web - browsers) that are expecting plain old JSON. The round trip - conversion we need to care about is JSON -> Erlang -> JSON. - - Here too we run into problems. The obvious way to represent - {"a":A, "b":B} in Erlang is [{'a',A},{'b',B}], and the obvious - way to represent a string is as a list of characters. But in - JSON, an empty list, an empty "object", and an empty string are - all clearly distinct, so must be translated to different Erlang - terms. Bearing this in mind, here's a first cut at mapping - JSON to Erlang: - - - null => the atom 'null' - - false => the atom 'false' - - true => the atom 'true' - - number => a float if there is a decimal point or exponent, - => the float -0.0 if it is a minus sign followed by - one or more zeros, with or without a decimal point - or exponent - => an integer otherwise - - string => a UTF-8-encoded binary - - sequence => a list - - object => a list of {Key,Value} pairs - => the empty tuple {} for an empty {} object - - Since Erlang does not currently allow the full range of - Unicode characters in an atom, a Key should be an atom if - each character of a label fits in Latin 1, or a binary if - it does not. - - Let's examine "objects" a little more closely. Erlang - programmers are used to working with lists of {Key,Value} - pairs. The standard library even include orddict, which - works with just such lists (although they must be sorted). - However, there is something distasteful about having empty - objects convert to empty tuples, but non-empty objects to - empty lists, and there is also something distasteful about - lists converting to sequence or objects depending on what - is inside them. What is distasteful here has something to - do with TYPES. Erlang doesn't have static types, but that - does not mean that types are not useful as a design tool, - or that something resembling type consistency is not useful - to people. The fact that Erlang tuples happen to use curly - braces is just icing on the cake. The first draft of this - EEP used lists; that was entirely R.A.O'K's own work. It - was then brought to his attention that Joe Armstrong thought - converting "objects" to tuples was the right thing to do. - So the next draft did that. Then other alternatives were - brought up. I'm currently aware of - - - Objects are tuples - A. {{K1,V1}, ..., {Kn,Vn}}. - This is the result of list_to_tuple/1 applied to a - proplist. There are no library functions to deal - with such things, but they are unambiguous and - relatively space-efficient. - B. {object,[{K1,V1}, ..., {Kn,Vn}]} - This is a proplist wrapped in a tuple purely to - distinguish it from other lists. This offers - simple type testing (objects are tuples) and simple - field processing (they contain proplists). - There seems to be no consensus for what the tag - should be, 'obj' (gratuitous abbreviation), 'json' - (but even the numbers binaries and lists are JSON), - 'object' seems to be least objectionable. - C. {[{K1,V1},...,{Kn,Vn}]} - Like B, but there isn't any need for a tag. - A and B are due to Joe Armstrong; I cannot recall who - thought of C. It has recently had supporters. - - - Objects are lists - D. Empty objects are {}. - This was my original proposal. Simple but non-uniform - and clumsy. - E. Empty objects are [{}]. - This came from the Erlang mailing list; I have forgotten - who proposed it. It's brilliant: objects are always - lists of tuples. - F. Empty objects are 'empty'. - Like A but a tiny fraction more space-efficient. - - We can demonstrate handling "objects" in each of these forms: - - json:is_object(X) -> is_tuple(X). % A - - json:is_object({object,X}) -> is_list(X). % B - - json:is_object({X}) -> is_list(X). % C - - json:is_object({}) -> true; % D - json:is_object([{_,_}|_]) -> true; - json:is_object(_) -> false. - - json:is_object([X|_]) -> is_tuple(X). % E - - json:is_object(empty) -> true; % F - json:is_object([{_,_}|_]) -> true; - json:is_object(_) -> false. - - Of these, A, B, C, and E can easily be used in clause heads, - and E is the only one that is easy to use with proplist. - After much scratching of the head and floundering around, - E does it. - - We might consider adding an 'object' option: - - {object,tuple} representation A - {object,pair} representation B. - {object,wrap} representation C. - {object,list} representation E. - - For conversion from Erlang to JSON, - - {T1,...,Tn} 0 or more tuples - {object,L} size 2, 1st element atom, 2nd list - {L} size 1, only element a list - - are all recognisable, so term_to_json/[1,2] could accept - all of them without requiring an option. - - There is a long term reason why we want some such option. - Both lists and tuples are just WRONG. The right data structure to - represent JSON "objects" is the one that I call "frames" and Joe - Armstrong calls "proper structs". At some point in the future we - will definitely want to have {object,frame} as a possibility. - - Suppose you are receiving JSON data from a source that does - not distinguish between integers and floating point numbers? - Perl, for example, or even more obviously, Javascript itself. - In that case some floating point numbers may have been written - in integer style more or less accidentally. In such a case, you - may want all the numbers in a JSON form converted to Erlang - floats. {float,true} was provided for that purpose. - - The corresponding mapping from Erlang to JSON is - - - atom => itself if it is null, false, or true - => error otherwise - - number => itself; use full precision for floats, - and always include a decimal point or exponent - in a float - - binary => if the binary is a well formed UTF-8 encoding - of some string, that string - => error otherwise - - tuple => if all elements are {Key,Value} pairs with - non-equivalent keys, then a JSON "object", - => error otherwise - - list => if it is proper, itself as a sequence - => error otherwise - - otherwise, an error - - There is an issue here with keys. The RFC says that "The names - within an object SHOULD be unique." In the spirit of "be - generous in what you accept, strict in what you generate", we - really ought to check that. The only time term_to_json/[1,2] - terminate successfully should be when the output is absolutely - perfect JSON. I did toy with the idea of an option to allow - duplicate labels, but if I want to send such non-standard data, - who can I send it to? Another Erlang program? Then I would be - better to use external binary format. So the only options now - allowed are ones to affect white space. One might add an - option later to specify the order of key:value pairs somehow, - but options that do not affect the semantics are appropriate. - - On second thoughts, look at the JSON-RPC 1.1 draft. - It says - "Client implementations SHOULD strive to order the members of - the Procedure Call object such that the server is able to - employ a streaming strategy to process the contents. At the - very least, a client SHOULD ensure that the version member - appears first and the params member last." - Reference [4], section 6.2.4 "Member Sequence". - This means that for conformity with JSON-RPC, - term_to_json([{version,<<"1.1">>}, - {method, <<"sum">>}, - {params, [17,25]}]) - should not re-order the pairs. Hence the current specification - says the order is preserved and does not provide any means for - re-ordering. If you want a standard order, program it outside. - - How should the "duplicate label" error be reported? There are two - ways to report such errors in Erlang: raise 'badarg' exceptions, - or return either {ok,Result} or {error,Reason} answers. I'm - really not at all sure what to do here. I ended up with 'raise - badarg' because that's what things like binary_to_term/1 do. - - At the moment, I specify that the Erlang terms use UTF-8 and only - UTF-8. This is by far the simplest possibility. However, we - could certainly add - {internal,Encoding} - options to say what Encoding to use or assume for binaries. The - time to add that, I think, is when there is a demonstrated need. - - There are five "round trip" issues left: - - - all information about white space is lost. - This is not a problem, because it has no significance. - - - decimal->binary->decimal conversion of floating point numbers - may introduce error unless techniques like those described in - the Scheme report are used to do these conversions with high - accuracy. This is a general problem for Erlang, and a general - problem for JSON. - - - there is another JSON library for Erlang that always converts - integers outside the 32-bit range to floating point. This seems - like a bad idea. There are languages (Scheme, Common Lisp, - SWI Prolog, Smalltalk) with JSON libraries that have bignums. - Why put an arbitrary restriction on our ability to communication - with them? Any JSON implementation that is unable to cope with - large integers as integers is (or should be) perfectly able to - convert such numbers to floating-point for itself. It seems - specially silly to do this when you consider that the program on - the other end might itself be in Erlang. So we expect that if T - is of type json(binary(),integer()) then +The mapping from JSON to Erlang is described below in this +section. An argument that is not a well formed IO_Data, +or that cannot be decoded, or that when decoded does not +follow the rules of JSON syntax, results in a badarg +exception. [It would be nice if there were Erlang-wide +conventions for distinguishing these cases.] - json_to_term(term_to_json(T), [{label,binary}]) - should be identical to T, up to re-ordering of attribute pairs. - - - conversion of a string to a binary and then a binary to a - string will not always yield the same representation, but - what you get will represent the same string. Example, - "\0041" will read as <<65>> which will display as "A". - - - Technically speaking the Unicode "surrogates" are not - characters. The RFC allows characters outside the Basic - Multilingual Plane to be written as UTF-8 sequences, or - to be written as 12-character \uHIGH\uLOWW surrogate pair - escapes. Something with a bare \uHIGH or \uLOWW surrogate - code point is not, technically speaking, a legal Unicode - string, so a UTF-8 sequence for such a code point should - not appear. A \uHIGH or \uLOWW escape sequence on its own - should not appear either; it would be just as much of a - syntax error as a byte with value 255 in a UTF-8 sequence. - We actually have two problems: - - (a) Some languages may be sloppy and may allow singleton - surrogates inside strings. Should Erlang be equally - sloppy? Should this just be allowed? - - (b) Some languages (and yes, I do mean Java) don't really - do UTF-8, but instead first break a sequence of Unicode - characters into 16-bit chunks (UTF-16) and then encode - the chunks as UTF-8, producing what is quite definitely - illegal UTF-8. Since there is a lot of Java code in the - world, how do we deal with this? - - Be generous in what you accept: the 'utf8' decoder - should quietly accept "UTF-Java", converting - separately encoded surrogates to a single numeric - code, and converting singleton surrogates _as if_ they - were characters. - - Be strict in what you generate: never generate - UTF-Java when the requested encoding is 'utf8'; - have a separate 'java' encoding that can be requested - instead. - - Hynek Vychodil is vehement that the only acceptable way to handle - JSON labels is as binaries. His argument against {label,atom} is - sound: as noted above, that option is only usable within a trust - boundary. His argument against {label,existing_atom} is that if - you convert a JSON form at one time in one node, and then store - the Erlang term in a file or send it across a wire or in any - other way make it available at another node or another time, - then it won't match the same JSON form converted at that time in - that node. This is true, but there are plenty of other round - trip issues as well. Data converted using {float,true} will not - match data converted using {float,false}. The handling of - duplicate labels may vary. The order of {key,value} pairs is - particularly likely to vary. For all programming languages and - libraries, if you want to move JSON data around in time or - space, the _only_ reliable way to do that is to move it _as_ - (possibly compressed) JSON data, not as something else. You - can expect a JSON form read at one time/place to be equivalent - to the same form read at another time/place; you cannot expect - it to be identical. Any code that does is essentially buggy, - whether {label,existing_atom} is used or not. Here is an - example that shows that the problem is ineradicable. - - Suppose we have the JSON form - "[0.123456789123456789123456789123456]". - Two Erlang nodes on different machines read this and - convert it to an Erlang term. One of them sends its term to - the other, which compares them. To its astonishment, they - are not identical! Why? Well, it could be that they use - different floating-point precisions. On one of Erlang's main - platforms, 128-bit floats are supported. (The example needs - 128 bits.) On its other main platform, 80-bit floats are - supported. (In neither case am I saying that Erlang does, - only that the hardware does.) Indeed, modern versions of the - second platform usually work with 64-bit floats. Let us - suppose that they both stick with 64-bit floats instead. - What if one of the systems is an IBM/370 with its non-IEEE - doubles? So suppose they are both using IEEE 64-bit floats. - They will use different C libraries to do the initial - decimal-to-binary conversion, so the number may be rounded - differently. And if one is Windows and another is Linux or - Solaris, they WILL use different libraries. Should Erlang - use its own code (which might not be a bad idea), we would - still have trouble talking to machines with non-IEEE doubles, - which are still in use. Even Java, which originally wanted - to have bit-identical results everywhere, eventually retreated. - - There is one important issue for JSON generation, and that is - what white space should be generated. Since JSON is supposed to - be "human readable", it would be nice if it could be indented, - and if it could be kept to a reasonable line width. However, - appearances to the contrary, JSON has to be regard as a binary - format. There is no way to insert line breaks inside strings. - Javascript doesn't have any analogue of C's - continuation; it can always join the pieces with '+'. JSON has - inherited the lack (no line continuation) but not the remedy - (you may not use '+' in JSON). So a JSON form containing a - 1000-character string cannot be fitted into 80-column lines; - it just cannot be done. - - The main thing I have not accounted for is the {label,_}. - option of json_to_term/2. For normal Erlang purposes, it is - much nicer (and somewhat more efficient) to deal with - - [{name,<<"fred">>},{female,false},{age,65}] - - than with - - [{<<"name">>,<<"fred">>},{<<"female">>,false},{<<"age">>,65}] - - If you are communicating with a trusted source that deals with - a known small number of labels, fine. There are limits on the - number of atoms Erlang can deal with. A small test program - that looped creating atoms and putting them into a list ticked - over happily until shortly after its millionth atom, and then - hung there burning cycles apparently getting nowhere. Also, - the atom table is shared by all processes on an Erlang node, - so garbage collecting it is not as cheap as it might be. As - a system integrity measure, therefore, it is useful to have a - mode of operation in which json_to_term never creates atoms. - But Erlang offers a third possibility: there is a built-in - list_to_existing_atom/1 function that returns an atom only if - that atom already exists. Otherwise it raises an exception. - So there are three cases: - - {label,binary} - - Always convert labels to binaries. - This is always safe and always clumsy. - Since <<"xxx">> syntax exists in Erlang, - it isn't _that_ clumsy. It is uniform, - and stable, in that it does not depend - on whether Erlang atoms support Unicode or - not, or what other modules have been loaded. - - {label,atom} - - Always convert labels to atoms if all their - characters are allowed in atoms, leave them - as binaries otherwise. - - This is more convenient for Erlang programming. - However, it is only really usable with a partner - that you trust. Since much communication takes - place within trust boundaries, it definitely has - a place. If this were not so, term_to_binary/1 - would be of no use! - - {label,existing_atom} - - Convert labels that match the names of existing - atoms to those atoms, leave all others as binaries. - If a module mentions an atom, and goes looking for - that atom as a key, it will find it. This is safe - _and_ convenient. The only real issue with it is - that the same JSON term converted at different times - (in the same Erlang node) may be converted differently. - This usually won't matter. - - In previous drafts I selected 'existing_atom' as the default, - because that's the option I like best. It's the one that would - most simplify the code that I would like to write. However, one - must also consider conversion issues. Some well considered - existing JSON libraries for Erlang always use binaries. - - There is no {string,XXX} option. That's because I see the - strings in JSON as "payload", as unpredictable data that are - being transmitted, that one does not _expect_ to match against. - This is in marked contrast with labels, which are "structure" - rather than data, and which one expects to match against a lot. - I did briefly consider a {string,list|binary} option, but these - days Erlang is so good at matching binaries that there didn't - seem to be much point. - - This raises a general issue about binaries. One of the reasons - for liking atoms as labels is that atoms are stored uniquely, - and binaries are not. This extends to term_to_binary(), which - compresses repeated references to identical atoms, but not - repeated references to equal binaries. There is no reason that - a C implementation of json_to_term/[1,2] could not keep track - of which labels have been seen and share references to repeated - ones. For example, - [{"name":"root","command":"java","cpu":75.7}, - {"name":"ok","command":"iropt","cpu":1.5} - ] - -- extracted from a run of the 'top' command showing that my - C compilation was getting a tiny fraction of the machine, - while some Java program run by root was getting the lion's share -- - would convert to Erlang as the equivalent of - N = <<"name">>, - M = <<"command">>, - P = <<"cpu">>, - [[{N,<<"root">>},{M,<<"java">>}, {P,75.7}], - [{N,<<"ok">>}, {M,<<"iropt">>},{P, 1.5}] - ] - getting much of the space saving that atoms would use. There is - of course no way for a pure Erlang program to detect whether such - sharing is happening or not. It would be nice if - term_to_binary(json_to_term(JSON)) - preserved such sharing. - - Another issue that has been raised concerns encoding. Some people - have said that they would like (a) to allow input encodings other - than UTF-8, (b) to have strings reported in their original - encoding, rather than UTF-8, so that (c) strings can be slices of - the original binary. What does the JSON specification actually - say? Section 3, Encoding: - - "JSON text SHALL be encoded in Unicode. - The default encoding is UTF-8." - - This is not quite as clear as it might be. There is explicit - mention of UTF-32 and UTF-16 (both of them in big- and little- - endian forms). But is SCSU "Unicode"? Is BOCU? How about - UTF-EBCDIC [5]? That's right, there is a legal way to encode - something in "Unicode" in which the JSON special characters - []{},:\" do not have their ASCII values. There does not seem - to be any reason to suppose that this is forbidden, and on an - IBM mainframe I would expect it to be useful. Until the day - someone ports Erlang to a z/Series machine, this is mainly of - academic interest, but we don't want to paint ourselves into - any corners. - - Suppose we did represent strings in their native encoding. - What then? First, a string that contained an escape sequence - of any kind could not be held as a slice of the source anyway. - Nor could a string that spanned two or more chunks of the - IO_Data input. The really big problem is that there would be - no indication of what the encoding actually was, so that we - would end up regarding logically equal strings from different - sources as unequal and logically unequal strings as equal. - - I do not want to forbid strings in the result being slices of - an original binary. In the common case when the input is - UTF-8 and the string does not contain any escapes, so that it - _can_ be done, an implementation should definitely be free to - exploit that. As this EEP currently stands, it is. What we - cannot do is to _require_ such sharing, because it generally - won't work. - - It has been suggested to me that it might be better for the - result of term_to_json/[1,2] to be iodata() rather than a - binary(). Anything that would have accepted iodata() will be - happy with a binary(), so the question is whether it is better - for the implementation, whether perhaps there are chunks of stuff - that have to be copied using a binary() but can be shared using - iodata(). Thanks to the encoding issue, I don't really think so. - This might be a good time to point out why the encoding is done - here rather than somewhere else. If you know that you are - generating stuff that will be encoded into character set X, then - you can avoid generating characters that are not in that - character set. You can generate \u sequences instead. Of course - JSON itself requires UTF-8, but what if you are going to send it - through some other transport? With {encoding,ascii} you are out - of trouble all the way. So for now I am sticking with binary(). - - The final issue is whether these functions should go in the - erlang: module or in some other module (perhaps called json:). - - - If another module, then there is no barrier to adding other - functions. For example, we might offer functions to test - whether a term is a JSON term, or an IO_Data represents a JSON - term, or alternative functions that present results in some - canonical form. - - If another module, then someone looking for a JSON module might - find one. + erlang:term_to_json(JSON) -> binary() + erlang:term_to_json(JSON, Option_List) -> Binary() + +Types: + + JSON = json() + Option_List = [Option] + Option = {encoding,atom()} + | {space,int()} + | space + | {indent,int()} + | indent + +This is a function for producing portable JSON. +It is not intended as a means for encoding arbitrary Erlang +terms. Terms that do not fit into the mapping scheme +described below in this section result in a badarg exception. +The JSON RFC says that "The names within an object SHOULD be +unique." JSON terms that violate this should also result in +a badarg exception. + +`term_to_json(X)` is equivalent to `term_to_json(X, [])`. + +Converting Erlang terms to JSON results in a (logical) +character sequence, which is encoded as a sequence of +bytes, which is returned as a binary. The default encoding +is UTF-8; this may be overridden by the encoding option. +Any encoding supported elsewhere in Erlang should be +supported here. + +There are two options for controlling white space. +By default, none is generated. + +`{space,N}`, where N is a non-negative integer, says to +add N spaces after each colon and comma. +'space' is equivalent to `{space,1}`. +No other space is ever inserted. + +`{indent,N}`, where N is a non-negative integer, says +to add a line break and some indentation after each +comma. The indentation is N spaces for each enclosing +[] or {}. Note that this still does not result in any +other spaces being added; in particular ] and } will +not appear at the beginning of lines. +'indent' is equivalent to `{indent,1}`. + +Other options may be added in the future. + + + +### Converting JSON to Erlang ### + +The keywords 'null', 'false', and 'true' are converted to the +corresponding Erlang atoms. No other complete JSON forms +are converted to atoms. + +A number is converted to an Erlang float if + +- it contains a decimal point, or +- it contains an exponent, or +- it is a negative zero, or +- the option {float,true} was passed. + +A JSON number that looks like an integer other than -0 +will be converted to an Erlang integer unless `{float,true}` +was provided. + +When occurring as a label in an "object", a string may on +explicit request be converted to an Erlang atom, if possible. +Otherwise, a string is converted to a UTF-8-encoded binary, +whatever the encoding used by the data source. +An empty string is converted to an empty binary. + +A sequence is converted to an Erlang list. The elements have +the same order in the list as in the original sequence. + +A non-empty "object" is converted to a list of {Key,Value} +pairs suitable for processing with the 'proplists' module. +Note that proplists: does not require that keys be atoms. +An "object" with no key:value pairs is converted to +the list `[{}]`, preserving the invariant that an object +is always represented by a non-empty list of tuples. +The proplists: module will correctly view `[{}]` as holding +no keys. + +Keys in the JSON form are always strings. A Key is converted +to an Erlang atom if and only if + +- `{label,atom}` was specified or + `{label,existing_atom}` was specified and a suitable atom + already existed; and +- every character in the JSON string can be held in an atom. + +Currently, only names made of Latin-1 characters can be turned +into atoms. Empty keys, "", are converted to empty atoms ''. +Keys are otherwise converted to binaries, using the UTF-8 +encoding, whatever the original encoding was. + +This means that if you read and convert a JSON term now, +and save the binary somewhere, then read and convert it in +a later fully-Unicode Erlang, you will find the +representations different. However, the order of the pairs +in a JSON "object" has no significance, and an implementation +of this specification is free to report them in any order it +likes (as given, reversed, sorted, sorted by some hash, you +name it). Within any particular Erlang version, this +conversion is a pure function, but different Erlang releases +may change the order of pairs, so you cannot expect exactly +the same term from release to release anyway. + +See the rationale for reasons why we do not convert to +a canonical form, for example by sorting. + +In the spirit of "be generous in what you accept, strict in +what you produce", it might be a good idea to accept unquoted +labels in the input. You can't accept just any old junk, +but allowing [Javascript][8] IdentifierNames would make sense. + + IdentifierName = IdentifierStart IdentifierPart*. + IdentifierStart = UnicodeLetter | '$' | '_' | + '\u' HexDigit*4 + IdentifierPart = IdentifierStart | UnicodeCombiningMark | + UnicodeDigit | UnicodeConnectorPunctuation + +There are apparently JSON generators out there that do this, +so it would add value, but it is not _required_. + + + +### Converting Erlang to JSON ### + +The atoms 'null', 'false', and 'true' are converted to the +corresponding JSON keywords. No other Erlang atoms are +allowed. + +An Erlang integer is converted to a JSON integer. +An Erlang float is converted to a JSON float, as precisely +as practical. An Erlang float which has an integral value +is written in such a way that it will read back as a float; +suitable methods include suffixing ".0" or "e0". + +An Erlang binary that is the UTF-8 representation of some +Unicode string is converted to a string. No other binaries +are allowed. + +An Erlang list all of whose elements are tuples is converted +to a JSON "object". If the list is `[{}]` it is converted to +"{}", otherwise all the tuples must have two elements and +the first must be an atom or binary; other tuples are not +allowed. For each `{Key,Value}` pair, the key must be an atom +or a binary that is the UTF-8 representation of some Unicode +string; the key is converted to a JSON string. The value must +be a JSON term. The order of the key:value pairs in the +output is the same as the order of the `{Key,Value}` pairs +in the list. A list with two equivalent keys is not allowed. +Two binaries, or two atoms, are equivalent iff they are equal. +An atom and a binary are equivalent if they would convert to +the same JSON string. + +Erlang tuples are not allowed except as elements of lists +that will be converted to JSON "objects". +No other tuples are allowed. + +An Erlang proper list whose elements are not tuples is +converted to a JSON sequence by converting its elements in +natural order. + +An improper list is not allowed. + +Other Erlang terms are not allowed. If you want to "tunnel" +other Erlang terms through JSON, fine, but it is entirely up +to you to do whatever conversion you want. + - - If another module, then this interface can easily be prototyped - without any modification to the core Erlang system. - - If another module, then someone who doesn't need this feature - need not load it. +Motivation +========== + +As Joe Armstrong put it in his message, +"JSON seems to be ubiquitous". +It should not only be supported, it should be supported +simply, efficiently, and reliably. + +As noted above, http://www.ietf.org/rfc/rfc4627.txt +defines an application/json Media Type that Erlang +should be able to handle "out of the box". - Conversely, - - If another module, then it is too easy to bloat the interface. - We don't _need_ such testing functions, as we can always catch - the badarg exception from the existing ones. We don't _need_ - extra canonicalising functions, because we can add options to - the existing ones. Something that subtly encourages us to - keep the number of functions down is a Good Thing. - - Every Erlang programmer ought to be familiar with the erlang: - module, and when looking for any feature, ought to start by - looking there. +Rationale +========= + +The very first question is whether the interface should be a +"value" interface (where a chunk of data is converted to an +Erlang term in one go) or an "event stream" interface, like +the classical ESIS interface offered by SGML parsers, for +some arcane reason known as SAX these days. + +There is room in the world for both kinds of interface. +This one is a "value" interface, which is best suited to +modest quantities of JSON data, less than a few megabytes say, +where the latency of waiting for the whole form before +processing any of it is not a problem. Someone else might +want to write an "event stream" EEP. + +Related to this issue, a JSON text must be an array or an object, +not, for example, a bare number. Or so says the JSON RFC. I do +not know whether all JSON libraries enforce this. Since a JSON +text must be [something] or {something}, JSON texts are self- +delimiting, and it makes sense to consume them one at a time from +a stream. Should that be part of this interface? Maybe, maybe +not. I note that you can separate parsing + +- skip leading white space +- check for '[' or '{' +- keep on accumulating characters until you find a + matching ']' or '}', ignoring characters inside "". + +from conversion. So I have separated them. This proposal only +addresses conversion. An extension should address parsing. It +might work better to have that as part of an event stream EEP. + +Let's consider conversion then. Round trip conversion fidelity +(X -> Y -> X should be an identity function) is always nice. Can +we have it? + +JSON has + +- null +- false +- true +- number (integers, floats, and ratios are not distinguished) +- string +- sequence (called array) +- record (called object) + +Erlang has + +- atom +- number (integers and floats are distinguished) +- binary +- list +- tuple +- pid +- port +- reference +- fun + +More precisely, JSON syntax DOES make integers distinguishable +from floats; it is Javascript (when JSON is used with Javascript) +that fails to distinguish them. Since we would like to use JSON +to exchange data between Erlang, Common Lisp, Scheme, Smalltalk, +and above all Python, all of which have such a distinction, it is +fortunate that JSON syntax and the RFC allow the distinction. + +Clearly, Erlang->JSON->Erlang is going to be tricky. To take +just one minor point, neither www.json.org nor RFC 4627 makes +an promises whatever about the range of numbers that can be +passed through JSON. There isn't even any minimum range. It +seems as though a JSON implementation could reject all numbers +other than 0 as too large and still conform! This is stupid. +We can PROBABLY rely on IEEE doubles; we almost certainly cannot +expect to get large integers through JSON. + +Converting pids, ports, and references to textual form using +`pid_to_list/1`, `erlang:port_to_list/1`, and `erlang:ref_to_list/1` +is possible. A built in function can certainly convert back +from textual form if we want it to. The problem is telling these +strings from other strings: when is "<0.43.0>" a pid and when is +it a string? As for funs, let's not go there. + +Basically, converting Erlang terms to JSON so that they can be +reconstructed as the same (or very similar) Erlang terms would +involve something like this: + + atom -> string + number -> number + binary -> {"type":"binary", "data":[]} + list -> , if it's a proper list + list -> {"type":"dotted", "data":, "end":} + tuple -> {"type":"tuple", "data":} + pid -> {"type":"pid", "data":} + port -> {"type":"port", "data":} + ref -> {"type":"ref", "data":} + fun -> {"module":, "name":, "arity":} + fun -> we're pushing things a bit for anything else. + +This is not part of the specification because I am not proposing +JSON as a representation for arbitrary Erlang data. I am making +the point that we COULD represent (most) Erlang data in JSON if +we really wanted to, but it is not an easy or natural fit. For +that we have Erlang binary format and we have UBF. To repeat, +we have no reason to believe that a JSON->JSON copier that works +by decoding JSON to an internal form and recoding it for output +will preserve Erlang terms, even encoded like this. + +No, the point of JSON support in Erlang is to let Erlang programs +deal with the JSON data that other people are sending around the +net, and to send JSON data to other programs (like scripts in Web +browsers) that are expecting plain old JSON. The round trip +conversion we need to care about is JSON -> Erlang -> JSON. + +Here too we run into problems. The obvious way to represent +{"a":A, "b":B} in Erlang is `[{'a',A},{'b',B}]`, and the obvious +way to represent a string is as a list of characters. But in +JSON, an empty list, an empty "object", and an empty string are +all clearly distinct, so must be translated to different Erlang +terms. Bearing this in mind, here's a first cut at mapping +JSON to Erlang: + + - null => the atom 'null' + - false => the atom 'false' + - true => the atom 'true' + - number => a float if there is a decimal point or exponent, + => the float -0.0 if it is a minus sign followed by + one or more zeros, with or without a decimal point + or exponent + => an integer otherwise + - string => a UTF-8-encoded binary + - sequence => a list + - object => a list of {Key,Value} pairs + => the empty tuple {} for an empty {} object + +Since Erlang does not currently allow the full range of +Unicode characters in an atom, a Key should be an atom if +each character of a label fits in Latin 1, or a binary if +it does not. + +Let's examine "objects" a little more closely. Erlang +programmers are used to working with lists of {Key,Value} +pairs. The standard library even include orddict, which +works with just such lists (although they must be sorted). +However, there is something distasteful about having empty +objects convert to empty tuples, but non-empty objects to +empty lists, and there is also something distasteful about +lists converting to sequence or objects depending on what +is inside them. What is distasteful here has something to +do with TYPES. Erlang doesn't have static types, but that +does not mean that types are not useful as a design tool, +or that something resembling type consistency is not useful +to people. The fact that Erlang tuples happen to use curly +braces is just icing on the cake. The first draft of this +EEP used lists; that was entirely R.A.O'K's own work. It +was then brought to his attention that Joe Armstrong thought +converting "objects" to tuples was the right thing to do. +So the next draft did that. Then other alternatives were +brought up. I'm currently aware of + +- Objects are tuples + * A. `{{K1,V1}, ..., {Kn,Vn}}`. + This is the result of `list_to_tuple/1` applied to a + proplist. There are no library functions to deal + with such things, but they are unambiguous and + relatively space-efficient. + * B. `{object,[{K1,V1}, ..., {Kn,Vn}]}` + This is a proplist wrapped in a tuple purely to + distinguish it from other lists. This offers + simple type testing (objects are tuples) and simple + field processing (they contain proplists). + There seems to be no consensus for what the tag + should be, 'obj' (gratuitous abbreviation), 'json' + (but even the numbers binaries and lists are JSON), + 'object' seems to be least objectionable. + * C. `{[{K1,V1},...,{Kn,Vn}]}` + Like B, but there isn't any need for a tag. + + A and B are due to Joe Armstrong; I cannot recall who + thought of C. It has recently had supporters. + +- Objects are lists + * D. Empty objects are `{}`. + This was my original proposal. Simple but non-uniform + and clumsy. + * E. Empty objects are `[{}]`. + This came from the Erlang mailing list; I have forgotten + who proposed it. It's brilliant: objects are always + lists of tuples. + * F. Empty objects are 'empty'. + Like A but a tiny fraction more space-efficient. + +We can demonstrate handling "objects" in each of these forms: + + json:is_object(X) -> is_tuple(X). % A + + json:is_object({object,X}) -> is_list(X). % B + + json:is_object({X}) -> is_list(X). % C + + json:is_object({}) -> true; % D + json:is_object([{_,_}|_]) -> true; + json:is_object(_) -> false. + + json:is_object([X|_]) -> is_tuple(X). % E + + json:is_object(empty) -> true; % F + json:is_object([{_,_}|_]) -> true; + json:is_object(_) -> false. + +Of these, A, B, C, and E can easily be used in clause heads, +and E is the only one that is easy to use with proplist. +After much scratching of the head and floundering around, +E does it. + +We might consider adding an 'object' option: + + {object,tuple} representation A + {object,pair} representation B. + {object,wrap} representation C. + {object,list} representation E. + +For conversion from Erlang to JSON, + + {T1,...,Tn} 0 or more tuples + {object,L} size 2, 1st element atom, 2nd list + {L} size 1, only element a list + +are all recognisable, so `term_to_json/[1,2]` could accept +all of them without requiring an option. + +There is a long term reason why we want some such option. +Both lists and tuples are just WRONG. The right data structure to +represent JSON "objects" is the one that I call "frames" and Joe +Armstrong calls "proper structs". At some point in the future we +will definitely want to have `{object,frame}` as a possibility. + +Suppose you are receiving JSON data from a source that does +not distinguish between integers and floating point numbers? +Perl, for example, or even more obviously, Javascript itself. +In that case some floating point numbers may have been written +in integer style more or less accidentally. In such a case, you +may want all the numbers in a JSON form converted to Erlang +floats. `{float,true}` was provided for that purpose. + +The corresponding mapping from Erlang to JSON is + + - atom => itself if it is null, false, or true + => error otherwise + - number => itself; use full precision for floats, + and always include a decimal point or exponent + in a float + - binary => if the binary is a well formed UTF-8 encoding + of some string, that string + => error otherwise + - tuple => if all elements are {Key,Value} pairs with + non-equivalent keys, then a JSON "object", + => error otherwise + - list => if it is proper, itself as a sequence + => error otherwise + - otherwise, an error + +There is an issue here with keys. The RFC says that "The names +within an object SHOULD be unique." In the spirit of "be +generous in what you accept, strict in what you generate", we +really ought to check that. The only time `term_to_json/[1,2]` +terminate successfully should be when the output is absolutely +perfect JSON. I did toy with the idea of an option to allow +duplicate labels, but if I want to send such non-standard data, +who can I send it to? Another Erlang program? Then I would be +better to use external binary format. So the only options now +allowed are ones to affect white space. One might add an +option later to specify the order of key:value pairs somehow, +but options that do not affect the semantics are appropriate. + +On second thoughts, look at the [JSON-RPC 1.1 draft][4]. +It says, in section 6.2.4 "Member Sequence": +> Client implementations SHOULD strive to order the members of +> the Procedure Call object such that the server is able to +> employ a streaming strategy to process the contents. At the +> very least, a client SHOULD ensure that the version member +> appears first and the params member last. +This means that for conformity with JSON-RPC, + + term_to_json([{version,<<"1.1">>}, + {method, <<"sum">>}, + {params, [17,25]}]) + +should not re-order the pairs. Hence the current specification +says the order is preserved and does not provide any means for +re-ordering. If you want a standard order, program it outside. + +How should the "duplicate label" error be reported? There are two +ways to report such errors in Erlang: raise 'badarg' exceptions, +or return either `{ok,Result}` or `{error,Reason}` answers. I'm +really not at all sure what to do here. I ended up with 'raise +badarg' because that's what things like `binary_to_term/1` do. + +At the moment, I specify that the Erlang terms use UTF-8 and only +UTF-8. This is by far the simplest possibility. However, we +could certainly add + + {internal,Encoding} + +options to say what Encoding to use or assume for binaries. The +time to add that, I think, is when there is a demonstrated need. + +There are five "round trip" issues left: + +- all information about white space is lost. + This is not a problem, because it has no significance. + +- decimal->binary->decimal conversion of floating point numbers + may introduce error unless techniques like those described in + the Scheme report are used to do these conversions with high + accuracy. This is a general problem for Erlang, and a general + problem for JSON. + +- there is another JSON library for Erlang that always converts + integers outside the 32-bit range to floating point. This seems + like a bad idea. There are languages (Scheme, Common Lisp, + SWI Prolog, Smalltalk) with JSON libraries that have bignums. + Why put an arbitrary restriction on our ability to communication + with them? Any JSON implementation that is unable to cope with + large integers as integers is (or should be) perfectly able to + convert such numbers to floating-point for itself. It seems + specially silly to do this when you consider that the program on + the other end might itself be in Erlang. So we expect that if T + is of type `json(binary(),integer())` then + + json_to_term(term_to_json(T), [{label,binary}]) + + should be identical to T, up to re-ordering of attribute pairs. + +- conversion of a string to a binary and then a binary to a + string will not always yield the same representation, but + what you get will represent the same string. Example, + "\0041" will read as `<<65>>` which will display as "A". + +- Technically speaking the Unicode "surrogates" are not + characters. The RFC allows characters outside the Basic + Multilingual Plane to be written as UTF-8 sequences, or + to be written as 12-character \uHIGH\uLOWW surrogate pair + escapes. Something with a bare \uHIGH or \uLOWW surrogate + code point is not, technically speaking, a legal Unicode + string, so a UTF-8 sequence for such a code point should + not appear. A \uHIGH or \uLOWW escape sequence on its own + should not appear either; it would be just as much of a + syntax error as a byte with value 255 in a UTF-8 sequence. + We actually have two problems: + + * (a) Some languages may be sloppy and may allow singleton + surrogates inside strings. Should Erlang be equally + sloppy? Should this just be allowed? + + * (b) Some languages (and yes, I do mean Java) don't really + do UTF-8, but instead first break a sequence of Unicode + characters into 16-bit chunks (UTF-16) and then encode + the chunks as UTF-8, producing what is quite definitely + illegal UTF-8. Since there is a lot of Java code in the + world, how do we deal with this? + + Be generous in what you accept: the 'utf8' decoder + should quietly accept "UTF-Java", converting + separately encoded surrogates to a single numeric + code, and converting singleton surrogates _as if_ they + were characters. + + Be strict in what you generate: never generate + UTF-Java when the requested encoding is 'utf8'; + have a separate 'java' encoding that can be requested + instead. + +Hynek Vychodil is vehement that the only acceptable way to handle +JSON labels is as binaries. His argument against `{label,atom}` is +sound: as noted above, that option is only usable within a trust +boundary. His argument against `{label,existing_atom}` is that if +you convert a JSON form at one time in one node, and then store +the Erlang term in a file or send it across a wire or in any +other way make it available at another node or another time, +then it won't match the same JSON form converted at that time in +that node. This is true, but there are plenty of other round +trip issues as well. Data converted using `{float,true}` will not +match data converted using `{float,false}`. The handling of +duplicate labels may vary. The order of {key,value} pairs is +particularly likely to vary. For all programming languages and +libraries, if you want to move JSON data around in time or +space, the _only_ reliable way to do that is to move it _as_ +(possibly compressed) JSON data, not as something else. You +can expect a JSON form read at one time/place to be equivalent +to the same form read at another time/place; you cannot expect +it to be identical. Any code that does is essentially buggy, +whether `{label,existing_atom}` is used or not. Here is an +example that shows that the problem is ineradicable. + +Suppose we have the JSON form +"[0.123456789123456789123456789123456]". +Two Erlang nodes on different machines read this and +convert it to an Erlang term. One of them sends its term to +the other, which compares them. To its astonishment, they +are not identical! Why? Well, it could be that they use +different floating-point precisions. On one of Erlang's main +platforms, 128-bit floats are supported. (The example needs +128 bits.) On its other main platform, 80-bit floats are +supported. (In neither case am I saying that Erlang does, +only that the hardware does.) Indeed, modern versions of the +second platform usually work with 64-bit floats. Let us +suppose that they both stick with 64-bit floats instead. +What if one of the systems is an IBM/370 with its non-IEEE +doubles? So suppose they are both using IEEE 64-bit floats. +They will use different C libraries to do the initial +decimal-to-binary conversion, so the number may be rounded +differently. And if one is Windows and another is Linux or +Solaris, they WILL use different libraries. Should Erlang +use its own code (which might not be a bad idea), we would +still have trouble talking to machines with non-IEEE doubles, +which are still in use. Even Java, which originally wanted +to have bit-identical results everywhere, eventually retreated. + +There is one important issue for JSON generation, and that is +what white space should be generated. Since JSON is supposed to +be "human readable", it would be nice if it could be indented, +and if it could be kept to a reasonable line width. However, +appearances to the contrary, JSON has to be regard as a binary +format. There is no way to insert line breaks inside strings. +Javascript doesn't have any analogue of C's +continuation; it can always join the pieces with '+'. JSON has +inherited the lack (no line continuation) but not the remedy +(you may not use '+' in JSON). So a JSON form containing a +1000-character string cannot be fitted into 80-column lines; +it just cannot be done. + +The main thing I have not accounted for is the `{label,_}`. +option of `json_to_term/2`. For normal Erlang purposes, it is +much nicer (and somewhat more efficient) to deal with + + [{name,<<"fred">>},{female,false},{age,65}] + +than with + + [{<<"name">>,<<"fred">>},{<<"female">>,false},{<<"age">>,65}] + +If you are communicating with a trusted source that deals with +a known small number of labels, fine. There are limits on the +number of atoms Erlang can deal with. A small test program +that looped creating atoms and putting them into a list ticked +over happily until shortly after its millionth atom, and then +hung there burning cycles apparently getting nowhere. Also, +the atom table is shared by all processes on an Erlang node, +so garbage collecting it is not as cheap as it might be. As +a system integrity measure, therefore, it is useful to have a +mode of operation in which json_to_term never creates atoms. +But Erlang offers a third possibility: there is a built-in +`list_to_existing_atom/1` function that returns an atom only if +that atom already exists. Otherwise it raises an exception. +So there are three cases: + +* `{label,binary}` + + Always convert labels to binaries. + This is always safe and always clumsy. + Since <<"xxx">> syntax exists in Erlang, + it isn't _that_ clumsy. It is uniform, + and stable, in that it does not depend + on whether Erlang atoms support Unicode or + not, or what other modules have been loaded. + +* `{label,atom}` + + Always convert labels to atoms if all their + characters are allowed in atoms, leave them + as binaries otherwise. + + This is more convenient for Erlang programming. + However, it is only really usable with a partner + that you trust. Since much communication takes + place within trust boundaries, it definitely has + a place. If this were not so, term_to_binary/1 + would be of no use! + +* `{label,existing_atom}` + + Convert labels that match the names of existing + atoms to those atoms, leave all others as binaries. + If a module mentions an atom, and goes looking for + that atom as a key, it will find it. This is safe + _and_ convenient. The only real issue with it is + that the same JSON term converted at different times + (in the same Erlang node) may be converted differently. + This usually won't matter. + +In previous drafts I selected `existing_atom` as the default, +because that's the option I like best. It's the one that would +most simplify the code that I would like to write. However, one +must also consider conversion issues. Some well considered +existing JSON libraries for Erlang always use binaries. + +There is no `{string,XXX}` option. That's because I see the +strings in JSON as "payload", as unpredictable data that are +being transmitted, that one does not _expect_ to match against. +This is in marked contrast with labels, which are "structure" +rather than data, and which one expects to match against a lot. +I did briefly consider a `{string,list|binary}` option, but these +days Erlang is so good at matching binaries that there didn't +seem to be much point. + +This raises a general issue about binaries. One of the reasons +for liking atoms as labels is that atoms are stored uniquely, +and binaries are not. This extends to `term_to_binary()`, which +compresses repeated references to identical atoms, but not +repeated references to equal binaries. There is no reason that +a C implementation of `json_to_term/[1,2]` could not keep track +of which labels have been seen and share references to repeated +ones. For example, + + [{"name":"root","command":"java","cpu":75.7}, + {"name":"ok","command":"iropt","cpu":1.5} + ] + +-- extracted from a run of the 'top' command showing that my +C compilation was getting a tiny fraction of the machine, +while some Java program run by root was getting the lion's share -- +would convert to Erlang as the equivalent of + + N = <<"name">>, + M = <<"command">>, + P = <<"cpu">>, + [[{N,<<"root">>},{M,<<"java">>}, {P,75.7}], + [{N,<<"ok">>}, {M,<<"iropt">>},{P, 1.5}] + ] + +getting much of the space saving that atoms would use. There is +of course no way for a pure Erlang program to detect whether such +sharing is happening or not. It would be nice if + + term_to_binary(json_to_term(JSON)) + +preserved such sharing. + +Another issue that has been raised concerns encoding. Some people +have said that they would like (a) to allow input encodings other +than UTF-8, (b) to have strings reported in their original +encoding, rather than UTF-8, so that (c) strings can be slices of +the original binary. What does the JSON specification actually +say? Section 3, Encoding: + +> JSON text SHALL be encoded in Unicode. +> The default encoding is UTF-8. + +This is not quite as clear as it might be. There is explicit +mention of UTF-32 and UTF-16 (both of them in big- and little- +endian forms). But is SCSU "Unicode"? Is BOCU? How about +[UTF-EBCDIC][5]? That's right, there is a legal way to encode +something in "Unicode" in which the JSON special characters +[]{},:\" do not have their ASCII values. There does not seem +to be any reason to suppose that this is forbidden, and on an +IBM mainframe I would expect it to be useful. Until the day +someone ports Erlang to a z/Series machine, this is mainly of +academic interest, but we don't want to paint ourselves into +any corners. + +Suppose we did represent strings in their native encoding. +What then? First, a string that contained an escape sequence +of any kind could not be held as a slice of the source anyway. +Nor could a string that spanned two or more chunks of the +IO_Data input. The really big problem is that there would be +no indication of what the encoding actually was, so that we +would end up regarding logically equal strings from different +sources as unequal and logically unequal strings as equal. + +I do not want to forbid strings in the result being slices of +an original binary. In the common case when the input is +UTF-8 and the string does not contain any escapes, so that it +_can_ be done, an implementation should definitely be free to +exploit that. As this EEP currently stands, it is. What we +cannot do is to _require_ such sharing, because it generally +won't work. + +It has been suggested to me that it might be better for the +result of `term_to_json/[1,2]` to be `iodata()` rather than a +`binary()`. Anything that would have accepted `iodata()` will be +happy with a `binary()`, so the question is whether it is better +for the implementation, whether perhaps there are chunks of stuff +that have to be copied using a `binary()` but can be shared using +`iodata()`. Thanks to the encoding issue, I don't really think so. +This might be a good time to point out why the encoding is done +here rather than somewhere else. If you know that you are +generating stuff that will be encoded into character set X, then +you can avoid generating characters that are not in that +character set. You can generate \u sequences instead. Of course +JSON itself requires UTF-8, but what if you are going to send it +through some other transport? With `{encoding,ascii}` you are out +of trouble all the way. So for now I am sticking with `binary()`. + +The final issue is whether these functions should go in the +erlang: module or in some other module (perhaps called json:). + +- If another module, then there is no barrier to adding other + functions. For example, we might offer functions to test + whether a term is a JSON term, or an IO_Data represents a JSON + term, or alternative functions that present results in some + canonical form. + +- If another module, then someone looking for a JSON module might + find one. + +- If another module, then this interface can easily be prototyped + without any modification to the core Erlang system. + +- If another module, then someone who doesn't need this feature + need not load it. - - There are JSON implementations in Erlang already; we know what - it is like to use such a thing, and we only need to settle the - fine details of the implementation. We know that it can be - implemented. Now we want something that is always there and - always the same and is as efficient as practical. +Conversely, - - In particular, we know that the feature is useful, and we know - that in applications where it is used, it will be used often, - so we want it to go about as fast as term_to_binary/1 and - binary_to_term/1. So we'd really like it to be implemented in - C, ideally inside the emulator. Erlang does not make dynamic - loading of foreign code modules easy. +- If another module, then it is too easy to bloat the interface. + We don't _need_ such testing functions, as we can always catch + the badarg exception from the existing ones. We don't _need_ + extra canonicalising functions, because we can add options to + the existing ones. Something that subtly encourages us to + keep the number of functions down is a Good Thing. + +- Every Erlang programmer ought to be familiar with the erlang: + module, and when looking for any feature, ought to start by + looking there. + +- There are JSON implementations in Erlang already; we know what + it is like to use such a thing, and we only need to settle the + fine details of the implementation. We know that it can be + implemented. Now we want something that is always there and + always the same and is as efficient as practical. + +- In particular, we know that the feature is useful, and we know + that in applications where it is used, it will be used often, + so we want it to go about as fast as term_to_binary/1 and + `binary_to_term/1`. So we'd really like it to be implemented in + C, ideally inside the emulator. Erlang does not make dynamic + loading of foreign code modules easy. - It's a delicate balance. On the whole, I still think that putting - these functions in erlang: is a good idea, but more reasons on - both sides would be useful. +It's a delicate balance. On the whole, I still think that putting +these functions in erlang: is a good idea, but more reasons on +both sides would be useful. Backwards Compatibility +======================= - There are no term_to_json/N or json_to_term/N functions in - the erlang: module now, so adding them should not break - anything. These functions will NOT be automatically imported; - it will be necessary to use an explicit erlang: prefix. So - any existing code that uses these function names won't notice - any change. +There are no `term_to_json/N` or `json_to_term/N` functions in +the erlang: module now, so adding them should not break +anything. These functions will NOT be automatically imported; +it will be necessary to use an explicit erlang: prefix. So +any existing code that uses these function names won't notice +any change. Reference Implementation +======================== - None. +None. -References - - [1] The JSON web site, http://www.json.org/ - [2] The JSON RFC, http://www.ietf.org/rfc/rfc4627.txt - [3] The JSON RPC web site, http://www.json-rpc.org/ - [4] The JSON RPC 1.1 draft specification, - http://json-rpc.org/wd/JSON-RPC-1-1-WD-20060807.html - [5] Uniode technical report #16, UTF-EBCDIC, - http://unicode.org/reports/tr16/ - [6] CouchDB, http://incubator.apache.org/couchdb/ - and http://wiki.apache.org/couchdb/ - [7] rfc4627 module for Erlang from LShift, - www.lshift.net/blog/2007/02/17/json-and-json-rpc-for-erlang - [8] ECMA stanard 262, ECMAScript. +[1]: http://www.json.org/ + "The JSON web site" +[2]: http://www.ietf.org/rfc/rfc4627.txt + "The JSON RFC" +[3]: http://www.json-rpc.org/ + "The JSON RPC web site" +[4]: http://json-rpc.org/wd/JSON-RPC-1-1-WD-20060807.html + "The JSON RPC 1.1 draft specification" +[5]: http://unicode.org/reports/tr16/ + "Uniode technical report #16, UTF-EBCDIC" +[6]: http://incubator.apache.org/couchdb/ + "CouchDB" +[6b]: http://wiki.apache.org/couchdb/ + "CouchDB" +[7]: www.lshift.net/blog/2007/02/17/json-and-json-rpc-for-erlang + "rfc4627 module for Erlang from LShift" +[8]: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-262.pdf + "ECMA standard 262, ECMAScript" + + Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0019.md b/eeps/eep-0019.md index ba07931..a114b2d 100644 --- a/eeps/eep-0019.md +++ b/eeps/eep-0019.md @@ -1,105 +1,109 @@ -EEP: 19 -Title: Comprehension multigenerators -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe -Status: Draft -Type: Standards Track -Erlang-Version: R12B-4 -Content-Type: text/plain -Created: 14-Aug-2008 -Post-History: + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Erlang-Version: R12B-4 + Created: 14-Aug-2008 + Post-History: +**** +EEP 19: Comprehension multigenerators +---- Abstract +======== - Add Clean-inspired multi-sequence generators to comprehensions, - making code more intention-revealing and reducing the need to zip. +Add Clean-inspired multi-sequence generators to comprehensions, +making code more intention-revealing and reducing the need to zip. - This is related to EEP 12 [1], but is independent of it. +This is related to [EEP 12][], but is independent of it. Specification +============= - Currently, Erlang has +Currently, Erlang has - Pattern <- Expr + Pattern <- Expr - to enumerate over the elements of a single list and +to enumerate over the elements of a single list and - Pattern <= Expr + Pattern <= Expr - to enumerate over a binary. EEP 12 [1] adds +to enumerate over a binary. [EEP 12][] adds - Pattern [<-] List - Pattern {<-} Tuple - Pattern <<<->> Binary + Pattern [<-] List + Pattern {<-} Tuple + Pattern <<<->> Binary - This proposal changes that to +This proposal changes that to - generator: term_generator | binary_generator; - binary_generator: pattern '<=' expression; - term_generator: term_generator '&&' term_generator - | pattern '<-' expression; + generator: term_generator | binary_generator; + binary_generator: pattern '<=' expression; + term_generator: term_generator '&&' term_generator + | pattern '<-' expression; - if we otherwise stick with current Erlang, or +if we otherwise stick with current Erlang, or - generator: term_generator | binary_generator; - binary_generator: pattern '<=' expression - | pattern '<<' '<-' '>>' expression; - term_generator: term_generator '&&' term_generator - | pattern '<-' expression - | pattern '[' '<-' ']' expression - | pattern '{' '<-' '}' expression; + generator: term_generator | binary_generator; + binary_generator: pattern '<=' expression + | pattern '<<' '<-' '>>' expression; + term_generator: term_generator '&&' term_generator + | pattern '<-' expression + | pattern '[' '<-' ']' expression + | pattern '{' '<-' '}' expression; - if we go with EEP 12. +if we go with [EEP 12][]. - Roughly speaking, ignoring errors and side effects, - the effect of P1 <- E1 && ... Pn <- En - is the effect of {P1,...,Pn} <- zip(E1, ..., En) - where +Roughly speaking, ignoring errors and side effects, +the effect of `P1 <- E1 && ... Pn <- En` +is the effect of `{P1,...,Pn} <- zip(E1, ..., En)` +where - zip([X1|Xs1], ..., [Xn|Xsn]) -> - [{X1,...,Xn} | zip(Xs1, ..., Xsn)]; - zip([], ..., []) -> - []. + zip([X1|Xs1], ..., [Xn|Xsn]) -> + [{X1,...,Xn} | zip(Xs1, ..., Xsn)]; + zip([], ..., []) -> + []. - However, it is expected that there will NOT be any extra list - or tuples created by the implementation; this specifies the - effect but NOT how it is to be implemented. +However, it is expected that there will NOT be any extra list +or tuples created by the implementation; this specifies the +effect but NOT how it is to be implemented. - The effect of a term generator using the new notations of EEP 12 - is that which would be obtained by first replacing - P {<-} E with P <- tuple_to_list(E) - P [<-] E with P <- E - and then applying the translation above. +The effect of a term generator using the new notations of EEP 12 +is that which would be obtained by first replacing - In the presence of errors, the behaviour of && is not precisely - the same as using zip. We need to specify the actual behaviour - more precisely. For brevity, I ignore binary enumeration. Both - tuple enumeration and tuple comprehension are currently defined - by rewriting to plain list comprehension, so that's all we need - to worry about for now. + P {<-} E with P <- tuple_to_list(E) + P [<-] E with P <- E - A list comprehension has the form [E || C1, ..., Cn] - where each Ci is - - a generator Pattern <- List_Expression - - a binding Pattern = Any_Expression - - a "guard" Other_Expression that should give true or false. - This acts like +and then applying the translation above. - R = [], - <| E || [C1, ..., Cn] |>(R), - reverse(R) +In the presence of errors, the behaviour of && is not precisely +the same as using zip. We need to specify the actual behaviour +more precisely. For brevity, I ignore binary enumeration. Both +tuple enumeration and tuple comprehension are currently defined +by rewriting to plain list comprehension, so that's all we need +to worry about for now. - where +A list comprehension has the form `[E || C1, ..., Cn]` +where each Ci is - <| E || [] |>(R) - => R = [E | R] % reassign R +- a generator `Pattern <- List_Expression` +- a binding `Pattern = Any_Expression` +- a "guard" `Other_Expression` that should give true or false. + +This acts like + + R = [], + <| E || [C1, ..., Cn] |>(R), + reverse(R) - <| E || [Pi <- Ei|Cs] |>(R) +where + + <| E || [] |>(R) + => R = [E | R] % reassign R + + <| E || [Pi <- Ei|Cs] |>(R) => Ti = Ei Label: case Ti of [Pi|X] -> Ti = X % reassign Ti @@ -109,38 +113,38 @@ Specification goto Label ; [] -> ok end - - <| E || [Pi = Ei|Cs] |>(R) + + <| E || [Pi = Ei|Cs] |>(R) => case Ei of Pi -> <| E || Cs |>(R) ; _ -> ok end - - <| E || [Ei|Cs] |>(R) + + <| E || [Ei|Cs] |>(R) => case Ei of true -> <| E || Cs |>(R) ; false -> ok end - - In these translations, pattern matching syntax is used, with the - intent that the variables which are unbound according to the - normal rules of Erlang, and thus get bound by the Pi <- or Pi = - matching, are treated *as if* unbound in the code to be generated, - ignoring whatever values they might previous have had. That also - applies when R or Ti appears on the left of a pattern match; the - fact that the variable really was bound is ignored and a simple - assignment is done. - - This does involve (re-)assignment to local variables in the code - to be generated, but it does NOT involve user-visible assignment - and it does NOT involve mutable data structures. It is no more - problematic for the language or the runtime system than reusing a - dead register is. - - Handling multi-list enumeration is a simple, albeit schematic, - change to the rule for enumeration. - - <| E || [Pi1 <- Ei1 && Pi2 <- Ei2 && ... && Pik <- Eik|Cs] |>(R) + +In these translations, pattern matching syntax is used, with the +intent that the variables which are unbound according to the +normal rules of Erlang, and thus get bound by the Pi <- or Pi = +matching, are treated *as if* unbound in the code to be generated, +ignoring whatever values they might previous have had. That also +applies when R or Ti appears on the left of a pattern match; the +fact that the variable really was bound is ignored and a simple +assignment is done. + +This does involve (re-)assignment to local variables in the code +to be generated, but it does NOT involve user-visible assignment +and it does NOT involve mutable data structures. It is no more +problematic for the language or the runtime system than reusing a +dead register is. + +Handling multi-list enumeration is a simple, albeit schematic, +change to the rule for enumeration. + + <| E || [Pi1 <- Ei1 && Pi2 <- Ei2 && ... && Pik <- Eik|Cs] |>(R) => Ti1 = Ei1 ... Tik = Eik @@ -158,180 +162,184 @@ Specification ; {[], ..., []} -> ok end - - Note that the use of tuple syntax in the case expression and the - case clauses does not imply the literal creation of a tuple in - the generated code, only that k values are to be matched against - k patterns in each case clause. + +Note that the use of tuple syntax in the case expression and the +case clauses does not imply the literal creation of a tuple in +the generated code, only that k values are to be matched against +k patterns in each case clause. Motivation +========== - "How do I iterate over several lists at once?" is a moderately - common question from Erlang and Haskell beginners. The stock - answer, "use zip", is almost tolerable for Haskell, where the - the zipping family goes up to 7 lists and the compiler works - hard to eliminate the intermediate data structures by using - deforestation. For Erlang, where even zip4 is missing, and - where the apparent cost of creating the unwanted list and - tuples is all too real, the fact that the use of zips makes - the code harder to read means that there is no good to - outweigh the bad. +"How do I iterate over several lists at once?" is a moderately +common question from Erlang and Haskell beginners. The stock +answer, "use zip", is almost tolerable for Haskell, where the +the zipping family goes up to 7 lists and the compiler works +hard to eliminate the intermediate data structures by using +deforestation. For Erlang, where even zip4 is missing, and +where the apparent cost of creating the unwanted list and +tuples is all too real, the fact that the use of zips makes +the code harder to read means that there is no good to +outweigh the bad. - With the new notation, +With the new notation, - zip4(As, Bs, Cs, Ds) -> - [{A,B,C,D} || A <- As && B <- Bs && C <- Cs && D <- Ds]. + zip4(As, Bs, Cs, Ds) -> + [{A,B,C,D} || A <- As && B <- Bs && C <- Cs && D <- Ds]. - zipwith4(F, As, Bs, Cs, Ds) -> - [F(A,B,C,D) || A <- As && B <- Bs && C <- Cs && D <- Ds]. + zipwith4(F, As, Bs, Cs, Ds) -> + [F(A,B,C,D) || A <- As && B <- Bs && C <- Cs && D <- Ds]. - dot(Xs, Ys) -> - sum([X*Y || X <- Xs && Y <- Ys]). + dot(Xs, Ys) -> + sum([X*Y || X <- Xs && Y <- Ys]). - ifelse(Tests, Xs, Ys) -> % Simulate R's ifelse(,,) - [ case T of true -> X ; false -> Y end - || T <- Tests && X <- Xs && Y <- Ys - ]. + ifelse(Tests, Xs, Ys) -> % Simulate R's ifelse(,,) + [ case T of true -> X ; false -> Y end + || T <- Tests && X <- Xs && Y <- Ys + ]. - This code from module dialyzer_dep +This code from module `dialyzer_dep` - merge_outs([#output{type=list, content=L1}|Left], - #output{type=list, content=L2}) -> - NewList = [merge_outs([X, Y]) || {X, Y} <- lists:zip(L1, L2)], - merge_outs(Left, output(NewList)); + merge_outs([#output{type=list, content=L1}|Left], + #output{type=list, content=L2}) -> + NewList = [merge_outs([X, Y]) || {X, Y} <- lists:zip(L1, L2)], + merge_outs(Left, output(NewList)); - would become +would become - merge_outs([#output{type=list, content=L1}|Left], - #output{type=list, content=L2]) -> - merge_outs(Left, output( - [merge_outs([X,Y]) || X <- L1 && Y <- L2])); + merge_outs([#output{type=list, content=L1}|Left], + #output{type=list, content=L2]) -> + merge_outs(Left, output( + [merge_outs([X,Y]) || X <- L1 && Y <- L2])); - This code from forward_args/3 in module dialyzer_dataflow +This code from `forward_args/3` in module `dialyzer_dataflow` - NewArgTypes = [t_sup(X, Y) || - {X, Y} <- lists:zip(ArgTypes, OldArgTypes)], + NewArgTypes = [t_sup(X, Y) || + {X, Y} <- lists:zip(ArgTypes, OldArgTypes)], - would become +would become - NewArgTypes = [t_sup(X, Y) || X <- ArgTypes && Y <- OldArgTypes], + NewArgTypes = [t_sup(X, Y) || X <- ArgTypes && Y <- OldArgTypes], Rationale +========= - This is a case where no invention is required, really. - Clean has +This is a case where no invention is required, really. +Clean has - Qualifier = Generators {|Guard} - Generators = {Generator}-list - | Generator {& Generator} - Generator = Selector <- ListExpr // lazy list - | Selector <|- ListExpr // overloaded list - | Selector <-: ArrayExpr // array + Qualifier = Generators {|Guard} + Generators = {Generator}-list + | Generator {& Generator} + Generator = Selector <- ListExpr // lazy list + | Selector <|- ListExpr // overloaded list + | Selector <-: ArrayExpr // array - All I have to do is bend this a little to fit it into Erlang - syntax. Since we use "||" for list comprehensions, "&&" was - the obvious spelling for generators that step together. +All I have to do is bend this a little to fit it into Erlang +syntax. Since we use "||" for list comprehensions, "&&" was +the obvious spelling for generators that step together. - I do not yet understand in detail what the Erlang compiler - does, but it seems to involve generating an auxiliary function. - Let's take - [f(X) || X <- Xs, X > 0] - as an example. This seems to be compiled as +I do not yet understand in detail what the Erlang compiler +does, but it seems to involve generating an auxiliary function. +Let's take - foo(Xs) - where + [f(X) || X <- Xs, X > 0] - foo([X|Xs]) when X > 0 -> [f(X) | foo(Xs)]; - foo([_|Xs]) -> foo(Xs); - foo([]) -> []. +as an example. This seems to be compiled as - With a multi-sequence generator, the translation is similar. + foo(Xs) - [g(X, Y) || X <- Xs && Y <- Ys, X > Y] - - can be compiled as +where - bar(Xs, Ys) + foo([X|Xs]) when X > 0 -> [f(X) | foo(Xs)]; + foo([_|Xs]) -> foo(Xs); + foo([]) -> []. - where +With a multi-sequence generator, the translation is similar. - bar([X|Xs], [Y|Ys]) when X > Y -> - [g(X, Y) | bar(Xs, Ys)]; - bar([_|Xs], [_|Ys]) -> bar(Xs, Ys); - bar([], []) -> []. + [g(X, Y) || X <- Xs && Y <- Ys, X > Y] - The specification above gives the kind of translation I would like - to see; I do have an implementation in mind (based on Pop-2) that - doesn't need the reversal but don't know how it would fit in BEAM. +can be compiled as - One obvious question is whether we need this at all. Why not just - get people to write calls to lists:zip and get the compiler to - optimise them? One answer is that this notation is much clearer; - the programmer's *intent* is to advance along two or more lists - at the same time, not to create a list of pairs. When you want to - create a list of pairs, lists:zip/2 is the perfect way to do it. - A more important answer is that the proposed notation is NOT a - simple optimisation of equivalent code using lists:zip/2. + bar(Xs, Ys) - [E || {P,Q} <- lists:zip(A, B)] % "zip version" +where - fails at once if A and B are not proper lists of the same length. + bar([X|Xs], [Y|Ys]) when X > Y -> + [g(X, Y) | bar(Xs, Ys)]; + bar([_|Xs], [_|Ys]) -> bar(Xs, Ys); + bar([], []) -> []. - [E || P <- A && Q <- B] % "Clean version" +The specification above gives the kind of translation I would like +to see; I do have an implementation in mind (based on Pop-2) that +doesn't need the reversal but don't know how it would fit in BEAM. - eventually fails if A and B are not proper lists of the same - length, but may have evaluated E (which may have had side effects) - many times before that. So an Erlang compiler would not be - allowed to replace the "zip version" by the "Clean version" unless - it could prove both that A and B were lists (which may be within - the abilities of the Dialyzer) and that they were exactly the same - length (which as far as I know isn't). +One obvious question is whether we need this at all. Why not just +get people to write calls to `lists:zip` and get the compiler to +optimise them? One answer is that this notation is much clearer; +the programmer's *intent* is to advance along two or more lists +at the same time, not to create a list of pairs. When you want to +create a list of pairs, `lists:zip/2` is the perfect way to do it. +A more important answer is that the proposed notation is NOT a +simple optimisation of equivalent code using `lists:zip/2`. - However, a multi-sequence generator and a single-sequence one - using calls to lists:zip/2 are clearly *similar*, so they should - eventually react to lists of different length the same way. - In Haskell, zipping two lists of different length acts as if the - longer were truncated to the length of the shorter. Since - Haskell has lazy evaluation, lists may be infinite, so you can't - afford to wait until the end to start a comprehension. Since - Erlang is strict, and since mistakes are common, lists:zip/2 in - Erlang makes sense as it is. + [E || {P,Q} <- lists:zip(A, B)] % "zip version" +fails at once if A and B are not proper lists of the same length. + [E || P <- A && Q <- B] % "Clean version" -Backwards Compatibility +eventually fails if A and B are not proper lists of the same +length, but may have evaluated E (which may have had side effects) +many times before that. So an Erlang compiler would not be +allowed to replace the "zip version" by the "Clean version" unless +it could prove both that A and B were lists (which may be within +the abilities of the Dialyzer) and that they were exactly the same +length (which as far as I know isn't). - The "operator" '&&' is not legal syntax anywhere in Erlang - at the moment, so no existing code can be affected. +However, a multi-sequence generator and a single-sequence one +using calls to `lists:zip/2` are clearly *similar*, so they should +eventually react to lists of different length the same way. +In Haskell, zipping two lists of different length acts as if the +longer were truncated to the length of the shorter. Since +Haskell has lazy evaluation, lists may be infinite, so you can't +afford to wait until the end to start a comprehension. Since +Erlang is strict, and since mistakes are common, `lists:zip/2` in +Erlang makes sense as it is. -Reference Implementation +Backwards Compatibility +======================= - None yet, but I'd like to do it when I can figure out how. +The "operator" '&&' is not legal syntax anywhere in Erlang +at the moment, so no existing code can be affected. -References - - [1] EEP 12, http://www.erlang.org/eeps/eep-0012.html +Reference Implementation +======================== +None yet, but I'd like to do it when I can figure out how. -Copyright - This document has been placed in the public domain. +[EEP 12]: eep-0012.md + "EEP 12" -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +Copyright +========= + +This document has been placed in the public domain. +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0020.md b/eeps/eep-0020.md index 05aded1..53b9035 100644 --- a/eeps/eep-0020.md +++ b/eeps/eep-0020.md @@ -1,258 +1,260 @@ -EEP: 20 -Title: Split the atoms! -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe -Status: Draft -Type: Standards Track -Erlang-Version: R12B-4 -Content-Type: text/plain -Created: 05-Aug-2008 - -Post-History: + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Erlang-Version: R12B-4 + Created: 05-Aug-2008 + Post-History: +**** +EEP 20: Split the atoms! +---- Abstract +======== - An idea from the Logix implementation of Flat Concurrent Prolog - can be adapted to Erlang: invisibly to users there can be two - implementations of 'atoms', fixing a major system integrity - issue and removing the need to warp one's data structure design - to code around it. +An idea from the Logix implementation of Flat Concurrent Prolog +can be adapted to Erlang: invisibly to users there can be two +implementations of 'atoms', fixing a major system integrity +issue and removing the need to warp one's data structure design +to code around it. Specification - - There are no user-visible changes to the Erlang language or - libraries. Interfaces between Erlang and other languages such - as C may need to be changed. - - We split atoms into two classes: "global" atoms are those atoms - which either appear in the post-preprocessing text of some loaded - module or are the registered name of any process; "local" atoms - are all others which a process creates. - - A local atom is represented by a data structure SUCH AS - - +----------+ - | size+tag | boxed object header; see below - +----------+ - | hashcode | a 32-bit hash code - +----------+ - | equivrep | points to Union/Find representative - +----------+ - | bytes of | - | name ... | - +----------+ - - As usual, the size+tag contains a 2 bit tag to say it is an - IMMED2 object, a 4-bit subtag to say what kind (I propose - 1011), and a 26-bit arity. However, the arity field is - split into two subfields: - - +--------------+------------+----+--+ - | byte count | char count |LATM|BX| - +--------------+------------+----+--+ - 14 12 4 2 size in bits - - The char count says how many Unicode characters there are in - the name. The byte count says how many bytes those characters - are stored in. For compactness and backwards compatibility, - an atom whose name consists only of Latin-1 characters has - byte count = char count and name represented as Latin-1; atoms - with names outside that range are held in some other form - _such as_ UTF-8, SCSU, BOCU, or what have you. This proposal - is not specifically about encoding schemes; all I have to say - here is that it should be the same for all atoms and it should - be at least as good as UTF-8. - - The hash code field is a 32-bit hash code. Again, I have - nothing to say about atom hashes as such except to say that - the method should be the same for all atoms in all processes - on a node and that it should be a good one. Advice about - good hashing functions is hard to find. hashpjw() can be - improved on. I heartily recommend Valloud's book [1]. - - The equivrep field is a pointer. It always points to an atom, - which may be a global atom or a local atom. Initially, it points - to the local atom itself. When a local atom is compared with - another local atom, - - first, check the header fields to see if they match - second, check the hash codes to see if they match - finally, check the bytes of the names. - - But this is also combined with Union/Find, very much like - binding variables in Prolog. So we "dereference" (chase the - equivrep fields) after the second step, and if we end up at - the same place, the two local atoms are equal. And if two - physically distinct local atoms do turn out equal, we make - the younger one (the one most recently created) point to the - older one. - - Global atoms should have a similar representation; I suggest that - the representation of a local atom should be embedded in the - representation of a global atom, so that local atoms can be - compared with global atoms as if they were both local. - - Atoms returned by list_to_existing_atom/1 are always global atoms. - Atoms returned by list_to_atom/1 or binary_to_term/1 are global - atoms if and only if they are already existing global atoms, - otherwise they are local atoms. - - Interfaces provided to other languages, such as C or Java, should - leave existing atom-creation operations returning global atoms, - and should add operations for creating local atoms. - - When a process is garbage collected, a pointer to a local atom is - replaced by that local atom's equivrep, so that processes that - have ever noticed they have duplicate local atoms don't keep them - forever. +============= + +There are no user-visible changes to the Erlang language or +libraries. Interfaces between Erlang and other languages such +as C may need to be changed. + +We split atoms into two classes: "global" atoms are those atoms +which either appear in the post-preprocessing text of some loaded +module or are the registered name of any process; "local" atoms +are all others which a process creates. + +A local atom is represented by a data structure SUCH AS + + +----------+ + | size+tag | boxed object header; see below + +----------+ + | hashcode | a 32-bit hash code + +----------+ + | equivrep | points to Union/Find representative + +----------+ + | bytes of | + | name ... | + +----------+ + +As usual, the size+tag contains a 2 bit tag to say it is an +IMMED2 object, a 4-bit subtag to say what kind (I propose +1011), and a 26-bit arity. However, the arity field is +split into two subfields: + + +--------------+------------+----+--+ + | byte count | char count |LATM|BX| + +--------------+------------+----+--+ + 14 12 4 2 size in bits + +The char count says how many Unicode characters there are in +the name. The byte count says how many bytes those characters +are stored in. For compactness and backwards compatibility, +an atom whose name consists only of Latin-1 characters has +byte count = char count and name represented as Latin-1; atoms +with names outside that range are held in some other form +_such as_ UTF-8, SCSU, BOCU, or what have you. This proposal +is not specifically about encoding schemes; all I have to say +here is that it should be the same for all atoms and it should +be at least as good as UTF-8. + +The hash code field is a 32-bit hash code. Again, I have +nothing to say about atom hashes as such except to say that +the method should be the same for all atoms in all processes +on a node and that it should be a good one. Advice about +good hashing functions is hard to find. `hashpjw()` can be +improved on. I heartily recommend [Valloud's book][1]. + +The equivrep field is a pointer. It always points to an atom, +which may be a global atom or a local atom. Initially, it points +to the local atom itself. When a local atom is compared with +another local atom, + +* first, check the header fields to see if they match +* second, check the hash codes to see if they match +* finally, check the bytes of the names. + +But this is also combined with Union/Find, very much like +binding variables in Prolog. So we "dereference" (chase the +equivrep fields) after the second step, and if we end up at +the same place, the two local atoms are equal. And if two +physically distinct local atoms do turn out equal, we make +the younger one (the one most recently created) point to the +older one. + +Global atoms should have a similar representation; I suggest that +the representation of a local atom should be embedded in the +representation of a global atom, so that local atoms can be +compared with global atoms as if they were both local. + +Atoms returned by `list_to_existing_atom/1` are always global atoms. +Atoms returned by `list_to_atom/1` or `binary_to_term/1` are global +atoms if and only if they are already existing global atoms, +otherwise they are local atoms. + +Interfaces provided to other languages, such as C or Java, should +leave existing atom-creation operations returning global atoms, +and should add operations for creating local atoms. + +When a process is garbage collected, a pointer to a local atom is +replaced by that local atom's equivrep, so that processes that +have ever noticed they have duplicate local atoms don't keep them +forever. Motivation +========== - There are a number of problems that limit the usefulness - of Erlang atoms. - - The first is that atom size is limited to 255 bytes, - which makes Erlang atoms of very little use for file names, - as C's FILENAME_MAX is typically 1024 these days. - - The second is that atoms are limited to Latin-1 characters. - We really do want full Unicode support for them, not so - much for programmers to write atoms in strange scripts in - their source code as to allow information to flow _through_ - an Erlang system as atoms. - - Those two are minor problems. - - The major problem is the atom table. - - It is a global resource, which means that on an SMP system - there has to be a lot of locking and unlocking. This proposal - doesn't include a new "always return a local atom" operation, - but it creates the possibilities for new operations like that - which require no locking. - - The atom table is limited, in atom.c, to ATOM_LIMIT=1024*1024 - entries. Even on a 32-bit system, this is smaller than a - machine could support; it is an arbitrary limit, and such limits - are always a problem. - - The atom table is not garbage collected. Once an atom has been - created, it says created. Historic Prolog systems, like Quintus - Prolog, did the same thing. Back in 1984 this was recognised as - a problem, especially for programs that wanted to access large - volumes of stored data. Modern Prolog systems, like SWI Prolog, - do collect atoms; SWI Prolog would not be nearly so useful for - manipulating large collections of RDF data if it were otherwise. - This proposal does not add garbage collection for the atom table; - what it does is to stop most of the atoms that would have been - collected ever entering that table in the first place. - - Filling up the atom table crashes or hangs the entire node. - - This means that it is far too easy to crash or hash Erlang - software by feeding it too many atoms. - - And _that_ means that Erlang programmers who would like to use - atoms in data structures (as keys in dictionaries, say) use - binaries instead: binaries are not limited in size or number, - can hold UTF-8 if you want them to, are garbage collected, and - are generally safer to use. - - While this proposal makes atoms more _convenient_ to use (they - may be longer, more numerous, and may contain Unicode), the - real point is to make atoms _safer_ to use. If you can - stream data from source through an Erlang process, mapping - external "strings" to binaries, you will be able to do the - same thing just as safely mapping them to atoms. +There are a number of problems that limit the usefulness +of Erlang atoms. +The first is that atom size is limited to 255 bytes, +which makes Erlang atoms of very little use for file names, +as C's `FILENAME_MAX` is typically 1024 these days. +The second is that atoms are limited to Latin-1 characters. +We really do want full Unicode support for them, not so +much for programmers to write atoms in strange scripts in +their source code as to allow information to flow _through_ +an Erlang system as atoms. -Rationale +Those two are minor problems. + +The major problem is the atom table. + +It is a global resource, which means that on an SMP system +there has to be a lot of locking and unlocking. This proposal +doesn't include a new "always return a local atom" operation, +but it creates the possibilities for new operations like that +which require no locking. + +The atom table is limited, in atom.c, to `ATOM_LIMIT=1024*1024` +entries. Even on a 32-bit system, this is smaller than a +machine could support; it is an arbitrary limit, and such limits +are always a problem. + +The atom table is not garbage collected. Once an atom has been +created, it says created. Historic Prolog systems, like Quintus +Prolog, did the same thing. Back in 1984 this was recognised as +a problem, especially for programs that wanted to access large +volumes of stored data. Modern Prolog systems, like SWI Prolog, +do collect atoms; SWI Prolog would not be nearly so useful for +manipulating large collections of RDF data if it were otherwise. +This proposal does not add garbage collection for the atom table; +what it does is to stop most of the atoms that would have been +collected ever entering that table in the first place. + +Filling up the atom table crashes or hangs the entire node. - Erlang is not the first language to face these problems. - It isn't even the first concurrent language to face them. - Flat Concurrent Prolog was there first, and while I have - not seen the Logix source code, the idea was explained in - Logix documentation many years ago. I know this *can* - work because it *did* work. - - Logix used this approach for all atoms; eventually, I - believe Erlang will need to as well in order to handle - thousands of processors without lots of locks. Right now, - it makes sense to keep on using the old representation for - fairly "static" atoms. In particular, we would like module - and function names (and frame keys when we have them) to be - just the way they are now. If an application is loaded after a - local atom has been created, we may find that it is a module - name or function name after all; this is one of the reasons - for the equivrep field. Once it's noticed, the duplication - won't survive another garbage collection. - - The current 'global atom' representation has a hack to make - term comparison faster. For simplicity I have not described - it above, because that's orthogonal to the issues this EEP is - concerned with. I note (a) that for the ord0 field to - continue in its present form, the encoding would best be - UTF-8 or BOCU, and (b) to keep the compactness of the Latin-1 - atoms, the ord0 field should be the first 31 bits that *would* - have been stored had the atom been stored in whichever of - UTF-8 or BOCU is chosen. I also note (c) that if you don't - allow "native" byte ordering to dictate the order in which the - bytes of an atom's name are stored, you don't *need* a special - ord0 field. - - I should confess that this proposal doesn't _entirely_ avoid the - crashes and hangs problem. If an Erlang system can be persuaded - to load modules from an untrustworthy source, it can still be - made to try to create enough atoms to get into trouble. This is - one of the reasons that I think Erlang will eventually have to - abandon the global atom table. However, anyone who loads modules - - from untrustworthy sources should KNOW they are doing that; it is - an obviously dangerous thing to do. list_to_atom/1 is NOT an - obviously dangerous function, and it should not be any more - dangerous than list_to_binary/1. +This means that it is far too easy to crash or hash Erlang +software by feeding it too many atoms. + +And _that_ means that Erlang programmers who would like to use +atoms in data structures (as keys in dictionaries, say) use +binaries instead: binaries are not limited in size or number, +can hold UTF-8 if you want them to, are garbage collected, and +are generally safer to use. + +While this proposal makes atoms more _convenient_ to use (they +may be longer, more numerous, and may contain Unicode), the +real point is to make atoms _safer_ to use. If you can +stream data from source through an Erlang process, mapping +external "strings" to binaries, you will be able to do the +same thing just as safely mapping them to atoms. + + + +Rationale +========= + +Erlang is not the first language to face these problems. +It isn't even the first concurrent language to face them. +Flat Concurrent Prolog was there first, and while I have +not seen the Logix source code, the idea was explained in +Logix documentation many years ago. I know this *can* +work because it *did* work. + +Logix used this approach for all atoms; eventually, I +believe Erlang will need to as well in order to handle +thousands of processors without lots of locks. Right now, +it makes sense to keep on using the old representation for +fairly "static" atoms. In particular, we would like module +and function names (and frame keys when we have them) to be +just the way they are now. If an application is loaded after a +local atom has been created, we may find that it is a module +name or function name after all; this is one of the reasons +for the equivrep field. Once it's noticed, the duplication +won't survive another garbage collection. + +The current 'global atom' representation has a hack to make +term comparison faster. For simplicity I have not described +it above, because that's orthogonal to the issues this EEP is +concerned with. I note (a) that for the ord0 field to +continue in its present form, the encoding would best be +UTF-8 or BOCU, and (b) to keep the compactness of the Latin-1 +atoms, the ord0 field should be the first 31 bits that *would* +have been stored had the atom been stored in whichever of +UTF-8 or BOCU is chosen. I also note (c) that if you don't +allow "native" byte ordering to dictate the order in which the +bytes of an atom's name are stored, you don't *need* a special +ord0 field. + +I should confess that this proposal doesn't _entirely_ avoid the +crashes and hangs problem. If an Erlang system can be persuaded +to load modules from an untrustworthy source, it can still be +made to try to create enough atoms to get into trouble. This is +one of the reasons that I think Erlang will eventually have to +abandon the global atom table. However, anyone who loads modules + +from untrustworthy sources should KNOW they are doing that; it is +an obviously dangerous thing to do. `list_to_atom/1` is NOT an +obviously dangerous function, and it should not be any more +dangerous than `list_to_binary/1`. Backwards Compatibility +======================= - No existing code (outside the Erlang implementation) - should be affected in the slightest. +No existing code (outside the Erlang implementation) +should be affected in the slightest. Reference Implementation +======================== + +None. The change is simple in concept, but affects several +atoms in the core of the system. - None. The change is simple in concept, but affects several - atoms in the core of the system. +[1]: http://www.lulu.com/content/1455536 + "Hashing in Smalltalk: Theory and Practice, Andrés Valloud" -References - - [1] Hashing in Smalltalk: Theory and Practice - AndrŽs Valloud, - http://www.lulu.com/content/1455536 Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0021.md b/eeps/eep-0021.md index c765f07..fdd15f3 100644 --- a/eeps/eep-0021.md +++ b/eeps/eep-0021.md @@ -1,172 +1,177 @@ -EEP: 21 -Title: Optional trailing commas for lists and tuples -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe -Status: Draft -Type: Standards Track -Erlang-Version: R12B-4 -Content-Type: text/plain -Created: 08-Aug-2008 -Post-History: + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Erlang-Version: R12B-4 + Created: 08-Aug-2008 + Post-History: +**** +EEP 21: Optional trailing commas for lists and tuples +---- Abstract +======== - Allow an extra comma at the end of a list or tuple. - Darren New proposed this change; Richard O'Keefe, who doesn't - like it very much, wrote it up as an EEP. +Allow an extra comma at the end of a list or tuple. +Darren New proposed this change; Richard O'Keefe, who doesn't +like it very much, wrote it up as an EEP. Specification +============= - A list that would have ended with ",X]" for some term X - may instead end with ",X,]". - A tuple that would have ended with ",X}" for some term X - may instead end with ",X,}". - The rule for tuples also applies to records and -record - declarations. +A list that would have ended with ",X]" for some term X +may instead end with ",X,]". +A tuple that would have ended with ",X}" for some term X +may instead end with ",X,}". +The rule for tuples also applies to records and -record +declarations. Motivation +========== - About 5,000 lines of the Erlang/OTP sources begin with a right - square bracket or right curly brace. For example, -record - declarations are commonly laid out as +About 5,000 lines of the Erlang/OTP sources begin with a right +square bracket or right curly brace. For example, -record +declarations are commonly laid out as - -record(foo, { - field_1 = default_1, - ... - field_n = default_n - }). + -record(foo, { + field_1 = default_1, + ... + field_n = default_n + }). - e.g., +e.g., - -record(hostent, { - h_name, - h_aliases = [], - h_addrtype, - h_length, - h_addr_list = [] - }). + -record(hostent, { + h_name, + h_aliases = [], + h_addrtype, + h_length, + h_addr_list = [] + }). - and record creation expressions are often laid out similarly, e.g., +and record creation expressions are often laid out similarly, e.g., make_hostent(Name, Addrs, Aliases, ?S_A) -> - #hostent { - h_name = Name, - h_addrtype = inet, - h_length = 4, - h_addr_list = Addrs, - h_aliases = Aliases - }; + #hostent { + h_name = Name, + h_addrtype = inet, + h_length = 4, + h_addr_list = Addrs, + h_aliases = Aliases + }; - Adding entries to such lists (in the informal sense of "list"), - removing entries, and reordering entries would be simpler if they - were all punctuated the same way. Lists (in the Erlang sense of - "list") of options are also often laid out like this. +Adding entries to such lists (in the informal sense of "list"), +removing entries, and reordering entries would be simpler if they +were all punctuated the same way. Lists (in the Erlang sense of +"list") of options are also often laid out like this. - C, C++, Java, and Javascript allow a trailing comma in - initial value lists. Python allows trailing commas in lists and - dictionaries. Python in particular is evidence that a programming - language can support this feature without charges of "C envy" or - of extreme ugliness. +C, C++, Java, and Javascript allow a trailing comma in +initial value lists. Python allows trailing commas in lists and +dictionaries. Python in particular is evidence that a programming +language can support this feature without charges of "C envy" or +of extreme ugliness. Rationale - - I don't actually feel any need for this proposal; I believe that - the answer is better tool support. However, many people are - wedded to their tools, even more than their programming languages. - Darren New is not the only one to have asked for it, and with - about 1 SLOC in 110 of the Erlang/OTP sources reflecting a list or - tuple where this feature could have been used, it's very much a - low cost high public appreciation feature. - - I wrote that last sentence before working on the parser to make it - accept this "feature". There are 115 lines of plain diffs. I - could have made this change to the Prolog parser in 10 minutes, - but then the Prolog parser has the enormous advantage of NOT being - written using an LR parser generator like Yecc. Still, now that I - *have* hacked on the parser, the cost to everyone *else* is low. - - The specification was carefully worded. Commas are NOT allowed in - empty lists or tuples, nor in list or tuple comprehensions. They - are only allowed after a final element, so [1|L,] is also not - allowed. Nor are trailing commas allowed inside argument lists, - only in [] and {}. They are, however, allowed in tuple and - record types. - - This is very similar to the "optional semicolons" EEP (which was - FAR simpler to implement). The heart and soul of that EEP is the - desire to make semicolons and commas look DIFFERENT; for this - reason it is important NOT to allow optional trailing semicolons. - If semicolons may trail, commas must not. - If commas may trail, semicolons must not. - It is also important NOT to approach the "consistent punctuation - for list elements" problem by allowing optional leading commas. - If semicolons may lead, commas must not. - If commas may lead, semicolons must not. - Since trailing commas are established practice in C, C++, Java, - ECMAScript, Python, &c, commas trail, semicolons lead. - - I repeat that this is not my idea. I've just written up the EEP - and figured out how to implement it. With nearly 1% of the SLOC - in the Erlang/OTP system being cases where people might well have - had reason to add a trailing comma, had it been legal, it seemed - worth while finding out whether it would be practical. +========= + +I don't actually feel any need for this proposal; I believe that +the answer is better tool support. However, many people are +wedded to their tools, even more than their programming languages. +Darren New is not the only one to have asked for it, and with +about 1 SLOC in 110 of the Erlang/OTP sources reflecting a list or +tuple where this feature could have been used, it's very much a +low cost high public appreciation feature. + +I wrote that last sentence before working on the parser to make it +accept this "feature". There are 115 lines of plain diffs. I +could have made this change to the Prolog parser in 10 minutes, +but then the Prolog parser has the enormous advantage of NOT being +written using an LR parser generator like Yecc. Still, now that I +*have* hacked on the parser, the cost to everyone *else* is low. + +The specification was carefully worded. Commas are NOT allowed in +empty lists or tuples, nor in list or tuple comprehensions. They +are only allowed after a final element, so `[1|L,]` is also not +allowed. Nor are trailing commas allowed inside argument lists, +only in `[]` and `{}`. They are, however, allowed in tuple and +record types. + +This is very similar to the "optional semicolons" EEP (which was +FAR simpler to implement). The heart and soul of that EEP is the +desire to make semicolons and commas look DIFFERENT; for this +reason it is important NOT to allow optional trailing semicolons. +If semicolons may trail, commas must not. +If commas may trail, semicolons must not. +It is also important NOT to approach the "consistent punctuation +for list elements" problem by allowing optional leading commas. +If semicolons may lead, commas must not. +If commas may lead, semicolons must not. +Since trailing commas are established practice in C, C++, Java, +ECMAScript, Python, &c, commas trail, semicolons lead. + +I repeat that this is not my idea. I've just written up the EEP +and figured out how to implement it. With nearly 1% of the SLOC +in the Erlang/OTP system being cases where people might well have +had reason to add a trailing comma, had it been legal, it seemed +worth while finding out whether it would be practical. Backwards Compatibility +======================= - All existing Erlang code remains acceptable with unchanged - semantics. The commas are dealt with entirely in the parser; - other language manipulation tools never know that they were - there, so work perfectly with code using them. +All existing Erlang code remains acceptable with unchanged +semantics. The commas are dealt with entirely in the parser; +other language manipulation tools never know that they were +there, so work perfectly with code using them. Reference Implementation +======================== + +The auxiliary file [eep-0021-1.diff][] +is a patch file to be applied to `erl_parse.yrl`. + +You would think that all we'd need to do would be to change - The auxiliary file eep-0021-1.diff - is a patch file to be applied to erl_parse.yrl. + ... ']' ... '}' - You would think that all we'd need to do would be to change +to - ... ']' ... '}' - to - ... ',' ']' ... ',' '}' + ... ',' ']' ... ',' '}' - in several places. You would be wrong. With a different grammar, - maybe. With the current grammar, this was an uncommonly tricky - change requiring surgery in all sorts of places. The result gets - through Yecc with no complaints other than the two shift/reduce - complaints that are expected (and have nothing to do with this - change). +in several places. You would be wrong. With a different grammar, +maybe. With the current grammar, this was an uncommonly tricky +change requiring surgery in all sorts of places. The result gets +through Yecc with no complaints other than the two shift/reduce +complaints that are expected (and have nothing to do with this change). -References - - None. +[eep-0021-1.diff]: eep-0021-1.diff + "Patch for erl_parse.yrl" Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0022.md b/eeps/eep-0022.md index 8187577..1927989 100644 --- a/eeps/eep-0022.md +++ b/eeps/eep-0022.md @@ -1,254 +1,257 @@ -EEP: 22 -Title: Range checking for binaries -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe -Status: Draft -Type: Standards Track -Erlang-Version: R12B-4 -Content-Type: text/plain -Created: 27-Aug-2008 -Post-History: + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Erlang-Version: R12B-4 + Created: 27-Aug-2008 + Post-History: +**** +EEP 22: Range checking for binaries +---- Abstract +======== - A module may request that bit fields be range checked. +A module may request that bit fields be range checked. Specification +============= - A new directive is added. +A new directive is added. - -bit_range_check(Wanted). + -bit_range_check(Wanted). - where Wanted is 'false' or 'true'. - - Recall that a segment of a bit string (or binary) has the form +where Wanted is 'false' or 'true'. - Value [':' Size] ['/' Type_Specifier_List] +Recall that a segment of a bit string (or binary) has the form - where Type_Specifier_List includes such things as 'integer', - 'signed', and 'unsigned'. Currently the documentation states - that + Value [':' Size] ['/' Type_Specifier_List] - "Signedness ... Only matters for matching and when the type - is integer. The default is unsigned." +where `Type_Specifier_List` includes such things as 'integer', +'signed', and 'unsigned'. Currently the documentation states +that - Combining the Size with the Unit gives a Size_In_Bits. - The on-line Erlang manual does not state in section 6.16 that - in constructing a bit string the bottom Size_In_Bits bits of - an integer are used with the rest quietly ignored, but it is so. + "Signedness ... Only matters for matching and when the type + is integer. The default is unsigned." - The directive -bit_range_check(false) makes explicit the - programmer's intention that this C-like truncation should happen. +Combining the `Size` with the `Unit` gives a `Size_In_Bits`. +The on-line Erlang manual does not state in section 6.16 that +in constructing a bit string the bottom `Size_In_Bits` bits of +an integer are used with the rest quietly ignored, but it is so. - The directive -bit_range_check(true) says that it is a checked - run-time error in +The directive `-bit_range_check(false)` makes explicit the +programmer's intention that this C-like truncation should happen. - Value:Size/unsigned-integer-unit:1 +The directive `-bit_range_check(true)` says that it is a checked +run-time error in - or constructions otherwise equivalent to it if Value does not - lie in the range 0 <= Value < 2**Size, and it is a checked - run-time error in + Value:Size/unsigned-integer-unit:1 - Value:Size/signed-integer-unit:1 +or constructions otherwise equivalent to it if Value does not +lie in the range `0 <= Value < 2**Size`, and it is a checked +run-time error in - or constructions otherwise equivalent to it if Value does not - lie in the range -(2**(Size-1)) <= Value < 2**(Size-1). + Value:Size/signed-integer-unit:1 - The error that is raised is like the error that would be raised - for (1//0):Size/Type_Specifier_List except for using 'badrange' - instead of 'badarith'. +or constructions otherwise equivalent to it if Value does not +lie in the range `-(2**(Size-1)) <= Value < 2**(Size-1)`. - The behaviour of integer bit syntax segments in the absence of - a -bit_range_check directive is implementation defined and - subject to change. +The error that is raised is like the error that would be raised +for `(1//0):Size/Type_Specifier_List` except for using 'badrange' +instead of 'badarith'. - The BEAM system is extended with a new instruction or instructions - similar to the existing instruction or instructions for integer - segments but checking the range. The compiler is extended to - generate them for <<...>> expressions in the range of a - -bit_range_check(true) directive. +The behaviour of integer bit syntax segments in the absence of +a `-bit_range_check` directive is implementation defined and +subject to change. - A -bit_range_check directive may not appear after a bit syntax - pattern or expression or after another -bit_range_check directive. +The BEAM system is extended with a new instruction or instructions +similar to the existing instruction or instructions for integer +segments but checking the range. The compiler is extended to +generate them for `<<...>>` expressions in the range of a +`-bit_range_check(true)` directive. + +A `-bit_range_check` directive may not appear after a bit syntax +pattern or expression or after another `-bit_range_check` directive. Motivation +========== - It keeps on coming as an unpleasant surprise to Erlang programmers - that this truncation happens. Quiet destruction of information is - otherwise alien to Erlang: integer arithmetic is unbounded, not - wrapped as in some (but not all) C systems; element/2 doesn't take - indices modulo tuple size but raises an exception if the index is - out of range, and so on. +It keeps on coming as an unpleasant surprise to Erlang programmers +that this truncation happens. Quiet destruction of information is +otherwise alien to Erlang: integer arithmetic is unbounded, not +wrapped as in some (but not all) C systems; element/2 doesn't take +indices modulo tuple size but raises an exception if the index is +out of range, and so on. - In any case where the truncation is wanted, an Erlang programmer - can already write - (Value rem 256):unsigned-integer - and the Erlang compiler could notice this and optimise the 'rem' - operation away, so the truncation is not only unusual in Erlang, - it is also unexpected in this particular case. +In any case where the truncation is wanted, an Erlang programmer +can already write - It is not only unexpected, it removes a chance to find mistakes, - so it would seem to be undesirable. + (Value rem 256):unsigned-integer - Edwin Fine asked "How difficult could it be to add optional run- - time checking to detect this condition without a serious risk of - adverse effects on the correctness of Erlang run-time execution?" +and the Erlang compiler could notice this and optimise the 'rem' +operation away, so the truncation is not only unusual in Erlang, +it is also unexpected in this particular case. - Bjšrn Gustavssan replied "it would be better to add optional - support in the compiler to turn on checks (either for an entire - module, or for individual segments of a binary). If someone - writes an EEP, we will consider implementing it." +It is not only unexpected, it removes a chance to find mistakes, +so it would seem to be undesirable. - This is that EEP. +Edwin Fine asked "How difficult could it be to add optional run- +time checking to detect this condition without a serious risk of +adverse effects on the correctness of Erlang run-time execution?" +Björn Gustavsson replied "it would be better to add optional +support in the compiler to turn on checks (either for an entire +module, or for individual segments of a binary). If someone +writes an EEP, we will consider implementing it." +This is that EEP. -Rationale - The Erlang/OTP team regard the old behaviour as a feature, - and wish to retain it. In particular, they wish modules that - were written expecting the old behaviour to continue to work - (for now) without modification. - - One alternative would be to add new syntax, such as having a - new 'checked' specifier, so that - Value/checked-unsigned-integer - would require a value in the range 0..255. - But many Erlang programmers will want to use this as the normal - case, and will not like the safe version being so much more effort - to write than the unsafe version. - - It appears that "truncation wanted/not wanted" is not a matter - of this expression or that, but of this programmer or that, - and we can expect that each module will be written by someone - expecting only one behaviour or expecting only the other. - - Adding a - - -bit_range_check(true). - - directive to a module is more work than doing nothing at all, - but programmers who want this behaviour should be able to set up - their editing environment to have this line in their template for - creating new Erlang modules. - - There are several questions: - - Should this apply to bit strings as well as integers? - - What should the name of the directive be? - - What should the argument(s) of the directive be? - - Should multiple instances of the directive be allowed in - a module? - - Bit strings: Assume X = <<5:3/unsigned-integer-unit:1>>. - Currently, <> quietly truncates X. This drops bits - from the right of X, giving <<2:2>>. If this worked the same - as integers, you would expect <<1:2>>. This is certainly - very odd. Since we get truncation on the left and padding on - the left for integers, we naturally expect padding on the - right for bit strings to go with truncation on the right. - But <> isn't <<10:4>>, it's a runtime exception. - All very odd indeed. It would certainly be desirable to have - an easy way for the programmer to indicate whether they wanted - truncation on the left or the right and padding on the left or - the right. Perhaps a new built in function - - set_bit_size(Bit_String, Desired_Width, - Truncation, Padding, Fill) - - Bit_String : a bit string - Desired_Width : a non-negative integer, the width wanted - Truncation: 'left' | 'right' | 'error'; - if bit_size(Bit_String) > Desired_Width - truncate on the left/truncate on the right/ - report an error - Padding: 'left' | 'right' | 'error'; - if bit_size(Bit_String) < Desired_Width - pad on the left/pad on the right/report an error - Fill: 0 | 1 | 'copy'; - pad with 0/pad with 1/pad with a copy of the - last bit at the end where padding is done. - - However, that idea is only partly baked, and is not part of the - current proposal. As things currently stand, using the bit - syntax and relying on implicit truncation is the simplest way - to extract the leading bits of a bit string. - - As long as the name of the directive is intention-revealing, - it doesn't matter very much what it is. - I proposed 'bit_range_check' because it is all about checking, - ranges in bit syntax, but since in this draft it does NOT apply - to bit string segments, perhaps 'bit_integer_range_check' would - be better. - - The arguments false and true seem clear enough. - Alternatives would be something like - - -bit_integer_range(check). - -bit_integer_range(no_check). - - That would be fine too. - - Classical Pascal compilers let you do things like - - {$I-} (* disable index checks *) - (* code with no index checks *) - {$I+} (* re-enable index checks *) - - Allowing multiple -bit_range_check directives in a module could - let you use code written for the old approach inside a module - that otherwise uses the new approach. I don't believe that we - want to encourage that sort of thing: it is MUCH easier when - reading a module if all of it follows the same rule. - - It is also easier for an Erlang compiler that expects to be able - to process function definitions in any order. The compiler can - check for one of these directives anywhere in a module before it - handles any bit syntax forms anywhere. However, it is easier for - people reading a module if, when they first see a <<...>> - construction, they have already seen any directive that might - affect what it means. - - The restrictions on the number and placement of these directives - can always be relaxed later if necessary. + +Rationale +========= + +The Erlang/OTP team regard the old behaviour as a feature, +and wish to retain it. In particular, they wish modules that +were written expecting the old behaviour to continue to work +(for now) without modification. + +One alternative would be to add new syntax, such as having a +new 'checked' specifier, so that + + Value/checked-unsigned-integer + +would require a value in the range 0..255. +But many Erlang programmers will want to use this as the normal +case, and will not like the safe version being so much more effort +to write than the unsafe version. + +It appears that "truncation wanted/not wanted" is not a matter +of this expression or that, but of this programmer or that, +and we can expect that each module will be written by someone +expecting only one behaviour or expecting only the other. + +Adding a + + -bit_range_check(true). + +directive to a module is more work than doing nothing at all, +but programmers who want this behaviour should be able to set up +their editing environment to have this line in their template for +creating new Erlang modules. + +There are several questions: + +- Should this apply to bit strings as well as integers? +- What should the name of the directive be? +- What should the argument(s) of the directive be? +- Should multiple instances of the directive be allowed in + a module? + +Bit strings: `Assume X = <<5:3/unsigned-integer-unit:1>>`. +Currently, `<>` quietly truncates `X`. This drops bits +from the right of `X`, giving `<<2:2>>`. If this worked the same +as integers, you would expect `<<1:2>>`. This is certainly +very odd. Since we get truncation on the left and padding on +the left for integers, we naturally expect padding on the +right for bit strings to go with truncation on the right. +But `<>` isn't `<<10:4>>`, it's a runtime exception. +All very odd indeed. It would certainly be desirable to have +an easy way for the programmer to indicate whether they wanted +truncation on the left or the right and padding on the left or +the right. Perhaps a new built in function + + set_bit_size(Bit_String, Desired_Width, + Truncation, Padding, Fill) + + Bit_String : a bit string + Desired_Width : a non-negative integer, the width wanted + Truncation: 'left' | 'right' | 'error'; + if bit_size(Bit_String) > Desired_Width + truncate on the left/truncate on the right/ + report an error + Padding: 'left' | 'right' | 'error'; + if bit_size(Bit_String) < Desired_Width + pad on the left/pad on the right/report an error + Fill: 0 | 1 | 'copy'; + pad with 0/pad with 1/pad with a copy of the + last bit at the end where padding is done. + +However, that idea is only partly baked, and is not part of the +current proposal. As things currently stand, using the bit +syntax and relying on implicit truncation is the simplest way +to extract the leading bits of a bit string. + +As long as the name of the directive is intention-revealing, +it doesn't matter very much what it is. +I proposed `bit_range_check` because it is all about checking, +ranges in bit syntax, but since in this draft it does NOT apply +to bit string segments, perhaps `bit_integer_range_check` would +be better. + +The arguments false and true seem clear enough. +Alternatives would be something like + + -bit_integer_range(check). + -bit_integer_range(no_check). + +That would be fine too. + +Classical Pascal compilers let you do things like + + {$I-} (* disable index checks *) + (* code with no index checks *) + {$I+} (* re-enable index checks *) + +Allowing multiple `-bit_range_check` directives in a module could +let you use code written for the old approach inside a module +that otherwise uses the new approach. I don't believe that we +want to encourage that sort of thing: it is MUCH easier when +reading a module if all of it follows the same rule. + +It is also easier for an Erlang compiler that expects to be able +to process function definitions in any order. The compiler can +check for one of these directives anywhere in a module before it +handles any bit syntax forms anywhere. However, it is easier for +people reading a module if, when they first see a `<<...>>` +construction, they have already seen any directive that might +affect what it means. + +The restrictions on the number and placement of these directives +can always be relaxed later if necessary. Backwards Compatibility +======================= - All existing Erlang code remains acceptable with unchanged - semantics. +All existing Erlang code remains acceptable with unchanged semantics. Reference Implementation +======================== - None, because I still can't find my way around the compiler. - - - -References - - None. +None, because I still can't find my way around the compiler. Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0023.md b/eeps/eep-0023.md index d042ab7..54118a6 100644 --- a/eeps/eep-0023.md +++ b/eeps/eep-0023.md @@ -1,120 +1,126 @@ -EEP: 23 -Title: Allow variables in fun M:F/A -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe -Status: Draft -Type: Standards Track -Erlang-Version: R12B-4 -Content-Type: text/plain -Created: 08-Aug-2008 -Post-History: + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Erlang-Version: R12B-4 + Created: 08-Aug-2008 + Post-History: +**** +EEP 23: Allow variables in `fun M:F/A` +---- Abstract +======== - fun M:F/A should allow M, F, and/or A to be a variable. +`fun M:F/A` should allow `M`, `F`, and/or `A` to be a variable. Specification +============= - The form fun M:F/A currently requires M to be an atom, - F to be an atom, and A to be a non-negative integer. - This is generalised to allow any or all of them to be - variables. +The form `fun M:F/A` currently requires `M` to be an atom, +`F` to be an atom, and `A` to be a non-negative integer. +This is generalised to allow any or all of them to be +variables. Motivation +========== - Representing functions by tuples {M,F,A} is now deprecated. - Yet there are times when some of this information is not - available until run time. For example, a behaviour's author - might wish to refer to the start/0 function in the Callback - module, but there might be any number of Callback modules at - run time. +Representing functions by tuples `{M,F,A}` is now deprecated. +Yet there are times when some of this information is not +available until run time. For example, a behaviour's author +might wish to refer to the `start/0` function in the Callback +module, but there might be any number of Callback modules at +run time. - It is absurd that the module name and function name in a call - of the form M:F(E1, ..., En) may be either atoms or variables, - but that they may not be variables in fun M:F/A. +It is absurd that the module name and function name in a call +of the form `M:F(E1, ..., En)` may be either atoms or variables, +but that they may not be variables in `fun M:F/A`. - It turns out that fun M:F/A is currently implemented as a call - to erlang:make_fun(M, F, A), so the ability to create such - funs given run-time data already exists. All that is missing - is to wrap some syntax around it. +It turns out that `fun M:F/A` is currently implemented as a call +to `erlang:make_fun(M, F, A)`, so the ability to create such +funs given run-time data already exists. All that is missing +is to wrap some syntax around it. Rationale - - The gap that's being filled here is one that has been felt in - practice. See a September 2008 thread in the Erlang mailing list. - The proposal generalises an existing form, but not more than the - existing function call syntax has already been generalised. - - It is perhaps the limits of this proposal that need explaining. - - First, the extension is from constants to constants or variables, - not to arbitrary expressions. This is mainly to avoid confusing - parsers and people. The effect of fun (E1):(E2)/(E3) can be had - by writing M = E1, F = E2, A = E3, fun M:F/A, so there is no loss - of expressiveness. Since Erlang's equivalent of a lambda form - begins with "fun (", fun (E1):... would be tricky to parse and - very confusing to people. - - Second, the extension is for fun M:F/A only, and not for fun F/A. - That's because there is no erlang:make_fun/2 to call; the - implementation of fun F/A is surprisingly tricky and involves - creating a special-purpose glue function. For many purposes, - fun ?MODULE:F/A will serve instead. +========= + +The gap that's being filled here is one that has been felt in +practice. See a September 2008 thread in the Erlang mailing list. +The proposal generalises an existing form, but not more than the +existing function call syntax has already been generalised. + +It is perhaps the limits of this proposal that need explaining. + +First, the extension is from constants to constants or variables, +not to arbitrary expressions. This is mainly to avoid confusing +parsers and people. The effect of `fun (E1):(E2)/(E3)` can be had +by writing `M = E1, F = E2, A = E3, fun M:F/A`, so there is no loss +of expressiveness. Since Erlang's equivalent of a lambda form +begins with "`fun (`", `fun (E1):...` would be tricky to parse and +very confusing to people. + +Second, the extension is for `fun M:F/A` only, and not for `fun F/A`. +That's because there is no `erlang:make_fun/2` to call; the +implementation of `fun F/A` is surprisingly tricky and involves +creating a special-purpose glue function. For many purposes, +`fun ?MODULE:F/A` will serve instead. Backwards Compatibility +======================= - All existing Erlang code remains acceptable with unchanged - semantics. No new functions or instructions are added, so - BEAM files produced with the new parser will work in older - releases. +All existing Erlang code remains acceptable with unchanged +semantics. No new functions or instructions are added, so +BEAM files produced with the new parser will work in older +releases. Reference Implementation +======================== - The auxiliary file eep-0023-1.diff - is a patch file to be applied to erl_parse.yrl. - The patched file has been checked by yecc, which is happy - with it, and the resulting .erl file compiles cleanly. - However, that's all the testing that has been done. +The auxiliary file [eep-0023-1.diff][] +is a patch file to be applied to erl_parse.yrl. +The patched file has been checked by yecc, which is happy +with it, and the resulting .erl file compiles cleanly. +However, that's all the testing that has been done. - All that the implementation does is to - (a) accept fun M:F/A where M, F, and A are constants or - variables, - (b) generate the same abstract syntax term that was - generated in the past when they are all constants, - (c) pretend that erlang:make_fun(M, F, A) had been written - when at least one is a variable. - Only the parser is involved. +All that the implementation does is to +1. accept `fun M:F/A` where `M`, `F`, and `A` are constants or + variables, +2. generate the same abstract syntax term that was + generated in the past when they are all constants, +3. pretend that `erlang:make_fun(M, F, A)` had been written + when at least one is a variable. +Only the parser is involved. -References - - None. + + +[eep-0023-1.diff]: eep-0023-1.diff + "Diff to apply to erl_parse.yrl" Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0024.md b/eeps/eep-0024.md index 8ffef52..71e342d 100644 --- a/eeps/eep-0024.md +++ b/eeps/eep-0024.md @@ -1,150 +1,154 @@ -EEP: 24 -Title: Functions may be named using F/N in all module attributes -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe -Status: Draft -Type: Standards Track -Erlang-Version: R12B-4 -Content-Type: text/plain -Created: 22-Sep-2008 -Post-History: + Author: Richard A. O'Keefe + Status: Final/R12B-5 Proposal is implemented in OTP release R12B-5 + Type: Standards Track + Erlang-Version: R12B-4 + Created: 22-Sep-2008 + Post-History: +**** +EEP 24: Functions may be named using `F/N` in all module attributes +---- Abstract +======== - Programmers will be allowed to name functions using the - F/N form (currently restricted to) -export and -import - in any module attribute. The parser will convert this - to the existing {F,N} form so that downstream tools will - be unaffected. +Programmers will be allowed to name functions using the +`F/N` form (currently restricted to) `-export` and `-import` +in any module attribute. The parser will convert this +to the existing `{F,N}` form so that downstream tools will +be unaffected. Specification +============= - In any module attribute the form F/N (where F is an atom and N is - a non-negative integer) should be converted to {F,N}, provided - that it is not in an expression that would be evaluated. +In any module attribute the form `F/N` (where `F` is an atom and `N` is +a non-negative integer) should be converted to `{F,N}`, provided +that it is not in an expression that would be evaluated. - Other occurrences of X/Y are not addressed by this EEP. - In particular, occurrences of X/Y in -record or -enum - declarations would be evaluated, so are not affected. +Other occurrences of `X/Y` are not addressed by this EEP. +In particular, occurrences of `X/Y` in `-record` or `-enum` +declarations would be evaluated, so are not affected. Motivation - - Compare - - -compile({inline, - [{ukeymerge3_12,13}, {ukeymerge3_21,13}, - {rukeymerge3_12a,11}, {rukeymerge3_21a,13}, - {rukeymerge3_12b,12}, {rukeymerge3_21b,12}]}). - - -deprecated( - [{new_set,0},{set_to_list,1},{list_to_set,1},{subset,2}]). - - with - - -compile({inline, [ - rukeymerge3_12a/11, - rukeymerge3_12b/12, - rukeymerge3_21a/13, - rukeymerge3_21b/12, - ukeymerge3_12/13, - ukeymerge3_21/13]}). - - -deprecated([ - list_to_set/1, - new_set/0, - set_to_list/1, - subset/2]). - - The improvement in readability is noteworthy, especially if - authors switch to the Prolog practice of putting one F/N form - per line in alphabetic order in such lists. - - The improvement in consistency is worth having: it's no longer a - case of new_set/0 in an -export or -import module attribute but - {new_set,0} in a -deprecated module attribute, it's the same in - all module attributes, making it easier to find those that mention - a particular function. +========== + +Compare + + -compile({inline, + [{ukeymerge3_12,13}, {ukeymerge3_21,13}, + {rukeymerge3_12a,11}, {rukeymerge3_21a,13}, + {rukeymerge3_12b,12}, {rukeymerge3_21b,12}]}). + + -deprecated( + [{new_set,0},{set_to_list,1},{list_to_set,1},{subset,2}]). + +with + + -compile({inline, [ + rukeymerge3_12a/11, + rukeymerge3_12b/12, + rukeymerge3_21a/13, + rukeymerge3_21b/12, + ukeymerge3_12/13, + ukeymerge3_21/13]}). + + -deprecated([ + list_to_set/1, + new_set/0, + set_to_list/1, + subset/2]). + +The improvement in readability is noteworthy, especially if +authors switch to the Prolog practice of putting one `F/N` form +per line in alphabetic order in such lists. + +The improvement in consistency is worth having: it's no longer a +case of `new_set/0` in an `-export` or `-import` module attribute but +`{new_set,0}` in a `-deprecated` module attribute, it's the same in +all module attributes, making it easier to find those that mention +a particular function. Rationale +========= - Module attributes that contain real expressions, such as -record - (and, if it is accepted, -enum) require a certain amount of care. - I did consider allowing the F/N notation everywhere; after all, - an atom cannot be divided by an integer. However, with the - 'fun F/N' form available, there are these days very few occasions - to refer to a function as {F,N} in an expression. +Module attributes that contain real expressions, such as `-record` +(and, if it is accepted, `-enum`) require a certain amount of care. +I did consider allowing the `F/N` notation everywhere; after all, +an atom cannot be divided by an integer. However, with the +`fun F/N` form available, there are these days very few occasions +to refer to a function as `{F,N}` in an expression. - Otherwise, F/N occurrences in -export and -import attributes are - currently converted to tuples (by farity_list), so this is just a - small matter of extending the notion elsewhere. I cannot imagine - why this wasn't done years ago. +Otherwise, `F/N` occurrences in `-export` and `-import` attributes are +currently converted to tuples (by farity_list), so this is just a +small matter of extending the notion elsewhere. I cannot imagine +why this wasn't done years ago. Backwards Compatibility +======================= - There are currently no attributes where F/N is accepted, - is not part of an expression to be evaluated, and does not - signify a function, and those where it does signify a function - already treat it as an {F,N} tuple. +There are currently no attributes where `F/N` is accepted, +is not part of an expression to be evaluated, and does not +signify a function, and those where it does signify a function +already treat it as an `{F,N}` tuple. - No existing source code can be affected. +No existing source code can be affected. - Progams using home-brew front ends instead of the Erlang - syntax tools, such as ones that want to preserve white - space, comments, and so on, will have to be extended by - their maintainers to recognise the new form. It is - already the case that {fred,3} may be written in two - different ways in Erlang source form: {fred,+3} is also - allowed. So such programs already have to cope with - multiple source forms with the same abstract form, and - this merely adds one more variant. +Progams using home-brew front ends instead of the Erlang +syntax tools, such as ones that want to preserve white +space, comments, and so on, will have to be extended by +their maintainers to recognise the new form. It is +already the case that `{fred,3}` may be written in two +different ways in Erlang source form: `{fred,+3}` is also +allowed. So such programs already have to cope with +multiple source forms with the same abstract form, and +this merely adds one more variant. - Programs generating Erlang source code should some day - be revised to generate the new form, but since the old form - is not being removed and not (in order to preserve the - value of recent books) even being deprecated, need not be. +Programs generating Erlang source code should some day +be revised to generate the new form, but since the old form +is not being removed and not (in order to preserve the +value of recent books) even being deprecated, need not be. Reference Implementation +======================== - A single clause needs to be added to the normalise/1 - function in the parse.yrl file: +A single clause needs to be added to the `normalise/1` +function in the parse.yrl file: %% Name/Arity case normalise({op,_,'/',{atom,_,F},{integer,_,I}}) when I >= 0 -> {F,I}; - just before the final clause, which raises an exception. - A context diff is provided (eep-0024-1.diff). +just before the final clause, which raises an exception. +A context diff is provided [eep-0024-1.diff][]. -References - - None. +[eep-0024-1.diff]: eep-0024-1.diff + "Diff to apply to parse.yrl" Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0025.md b/eeps/eep-0025.md index e0bfccf..8f24e43 100644 --- a/eeps/eep-0025.md +++ b/eeps/eep-0025.md @@ -1,57 +1,62 @@ -EEP: 25 -Title: Unnesting cases -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe -Status: Draft -Type: Standards Track -Erlang-Version: R12B-4 -Content-Type: text/plain -Created: 28-Nov-2008 -Post-History: + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Erlang-Version: R12B-4 + Created: 28-Nov-2008 + Post-History: +**** +EEP 25: Unnesting cases +---- Abstract +======== - Erlang 'case' expressions should adopt/adapt an idea from - Algol 68 that in Erlang would strictly generalise 'cond'. +Erlang 'case' expressions should adopt/adapt an idea from +Algol 68 that in Erlang would strictly generalise 'cond'. Specification +============= - Currently a 'case' expression has the form +Currently a 'case' expression has the form - 'case' Expression 'of' - Pattern ['when' Guard] '->' Expression - {';' Pattern ['when' Guard] '->' Expression}... - 'end' + 'case' Expression 'of' + Pattern ['when' Guard] '->' Expression + {';' Pattern ['when' Guard] '->' Expression}... + 'end' - It is well known that Algol 68 had - if .. then .. {elif .. then ..}... [else ..] fi - expressions. It is less well known that it had a similar - construction for case expression, - case .. in ... {ouse .. in ..}... [out ..] esac - where "ouse" (from "OUt caSE") let you iterate the case - matching process and only need one 'esac'. +It is well known that Algol 68 had - This proposal adopts the Algol 68 idea. - The revised form is + if .. then .. {elif .. then ..}... [else ..] fi - 'case' Expression 'of' - Pattern ['when' Guard] '->' Expression - {';' Pattern ['when' Guard] '->' Expression}... - {';' 'or' 'case' Expression 'of' - Pattern ['when' Guard] '->' Expression - {';' Pattern ['when' Guard] '->' Expression}...}... - 'end' +expressions. It is less well known that it had a similar +construction for case expression, + + case .. in ... {ouse .. in ..}... [out ..] esac + +where "ouse" (from "OUt caSE") let you iterate the case +matching process and only need one 'esac'. + +This proposal adopts the Algol 68 idea. +The revised form is + + 'case' Expression 'of' + Pattern ['when' Guard] '->' Expression + {';' Pattern ['when' Guard] '->' Expression}... + {';' 'or' 'case' Expression 'of' + Pattern ['when' Guard] '->' Expression + {';' Pattern ['when' Guard] '->' Expression}...}... + 'end' Motivation +========== - Consider this example: +Consider this example: suffix(P, Suffix, List) when is_function(P, 2), is_list(Suffix) -> @@ -66,7 +71,7 @@ Motivation end end. - With this proposal we could write +With this proposal we could write suffix_loop(P, Suffix, List) -> case equal(P, Suffix, List) @@ -76,122 +81,122 @@ Motivation ; [] -> false end. - where all the alternatives to be selected have the same - indentation. - - The old proposal for a Lisp-like 'cond' is no longer really - needed. Instead of - - cond - C1 -> B1 - ; C2 -> B2 - ... - ; Cn -> Bn - end - - one writes - - case C1 of true -> B1 - ; or case C2 of true -> B2 - ... - ; or case Cn of true -> Bn - end - - What one loses here is the check that a result that is not - 'true' must be 'false', but that job can these days be done - by the Dialyzer. This is certainly clumsier than 'cond', - but it achieves the main aim, that of selecting from a bunch - of choices at the same logical (and therefore at the same - indentation) level by means of a series of Boolean-valued - expressions, but it is strictly more general. It allows you - to combine Boolean-valued expressions with guards (including - any future generalisations of guards), and it allows you to - make a choice based on any kind of pattern matching, not just - Boolean. - - This is clumsier than 'cond', but over-using Boolean when some - more intention-revealing enumeration should be used is an - anti-pattern that has been recognised for over 20 years. If - 'cond' existed, there would be a strong pressure for people - to write functions that return a Boolean result when something - else might be more useful, just so they could use 'cond'. - As an example, suppose that we want to continue if the voltage - is nominal, shut the device off if the voltage is low and there - is not an emergency, or set the speed slow if the voltage is - low and there is an emergency. - - With cond: - cond voltage_nominal() -> continue_operations() - ; in_emergency() -> set_speed_slow() - ; true -> shut_device_down() - end - - With case: - case voltage() of nominal -> continue_operations() - ; or case status() of emergency -> set_speed_slow() - ; normal -> shut_device_down() - end - - When expressed this way, I for one find it easier to realise - that "low" is not the opposite of "nominal"; a voltage that is - not nominal might be high. So we really should have - - case voltage() of nominal -> continue_operations() - ; high -> WHAT DO WE DO HERE? - ; or case status() of emergency -> set_speed_slow() - ; normal -> shut_device_down() - end - - So an approach that gives you the "flat" structure of 'cond' - while subtly encouraging the multiway thinking of 'case' has - merit. You could say that I am not so much for 'ouse' as - against 'cond' and over-use of Boolean. +where all the alternatives to be selected have the same +indentation. + +The old proposal for a Lisp-like 'cond' is no longer really +needed. Instead of + + cond + C1 -> B1 + ; C2 -> B2 + ... + ; Cn -> Bn + end + +one writes + + case C1 of true -> B1 + ; or case C2 of true -> B2 + ... + ; or case Cn of true -> Bn + end + +What one loses here is the check that a result that is not +'true' must be 'false', but that job can these days be done +by the Dialyzer. This is certainly clumsier than 'cond', +but it achieves the main aim, that of selecting from a bunch +of choices at the same logical (and therefore at the same +indentation) level by means of a series of Boolean-valued +expressions, but it is strictly more general. It allows you +to combine Boolean-valued expressions with guards (including +any future generalisations of guards), and it allows you to +make a choice based on any kind of pattern matching, not just +Boolean. + +This is clumsier than 'cond', but over-using Boolean when some +more intention-revealing enumeration should be used is an +anti-pattern that has been recognised for over 20 years. If +'cond' existed, there would be a strong pressure for people +to write functions that return a Boolean result when something +else might be more useful, just so they could use 'cond'. +As an example, suppose that we want to continue if the voltage +is nominal, shut the device off if the voltage is low and there +is not an emergency, or set the speed slow if the voltage is +low and there is an emergency. + +With cond: + + cond voltage_nominal() -> continue_operations() + ; in_emergency() -> set_speed_slow() + ; true -> shut_device_down() + end + +With case: + + case voltage() of nominal -> continue_operations() + ; or case status() of emergency -> set_speed_slow() + ; normal -> shut_device_down() + end + +When expressed this way, I for one find it easier to realise +that "low" is not the opposite of "nominal"; a voltage that is +not nominal might be high. So we really should have + + case voltage() of nominal -> continue_operations() + ; high -> WHAT DO WE DO HERE? + ; or case status() of emergency -> set_speed_slow() + ; normal -> shut_device_down() + end + +So an approach that gives you the "flat" structure of 'cond' +while subtly encouraging the multiway thinking of 'case' has +merit. You could say that I am not so much for 'ouse' as +against 'cond' and over-use of Boolean. Rationale +========= - I read one too many "why doesn't Erlang have an if" e-message, - and suddently remember "Algol 68 could do that with 'case'". +I read one too many "why doesn't Erlang have an if" e-message, +and suddently remember "Algol 68 could do that with 'case'". - The main issue is how to spell 'ouse' in Erlang. My first - preference was for 'or case', but that can't work. I do not - love "; or case", and would be very happy to see something - better. Indeed, "; case" might do the job, I just felt that - that was a bit too error-prone. +The main issue is how to spell 'ouse' in Erlang. My first +preference was for 'or case', but that can't work. I do not +love "; or case", and would be very happy to see something +better. Indeed, "; case" might do the job, I just felt that +that was a bit too error-prone. Backwards Compatibility +======================= - All existing Erlang code remains acceptable with unchanged - semantics. The implementation will be entirely in the parser, - so even tools that examine ASTs will be unaffected. +All existing Erlang code remains acceptable with unchanged +semantics. The implementation will be entirely in the parser, +so even tools that examine ASTs will be unaffected. Reference Implementation +======================== - None yet. It will be entirely in the parser. - - - -References - - None. +None yet. It will be entirely in the parser. Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0026.md b/eeps/eep-0026.md index a743fd1..d5f6643 100644 --- a/eeps/eep-0026.md +++ b/eeps/eep-0026.md @@ -1,292 +1,299 @@ -EEP: 26 -Title: Make andalso and orelse tail-recursive -Version: $Revision$ -Last-Modified: $Date$ -Author: Bjorn Gustavsson -Status: Draft -Type: Standards Track -Erlang-Version: R12B-5 -Content-Type: text/plain -Created: 28-Jan-2009 -Post-History: + Author: Björn Gustavsson + Status: Accepted/R13A Proposal is to be implemented in OTP release R13A + Type: Standards Track + Erlang-Version: R12B-5 + Created: 28-Jan-2009 + Post-History: +**** +EEP 26: Make andalso and orelse tail-recursive +---- Abstract +======== - Erlang 5.1 added the ability to use 'andalso', 'orelse', - 'and', and 'or' in guards. However, the semantics for - 'andalso' and 'orelse' differs from that in other related - languages, causing confusion and inefficiency. +Erlang 5.1 added the ability to use 'andalso', 'orelse', +'and', and 'or' in guards. However, the semantics for +'andalso' and 'orelse' differs from that in other related +languages, causing confusion and inefficiency. + +I propose making 'andalso' and 'orelse' tail-recursive. + +This EEP is partly based on Richard O'Keefe's [EEP 17][], +but has a narrower scope. - I propose making 'andalso' and 'orelse' tail-recursive. - This EEP is partly based on Richard O'Keefe's EEP 17, - but has a narrower scope. Specification +============= - Currently, (E1 andalso E2) as an expression acts like +Currently, `(E1 andalso E2)` as an expression acts like - case E1 of - false -> false; - true -> case E2 of - false -> false; - true -> true - end - end + case E1 of + false -> false; + true -> case E2 of + false -> false; + true -> true + end + end - except that the former raises {badarg,NonBool} exceptions and the - latter raises {case_clause,NonBool} ones. +except that the former raises `{badarg,NonBool}` exceptions and the +latter raises `{case_clause,NonBool}` ones. - This should be changed to +This should be changed to - case E1 of - false -> false; - true -> E2 - end. + case E1 of + false -> false; + true -> E2 + end. - Currently, (E1 orelse E2) as an expression acts like +Currently, `(E1 orelse E2)` as an expression acts like - case E1 of - true -> true - false -> case E2 of - true -> true - false -> false - end + case E1 of + true -> true + false -> case E2 of + true -> true + false -> false end + end - except that the former raises {badarg,NonBool} exceptions and the - latter raises {case_clause,NonBool} ones. +except that the former raises `{badarg,NonBool}` exceptions and the +latter raises `{case_clause,NonBool}` ones. - This should be changed to +This should be changed to - case E1 of - true -> true; - false -> E2 - end + case E1 of + true -> true; + false -> E2 + end Motivation +========== + +To unlock the full potential of 'andalso'/'orelse' in Erlang. + +Given the current implementation, you either have to make +rewrite code that is naturally written using AND and OR +operators using 'case', or only use 'andalso'/'orelse' when +you know that your lists are relatively short. + +For instance, the function `all/2` that returns 'true' if +all elements of a list satisfies a predicate and 'false' +otherwise, can be written like this: + + all(Pred, [Hd|Tail]) -> + Pred(Hd) and all(Pred, Tail); + all(_, []) -> + true. + +In each recursion, we test that the current element Hd +satisfies the predicate AND that the rest of the list also +matches the predicate. The code reads almost like English. + +Of course, 'and' evaluates both of its operand, so the entire +list will be traversed even if the first element of the list +fails to satisfy the predicate. Furthermore, 'and' is not +tail-recursive, so the function will use stack space +proportional to the length of the list. + +To avoid the traversing the rest of the list if one element +fails to satisfy the predicate, we can use 'andalso': + + all(Pred, [Hd|Tail]) -> + Pred(Hd) andalso all(Pred, Tail); + all(_, []) -> + true. + +As soon as `Pred(Hd)` returns false, the recursion will +stop and the rest of the list need not be traversed. +Since 'andalso' is not tail-recursive, however, the +function will need stack space proportional to the number +of list elements that are traversed. + +To see more clearly that 'andalso' is not tail-recursive, +here is `all/1` with 'andalso' expanded out to a nested +'case' expression (as it would be in R12B-5): + + all(Pred, [Hd|Tail]) -> + case Pred(Hd) of + false -> false; + true -> case all(Pred, Tail) of + false -> false; + true -> true + end + end; + all(_, []) -> + true. + +To make `all/1` tail-recursive in R12B-5, you would have +to write a 'case' expression yourself: + + all(Pred, [Hd|Tail]) -> + case Pred(Hd) of + false -> false; + true -> all(Pred, Tail) + end; + all(_, []) -> + true. + +If this EEP is accepted, in R13B we could write like +this + + all(Pred, [Hd|Tail]) -> + Pred(Hd) andalso all(Pred, Tail); + all(_, []) -> + true. + +and the `all/1` function would be tail-recursive. + +In my opinion, the latter is easier to read and write. +The 'case' expression is mostly boiler-plate code +where 'true' and 'false' must be correctly spelled +several times. (Misspellings like 'ture' and 'flase' +are quite common, but are in most cases found the +first time the program is tested.) + +It could be argued that because Erlang has clearly defined truth +values (unlike some other languages where 0 is false and +everything else true), all operators that operate on booleans +should make sure that their arguments are booleans. + +Testing both arguments of 'and' and 'or' makes +sense, because the code executed for those operators always GETS +the values of both operands. But 'andalso' and 'orelse' only test +their second operand SOME of the time. + + X = 1, X >= 0 andalso X % checked error + X = 1, X < 0 andalso X % unchecked error + +There doesn't seem to be much point in checking SOME of the time, +especially when it does something as dramatic as blocking tail +recursion. + +Richard O'Keefe's motivation in [EEP 17][] is "Cultural consistency" +with other languages. See [EEP 17][]. - To unlock the full potential of 'andalso'/'orelse' in Erlang. - Given the current implementation, you either have to make - rewrite code that is naturally written using AND and OR - operators using 'case', or only use 'andalso'/'orelse' when - you know that your lists are relatively short. - For instance, the function 'all/2' that returns 'true' if - all elements of a list satisfies a predicate and 'false' - otherwise, can be written like this: +Rationale +========= - all(Pred, [Hd|Tail]) -> - Pred(Hd) and all(Pred, Tail); - all(_, []) -> - true. +Surprisingly (for me), the subject of this EEP turned out to +be controversial. - In each recursion, we test that the current element Hd - satisfies the predicate AND that the rest of the list also - matches the predicate. The code reads almost like English. +I will start this rationale by listing some of the more serious +arguments against this proposal and my counter-arguments, and +finish with the arguments for this proposal. - Of course, 'and' evaluates both of its operand, so the entire - list will be traversed even if the first element of the list - fails to satisfy the predicate. Furthermore, 'and' is not - tail-recursive, so the function will use stack space - proportional to the length of the list. +One argument against is to be that the new construct +will be confusing for users. 'andalso'/'orelse' can no longer +be described as a "boolean operator", but is now a "control +structure". - To avoid the traversing the rest of the list if one element - fails to satisfy the predicate, we can use 'andalso': +Yes, 'andalso'/'orelse' is no longer a boolean operator in the +sense that it no longer GUARANTEES that it returns a boolean. +However, using 'andalso'/'orelse' as a 'case' expression - all(Pred, [Hd|Tail]) -> - Pred(Hd) andalso all(Pred, Tail); - all(_, []) -> - true. - - As soon as 'Pred(Hd)' returns false, the recursion will - stop and the rest of the list need not be traversed. - Since 'andalso' is not tail-recursive, however, the - function will need stack space proportional to the number - of list elements that are traversed. - - To see more clearly that 'andalso' is not tail-recursive, - here is 'all/1' with 'andalso' expanded out to a nested - 'case' expression (as it would be in R12B-5): - - all(Pred, [Hd|Tail]) -> - case Pred(Hd) of - false -> false; - true -> case all(Pred, Tail) of - false -> false; - true -> true - end - end; - all(_, []) -> - true. - - To make 'all/1' tail-recursive in R12B-5, you would have - to write a 'case' expression yourself: - - all(Pred, [Hd|Tail]) -> - case Pred(Hd) of - false -> false; - true -> all(Pred, Tail) - end; - all(_, []) -> - true. - - If this EEP is accepted, in R13B we could write like - this - - all(Pred, [Hd|Tail]) -> - Pred(Hd) andalso all(Pred, Tail); - all(_, []) -> - true. + case E1 orelse E2 of + true -> ....; + false -> ... + end - and the 'all/1' function would be tail-recursive. +works in the same way as before. Most users certainly will not +notice any difference. And if an operator is not allowed to not +evaluate both of its arguments, it certainly wasn't an operator +before either. - In my opinion, the latter is easier to read and write. - The 'case' expression is mostly boiler-plate code - where 'true' and 'false' must be correctly spelled - several times. (Misspellings like 'ture' and 'flase' - are quite common, but are in most cases found the - first time the program is tested.) +Another argument against is that 'andalso'/'orelse' can be +used in one-liners to write "ugly code", such as - It could be argued that because Erlang has clearly defined truth - values (unlike some other languages where 0 is false and - everything else true), all operators that operate on booleans - should make sure that their arguments are booleans. + Debug andalso io:format("...", [...]) - Testing both arguments of 'and' and 'or' makes - sense, because the code executed for those operators always GETS - the values of both operands. But 'andalso' and 'orelse' only test - their second operand SOME of the time. +instead of - X = 1, X >= 0 andalso X % checked error - X = 1, X < 0 andalso X % unchecked error + if + Debug -> io:format("...", [...]); + true -> ok + end + +The code might be "ugly" (according to someone's taste or +some definition of "ugly"), but the one-liner is not hard +to understand and I don't see how it could turn into a +code-maintenance problem. - There doesn't seem to be much point in checking SOME of the time, - especially when it does something as dramatic as blocking tail - recursion. +The main argument for making 'andalso'/'orelse' tail-recursive: +The current implementation is dangerous. You could very easily +write non-tail-recursive code, for instance - Richard O'Keefe's motivation in EEP 17 is "Cultural consistency" - with other languages. See [1]. + all(Pred, [Hd|Tail]) -> + Pred(Hd) andalso all(Pred, Tail); + all(_, []) -> + true. -Rationale +without realizing it and introduce serious performance +problems. (Which has happened in [practice][2]). - Surprisingly (for me), the subject of this EEP turned out to - be controversial. - - I will start this rationale by listing some of the more serious - arguments against this proposal and my counter-arguments, and - finish with the arguments for this proposal. - - One argument against is to be that the new construct - will be confusing for users. 'andalso'/'orelse' can no longer - be described as a "boolean operator", but is now a "control - structure". - - Yes, 'andalso'/'orelse' is no longer a boolean operator in the - sense that it no longer GUARANTEES that it returns a boolean. - However, using 'andalso'/'orelse' as a 'case' expression - - case E1 orelse E2 of - true -> ....; - false -> ... - end - - works in the same way as before. Most users certainly will not - notice any difference. And if an operator is not allowed to not - evaluate both of its arguments, it certainly wasn't an operator - before either. - - Another argument against is that 'andalso'/'orelse' can be - used in one-liners to write "ugly code", such as - - Debug andalso io:format("...", [...]) - - instead of - - if - Debug -> io:format("...", [...]); - true -> ok - end - - The code might be "ugly" (according to someone's taste or - some definition of "ugly"), but the one-liner is not hard - to understand and I don't see how it could turn into a - code-maintenance problem. - - The main argument for making 'andalso'/'orelse' tail-recursive: - The current implementation is dangerous. You could very easily - write non-tail-recursive code, for instance - - all(Pred, [Hd|Tail]) -> - Pred(Hd) andalso all(Pred, Tail); - all(_, []) -> - true. - - without realizing it and introduce serious performance - problems. (Which has happened in practice, see [2]). - - If you cannot use 'andalso'/'orelse' in this way, these - operators become pretty useless. (Some would say - "utterly useless" - see [2].) You have to rewrite - beautiful code (in my opinion) to uglier code (in - comparison, in my opinion) and more error-prone - code (misspelling of 'true'/'false' in the boiler-plate - code): - - all(Pred, [Hd|Tail]) -> - case Pred(Hd) of - false -> false; - true -> all(Pred, Tail) - end; - all(_, []) -> - true. - +If you cannot use 'andalso'/'orelse' in this way, these +operators become pretty useless. (Some would say +["utterly useless"][2].) You have to rewrite +beautiful code (in my opinion) to uglier code (in +comparison, in my opinion) and more error-prone +code (misspelling of 'true'/'false' in the boiler-plate +code): -Backwards Compatibility + all(Pred, [Hd|Tail]) -> + case Pred(Hd) of + false -> false; + true -> all(Pred, Tail) + end; + all(_, []) -> + true. - Any code that ran without raising exceptions will continue - to produce the same results, except for running faster. - Code that did raise exceptions may raise different exceptions - elsewhere later, or may quietly complete in unexpected ways. - I believe it to be unlikely that anyone deliberately relied - on (E1 andalso 0) raising an exception. - Code that was previously broken because these operators have - such surprising behavior will now work in more cases. +Backwards Compatibility +======================= +Any code that ran without raising exceptions will continue +to produce the same results, except for running faster. +Code that did raise exceptions may raise different exceptions +elsewhere later, or may quietly complete in unexpected ways. +I believe it to be unlikely that anyone deliberately relied +on `(E1 andalso 0)` raising an exception. -Reference Implementation +Code that was previously broken because these operators have +such surprising behavior will now work in more cases. - The proposed change has been implemented and run in our - daily builds without finding any code in Erlang/OTP that - needed to be updated. One test case in the compiler test - suite that that test 'andalso'/'orelse' needed to be updated. -References - - [1] Richard O'Keefe: EEP 17 - Fix andalso and orelse. - http://www.erlang.org/eeps/eep-0017.html +Reference Implementation +======================== - [2] Mikael Pettersson: e-mail to erlang-questions: - http://www.erlang.org/pipermail/erlang-questions/2008-November/039935.html +The proposed change has been implemented and run in our +daily builds without finding any code in Erlang/OTP that +needed to be updated. One test case in the compiler test +suite that that test 'andalso'/'orelse' needed to be updated. -Copyright + +[EEP 17]: eep-0017.md + "Richard O'Keefe: EEP 17 - Fix andalso and orelse" - This document has been placed in the public domain. +[2]: http://www.erlang.org/pipermail/erlang-questions/2008-November/039935.html + "Mikael Pettersson: e-mail to erlang-questions" -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +Copyright +========= + +This document has been placed in the public domain. +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0027.md b/eeps/eep-0027.md index fe195e5..aa72ce9 100644 --- a/eeps/eep-0027.md +++ b/eeps/eep-0027.md @@ -1,104 +1,118 @@ -EEP: 27 -Title: Multi-Parameter Typechecking BIFs -Version: $Revision$ -Last-Modified: $Date$ -Author: James Hague [james (dot) hague (at) gmail (dot) com] -Status: Draft -Type: Standards Track -Content-Type: text/plain -Created: 18-Feb-2009 -Post-History: + Author: James Hague + Status: Draft + Type: Standards Track + Created: 18-Feb-2009 + Post-History: +**** +EEP 27: Multi-Parameter Typechecking BIFs +---- + Abstract +======== - Typechecking guards (e.g., is_float/1) are useful for a number of - reasons, but they're verbose. I propose allowing multiple - parameters to the "is_" famility of functions, which - significantly reduces source code bulk in common cases. +Typechecking guards (e.g., `is_float/1`) are useful for a number of +reasons, but they're verbose. I propose allowing multiple +parameters to the `is_` famility of functions, which +significantly reduces source code bulk in common cases. + + Specification - - Where "is_type" represents any of the "is_" family of functions, - such as "is_float": +============= + +Where `is_type` represents any of the `is_` family of functions, +such as `is_float`: + +`is_type(A, B, C, ...)` is equivalent to `(is_type(A) andalso +is_type(B) andalso is_type(C)...)`. + +The is_type functions can now take from 1 to N parameters, where +N is the implementation defined limit on function arity. + +The old-style guards (e.g., `float/1`) would not change, as some of +those serve double duty as typecasts. + +Direct references to these functions in the erlang module are for +the single parameter versions only (such as `fun +erlang:is_float/1`). - is_type(A, B, C, ...) is equivalent to "(is_type(A) andalso - is_type(B) andalso is_type(C)...)". - - The is_type functions can now take from 1 to N parameters, where - N is the implementation defined limit on function arity. - The old-style guards (e.g., float/1) would not change, as some of - those serve double duty as typecasts. - - Direct references to these functions in the erlang module are for - the single parameter versions only (such as fun - erlang:is_float/1). Motivation +========== + +I find myself adding typechecking guards not only for safety, but +to improve code generation quality, especially when using floats. +Writing three or four element vector math functions in Erlang, +with `is_float` guards, is verbose. The `is_float` checks dwarf what +would otherwise be a single-line function by adding multiple lines +of guards. + - I find myself adding typechecking guards not only for safety, but - to improve code generation quality, especially when using floats. - Writing three or four element vector math functions in Erlang, - with is_float guards, is verbose. The is_float checks dwarf what - would otherwise be a single-line function by adding multiple lines - of guards. Rationale +========= - Here's an example from the Wings3D project: +Here's an example from the Wings3D project: cross({V10,V11,V12}, {V20,V21,V22}) when is_float(V10), is_float(V11), is_float(V12), is_float(V20), is_float(V21), is_float(V22) -> {V11*V22-V12*V21,V12*V20-V10*V22,V10*V21-V11*V20}. - The is_float checks significantly improve the quality of the - generated code, allowing floats to be kept in virtual machine - registers instead of allocated on the heap. If multiple - parameters to is_float were allowed, this code could be - rewritten as: +The `is_float` checks significantly improve the quality of the +generated code, allowing floats to be kept in virtual machine +registers instead of allocated on the heap. If multiple +parameters to `is_float` were allowed, this code could be +rewritten as: cross({V10,V11,V12}, {V20,V21,V22}) when is_float(V10,V11,V12,V20,V21,V22) -> {V11*V22-V12*V21,V12*V20-V10*V22,V10*V21-V11*V20}. - In the second version, the intent is clearer at a glance, and - the source-level weight of adding typechecking doesn't overwhelm - the function. +In the second version, the intent is clearer at a glance, and +the source-level weight of adding typechecking doesn't overwhelm +the function. + +Over the years the the Erlang system has become more reliant on +typechecking. There are the dialyzer and typer tools. The +compiler can statically infer types and generate better code as +a result. Making typechecking guards be lighter-weight at the +source code level encourages their use and is more in-line with +the overall syntactic density of the language. + - Over the years the the Erlang system has become more reliant on - typechecking. There are the dialyzer and typer tools. The - compiler can statically infer types and generate better code as - a result. Making typechecking guards be lighter-weight at the - source code level encourages their use and is more in-line with - the overall syntactic density of the language. Backwards Compatibility +======================= + +All uses of the `is_type/1` functions will still work if this +proposal were implemented. Direct references to +`erlang:is_float`, `erlang:is_atom`, etc., as funs will still work +as originally intended. + - All uses of the is_type/1 functions will still work if this - proposal were implemented. Direct references to - erlang:is_float, erlang:is_atom, etc., as funs will still work - as originally intended. Reference Implementation +======================== + +None. - None. -References - - None. Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0028.md b/eeps/eep-0028.md index 3b8c187..23eb02c 100644 --- a/eeps/eep-0028.md +++ b/eeps/eep-0028.md @@ -1,255 +1,264 @@ -EEP: 28 -Title: Optional leading semicolons for choices -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe -Status: Draft -Type: Standards Track -Erlang-Version: R12B-4 -Content-Type: text/plain -Created: 08-Aug-2008 -Post-History: + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Erlang-Version: R12B-4 + Created: 08-Aug-2008 + Post-History: +**** +EEP 28: Optional leading semicolons for choices +---- Abstract +======== - If, case, receive, and try clauses may begin with a semicolon. +'If', 'case', 'receive', and 'try' clauses may begin with a semicolon. Specification +============= - A semicolon is allowed after the keywords 'if', 'of', - 'receive' (provided the next word is not 'after'), - and 'catch' (in a 'try' expression). +A semicolon is allowed after the keywords 'if', 'of', +'receive' (provided the next word is not 'after'), +and 'catch' (in a 'try' expression). - The semicolon has no effect; it is merely there to allow - a layout style which makes it easier to see the semicolons, - easier to ensure that commas are commas and semicolons are - semicolons, and easier to change the order of choices. +The semicolon has no effect; it is merely there to allow +a layout style which makes it easier to see the semicolons, +easier to ensure that commas are commas and semicolons are +semicolons, and easier to change the order of choices. Motivation - - In his PhD thesis on compiling Prolog, Peter van Roy complained - that commas and semicolons were hard to distinguish. In response, - I developed a Prolog layout style where commas go at the end of - lines and semicolons go at the beginner, so that a human being - reading the text is never in doubt about which is intended. - - Commas and semicolons remain hard to distinguish in Erlang. - It turns out that a semicolons-at-the-front style works well - for Erlang too. - - do_load_driver(Path, Driver, DriverFlags) -> - case erl_ddll:try_load(Path, Driver, - [{monitor,pending_driver}]++DriverFlags) of - {error, inconsistent} -> - {error,bad_driver_name}; - {error, What} -> - {error,What}; - {ok, already_loaded} -> - ok; - {ok,loaded} -> - ok; - {ok, pending_driver, Ref} -> - receive - {'DOWN', Ref, driver, _, load_cancelled} -> - {error, load_cancelled}; - {'UP', Ref, driver, _, permanent} -> - {error, permanent}; - {'DOWN', Ref, driver, _, - {load_failure, Failure}} -> - {error, Failure}; - {'UP', Ref, driver, _, loaded} -> - ok - end - end. - - In this layout style, the visually most salient part is the - beginning of the line, and except for 'case', 'receive', and - 'end', _every_ line could be _any_ line. Indentation alone - is not a reliable guide, because some logical lines have to - be split across multiple physical lines. - - My current style is - - do_load_driver(Path, Driver, DriverFlags) -> - case erl_ddll:try_load(Path, Driver, - [{monitor,pending_driver}]++DriverFlags) - of {error, inconsistent} -> - {error,bad_driver_name} - ; {error, What} -> - {error,What} - ; {ok, already_loaded} -> - ok - ; {ok,loaded} -> - ok - ; {ok, pending_driver, Ref} -> - receive - {'DOWN', Ref, driver, _, load_cancelled} -> - {error, load_cancelled} - ; {'UP', Ref, driver, _, permanent} -> - {error, permanent} - ; {'DOWN', Ref, driver, _, - {load_failure, Failure}} -> - {error, Failure} - ; {'UP', Ref, driver, _, loaded} -> - ok - end - end. - - Here the leading semicolons make it *obvious* with even half - an eye where each choice begins, and the line of semicolons - (lining up with the 'd' of 'end') makes it easy to see the - structure without a ruler. There is only one snag: the - first choice has to be different. It would be more consistent - to write - - do_load_driver(Path, Driver, DriverFlags) -> - case erl_ddll:try_load(Path, Driver, - [{monitor,pending_driver}]++DriverFlags) of - ; {error, inconsistent} -> - {error,bad_driver_name} - ; {error, What} -> - {error,What} - ; {ok, already_loaded} -> - ok - ; {ok,loaded} -> - ok - ; {ok, pending_driver, Ref} -> - receive - ; {'DOWN', Ref, driver, _, load_cancelled} -> - {error, load_cancelled} - ; {'UP', Ref, driver, _, permanent} -> - {error, permanent} - ; {'DOWN', Ref, driver, _, - {load_failure, Failure}} -> - {error, Failure} - ; {'UP', Ref, driver, _, loaded} -> - ok - end - end. - - Now each choice has the same structure, and if we wished to - reorder the choices, we could easily do so without adding, - removing, or changing any punctuation. - - It is relevant to see what case statements look like in some other - programming languages, to see that this style is quite general. - - Fortran: - SELECT CASE (expression) - CASE (values and ranges) - statements - CASE (values and ranges) - statements - CASE DEFAULT - statements - END CASE - - Ada: - case Expression is - when Discrete_Choice_List => - Statements; - when Discrete_Choice_List => - Statements; - when others => - Statements; - end case; - - PL/I: - select (Expression); - when (Values) Statement; - when (Values) Statement; - otherwise Statement; - end; - - These all exhibit "comb style", the ability to rearrange choices - without adding, removing, or changing punctuation or keywords, - and a clear indication at the _beginning_ of each choice. +========== + +In his PhD thesis on compiling Prolog, Peter van Roy complained +that commas and semicolons were hard to distinguish. In response, +I developed a Prolog layout style where commas go at the end of +lines and semicolons go at the beginner, so that a human being +reading the text is never in doubt about which is intended. + +Commas and semicolons remain hard to distinguish in Erlang. +It turns out that a semicolons-at-the-front style works well +for Erlang too. + + do_load_driver(Path, Driver, DriverFlags) -> + case erl_ddll:try_load(Path, Driver, + [{monitor,pending_driver}]++DriverFlags) of + {error, inconsistent} -> + {error,bad_driver_name}; + {error, What} -> + {error,What}; + {ok, already_loaded} -> + ok; + {ok,loaded} -> + ok; + {ok, pending_driver, Ref} -> + receive + {'DOWN', Ref, driver, _, load_cancelled} -> + {error, load_cancelled}; + {'UP', Ref, driver, _, permanent} -> + {error, permanent}; + {'DOWN', Ref, driver, _, + {load_failure, Failure}} -> + {error, Failure}; + {'UP', Ref, driver, _, loaded} -> + ok + end + end. + +In this layout style, the visually most salient part is the +beginning of the line, and except for 'case', 'receive', and +'end', _every_ line could be _any_ line. Indentation alone +is not a reliable guide, because some logical lines have to +be split across multiple physical lines. + +My current style is + + do_load_driver(Path, Driver, DriverFlags) -> + case erl_ddll:try_load(Path, Driver, + [{monitor,pending_driver}]++DriverFlags) + of {error, inconsistent} -> + {error,bad_driver_name} + ; {error, What} -> + {error,What} + ; {ok, already_loaded} -> + ok + ; {ok,loaded} -> + ok + ; {ok, pending_driver, Ref} -> + receive + {'DOWN', Ref, driver, _, load_cancelled} -> + {error, load_cancelled} + ; {'UP', Ref, driver, _, permanent} -> + {error, permanent} + ; {'DOWN', Ref, driver, _, + {load_failure, Failure}} -> + {error, Failure} + ; {'UP', Ref, driver, _, loaded} -> + ok + end + end. + +Here the leading semicolons make it *obvious* with even half +an eye where each choice begins, and the line of semicolons +(lining up with the 'd' of 'end') makes it easy to see the +structure without a ruler. There is only one snag: the +first choice has to be different. It would be more consistent +to write + + do_load_driver(Path, Driver, DriverFlags) -> + case erl_ddll:try_load(Path, Driver, + [{monitor,pending_driver}]++DriverFlags) of + ; {error, inconsistent} -> + {error,bad_driver_name} + ; {error, What} -> + {error,What} + ; {ok, already_loaded} -> + ok + ; {ok,loaded} -> + ok + ; {ok, pending_driver, Ref} -> + receive + ; {'DOWN', Ref, driver, _, load_cancelled} -> + {error, load_cancelled} + ; {'UP', Ref, driver, _, permanent} -> + {error, permanent} + ; {'DOWN', Ref, driver, _, + {load_failure, Failure}} -> + {error, Failure} + ; {'UP', Ref, driver, _, loaded} -> + ok + end + end. + +Now each choice has the same structure, and if we wished to +reorder the choices, we could easily do so without adding, +removing, or changing any punctuation. + +It is relevant to see what case statements look like in some other +programming languages, to see that this style is quite general. + +* Fortran: + + SELECT CASE (expression) + CASE (values and ranges) + statements + CASE (values and ranges) + statements + CASE DEFAULT + statements + END CASE + +* Ada: + + case Expression is + when Discrete_Choice_List => + Statements; + when Discrete_Choice_List => + Statements; + when others => + Statements; + end case; + +* PL/I: + + select (Expression); + when (Values) Statement; + when (Values) Statement; + otherwise Statement; + end; + +These all exhibit "comb style", the ability to rearrange choices +without adding, removing, or changing punctuation or keywords, +and a clear indication at the _beginning_ of each choice. Rationale +========= - People who like the usual Erlang style should not be forced to - change. This means that the leading semicolon must be optional, - not required. +People who like the usual Erlang style should not be forced to +change. This means that the leading semicolon must be optional, +not required. - Some of the benefits claimed above could be had by allowing - optional trailing semicolons instead of optional leading ones. - However, in Erlang as it stands, the semicolon is an operator, - not a terminator. There is nothing unusual about allowing an - operator to have a prefix version as well as an infix version. - There isn't even anything unusual about a prefix operator that - doesn't do much except clarify things: '+' is the obvious - example. So allowing a "do-nothing" prefix use of semicolons - in certain contexts is still within the spirit of Erlang. +Some of the benefits claimed above could be had by allowing +optional trailing semicolons instead of optional leading ones. +However, in Erlang as it stands, the semicolon is an operator, +not a terminator. There is nothing unusual about allowing an +operator to have a prefix version as well as an infix version. +There isn't even anything unusual about a prefix operator that +doesn't do much except clarify things: '+' is the obvious +example. So allowing a "do-nothing" prefix use of semicolons +in certain contexts is still within the spirit of Erlang. - That apart, the change is about as simple as it could be. - The only doubtful point is whether a semicolon should be - allowed before 'after'. But 'after' is already a keyword - explaining what comes next, and it can't be moved around - freely anyway. Since there seems to be nothing to gain, - let's not do it. +That apart, the change is about as simple as it could be. +The only doubtful point is whether a semicolon should be +allowed before 'after'. But 'after' is already a keyword +explaining what comes next, and it can't be moved around +freely anyway. Since there seems to be nothing to gain, +let's not do it. Backwards Compatibility +======================= - All existing Erlang code remains acceptable with unchanged - semantics. The leading semicolons are dealt with entirely in - the parser; other language manipulation tools never know that - the semicolons were ever there, so work perfectly with code - using the new style. +All existing Erlang code remains acceptable with unchanged +semantics. The leading semicolons are dealt with entirely in +the parser; other language manipulation tools never know that +the semicolons were ever there, so work perfectly with code +using the new style. Reference Implementation +======================== + +The auxiliary file [eep-0028-1.diff][] +is a patch file to be applied to `erl_parse.yrl`. +The patched file has been checked by yecc, which is happy +with it, and the resulting .erl file compiles cleanly. +However, that's all the testing that has been done. + +All that the implementation does is to change - The auxiliary file eep-0028-1.diff - is a patch file to be applied to erl_parse.yrl. - The patched file has been checked by yecc, which is happy - with it, and the resulting .erl file compiles cleanly. - However, that's all the testing that has been done. + .... 'thingy' ..... - All that the implementation does is to change - .... 'thingy' ..... - to - .... thingy_kw ..... +to - thingy_kw -> 'thingy'. - thingy_kw -> 'thingy' ';'. + .... thingy_kw ..... - in several places. This form of change, rather than + thingy_kw -> 'thingy'. + thingy_kw -> 'thingy' ';'. - .... 'thingy' optional_semicolon ...., +in several places. This form of change, rather than - was chosen so that the '$n' forms in the existing rules would - need no revision, so I am confident that no errors were - introduced by this change. + .... 'thingy' optional_semicolon ...., +was chosen so that the '$n' forms in the existing rules would +need no revision, so I am confident that no errors were +introduced by this change. -References - - None. +[eep-0028-1.diff]: eep-0028-1.diff + "Diff to apply to erl_parse.yrl" Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0029.md b/eeps/eep-0029.md index 9848c3e..37533a0 100644 --- a/eeps/eep-0029.md +++ b/eeps/eep-0029.md @@ -1,768 +1,801 @@ -EEP: 29 -Title: Abstract Patterns, Stage 1 -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe -Status: Draft -Type: Standards Track -Erlang-Version: R12B-5 -Content-Type: text/plain -Created: 25-Feb-2009 - -Post-History: + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Erlang-Version: R12B-5 + Created: 25-Feb-2009 + Post-History: +**** +EEP 29: Abstract Patterns, Stage 1 +---- Abstract +======== + +Abstract Patterns are named pattern/guard combinations +which can be used + +- in patterns, to support abstract data types +- as user-defined guards, guaranteed safe-for-guards +- as ordinary functions +- to replace many but not all uses of macros. - Abstract Patterns are named pattern/guard combinations - which can be used - - in patterns, to support abstract data types - - as user-defined guards, guaranteed safe-for-guards - - as ordinary functions - - to replace many but not all uses of macros. - The full proposal has six stages, of which this is stage 1. - This stage allows only simple abstract patterns which can be - handled by in-line substitution, so requiring no change to the - Erlang Virtual Machine. +The full proposal has six stages, of which this is stage 1. +This stage allows only simple abstract patterns which can be +handled by in-line substitution, so requiring no change to the +Erlang Virtual Machine. Specification +============= - We introduce abstract pattern declarations and calls. - The syntax is given as an adaptation of that in parse.yrl. +We introduce abstract pattern declarations and calls. +The syntax is given as an adaptation of that in parse.yrl. form -> abstract_pattern dot. abstract_pattern -> '#' atom clause_args clause_guard '->' expr. - - For future reference, we'll use the schematic rule - #A(H1, ..., Hn) when G -> B. - where an empty clause_guard is taken to mean that G is 'true'. - H1, ..., Hn and B must all be patterns. - Abstract patterns may not be directly or indirectly recursive. +For future reference, we'll use the schematic rule + + #A(H1, ..., Hn) when G -> B. + +where an empty clause_guard is taken to mean that `G` is 'true'. +`H1, ..., Hn` and `B` must all be patterns. + +Abstract patterns may not be directly or indirectly recursive. expr_700 -> pattern_call. pattern_call -> '#' atom argument_list - The expressions in the argument_list of a pattern_call must be - - patterns in a pattern - - guard expressions elsewhere in a guard - - any expression elsewhere in an ordinary expression. +The expressions in the argument_list of a pattern_call must be - There are two ways to understand the semantics of abstract - patterns: as function calls and as inline substitution. +- patterns in a pattern +- guard expressions elsewhere in a guard +- any expression elsewhere in an ordinary expression. - Considered as functions, stage 1 abstract patterns correspond - to two functions. Given our schematic rule, we get +There are two ways to understand the semantics of abstract +patterns: as function calls and as inline substitution. - '#A->'(H1, ..., Hn) when G -> B. +Considered as functions, stage 1 abstract patterns correspond +to two functions. Given our schematic rule, we get - That is, part of the meaning of an abstract pattern is a - function that works just the way it looks as if it works. - (The name '#A->' is for expository purposes and should not - be taken literally. In particular, it is NOT part of this - specification that such a function should be directly - accessible at all, still less that it should be accessible - by a name of that form.) So + '#A->'(H1, ..., Hn) when G -> B. - #permute([R,A,T]) when is_atom(A) -> [T,A,R]. +That is, part of the meaning of an abstract pattern is a +function that works just the way it looks as if it works. +(The name '#A->' is for expository purposes and should not +be taken literally. In particular, it is NOT part of this +specification that such a function should be directly +accessible at all, still less that it should be accessible +by a name of that form.) So - acts in one direction just like + #permute([R,A,T]) when is_atom(A) -> [T,A,R]. - '#permute->'([R,A,T]) when is_atom(A) -> [T,A,R]. +acts in one direction just like - would. Because abstract patterns are not allowed to be - recursive and cannot have any side effects, it is safe - to call them in guards. As a guard test, #A(E1,...,En) - is equivalent to (true = '#A->'(E1,...,En)). + '#permute->'([R,A,T]) when is_atom(A) -> [T,A,R]. - In the other direction, we get +would. Because abstract patterns are not allowed to be +recursive and cannot have any side effects, it is safe +to call them in guards. As a guard test, `#A(E1,...,En)` +is equivalent to `(true = '#A->'(E1,...,En))`. - '#A='(B) when G -> {H1, ..., Hn}. +In the other direction, we get - A pattern match + '#A='(B) when G -> {H1, ..., Hn}. - #A(P1, ..., Pn) = E +A pattern match - is equivalent to + #A(P1, ..., Pn) = E - {P1, ..., Pn} = '#A='(E) +is equivalent to - When some of the patterns Hi, B use '=', the definition is - a little trickier. Suppose, for example, we have - #foo([H|T] = X) -> {H,T}. - A naive translation would be - '#foo='({H,T}) -> [H|T] = X. - which would not work, because X would be undefined. The - basic problem here is that '=' in patterns is symmetric, - while '=' in expressions is not. The real translation - has to be that - #A(H11=H12=.., ..., Hn1=Hn2=..) when G -> B - is equivalent to - '#A='(B) - when G, X1=H11, X1=H12, ..., Xn=Hn1, Xn=Hn2, ... - -> {X1, ..., Xn} - where the bindings Xi=Hij are both sorted and re-ordered - (that is, switched from Xi=Hij to Hij=Xi) according to - data flow. In the case of the #foo/1 example, we'd get - '#foo='({H,T}) when X1 = [H|T], X = X1 -> {X1}. - The sorting and reordering process is easier than it sounds. - While there is an equation Xi=Hij such that either every - variable in Hij is known or Xi is known, add Xi=Hij if - Hij is all known, or Hij = Xi if Xi is known. + {P1, ..., Pn} = '#A='(E) - This sorting-and-reordering-by-dataflow is also recommended - in the forward direction when B contains '='. +When some of the patterns Hi, B use '=', the definition is +a little trickier. Suppose, for example, we have - Sometimes one or the other direction of an abstract pattern - cannot be constructed, even with sorting and reordering by - dataflow. This is typically because one side contains a - variable that doesn't occur on the other. For example, + #foo([H|T] = X) -> {H,T}. - #first(X) -> {X,_}. - #second(Y) -> {_,Y}. +A naive translation would be - are usable as patterns, but not as functions. The compiler - should issue a warning for such abstract patterns but allow - them. It should be a run-time error to call such a pattern - as a function as well. It should be possible to suppress - the warning, perhaps by + '#foo='({H,T}) -> [H|T] = X. - -compile({pattern_only,[{first,1,second,1}]}). +which would not work, because X would be undefined. The +basic problem here is that '=' in patterns is symmetric, +while '=' in expressions is not. The real translation +has to be that - (That's within the current syntax. Ideally that should be - #first/1 and #second/1.) + #A(H11=H12=.., ..., Hn1=Hn2=..) when G -> B - For another example, +is equivalent to - #is_date(#date(_,_,_)) -> true. + '#A='(B) + when G, X1=H11, X1=H12, ..., Xn=Hn1, Xn=Hn2, ... + -> {X1, ..., Xn} - is usable as a function, even/especially in a guard, but is - not usable as a pattern. The compiler should issue a - warning for such abstract patterns but allow them. It - should be a run-time error to call such a pattern as well. - It should be possible to suppress the warning, perhaps by +where the bindings `Xi=Hij` are both sorted and re-ordered +(that is, switched from `Xi=Hij` to `Hij=Xi`) according to +data flow. In the case of the `#foo/1` example, we'd get - -compile({function_only,[{is_date,1}]}). + '#foo='({H,T}) when X1 = [H|T], X = X1 -> {X1}. - Definition via in-line substituion is straightforward. - All of the following rewrites assume a standard renaming - of variables. +The sorting and reordering process is easier than it sounds. +While there is an equation `Xi=Hij` such that either every +variable in `Hij` is known or `Xi` is known, add `Xi=Hij` if +`Hij` is all known, or `Hij = Xi` if `Xi` is known. - f(... #A(P1,...,Pn) ...) when Gf -> Bf +This sorting-and-reordering-by-dataflow is also recommended +in the forward direction when B contains '='. - rewrites to +Sometimes one or the other direction of an abstract pattern +cannot be constructed, even with sorting and reordering by +dataflow. This is typically because one side contains a +variable that doesn't occur on the other. For example, - f(... B ...) - when G, Xi=Hij..., {P1,...,Pn} = {X1,...,Xn}, Gf -> Bf + #first(X) -> {X,_}. + #second(Y) -> {_,Y}. - case ... of ... #(P1,...,Pn) ... when Gc -> Bc - - rewrites to +are usable as patterns, but not as functions. The compiler +should issue a warning for such abstract patterns but allow +them. It should be a run-time error to call such a pattern +as a function as well. It should be possible to suppress +the warning, perhaps by - case ... of ... B ... - when G, Xi=Hij..., {P1,...,Pn} = {X1,...,Xn}, Gc -> Bc + -compile({pattern_only,[{first,1,second,1}]}). - P = E +(That's within the current syntax. Ideally that should be +` #first/1` and `#second/1`.) - rewrites to +For another example, - case E of P -> ok end + #is_date(#date(_,_,_)) -> true. - In a guard expression, +is usable as a function, even/especially in a guard, but is +not usable as a pattern. The compiler should issue a +warning for such abstract patterns but allow them. It +should be a run-time error to call such a pattern as well. +It should be possible to suppress the warning, perhaps by - (... #A(E1, ..., En) ...) + -compile({function_only,[{is_date,1}]}). - rewrites to +Definition via in-line substituion is straightforward. +All of the following rewrites assume a standard renaming +of variables. - {H1,...,Hn} = {E1,...,En}, G, (... B ...) + f(... #A(P1,...,Pn) ...) when Gf -> Bf - As a guard test, +rewrites to - #A(E1, ..., En) + f(... B ...) + when G, Xi=Hij..., {P1,...,Pn} = {X1,...,Xn}, Gf -> Bf - rewrites to + case ... of ... #(P1,...,Pn) ... when Gc -> Bc - {H1,...,Hn} = {E1,...,En}, G, true = B +rewrites to - As an ordinary expression, + case ... of ... B ... + when G, Xi=Hij..., {P1,...,Pn} = {X1,...,Xn}, Gc -> Bc - #A(E1, ..., En) + P = E - rewrites to +rewrites to - case {E1,...,En} of {H1,...,Hn} when G -> B end + case E of P -> ok end +In a guard expression, + (... #A(E1, ..., En) ...) -Motivation +rewrites to + + {H1,...,Hn} = {E1,...,En}, G, (... B ...) + +As a guard test, + + #A(E1, ..., En) + +rewrites to + + {H1,...,Hn} = {E1,...,En}, G, true = B + +As an ordinary expression, + + #A(E1, ..., En) + +rewrites to - Even in this restricted form, abstract patterns solve a lot - of problems that keep coming up on the Erlang mailing list. - They were invented to serve two main purposes: to greatly - reduce the need for the preprocessor, and to support the - use of abstract data types. It turns out that they can also - reduce the amount of keyboard work a programmer has to do, - and increase the amount of type information available to the - compiler. - - Macros are often used to provide named constants. - For example, - - -define(unknown, "UNKNOWN"). - f(?unknown, Actors) -> Actors; - f(N, Actors) -> lists:keydelete(N, #actor.name, Actors). - - A function is not used here because function calls may not - appear in patterns. Abstract patterns are functions that - are sufficiently restricted that they _may_ appear in patterns: - - #unknown() -> "UNKNOWN". - f(#unknown(), Actors) -> Actors; - f(N, Actors) -> lists:keydelete(n, #actor.name, Actors). - - Sometimes these constants must be computed. - For example, - - -define(START_TIMEOUT, 1000 * 30). - - Thanks to variable binding in guards, we can do that too: - - #start_timeout() when N = 1000*30 -> N. - - There are things that macros cannot do, because there needs - to be a guard test as well as a pattern. Macros can't bilocate. - - #date(D, M, Y) - when is_integer(Y), Y >= 1600, Y =< 2500, - is_integer(M), M >= 1, M =< 12, - is_integer(D), D >= 1, D =< 31 - -> {Y, M, D}. - - #vector3(X, Y, Z) - when is_float(X), is_float(Y), is_float(Z) - -> {X, Y, Z}. - - #mod_func(M, F) when is_atom(M), is_atom(F) -> {M, F}. - - #mod_func_arity(M, F, A) - when is_atom(M), is_atom(F), is_integer(A), A >= 0 - -> {M, F, A}. - - Some macros cannot be replaced by abstract patterns. - - -define(DBG(DbgLvl, Format, Data), - dbg(DbgLvl, Format, Data)). - - cannot be an abstract pattern because the right hand side - involves a call to an ordinary function. - - Some macros define guard tests. For example, - - -define(tab, 9). - -define(space, 32). - -define(is_tab(X), X == ?tab). - -define(is_space(X), X == ?space). - -define(is_underline(X), X == $_). - -define(is_number(X), X >= $0, X =< $9). - -define(is_upper(X), X >= $A, X =< $Z). - -define(is_lower(X), X >= $a, X =< $z). - - token([X|File], L, Result, Gen, BsNl) - when ?is_upper(X) -> - GenNew = case Gen of not_set -> var; _ -> Gen end, - {Rem, Var} = tok_var(File, [X]), - token(Rem, L, [{var,Var}|Result], GenNew, BsNl); - token([X|File], L, Result, Gen, BsNl) - when ?is_lower(X) -> - GenNew = case Gen of not_set -> var; _ -> Gen end, - {Rem, Var} = tok_var(File, [X]), - token(Rem, L, [{var,Var}|Result], GenNew, BsNl); - token([X|File], L, Result, Gen, BsNl) - when ?is_underline(X) -> - GenNew = case Gen of not_set -> var; _ -> Gen end, - {Rem, Var} = tok_var(File, [X]), - token(Rem, L, [{var,Var}|Result], GenNew, BsNl); - - These can be converted to abstract patterns that are usable - as guard tests, - - #tab() -> 9. - #space() -> 32. - #is_tab(#tab()) -> true. - #is_space(#space()) -> true. - #is_underline($_)) -> true. - #is_number(X) when X >= $0, X =< $9 -> true. - #is_upper(X) when X >= $A, X =< $Z -> true. - #is_lower(X) when X >= $a, X =< $z -> true. - - token([X|File], L, Result, Gen, BsNl) - when #is_upper(X) -> - GenNew = case Gen of not_set -> var; _ -> Gen end, - {Rem, Var} = tok_var(File, [X]), - token(Rem, L, [{var,Var}|Result], GenNew, BsNl); - token([X|File], L, Result, Gen, BsNl) - when #is_lower(X) -> - GenNew = case Gen of not_set -> var; _ -> Gen end, - {Rem, Var} = tok_var(File, [X]), - token(Rem, L, [{var,Var}|Result], GenNew, BsNl); - token([X|File], L, Result, Gen, BsNl) - when #is_underline(X) -> - GenNew = case Gen of not_set -> var; _ -> Gen end, - {Rem, Var} = tok_var(File, [X]), - token(Rem, L, [{var,Var}|Result], GenNew, BsNl); - - or to abstract patterns that can be used as patterns, - - #tab() -> 9. - #space() -> 32. - #underline(X) when X == $_ -> X. - #number(X) when X >= $0, X =< $9 -> X. - #upper(X) when X >= $A, X =< $Z -> X. - #lower(X) when X >= $a, X =< $z -> X. - - token([#upper(X)|File], L, Result, Gen, BsNl) -> - GenNew = case Gen of not_set -> var; _ -> Gen end, - {Rem, Var} = tok_var(File, [X]), - token(Rem, L, [{var,Var}|Result], GenNew, BsNl); - token([#lower(X)|File], L, Result, Gen, BsNl) -> - GenNew = case Gen of not_set -> var; _ -> Gen end, - {Rem, Var} = tok_var(File, [X]), - token(Rem, L, [{var,Var}|Result], GenNew, BsNl); - token([#underline(X)|File], L, Result, Gen, BsNl) -> - GenNew = case Gen of not_set -> var; _ -> Gen end, - {Rem, Var} = tok_var(File, [X]), - token(Rem, L, [{var,Var}|Result], GenNew, BsNl); - - Of course we can use disjunction in the guard of an - abstract pattern. - - #id_start(X) when X >= $A, X =< $Z - ; X >= $a, X =< $z - ; X == $_ -> X. - - token([#is_start(X)|File], L, Result, Gen, BsNl) -> - GenNew = case Gen of not_set -> var; _ -> Gen end, - {Rem, Var} = tok_var(File, [X]), - token(Rem, L, [{var,Var}|Result], GenNew, BsNl); - - Yes, the original macro-based version could have done the same. - It's from the OTP sources; don't blame me. - - Aside from replacing a pattern AND a guard, which macros cannot - do, the great advantages over patterns over macros are that - - they can be syntax-checked at the point of definition, - while macros can only be syntax-checked at the point of use; - - there is no problem, indeed no possibility, of variable name - capture; - - abstract patterns are value based, not token-list based, so - there are no problems with operators. - - Consider the following OTP macro: - - -define(IC_FLAG_TEST(_F1, _I1), ((_F1 band _I1) == _I1)). - - First, the author was evidently scared of accidental collisions - with other variable names. Second, the parentheses look as - though they are there in case of operator precedence bugs. - There's at least one other like it, - - -define(is_set(F, Bits), ((F) band (Bits)) == (F)). - - which (correctly) suggests that the first macro doesn't have enough - parentheses. The abstract pattern equivalent, - - #ic_flag_test(Flags, Mask) when Flags band Mask == Mask -> true. - - has neither problem. - - Once again, there are things that abstract patterns cannot do. - For example, - - -define(get_max(_X, _Y), if _X > _Y -> _X; true -> _Y end). - -define(get_min(_X, _Y), if _X > _Y -> _Y; true -> _X end). - - These cannot be abstract patterns because an abstract pattern - cannot contain an 'if' or a 'case' or any other control structure. - But they can, and should, be ordinary inline functions: - - -compile({inline,[{max,2},{min,2}]}). - max(X, Y) -> if X > Y -> X; true -> Y end. - min(X, Y) -> if X > Y -> Y; true -> X end. - - Abstract patterns don't need to do what ordinary functions can. - Here's another example from the OTP sources. - - -define(LOWER(Char), - if - Char >= $A, Char =< $Z -> - Char - ($A - $a); - true -> - Char - end). - tolower(Chars) -> - [?LOWER(Char) || Char <- Chars]. - - This could, and should, have been an ordinary inlined function. - Abstract patterns don't need to do what ordinary functions can. - Let's examine it a little closer. Suppose we had a pattern - Cl = #lower(Cx) - which when used as an ordinary function converted both $x and $X - to $x. Then when used as a pattern #lower(Cx) = $x, there would - be two correct answers for Cx. There are no other cases where - a pattern may match more than one way. The fact that abstract - patterns cannot do conditionals is one of the things that makes - them usable as patterns. - - Macros are sometimes used for module names. - - -define(SERVER,{rmod_random_impl, - list_to_atom("babbis@" ++ - hd(tl(string:tokens(atom_to_list(node()),"@"))))}). - - -define(CLIENTMOD,'rmod_random'). - - produce() -> ?CLIENTMOD:produce(?SERVER). - - Abstract patterns can be used for this too, but there is an - error waiting to happen. - - server() -> {rmod_random_impl, - list_to_atom("babbis@" ++ - hd(tl(string:tokens(atom_to_list(node()),"@"))))}. + case {E1,...,En} of {H1,...,Hn} when G -> B end - #client_mod() -> 'rmod_random'. - - produce -> #client_mod():produce(server()). - The risk is that of writing #client_mod:produce(server()), - which is the syntax we'll want in stage 2 for calling an - abstract pattern defined in another module. - There is one thing that macros are used for that abstract - patterns can be used for, but you'd probably rather not. + +Motivation +========== + +Even in this restricted form, abstract patterns solve a lot +of problems that keep coming up on the Erlang mailing list. +They were invented to serve two main purposes: to greatly +reduce the need for the preprocessor, and to support the +use of abstract data types. It turns out that they can also +reduce the amount of keyboard work a programmer has to do, +and increase the amount of type information available to the +compiler. + +Macros are often used to provide named constants. +For example, + + -define(unknown, "UNKNOWN"). + f(?unknown, Actors) -> Actors; + f(N, Actors) -> lists:keydelete(N, #actor.name, Actors). + +A function is not used here because function calls may not +appear in patterns. Abstract patterns are functions that +are sufficiently restricted that they _may_ appear in patterns: + + #unknown() -> "UNKNOWN". + f(#unknown(), Actors) -> Actors; + f(N, Actors) -> lists:keydelete(n, #actor.name, Actors). + +Sometimes these constants must be computed. +For example, + + -define(START_TIMEOUT, 1000 * 30). + +Thanks to variable binding in guards, we can do that too: + + #start_timeout() when N = 1000*30 -> N. + +There are things that macros cannot do, because there needs +to be a guard test as well as a pattern. Macros can't bilocate. + + #date(D, M, Y) + when is_integer(Y), Y >= 1600, Y =< 2500, + is_integer(M), M >= 1, M =< 12, + is_integer(D), D >= 1, D =< 31 + -> {Y, M, D}. + + #vector3(X, Y, Z) + when is_float(X), is_float(Y), is_float(Z) + -> {X, Y, Z}. + + #mod_func(M, F) when is_atom(M), is_atom(F) -> {M, F}. + + #mod_func_arity(M, F, A) + when is_atom(M), is_atom(F), is_integer(A), A >= 0 + -> {M, F, A}. + +Some macros cannot be replaced by abstract patterns. + + -define(DBG(DbgLvl, Format, Data), + dbg(DbgLvl, Format, Data)). + +cannot be an abstract pattern because the right hand side +involves a call to an ordinary function. + +Some macros define guard tests. For example, + + -define(tab, 9). + -define(space, 32). + -define(is_tab(X), X == ?tab). + -define(is_space(X), X == ?space). + -define(is_underline(X), X == $_). + -define(is_number(X), X >= $0, X =< $9). + -define(is_upper(X), X >= $A, X =< $Z). + -define(is_lower(X), X >= $a, X =< $z). + + token([X|File], L, Result, Gen, BsNl) + when ?is_upper(X) -> + GenNew = case Gen of not_set -> var; _ -> Gen end, + {Rem, Var} = tok_var(File, [X]), + token(Rem, L, [{var,Var}|Result], GenNew, BsNl); + token([X|File], L, Result, Gen, BsNl) + when ?is_lower(X) -> + GenNew = case Gen of not_set -> var; _ -> Gen end, + {Rem, Var} = tok_var(File, [X]), + token(Rem, L, [{var,Var}|Result], GenNew, BsNl); + token([X|File], L, Result, Gen, BsNl) + when ?is_underline(X) -> + GenNew = case Gen of not_set -> var; _ -> Gen end, + {Rem, Var} = tok_var(File, [X]), + token(Rem, L, [{var,Var}|Result], GenNew, BsNl); + +These can be converted to abstract patterns that are usable +as guard tests, + + #tab() -> 9. + #space() -> 32. + #is_tab(#tab()) -> true. + #is_space(#space()) -> true. + #is_underline($_)) -> true. + #is_number(X) when X >= $0, X =< $9 -> true. + #is_upper(X) when X >= $A, X =< $Z -> true. + #is_lower(X) when X >= $a, X =< $z -> true. + + token([X|File], L, Result, Gen, BsNl) + when #is_upper(X) -> + GenNew = case Gen of not_set -> var; _ -> Gen end, + {Rem, Var} = tok_var(File, [X]), + token(Rem, L, [{var,Var}|Result], GenNew, BsNl); + token([X|File], L, Result, Gen, BsNl) + when #is_lower(X) -> + GenNew = case Gen of not_set -> var; _ -> Gen end, + {Rem, Var} = tok_var(File, [X]), + token(Rem, L, [{var,Var}|Result], GenNew, BsNl); + token([X|File], L, Result, Gen, BsNl) + when #is_underline(X) -> + GenNew = case Gen of not_set -> var; _ -> Gen end, + {Rem, Var} = tok_var(File, [X]), + token(Rem, L, [{var,Var}|Result], GenNew, BsNl); + +or to abstract patterns that can be used as patterns, + + #tab() -> 9. + #space() -> 32. + #underline(X) when X == $_ -> X. + #number(X) when X >= $0, X =< $9 -> X. + #upper(X) when X >= $A, X =< $Z -> X. + #lower(X) when X >= $a, X =< $z -> X. - Abstract patterns were also invented with the aim of - replacing at least some uses of records. Frames (or Joe - Armstrong's structs, which are essentially the same thing) - are a superior way to do that. Let's see a simple case. - - -record(mark_params, {cell_id, - virtual_col, - virtual_row - }). - ... - MarkP = mark_params(), - ... - NewMarkP = MarkP#mark_params{cell_id = undefined, - virtual_col = undefined, - virtual_row = VirtualRow - }, - - This becomes - - % General - #mark_params(Cell, Row, Col) -> {mark_params, Cell, Row, Col}. - % Initial value - #mark_params() -> #mark_params(undefined, undefined, undefined). - % Recogniser - #is_mark_params({mark_params,_,_,_}) -> true. - % Cell extractor - #mark_params__cell(#mark_params(Cell,_,_)) -> Cell. - % Cell updater - #mark_params__cell(Cell, #mark_params(_,R,C)) -> - #mark_params(Cell, R, C). - % Row extractor - #mark_params__row(#mark_params(_,Row,_)) -> Row. - % Row updater - #mark_params__row(Row, #mark_params(K,_,C)) -> - #mark_params(K, Row, C). - % Col extractor - #mark_params__col(#mark_params(_,_,Col)) -> Col. - % Col updater - #mark_params__col(Col, #mark_params(K,R,_)) -> - #mark_params(K, R, Col). - ... - MarkP = #mark_params(), - ... - NewMarkP = #mark_params__row(VirtualRow, - #mark_params__col(undefined, - #mark_params__cell(undefined, MarkP))) - - The extractor and updater patterns can be derived automatically, - which comes in stage 4. With frames/structs, we may never bother. - - There is a feature of Haskell that I have long loved. - That is so-called "n+k patterns", where a pattern may be N+K - for N a variable and K a positive integer. This matches V - if V is an integer greater than or equal to K, and binds N - to V - K. For example, - - fib 0 = 1 - fib 1 = 1 - fib (n+2) = fib n + fib (n+1) + token([#upper(X)|File], L, Result, Gen, BsNl) -> + GenNew = case Gen of not_set -> var; _ -> Gen end, + {Rem, Var} = tok_var(File, [X]), + token(Rem, L, [{var,Var}|Result], GenNew, BsNl); + token([#lower(X)|File], L, Result, Gen, BsNl) -> + GenNew = case Gen of not_set -> var; _ -> Gen end, + {Rem, Var} = tok_var(File, [X]), + token(Rem, L, [{var,Var}|Result], GenNew, BsNl); + token([#underline(X)|File], L, Result, Gen, BsNl) -> + GenNew = case Gen of not_set -> var; _ -> Gen end, + {Rem, Var} = tok_var(File, [X]), + token(Rem, L, [{var,Var}|Result], GenNew, BsNl); + +Of course we can use disjunction in the guard of an +abstract pattern. + + #id_start(X) when X >= $A, X =< $Z + ; X >= $a, X =< $z + ; X == $_ -> X. + + token([#is_start(X)|File], L, Result, Gen, BsNl) -> + GenNew = case Gen of not_set -> var; _ -> Gen end, + {Rem, Var} = tok_var(File, [X]), + token(Rem, L, [{var,Var}|Result], GenNew, BsNl); + +Yes, the original macro-based version could have done the same. +It's from the OTP sources; don't blame me. + +Aside from replacing a pattern AND a guard, which macros cannot +do, the great advantages over patterns over macros are that + +- they can be syntax-checked at the point of definition, + while macros can only be syntax-checked at the point of use; +- there is no problem, indeed no possibility, of variable name + capture; +- abstract patterns are value based, not token-list based, so + there are no problems with operators. + +Consider the following OTP macro: + + -define(IC_FLAG_TEST(_F1, _I1), ((_F1 band _I1) == _I1)). + +First, the author was evidently scared of accidental collisions +with other variable names. Second, the parentheses look as +though they are there in case of operator precedence bugs. + +There's at least one other like it, + + -define(is_set(F, Bits), ((F) band (Bits)) == (F)). + +which (correctly) suggests that the first macro doesn't have enough +parentheses. The abstract pattern equivalent, + + #ic_flag_test(Flags, Mask) when Flags band Mask == Mask -> true. + +has neither problem. + +Once again, there are things that abstract patterns cannot do. +For example, + + -define(get_max(_X, _Y), if _X > _Y -> _X; true -> _Y end). + -define(get_min(_X, _Y), if _X > _Y -> _Y; true -> _X end). + +These cannot be abstract patterns because an abstract pattern +cannot contain an 'if' or a 'case' or any other control structure. +But they can, and should, be ordinary inline functions: + + -compile({inline,[{max,2},{min,2}]}). + max(X, Y) -> if X > Y -> X; true -> Y end. + min(X, Y) -> if X > Y -> Y; true -> X end. + +Abstract patterns don't need to do what ordinary functions can. +Here's another example from the OTP sources. + + -define(LOWER(Char), + if + Char >= $A, Char =< $Z -> + Char - ($A - $a); + true -> + Char + end). + tolower(Chars) -> + [?LOWER(Char) || Char <- Chars]. + +This could, and should, have been an ordinary inlined function. +Abstract patterns don't need to do what ordinary functions can. +Let's examine it a little closer. Suppose we had a pattern + + Cl = #lower(Cx) + +which when used as an ordinary function converted both `$x` and `$X` +to `$x`. Then when used as a pattern `#lower(Cx) = $x`, there would +be two correct answers for `Cx`. There are no other cases where +a pattern may match more than one way. The fact that abstract +patterns cannot do conditionals is one of the things that makes +them usable as patterns. + +Macros are sometimes used for module names. + + -define(SERVER,{rmod_random_impl, + list_to_atom("babbis@" ++ + hd(tl(string:tokens(atom_to_list(node()),"@"))))}). + + -define(CLIENTMOD,'rmod_random'). + + produce() -> ?CLIENTMOD:produce(?SERVER). + +Abstract patterns can be used for this too, but there is an +error waiting to happen. + + server() -> {rmod_random_impl, + list_to_atom("babbis@" ++ + hd(tl(string:tokens(atom_to_list(node()),"@"))))}. + + #client_mod() -> 'rmod_random'. - Not that that's a good way to implement the Fibonacci function, - of course. (It takes O(phi^N) when O(log N) is attainable.) - There's no such thing in Erlang. But with abstract patterns, - we could program it: - - #succ(M) when is_integer(N), N >= 1, M = N - 1 -> N. - - fib(0) -> 1; - fib(1) -> 1; - fib(#succ(#succ(N)) -> fib(N) + fib(N+1). - - Sometimes we want a three-way split: - N = 1 - N = 2k+0 (k >= 1) - N = 2k+1 (k >= 1) - We can program that too: - #one() -> 1. - #even(K) - when is_integer(N), (N band 1) == 0, N >= 2, K = N div 2 - -> N. - #odd(K) - when is_integer(N), (N band 1) == 1, N >= 3, K = N div 2 - -> N. - - ruler(#one()) -> 0 ; - ruler(#even(K)) -> 1 + ruler(K); - ruler(#odd(K)) -> 1. - - Let's turn to abstract data types. - There are three obvious ways to implement association lists - as single data structures: - - [{K1,V1}, ..., {Kn,Vn}] % pairs - [K1,V1, ..., Kn,Vn] % alternating - {K1,V1, ..., {Kn,Vn,[]}} % triples - - Suppose you cannot make up your mind which is better. - - #empty_alist() -> []. - -ifdef(PAIRS). - #non_empty_alist(K,V,R) -> [{K,V}|R]. - -else. - -ifdef(TRIPLES). - #non_empty_alist(K,V,R) -> {K,V,R}. - -else. - #non_empty_alist(K,V,R) -> [K,V|R]. - -endif. - -endif. - - zip([K|Ks], [V|Vs]) -> - #non_empty_alist(K, V, zip(Ks, Vs)); - zip([], []) -> - #empty_alist(). - - lookup(K, #non_empty_alist(K,V,_), _) -> - V; - lookup(K, #non_empty_alist(_,_,R), D) -> - lookup(K, R, D); - lookup(K, #empty_alist(), D) -> - D. - - Now you can switch between the three implementations, for - testing and benchmarking, by flicking a single preprocessor - switch. - - Sometimes there is something that would have been an algebraic - data type in Haskell or Clean or SML or CAML, but in Erlang we - just have to use a variety of tuples. The parsed form of - Erlang source code is a good example. - - lform({attribute,Line,Name,Arg}, Hook) -> - lattribute({attribute,Line,Name,Arg}, Hook); - lform({function,Line,Name,Arity,Clauses}, Hook) -> - lfunction({function,Line,Name,Arity,Clauses}, Hook); - lform({rule,Line,Name,Arity,Clauses}, Hook) -> - lrule({rule,Line,Name,Arity,Clauses}, Hook); - %% These are specials to make it easier for the compiler. - lform({error,E}, _Hook) -> - leaf(format("~p\n", [{error,E}])); - lform({warning,W}, _Hook) -> - leaf(format("~p\n", [{warning,W}])); - lform({eof,_Line}, _Hook) -> - $\n. - - We can define abstract patterns for these. - - #attribute(L, N, A) -> {attribute, L, N, A}. - #function( L, N, A, C) -> {function, L, N, A, C}. - #rule( L, N, A, C) -> {rule, L, N, A, C}. - #eof( L) -> {eof, L}. - #error( E_ -> {error, E}. - #warning( W) -> {warning, W}. - - #attribute() -> #attribute(_,_,_). - #function() -> #function(_,_,_,_). - #rule() -> #rule(_,_,_,_). - - lform(Form, Hook) -> - case Form - of #attribute() -> lattribute(Form, Hook) - ; #function() -> lfunction( Form, Hook) - ; #rule() -> lrule( Form, Hook) - ; #error(E) -> leaf(format("~p\n", [{error,E}])) - ; #warning(W) -> leaf(format("~p\n", [{warning,W}])) - ; #eof(_) -> $\n - end. - - It would almost be worth defining these patterns even if these - were their only occurrences, simply for the clarity they permit. - But these patterns would be used over and over again. Using - the patterns not only makes the code shorter and clearer, it - gives us two kinds of protection against changes to the data - representation. For example, suppose we decided to hold - Name/Arity information in 'function' and 'rule' tuples as - pairs, not as separate fields. Then we could do - - -ifdef(OLD_DATA). - #function( L, N, A, C) -> {function, L, N, A, C}. - #rule( L, N, A, C) -> {rule, L, N, A, C}. - #function( L, {N,A}, C) -> {function, L, N, A, C}. - #rule( L, {N,A}, C) -> {rule, L, N, A, C}. - -else. - #function( L, N, A, C) -> {function, L, {N,A}, C}. - #rule( L, N, A, C) -> {rule, L, {N,A}, C}. - #function( L, NA, C) -> {function, L, NA, C}. - #rule( L, NA, C) -> {rule, L, NA, C}. - -endif. - - The rest of the code would remain unchanged. That's one kind of - protection. It doesn't help us when we need to add new cases. - That's when the second kind of protection comes up. Looking - for '#function' is a much safer guide to finding relevant places - than looking for 'function'. + produce -> #client_mod():produce(server()). + +The risk is that of writing `#client_mod:produce(server())`, +which is the syntax we'll want in stage 2 for calling an +abstract pattern defined in another module. +There is one thing that macros are used for that abstract +patterns can be used for, but you'd probably rather not. + +Abstract patterns were also invented with the aim of +replacing at least some uses of records. Frames (or Joe +Armstrong's structs, which are essentially the same thing) +are a superior way to do that. Let's see a simple case. + + -record(mark_params, {cell_id, + virtual_col, + virtual_row + }). + ... + MarkP = mark_params(), + ... + NewMarkP = MarkP#mark_params{cell_id = undefined, + virtual_col = undefined, + virtual_row = VirtualRow + }, + +This becomes + + % General + #mark_params(Cell, Row, Col) -> {mark_params, Cell, Row, Col}. + % Initial value + #mark_params() -> #mark_params(undefined, undefined, undefined). + % Recogniser + #is_mark_params({mark_params,_,_,_}) -> true. + % Cell extractor + #mark_params__cell(#mark_params(Cell,_,_)) -> Cell. + % Cell updater + #mark_params__cell(Cell, #mark_params(_,R,C)) -> + #mark_params(Cell, R, C). + % Row extractor + #mark_params__row(#mark_params(_,Row,_)) -> Row. + % Row updater + #mark_params__row(Row, #mark_params(K,_,C)) -> + #mark_params(K, Row, C). + % Col extractor + #mark_params__col(#mark_params(_,_,Col)) -> Col. + % Col updater + #mark_params__col(Col, #mark_params(K,R,_)) -> + #mark_params(K, R, Col). + ... + MarkP = #mark_params(), + ... + NewMarkP = #mark_params__row(VirtualRow, + #mark_params__col(undefined, + #mark_params__cell(undefined, MarkP))) + +The extractor and updater patterns can be derived automatically, +which comes in stage 4. With frames/structs, we may never bother. + +There is a feature of Haskell that I have long loved. +That is so-called "n+k patterns", where a pattern may be N+K +for N a variable and K a positive integer. This matches V +if V is an integer greater than or equal to K, and binds N +to V - K. For example, + + fib 0 = 1 + fib 1 = 1 + fib (n+2) = fib n + fib (n+1) + +Not that that's a good way to implement the Fibonacci function, +of course. (It takes O(phi^N) when O(log N) is attainable.) +There's no such thing in Erlang. But with abstract patterns, +we could program it: + + #succ(M) when is_integer(N), N >= 1, M = N - 1 -> N. + + fib(0) -> 1; + fib(1) -> 1; + fib(#succ(#succ(N)) -> fib(N) + fib(N+1). + +Sometimes we want a three-way split: + + N = 1 + N = 2k+0 (k >= 1) + N = 2k+1 (k >= 1) + +We can program that too: + + #one() -> 1. + #even(K) + when is_integer(N), (N band 1) == 0, N >= 2, K = N div 2 + -> N. + #odd(K) + when is_integer(N), (N band 1) == 1, N >= 3, K = N div 2 + -> N. + + ruler(#one()) -> 0 ; + ruler(#even(K)) -> 1 + ruler(K); + ruler(#odd(K)) -> 1. + +Let's turn to abstract data types. +There are three obvious ways to implement association lists +as single data structures: + + [{K1,V1}, ..., {Kn,Vn}] % pairs + [K1,V1, ..., Kn,Vn] % alternating + {K1,V1, ..., {Kn,Vn,[]}} % triples + +Suppose you cannot make up your mind which is better. + + #empty_alist() -> []. + -ifdef(PAIRS). + #non_empty_alist(K,V,R) -> [{K,V}|R]. + -else. + -ifdef(TRIPLES). + #non_empty_alist(K,V,R) -> {K,V,R}. + -else. + #non_empty_alist(K,V,R) -> [K,V|R]. + -endif. + -endif. + + zip([K|Ks], [V|Vs]) -> + #non_empty_alist(K, V, zip(Ks, Vs)); + zip([], []) -> + #empty_alist(). + + lookup(K, #non_empty_alist(K,V,_), _) -> + V; + lookup(K, #non_empty_alist(_,_,R), D) -> + lookup(K, R, D); + lookup(K, #empty_alist(), D) -> + D. + +Now you can switch between the three implementations, for +testing and benchmarking, by flicking a single preprocessor +switch. + +Sometimes there is something that would have been an algebraic +data type in Haskell or Clean or SML or CAML, but in Erlang we +just have to use a variety of tuples. The parsed form of +Erlang source code is a good example. + + lform({attribute,Line,Name,Arg}, Hook) -> + lattribute({attribute,Line,Name,Arg}, Hook); + lform({function,Line,Name,Arity,Clauses}, Hook) -> + lfunction({function,Line,Name,Arity,Clauses}, Hook); + lform({rule,Line,Name,Arity,Clauses}, Hook) -> + lrule({rule,Line,Name,Arity,Clauses}, Hook); + %% These are specials to make it easier for the compiler. + lform({error,E}, _Hook) -> + leaf(format("~p\n", [{error,E}])); + lform({warning,W}, _Hook) -> + leaf(format("~p\n", [{warning,W}])); + lform({eof,_Line}, _Hook) -> + $\n. + +We can define abstract patterns for these. + + #attribute(L, N, A) -> {attribute, L, N, A}. + #function( L, N, A, C) -> {function, L, N, A, C}. + #rule( L, N, A, C) -> {rule, L, N, A, C}. + #eof( L) -> {eof, L}. + #error( E_ -> {error, E}. + #warning( W) -> {warning, W}. + + #attribute() -> #attribute(_,_,_). + #function() -> #function(_,_,_,_). + #rule() -> #rule(_,_,_,_). + + lform(Form, Hook) -> + case Form + of #attribute() -> lattribute(Form, Hook) + ; #function() -> lfunction( Form, Hook) + ; #rule() -> lrule( Form, Hook) + ; #error(E) -> leaf(format("~p\n", [{error,E}])) + ; #warning(W) -> leaf(format("~p\n", [{warning,W}])) + ; #eof(_) -> $\n + end. + +It would almost be worth defining these patterns even if these +were their only occurrences, simply for the clarity they permit. +But these patterns would be used over and over again. Using +the patterns not only makes the code shorter and clearer, it +gives us two kinds of protection against changes to the data +representation. For example, suppose we decided to hold +Name/Arity information in 'function' and 'rule' tuples as +pairs, not as separate fields. Then we could do + + -ifdef(OLD_DATA). + #function( L, N, A, C) -> {function, L, N, A, C}. + #rule( L, N, A, C) -> {rule, L, N, A, C}. + #function( L, {N,A}, C) -> {function, L, N, A, C}. + #rule( L, {N,A}, C) -> {rule, L, N, A, C}. + -else. + #function( L, N, A, C) -> {function, L, {N,A}, C}. + #rule( L, N, A, C) -> {rule, L, {N,A}, C}. + #function( L, NA, C) -> {function, L, NA, C}. + #rule( L, NA, C) -> {rule, L, NA, C}. + -endif. + +The rest of the code would remain unchanged. That's one kind of +protection. It doesn't help us when we need to add new cases. +That's when the second kind of protection comes up. Looking +for `#function` is a much safer guide to finding relevant places +than looking for `function`. Rationale +========= + +There is more to the idea of abstract patterns than this +specification describes. Here's a "road map". + +* Stage 0: + Allow pattern matching in guards. + This is the subject of another EEP, as it is + desirable in itself. This MUST be implemented + first before implementing Stage 1, because that's + what we want inlinable pattern calls to expand to. + +* Stage 1: + Simple abstract patterns restricted so that they + can be implemented exclusively by inline expansion. + This requires no change to the VM other than the + changes required for Stage 0. + + Import/export of patterns can be faked using the + preprocessor to -include definitions; this is not + ideal, but it's an acceptable stopgap. + +* Stage 2: + Abstract functions are (pairs of) real functions, + they may be -exported and -imported, may be called + with module prefixes, can be replaced by hot loading, + should be traceable, debuggable, profilable, and so + on, just like other functions. In Stage 2, exported + abstract patterns would need inline declarations if + they are to be inlined; other patterns would continue + to be inlined except when compiled in debugging mode. + + This requires fairly substantial changes to the + run time system. The big payoff here is that + imported abstract patterns can be replaced by hot + loading, unlike macros. - There is more to the idea of abstract patterns than this - specification describes. Here's a "road map". - - Stage 0: Allow pattern matching in guards. - This is the subject of another EEP, as it is - desirable in itself. This MUST be implemented - first before implementing Stage 1, because that's - what we want inlinable pattern calls to expand to. - - Stage 1: Simple abstract patterns restricted so that they - can be implemented exclusively by inline expansion. - This requires no change to the VM other than the - changes required for Stage 0. - - Import/export of patterns can be faked using the - preprocessor to -include definitions; this is not - ideal, but it's an acceptable stopgap. - - Stage 2: Abstract functions are (pairs of) real functions, - they may be -exported and -imported, may be called - with module prefixes, can be replaced by hot loading, - should be traceable, debuggable, profilable, and so - on, just like other functions. In Stage 2, exported - abstract patterns would need inline declarations if - they are to be inlined; other patterns would continue - to be inlined except when compiled in debugging mode. - - This requires fairly substantial changes to the - run time system. The big payoff here is that - imported abstract patterns can be replaced by hot - loading, unlike macros. - - Stage 3: #fun [Module:]Name/Arity and - #fun (P1, ..., Pn) when G -> B end - forms are introduced, and a metacall - #Var(E1,...,En) is added. - - This requires extensions to the Erlang term - representation and the VM. The gain here is that - the FAQ "how do I pass a pattern as a parameter" - finally gets a safe answer. For example, - - collect_messages(P) -> - lists:reverse(collect_messages_loop(P, [])). - - collect_messages_loop(P, Ms) -> - receive M = #P() -> collect_messages_loop([M|Ms]) - after 0 -> Ms - end. - - gathers all the messages currently in the mailbox - that match a pattern passed as a parameter. - - Stage 4: # field update, - as described in the original proposal. - - Stage 5: Multi-clause abstract patterns, - as described in the original proposal. - Multi-clause abstract patterns CAN handle - examples like ?get_max and ?LOWER, which makes - them even more useful in guards, but more than - a little dubious as patterns. - - Stage 6: "Hybrid" abstract patterns, where in #A/M+N the - first M arguments are always inputs, and only - the last N are outputs. This one isn't actually - my idea. The example - #range(L, U, N) - when is_integer(N), L =< N, N =< U - -> N. - comes from the mailing list. I don't like this very - much, and note that for some purposes, - range(L, U) -> - #fun(N) when is_integer(N), L =< N, N =< U - -> N end. - can do the same job. - - What I've done for this proposal is to strip away everything - that isn't essential. We get data abstraction, user defined - guard tests and functions, and a replacement for many uses - of macros, without run time overheads and without changes to - anything except the compiler front end, assuming that Stage 0 - is done first. +* Stage 3: + #fun [Module:]Name/Arity and + #fun (P1, ..., Pn) when G -> B end + forms are introduced, and a metacall -Backwards Compatibility + #Var(E1,...,En) is added. + + This requires extensions to the Erlang term + representation and the VM. The gain here is that + the FAQ "how do I pass a pattern as a parameter" + finally gets a safe answer. For example, + + collect_messages(P) -> + lists:reverse(collect_messages_loop(P, [])). + + collect_messages_loop(P, Ms) -> + receive M = #P() -> collect_messages_loop([M|Ms]) + after 0 -> Ms + end. + + gathers all the messages currently in the mailbox + that match a pattern passed as a parameter. - Erlang currently uses the sharp sign for record syntax. - Since record syntax uses curly braces, and abstract patterns - use round parentheses, no existing code should be affected. +* Stage 4: + `#` field update, + as described in the original proposal. +* Stage 5: + Multi-clause abstract patterns, + as described in the original proposal. + Multi-clause abstract patterns CAN handle + examples like `?get_max` and `?LOWER`, which makes + them even more useful in guards, but more than + a little dubious as patterns. +* Stage 6: + "Hybrid" abstract patterns, where in `#A/M+N` the + first `M` arguments are always inputs, and only + the last `N` are outputs. This one isn't actually + my idea. The example -Reference Implementation + #range(L, U, N) + when is_integer(N), L =< N, N =< U + -> N. - Sketched above. Given stage 0, this stage 1 is within my - knowledge and abilities, but I don't understand the Erlang - VM well enough to do stage 0. + comes from the mailing list. I don't like this very + much, and note that for some purposes, + range(L, U) -> + #fun(N) when is_integer(N), L =< N, N =< U + -> N end. + can do the same job. -References +What I've done for this proposal is to strip away everything +that isn't essential. We get data abstraction, user defined +guard tests and functions, and a replacement for many uses +of macros, without run time overheads and without changes to +anything except the compiler front end, assuming that Stage 0 +is done first. -Copyright +Backwards Compatibility +======================= + +Erlang currently uses the sharp sign for record syntax. +Since record syntax uses curly braces, and abstract patterns +use round parentheses, no existing code should be affected. + + - This document has been placed in the public domain. +Reference Implementation +======================== + +Sketched above. Given stage 0, this stage 1 is within my +knowledge and abilities, but I don't understand the Erlang +VM well enough to do stage 0. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +Copyright +========= + +This document has been placed in the public domain. +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0030.md b/eeps/eep-0030.md index b495f13..ef3b38e 100644 --- a/eeps/eep-0030.md +++ b/eeps/eep-0030.md @@ -1,142 +1,147 @@ -EEP: 30 -Title: Maximum and Minimum -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe -Status: Draft -Type: Standards Track -Erlang-Version: R12B-4 -Content-Type: text/plain -Created: 10-Jul-2008 -Post-History: + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Erlang-Version: R12B-4 + Created: 10-Jul-2008 + Post-History: +**** +EEP 30: Maximum and Minimum +---- Abstract +======== - Add maximum and minimum core functions. +Add maximum and minimum core functions. Specification +============= - Currently the Erlang language has no built-in support for - the maximum and minimum operations. So we add new functions +Currently the Erlang language has no built-in support for +the maximum and minimum operations. So we add new functions - erlang:min(E1, E2) with the same effects and value as - (T1 = E1, T2 = E2, if T1 > T2 -> T2 ; true -> T1 end) +`erlang:min(E1, E2)` with the same effects and value as - erlang:max(E1, E2) with the same effects and value as - (T1 = E1, T2 = E2, if T1 > T2 -> T1 ; true -> T2 end) + (T1 = E1, T2 = E2, if T1 > T2 -> T2 ; true -> T1 end) - except that we expect them to be implemented using single VM - instructions, and we expect HiPE to use conditional moves on - machines that have them. +`erlang:max(E1, E2)` with the same effects and value as - The erlang: module prefix on max/2 (respectively min/2) can - be omitted if and only if there is no locally defind max/2 - (respectively min/2). + (T1 = E1, T2 = E2, if T1 > T2 -> T1 ; true -> T2 end) +except that we expect them to be implemented using single VM +instructions, and we expect HiPE to use conditional moves on +machines that have them. + +The `erlang:` module prefix on `max/2` (respectively `min/2`) can +be omitted if and only if there is no locally defind `max/2` +(respectively `min/2`). -Motivation - Maximum and minimum are extremely useful operations. - The fact that there is no standard way to express them in Erlang - has had the predictable result: there are definitions of max/2 - in tool_utils, tv_pg_gridfcns, tv_pb, tv_comm_func, - ssh_connection_handler, bssh_connection_handler, ssh_cli, - hipe_arm, hipe_schedule, hipe_ultra_prio, hipe_ppc_frame, - ?HIPE_X86_FRAME (presumably one each for 32- and 64-bit PCs), - hipe_sparc_frame, erl_recomment, erl_syntax_lib, appmon_info, - oh, the list goes on and on. There are dozens of copies. - There are nearly as many copies of min/2. And that's leaving - aside possible copies with different names. - - Not only are the operations useful, they can be implemented - more efficiently by the compiler than by the programmer. - If X < Y can be a VM instruction, so can min and max. - Here's a first draft implementation: - - OpCase(i_minimum): { - r(0) = CMP_GT(tmp_arg1, tmp_arg2)) ? tmp_arg1 : tmp_arg2; - Next(1); - } - OpCase(i_maximum): { - r(0) = CMP_GT(tmp_arg1, tmp_arg2)) ? tmp_arg2 : tmp_arg1; - Next(1); - } - - Beware: untested code! Amongst other things, I don't know all the - places that need to be updated, or how, when new instructions are - added. These instructions are intended to be preceded by an - i_fetch instruction the way < and its other friends are. - - This is much cheaper than an Erlang function call, and it's much - easier for HiPE to recognise when a maximum or minimum of two - floating point numbers is involved and can be turned into a - compare and a conditional move. - - The most important thing is the barrier to thought that is - removed. When I'm writing Fortran, I know that max and min have - been there for decades, and I use those operations freely. - When I'm writing C, I know that those operations are not there, - and that there are problems with the conventional macros, so - I avoid them. As an experiment, I added max() and min() functions - to the version of AWK that I maintain. It was easy, and the - result is that I now have a lot of AWK code that can't be run by - anything else, because the operations are so handy. Erlang has - no *documented* maximum or minimum functions other than those in - the 'lists' module, and writing lists:max([X,Y]) is sufficiently - painful to deter all but the most determined. +Motivation +========== + +Maximum and minimum are extremely useful operations. +The fact that there is no standard way to express them in Erlang +has had the predictable result: there are definitions of `max/2` +in `tool_utils`, `tv_pg_gridfcns`, `tv_pb`, `tv_comm_func`, +`ssh_connection_handler`, `bssh_connection_handler`, `ssh_cli`, +`hipe_arm`, `hipe_schedule`, `hipe_ultra_prio`, `hipe_ppc_frame`, +`?HIPE_X86_FRAME` (presumably one each for 32- and 64-bit PCs), +`hipe_sparc_frame`, `erl_recomment`, `erl_syntax_lib`, `appmon_info`, +oh, the list goes on and on. There are dozens of copies. +There are nearly as many copies of `min/2`. And that's leaving +aside possible copies with different names. + +Not only are the operations useful, they can be implemented +more efficiently by the compiler than by the programmer. +If `X < Y` can be a VM instruction, so can `min` and `max`. +Here's a first draft implementation: + + OpCase(i_minimum): { + r(0) = CMP_GT(tmp_arg1, tmp_arg2)) ? tmp_arg1 : tmp_arg2; + Next(1); + } + OpCase(i_maximum): { + r(0) = CMP_GT(tmp_arg1, tmp_arg2)) ? tmp_arg2 : tmp_arg1; + Next(1); + } + +Beware: untested code! Amongst other things, I don't know all the +places that need to be updated, or how, when new instructions are +added. These instructions are intended to be preceded by an +`i_fetch` instruction the way < and its other friends are. + +This is much cheaper than an Erlang function call, and it's much +easier for HiPE to recognise when a maximum or minimum of two +floating point numbers is involved and can be turned into a +compare and a conditional move. + +The most important thing is the barrier to thought that is +removed. When I'm writing Fortran, I know that max and min have +been there for decades, and I use those operations freely. +When I'm writing C, I know that those operations are not there, +and that there are problems with the conventional macros, so +I avoid them. As an experiment, I added max() and min() functions +to the version of AWK that I maintain. It was easy, and the +result is that I now have a lot of AWK code that can't be run by +anything else, because the operations are so handy. Erlang has +no *documented* maximum or minimum functions other than those in +the `lists` module, and writing `lists:max([X,Y])` is sufficiently +painful to deter all but the most determined. Rationale +========= - Function or operator? +Function or operator? - I believe that there are excellent reasons to use the standard - /\ and \/ symbols from lattice theory. However, discussion in - the EEPs mailing list showed that the community was divided - into - - people who were familiar with the operators - - people who insisted that they were only Boolean operators - - people who didn't get them at all because they weren't C. +I believe that there are excellent reasons to use the standard +`/\` and `\/` symbols from [lattice][] theory. However, discussion in +the EEPs mailing list showed that the community was divided +into - The ready availability of the operations as a standard part of - the language is much more important than what they are called, - so the second draft of this EEP switched to built in functions - in order to increase acceptance. +- people who were familiar with the operators +- people who insisted that they were only Boolean operators +- people who didn't get them at all because they weren't C. - The argument which finally settled it for me was the - internationalisation one: Japanese programmers may be using - keyboards where \ means or screens where \ displays as Yen, - so /\ and \/ just won't work for them. +The ready availability of the operations as a standard part of +the language is much more important than what they are called, +so the second draft of this EEP switched to built in functions +in order to increase acceptance. - We cannot use max and min as operators because the compiler - will not let you use a symbol as both an operator and a function - name, and there are lots and lots of uses of max and min as - function names. That's precisely the problem we're trying to - address here. So they have to be function names. +The argument which finally settled it for me was the +internationalisation one: Japanese programmers may be using +keyboards where `\` means or screens where `\` displays as Yen, +so `/\` and `\/` just won't work for them. - There is no great difficulty in adding new functions to the - erlang: module. +We cannot use `max` and `min` as operators because the compiler +will not let you use a symbol as both an operator and a function +name, and there are lots and lots of uses of `max` and `min` as +function names. That's precisely the problem we're trying to +address here. So they have to be function names. - I don't want to write the erlang: prefix here. There is - nothing new in making the erlang: prefix for some functions - optional either. +There is no great difficulty in adding new functions to the +`erlang:` module. - What we want is for existing modules with their own definitions - of max/2 and/or min/2 to remain legal, and then to be upgraded - simply by removing the redundant definitions. +I don't want to write the `erlang:` prefix here. There is +nothing new in making the `erlang:` prefix for some functions +optional either. - Imagine that you want to find the bounding box for a set - of 2D points. (This is adapted from code in Wings3D.) +What we want is for existing modules with their own definitions +of `max/2` and/or `min/2` to remain legal, and then to be upgraded +simply by removing the redundant definitions. - bounding_box([{X0,Y0}|Pts]) -> - bounding_box(Pts, X0,X0, Y0,Y0). +Imagine that you want to find the bounding box for a set +of 2D points. (This is adapted from code in Wings3D.) + bounding_box([{X0,Y0}|Pts]) -> + bounding_box(Pts, X0,X0, Y0,Y0). + bounding_box([{X,Y}|Pts], Xlo,Xhi, Ylo,Yhi) -> if X < Xlo -> Xlo1 = X, Xhi1 = Xhi ; X > Xhi -> Xlo1 = Xlo, Xhi1 = X @@ -148,51 +153,52 @@ Rationale end, bounding_box(Pts, Xlo1,Xhi1, Ylo1,Yhi1); bounding_box([], Xlo,Xhi, Ylo,Yhi) -> - {{Xlo,Ylo}, {Xhi,Yhi}}. + {{Xlo,Ylo}, {Xhi,Yhi}}. - With maximum and minimum operators, this becomes +With maximum and minimum operators, this becomes bounding_box([{X,Y}|Pts], Xlo,Xhi, Ylo,Yhi) -> bounding_box(Pts, min(X,Xlo), max(X,Xhi), - min(Y,Ylo), max(Y,Yhi)); + min(Y,Ylo), max(Y,Yhi)); bounding_box([], Xlo,Xhi, Ylo,Yhi) -> - {{Xlo,Ylo}, {Xhi,Yhi}}. + {{Xlo,Ylo}, {Xhi,Yhi}}. Backwards Compatibility +======================= - No issues. Where a module already has max/2 or min/2, - the erlang: prefix is required to get the new function. +No issues. Where a module already has `max/2` or `min/2`, +the `erlang:` prefix is required to get the new function. Reference Implementation +======================== - I don't understand BEAM or the compiler well enough to - provide one, but the instruction definitions above are - offered as evidence that it should not be hard for those - who do. If this EEP is accepted I will be happy to write - the documentation for these operators. +I don't understand BEAM or the compiler well enough to +provide one, but the instruction definitions above are +offered as evidence that it should not be hard for those +who do. If this EEP is accepted I will be happy to write +the documentation for these operators. -References - - http://mathworld.wolfram.com/Lattice.html - +[lattice]: http://mathworld.wolfram.com/Lattice.html + "Lattice Algebra" Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0031.md b/eeps/eep-0031.md index 1fc31a6..e26d8e7 100644 --- a/eeps/eep-0031.md +++ b/eeps/eep-0031.md @@ -1,32 +1,33 @@ -EEP: 31 -Title: Binary manipulation and searching module -Version: 1.2 -Last-Modified: 18-Dec-2009 12:12 -Author: Patrik Nyblom, Fredrik Svahn -Status: Draft -Type: Standards Track -Content-Type: text/x-rst -Created: 28-Nov-2009 -Erlang-Version: R13B03 -Post-History: + Author: Patrik Nyblom, Fredrik Svahn + Status: Draft + Type: Standards Track + Created: 28-Nov-2009 + Erlang-Version: R13B03 + Post-History: +**** +EEP 31: Binary manipulation and searching module +---- + Abstract ======== This EEP contains developed suggestions regarding the module ``binary`` -first suggested in EEP 9. +first suggested in [EEP 9][]. -EEP 9 suggests several modules and is partially superseded by later -EEP's (i.e. EEP 11), while still containing valuable suggestions not -yet implemented. The remaining modules from EEP 9 will therefore +[EEP 9][] suggests several modules and is partially superseded by later +EEP's (i.e. [EEP 11][]), while still containing valuable suggestions not +yet implemented. The remaining modules from [EEP 9][] will therefore appear in separate EEP's. This construction is made in agreement with -the original author of EEP 9. +the original author of [EEP 9][]. The module ``binary`` is suggested to contain fast searching algorithms together with some common operations on binaries already present for lists (in the lists module). - + + + Motivation ========== @@ -49,7 +50,7 @@ programming style. Some operations for converting lists to binaries and v.v. are today in the erlang module. BIFs concerning binaries now present have varied view of zero vs. one-based positioning in binaries. I.e. -binay_to_list/3 uses one-based while split_binary/2 uses +``binay_to_list/3`` uses one-based while ``split_binary/2`` uses zero-based. As the convention is to use zero-based, new functions for convertion binaries to lists and v.v. are needed. @@ -66,6 +67,8 @@ that a function for extracting parts of binaries is added to the set of guard BIFs. This would be consistent with the function element/2 being allowed in guards. + + Rationale ========= @@ -83,41 +86,42 @@ functionality in a forthcoming Erlang release. The functionality suggested is the following: -- Functionality for searching, splitting and replacing in - binaries. The functionality in some ways will overlap that of the - regular expression library already present in Erlang, but will be - even more efficient and will have a simpler interface. - -- Common operations on binaries that have their counterparts for lists - already in the stdlib module ``lists``. While not all interfaces in - the ``lists`` module are applicable to binaries, many are. This module - also provides a good place for future operations on binaries, - operations that are not applicable to lists or that we still don't - know the need for. - -- Functions for converting lists to binaries and v.v. These functions - should have a consistent view of zero-based indexing in binaries. - -- Operations on binaries concerning their internal - representation. This functionality is sometimes necessary to avoid - extensive use of memory due to the shared nature of the binaries. As - operations on binaries do not involve copying when binaries are - taken apart, programs can unknowingly (or at least unintentionally) - keep references to large binaries by holding seemingly small amounts - of data in the process. The O(1) nature of many operations on - binaries makes the data sharing necessary, but the effects can - sometimes be surprising. On the other hand, O(n) complexity and - instant memory explosions when splitting a binary would be even more - surprising, why the current behavior need to be retained. It is suggested - that functions for both inspecting the nature of sharing of a binary - and to clone a copy of a binary to avoid sharing effects is present - in this suggested module. +- Functionality for searching, splitting and replacing in + binaries. The functionality in some ways will overlap that of the + regular expression library already present in Erlang, but will be + even more efficient and will have a simpler interface. + +- Common operations on binaries that have their counterparts for lists + already in the stdlib module ``lists``. While not all interfaces in + the ``lists`` module are applicable to binaries, many are. This module + also provides a good place for future operations on binaries, + operations that are not applicable to lists or that we still don't + know the need for. + +- Functions for converting lists to binaries and v.v. These functions + should have a consistent view of zero-based indexing in binaries. + +- Operations on binaries concerning their internal + representation. This functionality is sometimes necessary to avoid + extensive use of memory due to the shared nature of the binaries. As + operations on binaries do not involve copying when binaries are + taken apart, programs can unknowingly (or at least unintentionally) + keep references to large binaries by holding seemingly small amounts + of data in the process. The O(1) nature of many operations on + binaries makes the data sharing necessary, but the effects can + sometimes be surprising. On the other hand, O(n) complexity and + instant memory explosions when splitting a binary would be even more + surprising, why the current behavior need to be retained. It is suggested + that functions for both inspecting the nature of sharing of a binary + and to clone a copy of a binary to avoid sharing effects is present + in this suggested module. All functionality is to be applied to byte oriented binaries, never bitstrings that do not have a bitlength that is a multiple of eight. All binaries supplied to and returned by these functions should -pass the is_binary/1 test, otherwise an error will be raised. +pass the ``is_binary/1`` test, otherwise an error will be raised. + Suggested module reference -------------------------- @@ -125,19 +129,18 @@ Suggested module reference I suggest the following functionality (presented as an excerpt of an Erlang manual pages). A discussion about the interface can be found below. -DATA TYPES -.......... +### DATA TYPES ### -**cp()** + cp() Opaque data-type representing a compiled search-pattern. guaranteed to be a tuple() to allow programs to distinguish it from non precompiled search patterns. -**part() = {Pos,Length}** - -- Start = int() -- Length = int() + part() = {Pos,Length} + + Start = int() + Length = int() A representaion of a part (or range) in a binary. ``Start`` is a zero-based offset into a binary() and Length is the length of that @@ -148,14 +151,13 @@ that the part of the binary begins at ``Start`` + ``Length`` and is of a binary as ``{size(Binary), -N}``. The functions in this module always return part()'s with positive ``Length``. -EXPORTS -....... +### EXPORTS ### -**compile_pattern(Pattern) -> cp()** +#### ``compile_pattern(Pattern) -> cp()`` Types: -- Pattern = binary() | [ binary() ] + Pattern = binary() | [ binary() ] Builds an internal structure representing a compilation of a search-pattern, later to be used in the find, split or replace @@ -171,28 +173,28 @@ only a single binary is given, the set has only one element. If pattern is not a binary or a flat proper list of binaries, a ``badarg`` exception will be raised. -**match(Subject, Pattern) -> Found |** ``no`` +#### ``match(Subject, Pattern) -> Found | no`` Types: -- Subject = binary() -- Pattern = binary() | [ binary() ] | cp() -- Found = part() + Subject = binary() + Pattern = binary() | [ binary() ] | cp() + Found = part() -The same as match(Subject, Pattern, []). +The same as ``match(Subject, Pattern, [])``. -**match(Subject,Pattern,Options) -> Found |** ``no`` +#### ``match(Subject,Pattern,Options) -> Found | no`` Types: -- Subject = binary() -- Pattern = binary() | [ binary() ] | cp() -- Found = part() -- Options = [ Option ] -- Option = {scope, part()} + Subject = binary() + Pattern = binary() | [ binary() ] | cp() + Found = part() + Options = [ Option ] + Option = {scope, part()} -Searches for the first occurrence of Pattern in Subject and returns -the position and lengt. +Searches for the first occurrence of ``Pattern`` in ``Subject`` and returns +the position and length. The function will return ``{Pos,Length}`` for the binary in ``Pattern`` starting at @@ -207,40 +209,40 @@ matches begins at the same position, the longest is returned. Summary of the options: -``{scope, {Start, Length}}`` - Only the given part is searched. Return values still have offsets - from the beginning of ``Subject``. A negative ``Length`` is - allowed as described in the **TYPES** section of this manual. +* ``{scope, {Start, Length}}`` + Only the given part is searched. Return values still have offsets + from the beginning of ``Subject``. A negative ``Length`` is + allowed as described in the **TYPES** section of this manual. -The found part() is returned, if none of the strings in ``Pattern`` is -found, the atom ``no`` is returned. + The found part() is returned, if none of the strings in ``Pattern`` is + found, the atom ``no`` is returned. -For a descrition of ``Pattern``, see ``compile_pattern/1``. + For a descrition of ``Pattern``, see ``compile_pattern/1``. -If ``{scope, {Start,Length}}`` is given in the options such that -``Start`` is larger than the size of ``Subject``, ``Start`` + -``Length`` is less than zero or ``Start`` + ``Length`` is larger than -the size of ``Subject``, a ``badarg`` exception is raised. + If ``{scope, {Start,Length}}`` is given in the options such that + ``Start`` is larger than the size of ``Subject``, ``Start`` + + ``Length`` is less than zero or ``Start`` + ``Length`` is larger than + the size of ``Subject``, a ``badarg`` exception is raised. -**matches(Subject, Pattern) -> Found** +#### ``matches(Subject, Pattern) -> Found`` Types: -- Subject = binary() -- Pattern = binary() | [ binary() ] | cp() -- Found = [ part() ] | [] + Subject = binary() + Pattern = binary() | [ binary() ] | cp() + Found = [ part() ] | [] -The same as matches(Subject, Pattern, []). +The same as ``matches(Subject, Pattern, [])``. -**matches(Subject,Pattern,Options) -> Found** +#### ``matches(Subject,Pattern,Options) -> Found`` Types: -- Subject = binary() -- Pattern = binary() | [ binary() ] | cp() -- Found = [ part() ] | [] -- Options = [ Option ] -- Option = {scope, part()} + Subject = binary() + Pattern = binary() | [ binary() ] | cp() + Found = [ part() ] | [] + Options = [ Option ] + Option = {scope, part()} Works like match, but the ``Subject`` is search until exhausted and a list of all non-overlapping parts present in Pattern are returned (in order). @@ -269,25 +271,25 @@ If ``{scope, {Start,Length}}`` is given in the options such that ``Length`` is less than zero or ``Start`` + ``Length`` is larger than the size of ``Subject``, a ``badarg`` exception is raised. -**split(Subject,Pattern) -> Parts** +#### ``split(Subject,Pattern) -> Parts`` Types: -- Subject = binary() -- Pattern = binary() | [ binary() ] | cp() -- Parts = [ binary() ] + Subject = binary() + Pattern = binary() | [ binary() ] | cp() + Parts = [ binary() ] -The same as split(Subject, Pattern, []). +The same as ``split(Subject, Pattern, [])``. -**split(Subject,Pattern,Options) -> Parts** +#### ``split(Subject,Pattern,Options) -> Parts`` Types: -- Subject = binary() -- Pattern = binary() | [ binary() ] | cp() -- Parts = [ binary() ] -- Options = [ Option ] -- Option = {scope, part()} | trim | global + Subject = binary() + Pattern = binary() | [ binary() ] | cp() + Parts = [ binary() ] + Options = [ Option ] + Option = {scope, part()} | trim | global Splits Binary into a list of binaries based on ``Pattern``. If the option ``global``is not given, only the first occurrence of @@ -298,38 +300,39 @@ in the result. Example:: - 1> binary:split(<<1,255,4,0,0,0,2,3>>, [<<0,0,0>>,<<2>>],[]). - [<<1,255,4>>, <<2,3>>] - 2> binary:split(<<0,1,0,0,4,255,255,9>>, [<<0,0>>, <<255,255>>],[global]). - [<<0,1>>,<<4>>,<<9>>] + 1> binary:split(<<1,255,4,0,0,0,2,3>>, [<<0,0,0>>,<<2>>],[]). + [<<1,255,4>>, <<2,3>>] + 2> binary:split(<<0,1,0,0,4,255,255,9>>, [<<0,0>>, <<255,255>>],[global]). + [<<0,1>>,<<4>>,<<9>>] Summary of options: -``{scope, part()}`` - Works as in ``binary:match/3`` and ``binary:matches/3``. Note - that this only defines the scope of the search for matching - strings, it does not cut the binary before splitting. The - bytes before and after the scope will be kept in the result. See example below. +* ``{scope, part()}`` + Works as in ``binary:match/3`` and ``binary:matches/3``. Note + that this only defines the scope of the search for matching + strings, it does not cut the binary before splitting. The + bytes before and after the scope will be kept in the result. + See example below. -``trim`` - Removes trailing empty parts of the result (as does ``trim`` - in ``re:split/3``) +* ``trim`` + Removes trailing empty parts of the result (as does ``trim`` + in ``re:split/3``) -``global`` - Repeats the split until the ``Subject`` is - exhausted. Conceptually the ``global`` option makes ``split`` - work on the positions returned by ``binary:matches/3``, while - it normally works on the position returned by - ``binary:match/3``. +* ``global`` + Repeats the split until the ``Subject`` is + exhausted. Conceptually the ``global`` option makes ``split`` + work on the positions returned by ``binary:matches/3``, while + it normally works on the position returned by + ``binary:match/3``. Example of the difference between a ``scope`` and taking the binary apart before splitting:: - 1> binary:split(<<"banana">>,[<<"a">>],[{scope,{2,3}}]). - [<<"ban">>,<<"na">>] - 2> binary:split(binary:part(<<"banana">>,{2,3}),[<<"a">>],[]). - [<<"n">>,<<"n">>] + 1> binary:split(<<"banana">>,[<<"a">>],[{scope,{2,3}}]). + [<<"ban">>,<<"na">>] + 2> binary:split(binary:part(<<"banana">>,{2,3}),[<<"a">>],[]). + [<<"n">>,<<"n">>] The return type is always a list of binaries which are all referencing ``Subject``. This means that the data in ``Subject`` is not actually @@ -338,29 +341,29 @@ collected until the results of the split are no longer referenced. For a descrition of ``Pattern``, see ``compile_pattern/1``. -**replace(Subject,Pattern,Replacement) -> Result** +#### ``replace(Subject,Pattern,Replacement) -> Result`` Types: -- Subject = binary() -- Pattern = binary() | [ binary() ] | cp() -- Replacement = binary() -- Result = binary() + Subject = binary() + Pattern = binary() | [ binary() ] | cp() + Replacement = binary() + Result = binary() -The same as replace(Subject,Pattern,Replacement,[]). +The same as ``replace(Subject,Pattern,Replacement,[])``. -**replace(Subject,Pattern,Replacement,Options) -> Result** +#### ``replace(Subject,Pattern,Replacement,Options) -> Result`` Types: -- Subject = binary() -- Pattern = binary() | [ binary() ] | cp() -- Replacement = binary() -- Result = binary() -- Options = [ Option ] -- Option = global | {scope, part()} | {insert_replaced, InsPos} -- InsPos = OnePos | [ OnePos ] -- OnePos = int() =< byte_size(Replacement) + Subject = binary() + Pattern = binary() | [ binary() ] | cp() + Replacement = binary() + Result = binary() + Options = [ Option ] + Option = global | {scope, part()} | {insert_replaced, InsPos} + InsPos = OnePos | [ OnePos ] + OnePos = int() =< byte_size(Replacement) Constructs a new binary by replacing the parts in ``Subject`` matching ``Pattern`` with the content of ``Replacement``. @@ -371,173 +374,173 @@ replacement is to be inserted in the result, the option ``Replacement`` at the given position (or positions) before actually inserting ``Replacement`` into the Subject. Example:: - 1> binary:replace(<<"abcde">>,<<"b">>,<<"[]">>,[{insert_replaced,1}]). - <<"a[b]cde">> - 2> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>, - [global,{insert_replaced,1}]). - <<"a[b]c[d]e">> - 3> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>, - [global,{insert_replaced,[1,1]}]). - <<"a[bb]c[dd]e">> - 4> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[-]">>, - [global,{insert_replaced,[1,2]}]). - <<"a[b-b]c[d-d]e">> - -If any position given in InsPos is greater than the size of the replacement binary, a -``badarg`` exception is raised. + 1> binary:replace(<<"abcde">>,<<"b">>,<<"[]">>,[{insert_replaced,1}]). + <<"a[b]cde">> + 2> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>, + [global,{insert_replaced,1}]). + <<"a[b]c[d]e">> + 3> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>, + [global,{insert_replaced,[1,1]}]). + <<"a[bb]c[dd]e">> + 4> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[-]">>, + [global,{insert_replaced,[1,2]}]). + <<"a[b-b]c[d-d]e">> + +If any position given in ``InsPos`` is greater than the size of +the replacement binary, a ``badarg`` exception is raised. The options ``global`` and ``{scope, part()}`` works as for ``binary:split/3``. The return type is always a binary. For a descrition of ``Pattern``, see ``compile_pattern/1``. -**longest_common_prefix(Binaries) -> int()** +#### ``longest_common_prefix(Binaries) -> int()`` Types: -- Binaries = [ binary() ] + Binaries = [ binary() ] Returns the length of the longest common prefix of the binaries in the list ``Binaries``. Example:: - 1> binary:longest_common_prefix([<<"erlang">>,<<"ergonomy">>]). - 2 - 2> binary:longest_common_prefix([<<"erlang">>,<<"perl">>]). - 0 + 1> binary:longest_common_prefix([<<"erlang">>,<<"ergonomy">>]). + 2 + 2> binary:longest_common_prefix([<<"erlang">>,<<"perl">>]). + 0 If ``Binaries`` is not a flat list of binaries, a ``badarg`` exception is raised. -**longest_common_suffix(Binaries) -> int()** +#### ``longest_common_suffix(Binaries) -> int()`` Types: -- Binaries = [ binary() ] + Binaries = [ binary() ] Returns the length of the longest common suffix of the binaries in the list ``Binaries``. Example:: - 1> binary:longest_common_suffix([<<"erlang">>,<<"fang">>]). - 3 - 2> binary:longest_common_suffix([<<"erlang">>,<<"perl">>]). - 0 + 1> binary:longest_common_suffix([<<"erlang">>,<<"fang">>]). + 3 + 2> binary:longest_common_suffix([<<"erlang">>,<<"perl">>]). + 0 If ``Binaries`` is not a flat list of binaries, a ``badarg`` exception is raised. -**first(Subject) -> int()** +#### ``first(Subject) -> int()`` Types: -- Subject = binary() + Subject = binary() Returns the first byte of the binary as an integer. If the binary length is zero, a ``badarg`` exception is raised. -**last(Subject) -> int()** +#### ``last(Subject) -> int()`` Types: -- Subject = binary() + Subject = binary() Returns the last byte of the binary as an integer. If the binary length is zero, a ``badarg`` exception is raised. -**at(Subject, Pos) -> int()** +#### ``at(Subject, Pos) -> int()`` Types: -- Subject = binary() -- Pos = int() >= 0 + Subject = binary() + Pos = int() >= 0 Returns the byte at position ``Pos`` (zero-based) in the binary -``Subject`` as an integer. If ``Pos`` >= byte_size(Subject), a +``Subject`` as an integer. If ``Pos`` >= ``byte_size(Subject)``, a ``badarg`` exception is raised. -**part(Subject, PosLen) -> binary()** +#### ``part(Subject, PosLen) -> binary()`` Types: -- Subject = binary() -- PosLen = part() + Subject = binary() + PosLen = part() Extracts the part of the binary described by ``PosLen``. Negative length can be used to extract bytes at the end of a binary:: - 1> Bin = <<1,2,3,4,5,6,7,8,9,10>>. - 2> binary:part(Bin,{byte_size(Bin), -5)). - <<6,7,8,9,10>> + 1> Bin = <<1,2,3,4,5,6,7,8,9,10>>. + 2> binary:part(Bin,{byte_size(Bin), -5)). + <<6,7,8,9,10>> If ``PosLen`` in any way references outside the binary, a ``badarg`` exception is raised. -**part(Subject, Pos, Len) -> binary()** +#### ``part(Subject, Pos, Len) -> binary()`` Types: -- Subject = binary() -- Pos = int() -- Len = int() + Subject = binary() + Pos = int() + Len = int() -The same as part(Subject, {Pos, Len}). +The same as ``part(Subject, {Pos, Len})``. -**bin_to_list(Subject) -> list()** +#### ``bin_to_list(Subject) -> list()`` Types: -- Subject = binary() + Subject = binary() -The same as bin_to_list(Subject,{0,byte_size(Subject)}). +The same as ``bin_to_list(Subject,{0,byte_size(Subject)})``. -**bin_to_list(Subject, PosLen) -> list()** +#### ``bin_to_list(Subject, PosLen) -> list()`` -- Subject = binary() -- PosLen = part() + Subject = binary() + PosLen = part() Converts ``Subject`` to a list of int(), each int representing the value of one byte. The ``part()`` denotes which part of the ``binary()`` to convert. Example:: - 1> binary:bin_to_list(<<"erlang">>,{1,3}). - "rla" - %% or [114,108,97] in list notation. + 1> binary:bin_to_list(<<"erlang">>,{1,3}). + "rla" + %% or [114,108,97] in list notation. If ``PosLen`` in any way references outside the binary, a ``badarg`` exception is raised. -**bin_to_list(Subject, Pos, Len) -> list()** +### ``bin_to_list(Subject, Pos, Len) -> list()`` Types: -- Subject = binary() -- Pos = int() -- Len = int() + Subject = binary() + Pos = int() + Len = int() -The same as bin_to_list(Subject,{Pos,Len}). +The same as ``bin_to_list(Subject,{Pos,Len})``. -**list_to_bin(ByteList) -> binary()** +#### ``list_to_bin(ByteList) -> binary()`` Types: -- ByteList = iodata() (see module erlang) + ByteList = iodata() (see module erlang) -Works exactly like erlang:list_to_binary/1, added for completeness. +Works exactly like ``erlang:list_to_binary/1``, added for completeness. -**copy(Subject) -> binary()** +#### ``copy(Subject) -> binary()`` Types: -- Subject = binary() + Subject = binary() -The same as copy(Subject, 1). +The same as ``copy(Subject, 1)``. -**copy(Subject,N) -> binary()** +### ``copy(Subject,N) -> binary()`` Types: -- Subject = binary() -- N = int() >= 0 + Subject = binary() + N = int() >= 0 Creates a binary with the content of ``Subject`` duplicated ``N`` times. @@ -546,16 +549,16 @@ This function will always create a new binary, even if ``N`` = 1. By using ``copy/1`` on a binary referencing a larger binary, one might free up the larger binary for garbage collection. -NOTE! By deliberately copying a single binary to avoid referencing a -larger binary, one might, instead of freeing up the larger binary for -later garbage collection, create much more binary data than -needed. Sharing binary data is usually good. Only in special cases, -when small parts reference large binaries and the large binaries are -no longer used *in any process*, deliberate copying might be a good idea. +> NOTE! By deliberately copying a single binary to avoid referencing a +> larger binary, one might, instead of freeing up the larger binary for +> later garbage collection, create much more binary data than +> needed. Sharing binary data is usually good. Only in special cases, +> when small parts reference large binaries and the large binaries are +> no longer used *in any process*, deliberate copying might be a good idea. If ``N`` < 0, a ``badarg`` exception is raised. -**referenced_byte_size(binary()) -> int()** +#### ``referenced_byte_size(binary()) -> int()`` If a binary references a larger binary (often described as being a sub-binary), it can be useful to get the size of the actual referenced @@ -566,18 +569,18 @@ to. Example:: - store(Binary, GBSet) -> - NewBin = - case binary:referenced_byte_size(Binary) of - Large when Large > 2 * byte_size(Binary) -> - binary:copy(Binary); - _ -> - Binary - end, - gb_sets:insert(NewBin,GBSet). + store(Binary, GBSet) -> + NewBin = + case binary:referenced_byte_size(Binary) of + Large when Large > 2 * byte_size(Binary) -> + binary:copy(Binary); + _ -> + Binary + end, + gb_sets:insert(NewBin,GBSet). In this example, we chose to copy the binary content before inserting -it in the gb_set() if it references a binary more than twice the size +it in the ``gb_set()`` if it references a binary more than twice the size of the data we're going to keep. Of course different rules for when copying will apply to different programs. @@ -589,87 +592,89 @@ might be useful when optimizing for memory use. Example of binary sharing:: - 1> A = binary:copy(<<1>>,100). - <<1,1,1,1,1 ... - 2> byte_size(A). - 100 - 3> binary:referenced_byte_size(A) - 100 - 4> <<_:10/binary,B:10/binary,_/binary>> = A. - <<1,1,1,1,1 ... - 5> byte_size(B). - 10 - 6> binary:referenced_byte_size(B) - 100 - -NOTE! Binary data is shared among processes. If another process still -references the larger binary, copying the part this process uses only -consumes more memory and will not free up the larger binary for garbage -collection. Use this kind of intrusive functions with extreme care, -and only if a *real* problem is detected. + 1> A = binary:copy(<<1>>,100). + <<1,1,1,1,1 ... + 2> byte_size(A). + 100 + 3> binary:referenced_byte_size(A) + 100 + 4> <<_:10/binary,B:10/binary,_/binary>> = A. + <<1,1,1,1,1 ... + 5> byte_size(B). + 10 + 6> binary:referenced_byte_size(B) + 100 + +> NOTE! Binary data is shared among processes. If another process still +> references the larger binary, copying the part this process uses only +> consumes more memory and will not free up the larger binary for garbage +> collection. Use this kind of intrusive functions with extreme care, +> and only if a *real* problem is detected. -**encode_unsigned(Unsigned) -> binary()** +#### ``encode_unsigned(Unsigned) -> binary()`` Types: -- Unsigned = int() >= 0 + Unsigned = int() >= 0 -The same as encode_unsigned(Unsigned,big). +The same as ``encode_unsigned(Unsigned,big)``. -**encode_unsigned(Unsigned,Endianess) -> binary()** +#### ``encode_unsigned(Unsigned,Endianess) -> binary()`` Types: -- Unsigned = int() >= 0 -- Endianess = big | little + Unsigned = int() >= 0 + Endianess = big | little Converts a positive integer the smallest possible representation in in a binary digit representation, either big or little endian. -Example:: +Example: - 1> binary:encode_unsigned(11111111,big). - <<169,138,199>> + 1> binary:encode_unsigned(11111111,big). + <<169,138,199>> -**decode_unsigned(Subject) -> Unsigned** +#### ``decode_unsigned(Subject) -> Unsigned`` Types: -- Subject = binary() -- Unsigned = int() >= 0 + Subject = binary() + Unsigned = int() >= 0 -The same as encode_unsigned(Subject,big). +The same as ``encode_unsigned(Subject,big)``. -**decode_unsigned(Subject, Endianess) -> Unsigned** +#### ``decode_unsigned(Subject, Endianess) -> Unsigned`` Types: -- Subject = binary() -- Endianess = big | little -- Unsigned = int() >= 0 + Subject = binary() + Endianess = big | little + Unsigned = int() >= 0 Converts the binary digit representation, in big or little endian, of -a positive integer in Subject to an Erlang int(). +a positive integer in ``Subject`` to an Erlang int(). Example:: - 1> binary:decode_unsigned(<<169,138,199>>,big). - 11111111 + 1> binary:decode_unsigned(<<169,138,199>>,big). + 11111111 + Guard BIF --------- -I suggest adding the functions binary:part/2 and binary:part/3 to the -set of BIFs allowed in guard tests. As guard BIFs are traditionally +I suggest adding the functions ``binary:part/2`` and ``binary:part/3`` +to the set of BIFs allowed in guard tests. As guard BIFs are traditionally put in the erlang module, the following names for the guard BIFs are suggested:: - erlang:binary_part/2 - erlang:binary_part/3 + erlang:binary_part/2 + erlang:binary_part/3 They should both work exactly as their counterparts in the binary module. + Interface design discussion --------------------------- @@ -677,60 +682,62 @@ As with all modules, there are a lot of arguments about the actual interface, sometimes more than about the functionality. In this case a number of parameters has to be considered. -- Effectiveness - The interface should be constructed so that fast - implementation is possible and so that code using the interface can - be written in an effective way. To not create unnecessary garbage is - one parameter, to allow for general code is another. - -- Parameter ordering - I've chosen to make the binary subject the - first parameter in all applicable calls. Putting the subject first - corresponds to the ``re`` interface. The ``lists`` module, however, - usually has the subject as last parameter. We could go for that - instead, but unfortunately the ``lists:sublist/{2,3}`` interface, - which corresponds to the ``part`` function, has the subject - first, why following the conventions of ``lists`` would not only - break conformance with ``re``, it would also give us a generally - non-stringent interface. The effect of not conforming to the - ``lists`` interface is that using function names from that module - would lead to confusion and therefore is avoided. - -- Function naming - We have two related modules to take into account - when naming functions here. The module ``re`` is related to the - searching function (``match``, ``replace`` etc), while the ``lists`` - module is related to the decomposition functions (``first``, - ``last`` etc). - - I've basically retained the names from ``re`` when I find the - functionality, both in concept and interface to be similar - enough. The nature of regular expressions as small executable - programs, which is to much to say for a collection of binaries as - the patterns are in this module, prohibits the use of the function - name ``run`` for actually doing the searching. We use ``match`` and - ``matches`` instead of ``run``. - - As this module is more general than ``re``, a function name like - ``compile`` is not really good. ``re:compile`` means "compile a - regular expression", but what would ``binary:compile`` mean? - Therefore the pre-processing function is instead called - ``compile_pattern``. - - When it comes to the ``lists`` module, the parameter ordering has - prevented me from reusing any function names but ``last``, which - only takes one parameter in ``lists`` and there is no real - alternative there. - -- Options or multiple functions - I believe a good rule of thumb is to - not have options that change the return type of the function, which - would have been the case if we i.e. had a ``global`` option to - ``match/3`` instead of a separate ``matches/3`` function. - - The fact that there are a manageable set of possible return types - for the searching and decomposition functiona allows us to follow - that rule of thumb. - - (Unfortunately that rule could not be easilly followed in ``re``, as the - rich assortment of options would have given rise to a non-manageable - amount of function names). +- Effectiveness - The interface should be constructed so that fast + implementation is possible and so that code using the interface can + be written in an effective way. To not create unnecessary garbage is + one parameter, to allow for general code is another. + +- Parameter ordering - I've chosen to make the binary subject the + first parameter in all applicable calls. Putting the subject first + corresponds to the ``re`` interface. The ``lists`` module, however, + usually has the subject as last parameter. We could go for that + instead, but unfortunately the ``lists:sublist/{2,3}`` interface, + which corresponds to the ``part`` function, has the subject + first, why following the conventions of ``lists`` would not only + break conformance with ``re``, it would also give us a generally + non-stringent interface. The effect of not conforming to the + ``lists`` interface is that using function names from that module + would lead to confusion and therefore is avoided. + +- Function naming - We have two related modules to take into account + when naming functions here. The module ``re`` is related to the + searching function (``match``, ``replace`` etc), while the ``lists`` + module is related to the decomposition functions (``first``, + ``last`` etc). + + I've basically retained the names from ``re`` when I find the + functionality, both in concept and interface to be similar + enough. The nature of regular expressions as small executable + programs, which is to much to say for a collection of binaries as + the patterns are in this module, prohibits the use of the function + name ``run`` for actually doing the searching. We use ``match`` and + ``matches`` instead of ``run``. + + As this module is more general than ``re``, a function name like + ``compile`` is not really good. ``re:compile`` means "compile a + regular expression", but what would ``binary:compile`` mean? + Therefore the pre-processing function is instead called + ``compile_pattern``. + + When it comes to the ``lists`` module, the parameter ordering has + prevented me from reusing any function names but ``last``, which + only takes one parameter in ``lists`` and there is no real + alternative there. + +- Options or multiple functions - I believe a good rule of thumb is to + not have options that change the return type of the function, which + would have been the case if we i.e. had a ``global`` option to + ``match/3`` instead of a separate ``matches/3`` function. + + The fact that there are a manageable set of possible return types + for the searching and decomposition functiona allows us to follow + that rule of thumb. + + (Unfortunately that rule could not be easilly followed in ``re``, as the + rich assortment of options would have given rise to a non-manageable + amount of function names). + + Performance =========== @@ -745,23 +752,38 @@ functionality in the ``re`` module. Implementation methods has to be chosen so that this modules search functions are faster, or possibly even significantly faster, than ``re``. + + Reference implementation ======================== A reference implementation will be included as beta functionality in R13B04. Prereleases may be available via github. -References -========== - [1] EEP 9, the original work from which this EEP is derived. + +[EEP 9]: eep-0009.md + "EEP 9, the original work from which this EEP is derived" + +[EEP 11]: eep-0011.md + "EEP 11, intresting extensions to EEP 9" + +[CCA3.0]: http://creativecommons.org/licenses/by/3.0/ + "Creative Commons Attribution 3.0 License" + Copyright ========= - This document is licensed under the Creative Commons license. - +This document is licensed under the [Creative Commons license][CCA3.0]. +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" diff --git a/eeps/eep-0032.md b/eeps/eep-0032.md index 527273e..a20c9d0 100644 --- a/eeps/eep-0032.md +++ b/eeps/eep-0032.md @@ -1,233 +1,240 @@ -EEP: 32 -Title: Module-local process names -Version: $Revision$ -Last-Modified: $Date$ -Author: Richard A. O'Keefe -Status: Draft -Type: Standards Track -Erlang-Version: R13B-3 -Content-Type: text/plain -Created: 09-Feb-2010 + Author: Richard A. O'Keefe + Status: Draft + Type: Standards Track + Created: 09-Feb-2010 + Erlang-Version: R13B-3 + Post-History: +**** +EEP 32: Module-local process names +---- + + Abstract +======== - The process registry in Erlang is convenient, but counts as - a global shared mutable variable, with two major defects: - the possibility of data races (shared mutable variable) and - the impossibility of encapsulation (global). This EEP - resurrects the old (1997 or earlier) proposal of module- - local process-valued variables, providing a replacement for - node-local uses of the registry with encapsulation and without - races. +The process registry in Erlang is convenient, but counts as +a global shared mutable variable, with two major defects: +the possibility of data races (shared mutable variable) and +the impossibility of encapsulation (global). This EEP +resurrects the old (1997 or earlier) proposal of module- +local process-valued variables, providing a replacement for +node-local uses of the registry with encapsulation and without +races. Specification +============= + +A module (or an instance of a parameterized module) may have +one or more top level pid-valued variables, and if so, has a +lock associated with them. The directive has the form + + -pid_name(Atom). + +where Atom is an atom. To avoid confusing programmers who +still have to deal with the registry, this Atom may not be +'undefined'. + +If there is at least one such directive in a module, the +compiler automatically generates a function called +`pid_name/1`. In the scope of directives + + -pid_name(pn_1). + ... + -pid_name(pn_k). + +the `pid_name/1` function is rather like + + pid_name(pn_1) -> + with_module_lock(read) -> X = *pn_1 end, X; + ... + pid_name(pn_k) -> + with_module_lock(read) -> X = *pn_k end, X. + +except that we expect there to be a VM instruction +`get_pid_safely(Address)`, and we expect the compiler to +inline calls to pid_name(Atom) when Atom is known. +On a machine like the `X86` or `X86_64`, this could be a +single locked load instruction. + +The value of a `-pid_name` is always a process id. +There is a special process id value which at all times represents +a dead process. So within a module, + + pid_name(X) ! Message + +is legal if and only if X is one of the pid-names declared in +the module, and whether or not the process it names has died. + +If there is a need to discover whether a `-pid_name` has within +the recent but unpredictable past been associated with a live +process, that can be found out by combining `pid_name/1` with +`process_info/2`. + +As with the registry, a process may have at most one `pid_name`. +For debugging purposes, I suppose that `process_info` could be +extended to return a `{pid_name,{Module,Name}}` tuple. + +When a process exits, it is automatically unregistered. +That is, if it was bound to a `-pid_name`, that `-pid_name` +now refers to the conventional dead process. This draft of +this EEP includes no other way for a process to be unregistered. + +The important thing about registering a process is that it +should be atomic. So there are two new functions + + pid_name_spawn(Name, Fun) + pid_name_spawn_link(Name, Fun) + +We can understand them as + + pid_name_spawn(Name, Fun) + when is_atom(Name), is_function(Fun, 0) -> + with_module_lock(write) -> + P = *Name, + if P is a live process -> + P + ; P is a dead process -> + Q = spawn(Fun), + *Name := Q, + Q + end + end. + + pid_name_spawn_link(Name, Fun) + when is_atom(Name), is_function(Fun, 0) -> + with_module_lock(write) -> + P = *Name, + if P is a live process -> + P + ; P is a dead process -> + Q = spawn(Fun), + *Name := Q, + Q + end + end. + +Here, as earlier, `with_module_lock` is pseudo-code, meant to +suggest some sort of reader-writer locking on a private lock, +existing only inside a module that has declared a `-pid_name`. + +These two functions are automatically declared inside the +module, like `pid_name/1`. The three functions are not functions +automatically inherited from the `erlang:` module but functions +that are logically inside the module, however they might be +actually implemented. There doesn't seem to be any good +reason for a module to export any of these functions, and the +compiler should at least warn if that is attempted. - A module (or an instance of a parameterized module) may have - one or more top level pid-valued variables, and if so, has a - lock associated with them. The directive has the form - - -pid_name(Atom). - - where Atom is an atom. To avoid confusing programmers who - still have to deal with the registry, this Atom may not be - 'undefined'. - - If there is at least one such directive in a module, the - compiler automatically generates a function called - pid_name/1. In the scope of directives - - -pid_name(pn_1). - ... - -pid_name(pn_k). - - the pid_name/1 function is rather like - - pid_name(pn_1) -> - with_module_lock(read) -> X = *pn_1 end, X; - ... - pid_name(pn_k) -> - with_module_lock(read) -> X = *pn_k end, X. - - except that we expect there to be a VM instruction - get_pid_safely(Address), and we expect the compiler to - inline calls to pid_name(Atom) when Atom is known. - On a machine like the X86 or X86_64, this could be a - single locked load instruction. - - The value of a -pid_name is always a process id. - There is a special process id value which at all times represents - a dead process. So within a module, - - pid_name(X) ! Message - - is legal if and only if X is one of the pid-names declared in - the module, and whether or not the process it names has died. - - If there is a need to discover whether a -pid_name has within - the recent but unpredictable past been associated with a live - process, that can be found out by combining pid_name/1 with - process_info/2. - - As with the registry, a process may have at most one pid_name. - For debugging purposes, I suppose that process_info could be - extended to return a {pid_name,{Module,Name}} tuple. - - When a process exits, it is automatically unregistered. - That is, if it was bound to a -pid_name, that -pid_name - now refers to the conventional dead process. This draft of - this EEP includes no other way for a process to be unregistered. - - The important thing about registering a process is that it - should be atomic. So there are two new functions - - pid_name_spawn(Name, Fun) - pid_name_spawn_link(Name, Fun) - - We can understand them as - - pid_name_spawn(Name, Fun) - when is_atom(Name), is_function(Fun, 0) -> - with_module_lock(write) -> - P = *Name, - if P is a live process -> - P - ; P is a dead process -> - Q = spawn(Fun), - *Name := Q, - Q - end - end. - - pid_name_spawn_link(Name, Fun) - when is_atom(Name), is_function(Fun, 0) -> - with_module_lock(write) -> - P = *Name, - if P is a live process -> - P - ; P is a dead process -> - Q = spawn(Fun), - *Name := Q, - Q - end - end. - - Here, as earlier, "with_module_lock" is pseudo-code, meant to - suggest some sort of reader-writer locking on a private lock, - existing only inside a module that has declared a -pid_name. - - These two functions are automatically declared inside the - module, like pid_name/1. The three functions are not functions - automatically inherited from the erlang: module but functions - that are logically inside the module, however they might be - actually implemented. There doesn't seem to be any good - reason for a module to export any of these functions, and the - compiler should at least warn if that is attempted. Motivation +========== - Encapsulation. +* Encapsulation. - The process registry is often used when clients of a module - need to communicate with one or more servers managed by the - module, but the interface code is inside the module. There - is no advantage, and much risk, in exposing the process. A - big reason for this process is to get the benefit of having - mutable process variables without the loss of encapsulation. + The process registry is often used when clients of a module + need to communicate with one or more servers managed by the + module, but the interface code is inside the module. There + is no advantage, and much risk, in exposing the process. A + big reason for this process is to get the benefit of having + mutable process variables without the loss of encapsulation. - Efficiency. +* Efficiency. - As a shared mutable data structure, the registry has to be - accessed within the scope of suitable locks. With this - approach, each module has its own lock, contention ought - to be pretty nearly zero, and the commonest use case of - the registry can, I believe, be a simple load instruction. + As a shared mutable data structure, the registry has to be + accessed within the scope of suitable locks. With this + approach, each module has its own lock, contention ought + to be pretty nearly zero, and the commonest use case of + the registry can, I believe, be a simple load instruction. - Safety. +* Safety. - It is actually surprisingly hard to register a process - safely, and the use of registered names is oddly inconsistent - with the use of direct process ids. This interface is meant - to be simpler to use safely. + It is actually surprisingly hard to register a process + safely, and the use of registered names is oddly inconsistent + with the use of direct process ids. This interface is meant + to be simpler to use safely. Rationale +========= + +The old Erlang book describes four functions for dealing with +registered process names. There are two more main interfaces. + + Name ! Message when is_atom(Name) -> + % Also available as erlang:send(Name, Message). + % A 'badarg' exception results if Pid is an atom that is + % not the registered name of a live local process or port. + whereis(Name) ! Message. + + register(Name, Pid) when is_atom(Name), is_pid(Pid) -> + % A 'badarg' exception results if Pid is not a live local + % process or port, if Name is not an atom or is already in + % use, if Pid already has a registered name, or if Name is + % 'undefined'. + "whereis(Name) := Pid". + + unregister(Name) when is_atom(Name) -> + % A 'badarg' exception results if Name is not an atom + % currently in use as the registered name of some process + % or port. 'undefined' is always an error. + "whereis(Name) := undefined". + + whereis(Name) when is_atom(Name) -> + % A 'badarg' exception results if Name is not a name. + % in effect, a global mutable hash table with + % atom keys and pid-or-'undefined' values. + + registered() -> + % yes, I know this is not executable Erlang. + [Name || is_atom(Name), is_pid(whereis(Name))]. + + process_info(Pid, registered_name) when is_pid(Pid) -> + % yes, I know this is not executable Erlang. + case [Name || is_atom(Name), whereis(Name) =:= Pid] + of [N] -> {registered_name,N} + ; [] -> [] + end. + +When a process terminates, for whatever reason, it does the +equivalent of + + case process_info(self(), registered_name) + of {_,Name} -> unregister(Name) + ; [] -> ok + end. + +This has an astonishing consequence. - The old Erlang book describes four functions for dealing with - registered process names. There are two more main interfaces. - - Name ! Message when is_atom(Name) -> - % Also available as erlang:send(Name, Message). - % A 'badarg' exception results if Pid is an atom that is - % not the registered name of a live local process or port. - whereis(Name) ! Message. - - register(Name, Pid) when is_atom(Name), is_pid(Pid) -> - % A 'badarg' exception results if Pid is not a live local - % process or port, if Name is not an atom or is already in - % use, if Pid already has a registered name, or if Name is - % 'undefined'. - "whereis(Name) := Pid". - - unregister(Name) when is_atom(Name) -> - % A 'badarg' exception results if Name is not an atom - % currently in use as the registered name of some process - % or port. 'undefined' is always an error. - "whereis(Name) := undefined". - - whereis(Name) when is_atom(Name) -> - % A 'badarg' exception results if Name is not a name. - % in effect, a global mutable hash table with - % atom keys and pid-or-'undefined' values. - - registered() -> - % yes, I know this is not executable Erlang. - [Name || is_atom(Name), is_pid(whereis(Name))]. - - process_info(Pid, registered_name) when is_pid(Pid) -> - % yes, I know this is not executable Erlang. - case [Name || is_atom(Name), whereis(Name) =:= Pid] - of [N] -> {registered_name,N} - ; [] -> [] - end. - - When a process terminates, for whatever reason, it does the - equivalent of - case process_info(self(), registered_name) - of {_,Name} -> unregister(Name) - ; [] -> ok - end. - - This has an astonishing consequence. - - Suppose I do - - Pid = spawn(Fun), - ... - Pid ! Message - - and between the time the process was created and the time I send - the message to it, the process dies. In Erlang this is - perfectly ok, and the message just disappears. - - Now suppose I do - - register(Name, spawn(Fun)), - ... - Name ! Message - - and between the time the process was created and the time I send - the message to it, the process dies. Anyone would expect the - result to be exactly the same: because the Name pointed to a - process which has died, this amounts to sending a message to a - dead process, which is perfectly ok, and the message just - disappears. Most confusingly, that is not what happens, and - instead you get a 'badarg' exception. - - Now suppose I do +Suppose I do + + Pid = spawn(Fun), + ... + Pid ! Message + +and between the time the process was created and the time I send +the message to it, the process dies. In Erlang this is +perfectly ok, and the message just disappears. + +Now suppose I do + + register(Name, spawn(Fun)), + ... + Name ! Message + +and between the time the process was created and the time I send +the message to it, the process dies. Anyone would expect the +result to be exactly the same: because the `Name` pointed to a +process which has died, this amounts to sending a message to a +dead process, which is perfectly ok, and the message just +disappears. Most confusingly, that is not what happens, and +instead you get a 'badarg' exception. + +Now suppose I do send(Pid, Message) when is_pid(Pid) -> Pid ! Message; @@ -237,337 +244,338 @@ Rationale ; Pid when is_pid(Pid) -> Pid ! Message end. ... - register(Name, spawn(Fun)), - ... - send(Name, Message) - - This works the way we would expect, but why is it necessary? - - In Erlang as it stands, Name ! Message will raise an error if - Name would have referred to the right process but that process - has died. It might be argued that this is a useful debugging - aid, but nothing helps us if Name now refers to the WRONG - process. Right now, consider - - whereis(Name) ! Message - - This will raise an exception if the named process had died - before whereis/1 was called, but consider this timing: - live dies - whereis runs message sent - A slight change in timing can unpredictably change the - behaviour from silence-on-late-death to error-on-early-death - and vice versa. - - pid_name(Name) ! Message - - is *consistently* silent. - - The current process registry is also used for ports, which act in - many ways like processes. - - The old Erlang book is absolutely right that sometimes you - need a way to talk to a process you haven't been previously - introduced to. However, it is not true that this must be - done by means of a global hash table. You could always ask - a module for the information. - - Let's take program 5.5 from the book. - - -module(number_analyser). - -export([start/0,server/1]). - -export([add_number/2,analyse/1]). - - start() -> - register(number_analyser, - spawn(number_analyser, server, [nil])). - - %% The interface functions. - - add_number(Seq, Dest) -> - request({add_number,Seq,Dest}). - - analyse(Seq) -> - request({analyse,Seq}). - - request(Req) -> - number_analyser ! {self(), Req}, - receive - {number_analyser,Reply} -> - Reply - end. - - %% The server. - - server(Analyser_Table) -> - receive - {From, {analyse, Seq}} -> - Result = lookup(Seq, Analyser_Table), - From ! {number_analyser, Result}, - server(Analyser_Table) - ; {From, {add_number, Seq, Dest}} -> - From ! {number_analyser, ack}, - server(insert(Seq, Dest, Analyser_Table)) - end. - - The first thing we notice about this is that the registry is used - to allow a process that is a client of this module to communicate - with a process managed by this module through interface functions - in this module. There is no reason why the process should be - given a GLOBALLY visible name, and every reason why it should NOT. - We would like to ensure that all communication with the server - process goes through the interface functions, and as long as the - process is in a global registry, anything could happen. The - global process registry thus defeats its own purpose. - - Similarly, because the reply messages to the interface functions - are tagged, not with the server's identity, but with its public - name, they are easy to forge. Both of these problems also apply - to Program 5.6 in the old book. - - But there is worse. It is NEVER safe to call register/2 or - unregister/1. Recall that the precondition for register/2 - requires that the Name not be in use. But there is no way to - ever be sure of that. For example, you might try - - spawn_if_necessary(Name, Fun) -> - case whereis(Name) % T1 - of undefined -> - Pid = spawn(Fun), % T2 - register(Name, Pid) % T3 - ; Pid when is_pid(Pid) -> - ok - end, - Pid. - - Unfortunately, between time T1, when whereis/1 reports that the - Name is not in use, and time T3, when we try to assign it, some - other process might have been registered. Also, between time T2, - when the new process is created, and T3, when we use the Pid, the - process might have died. - - Because the registry is global, it is no use searching existing - code to see whether the Name is clobbered; the bug might be - introduced in future code. - - There appears to be no way to protect against the possibility of a - process dying between T2 and T3. The obvious hack, - - Pid = spawn(Fun), - erlang:suspend_process(Pid), - register(Name, Pid), - erlang:resume_process(Pid) - - won't work because erlang:suspend_process/1 is documented as - having the same 'badarg if Pid is not the pid of a live local - process' snafu as register/2. The only really safe way around the - issue would be for the new process to be born suspended, and - there's no way to do that. There is no 'suspended' option allowed - in the options list of spawn_opt/[2-5]. - - In practice, of course, the new process WON'T die, typically - because it goes into a loop waiting for a message. Even so, this - amount of fragility in a primitive is a bit worrying. - - Let's take a quick check to see how real all this is. - sounder.erl has + register(Name, spawn(Fun)), + ... + send(Name, Message) + +This works the way we would expect, but why is it necessary? + +In Erlang as it stands, `Name ! Message` will raise an error if +`Name` would have referred to the right process but that process +has died. It might be argued that this is a useful debugging +aid, but nothing helps us if `Name` now refers to the WRONG +process. Right now, consider + + whereis(Name) ! Message + +This will raise an exception if the named process had died +before whereis/1 was called, but consider this timing: + + live dies + whereis runs message sent + +A slight change in timing can unpredictably change the +behaviour from silence-on-late-death to error-on-early-death +and vice versa. + + pid_name(Name) ! Message + +is *consistently* silent. + +The current process registry is also used for ports, which act in +many ways like processes. + +The old Erlang book is absolutely right that sometimes you +need a way to talk to a process you haven't been previously +introduced to. However, it is not true that this must be +done by means of a global hash table. You could always ask +a module for the information. + +Let's take program 5.5 from the book. + + -module(number_analyser). + -export([start/0,server/1]). + -export([add_number/2,analyse/1]). + + start() -> + register(number_analyser, + spawn(number_analyser, server, [nil])). + + %% The interface functions. + + add_number(Seq, Dest) -> + request({add_number,Seq,Dest}). + + analyse(Seq) -> + request({analyse,Seq}). + + request(Req) -> + number_analyser ! {self(), Req}, + receive + {number_analyser,Reply} -> + Reply + end. + + %% The server. + + server(Analyser_Table) -> + receive + {From, {analyse, Seq}} -> + Result = lookup(Seq, Analyser_Table), + From ! {number_analyser, Result}, + server(Analyser_Table) + ; {From, {add_number, Seq, Dest}} -> + From ! {number_analyser, ack}, + server(insert(Seq, Dest, Analyser_Table)) + end. + +The first thing we notice about this is that the registry is used +to allow a process that is a client of this module to communicate +with a process managed by this module through interface functions +in this module. There is no reason why the process should be +given a GLOBALLY visible name, and every reason why it should NOT. +We would like to ensure that all communication with the server +process goes through the interface functions, and as long as the +process is in a global registry, anything could happen. The +global process registry thus defeats its own purpose. + +Similarly, because the reply messages to the interface functions +are tagged, not with the server's identity, but with its public +name, they are easy to forge. Both of these problems also apply +to Program 5.6 in the old book. + +But there is worse. It is NEVER safe to call `register/2` or +`unregister/1`. Recall that the precondition for `register/2` +requires that the `Name` not be in use. But there is no way to +ever be sure of that. For example, you might try + + spawn_if_necessary(Name, Fun) -> + case whereis(Name) % T1 + of undefined -> + Pid = spawn(Fun), % T2 + register(Name, Pid) % T3 + ; Pid when is_pid(Pid) -> + ok + end, + Pid. + +Unfortunately, between time T1, when `whereis/1` reports that the +`Name` is not in use, and time T3, when we try to assign it, some +other process might have been registered. Also, between time T2, +when the new process is created, and T3, when we use the `Pid`, the +process might have died. + +Because the registry is global, it is no use searching existing +code to see whether the `Name` is clobbered; the bug might be +introduced in future code. + +There appears to be no way to protect against the possibility of a +process dying between T2 and T3. The obvious hack, + + Pid = spawn(Fun), + erlang:suspend_process(Pid), + register(Name, Pid), + erlang:resume_process(Pid) + +won't work because `erlang:suspend_process/1` is documented as +having the same 'badarg if Pid is not the pid of a live local +process' snafu as `register/2`. The only really safe way around the +issue would be for the new process to be born suspended, and +there's no way to do that. There is no 'suspended' option allowed +in the options list of `spawn_opt/[2-5]`. + +In practice, of course, the new process WON'T die, typically +because it goes into a loop waiting for a message. Even so, this +amount of fragility in a primitive is a bit worrying. + +Let's take a quick check to see how real all this is. + +`sounder.erl` has start() -> - case whereis(sounder) of - undefined -> - case file:read_file_info('/dev/audio') of - {ok, FI} when FI#file_info.access==read_write -> - register(sounder, spawn(sounder,go,[])), - ok; - _Other -> - register(sounder, spawn(sounder,nosound,[])), - silent - end; - _Pid -> - ok - end. - - Here's a curious thing: the first time sounder:start/0 is - called, it will return different values (ok, silent) depending - on whether sound (is, is not) supported. Later calls always - return ok. This contradicts the documentation. Whoops! - Apart from that, it's a straightforward spawn_if_necessary. - - man.erl has + case whereis(sounder) of + undefined -> + case file:read_file_info('/dev/audio') of + {ok, FI} when FI#file_info.access==read_write -> + register(sounder, spawn(sounder,go,[])), + ok; + _Other -> + register(sounder, spawn(sounder,nosound,[])), + silent + end; + _Pid -> + ok + end. + +Here's a curious thing: the first time `sounder:start/0` is +called, it will return different values (ok, silent) depending +on whether sound (is, is not) supported. Later calls always +return ok. This contradicts the documentation. Whoops! +Apart from that, it's a straightforward `spawn_if_necessary`. + +`man.erl` has start() -> - case whereis(man) of - undefined -> - register(man,Pid=spawn(man,init,[])), - Pid; - Pid -> - Pid - end. + case whereis(man) of + undefined -> + register(man,Pid=spawn(man,init,[])), + Pid; + Pid -> + Pid + end. - This is precisely +This is precisely - start() -> spawn_if_necessary(fun () -> man:init() end). + start() -> spawn_if_necessary(fun () -> man:init() end). - tv_table_owner has +`tv_table_owner` has start() -> - case whereis(?REGISTERED_NAME) of - undefined -> - ServerPid = spawn(?MODULE, init, []), - case catch register(?REGISTERED_NAME, ServerPid) of - true -> - ok; - {'EXIT', _Reason} -> - exit(ServerPid, kill), - timer:sleep(500), - start() - end; - Pid when is_pid(Pid) -> - ok - end. - - Let's repackage that to see what's going on: - - spawn_if_necessary(Name, Fun) -> - case whereis(Name) - of undefined -> - Pid = spawn(Fun), - case catch register(Name, Pid) - of true -> - Pid - ; {'EXIT', _} -> - exit(Pid, kill), - timer:sleep(500), - spawn_if_necessary(Name, Fun) - end - ; Pid when is_pid(Pid) -> - ok - end. - - If there is a live local process registered under Name, return its - Pid. Of course, after the function returns to believe that there - is STILL a live local process registered under Name, but that's - just as true of whereis/1. - - If there is not, then create a new process, regardless of whether - that turns out to be useful. Try to register it. The Pid will be - the pid of a live local process that is not registered under any - other name, and Name must be an atom other than 'undefined', or - whereis/1 would have crashed. So it should be that the only thing - that can go wrong is that some other process has snuck in and - swiped the registry slot. In that case, kill the process, wait a - long time, and try again. - - In theory, it is possible for this to loop forever, with just the - right malevolent timing by an adversary. In practice, I'm sure it - works very well. - - The thing is, if the 'primitives' are this fragile, I would rather - not expose beginners to them. Or for that matter, most people: - there are plenty of uses of register/1 in the Erlang/OTP sources - that are not this well protected. - - The simplest fix to the 'registration race' problem would be to - verify that spawn_if_necessary/2 is sound, correct it if - necessary, and put it in a library. However, that does nothing to - fix the globality of the registry. - - There is no analogue of registered(). Inside a module, you can - see what names are available; outside the module, you have no - right to know. - - This EEP does not propose abolishing the old registry. There - is a lot of code, and a lot of training material, that still - uses or mentions it. Above all, the old registry can do one - thing that this EEP cannot do and isn't meant to, and that is - to provide names that can be used in other nodes, in {Node,Name} - form. The aim of this proposal is to provide something that can - replace MOST uses of the registry with something safer, and in - particular to allow gradual migration to per-module registration. + case whereis(?REGISTERED_NAME) of + undefined -> + ServerPid = spawn(?MODULE, init, []), + case catch register(?REGISTERED_NAME, ServerPid) of + true -> + ok; + {'EXIT', _Reason} -> + exit(ServerPid, kill), + timer:sleep(500), + start() + end; + Pid when is_pid(Pid) -> + ok + end. +Let's repackage that to see what's going on: + spawn_if_necessary(Name, Fun) -> + case whereis(Name) + of undefined -> + Pid = spawn(Fun), + case catch register(Name, Pid) + of true -> + Pid + ; {'EXIT', _} -> + exit(Pid, kill), + timer:sleep(500), + spawn_if_necessary(Name, Fun) + end + ; Pid when is_pid(Pid) -> + ok + end. -Backwards Compatibility +If there is a live local process registered under `Name`, return its +`Pid`. Of course, after the function returns to believe that there +is STILL a live local process registered under Name, but that's +just as true of `whereis/1`. + +If there is not, then create a new process, regardless of whether +that turns out to be useful. Try to register it. The `Pid` will be +the pid of a live local process that is not registered under any +other name, and `Name` must be an atom other than 'undefined', or +`whereis/1` would have crashed. So it should be that the only thing +that can go wrong is that some other process has snuck in and +swiped the registry slot. In that case, kill the process, wait a +long time, and try again. + +In theory, it is possible for this to loop forever, with just the +right malevolent timing by an adversary. In practice, I'm sure it +works very well. + +The thing is, if the 'primitives' are this fragile, I would rather +not expose beginners to them. Or for that matter, most people: +there are plenty of uses of `register/1` in the Erlang/OTP sources +that are not this well protected. + +The simplest fix to the 'registration race' problem would be to +verify that `spawn_if_necessary/2` is sound, correct it if +necessary, and put it in a library. However, that does nothing to +fix the globality of the registry. + +There is no analogue of registered(). Inside a module, you can +see what names are available; outside the module, you have no +right to know. + +This EEP does not propose abolishing the old registry. There +is a lot of code, and a lot of training material, that still +uses or mentions it. Above all, the old registry can do one +thing that this EEP cannot do and isn't meant to, and that is +to provide names that can be used in other nodes, in `{Node,Name}` +form. The aim of this proposal is to provide something that can +replace MOST uses of the registry with something safer, and in +particular to allow gradual migration to per-module registration. - The only modules that are affected by the new feature are - those that visibly contain an explicit -pid_name directive. +Backwards Compatibility +======================= -Reference Implementation +The only modules that are affected by the new feature are +those that visibly contain an explicit `-pid_name` directive. - None. +Reference Implementation +======================== -References - - None. +None. Example +======= - Here is the old book's Program 5.5 again, brought up to date. +Here is the old book's Program 5.5 again, brought up to date. - -module(number_analyser). - -export([ - add_number/2, - analyse/1, - start/0, - stop/0 - ]). - -pid_name(server). + -module(number_analyser). + -export([ + add_number/2, + analyse/1, + start/0, + stop/0 + ]). + -pid_name(server). - start() -> - pid_name_spawn(server, fun () -> server(nil) end). + start() -> + pid_name_spawn(server, fun () -> server(nil) end). - stop() -> - pid_name(server) ! stop. + stop() -> + pid_name(server) ! stop. - add_number(Seq, Dest) -> - request({add_number,Seq,Dest}). + add_number(Seq, Dest) -> + request({add_number,Seq,Dest}). - analyse(Seq) -> - request({analyse,Seq}). + analyse(Seq) -> + request({analyse,Seq}). - request(Request) -> - P = pid_name(server), - P ! {self(), Request}, - receive {P,Reply} -> Reply end. + request(Request) -> + P = pid_name(server), + P ! {self(), Request}, + receive {P,Reply} -> Reply end. - server(Analyser_Table) -> - receive - {From, {analyse, Seq}} -> - From ! {self(), lookup(Seq, Analyser_Table)}, - server(Analyser_Table) - ; {From, {add_number, Seq, Dest}} -> - From ! {self(), ok}, - server(insert(Seq, Dest, Analyser_Table)) - end. + server(Analyser_Table) -> + receive + {From, {analyse, Seq}} -> + From ! {self(), lookup(Seq, Analyser_Table)}, + server(Analyser_Table) + ; {From, {add_number, Seq, Dest}} -> + From ! {self(), ok}, + server(insert(Seq, Dest, Analyser_Table)) + end. - It is now possible to use a programming convention where the - -pid_name of every server is 'server'. +* It is now possible to use a programming convention where the + `-pid_name` of every server is 'server'. - It is no longer possible for code outside the module to send +* It is no longer possible for code outside the module to send messages to the server process. - It is no longer possible (well, no longer embarrassingly easy) +* It is no longer possible (well, no longer embarrassingly easy) for an outsider to forge responses from the server. Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:"