Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Units of Measure #4292

Closed
pmwhite opened this issue Jan 26, 2020 · 13 comments
Closed

Proposal: Units of Measure #4292

pmwhite opened this issue Jan 26, 2020 · 13 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@pmwhite
Copy link

pmwhite commented Jan 26, 2020

Zig Language Support for Units of Measure

This is a proposal to add compile-time annotation of units to numerical values
and dimensional analysis of units in numerical operations while remaining
consistent with Zig's design philosophy.

The rationale for this feature is type safety and documentation. Units of
measure will certainly not eradicate all units-related bugs, but they can
increase the number of mistakes which must happen in order for a units-related
bug to occur.

Downsides to any additional type mechanism include:

  1. The potential for being confusing
  2. The inconvenience of conforming to extra compiler requirements
  3. The tediousness of using library code that uses the type feature copiously

This proposal compromises to avoid these downsides by:

  1. Sacrificing a more powerful unit system to maintain simplicity (units of
    measure are fairly easy to understand, but can be confusing to use when
    considering the nuances of different unit systems)
  2. Sacrificing extreme strictness for the sake of ease-of-use
  3. Sacrificing strictness to allow easy interaction between unit-less code and
    unit-heavy code.

Some objections

Units of measure are a domain-specific concern, and thus should not be added to a general purpose language.

Async-await is also domain specific. Must a feature be useful in all domains
for it to be included? While async-await and units of measure are both
domain-specific, their domains are very broad. Many programs include some form
of concurrency, and thus benefit highly from async-await. Likewise, many
applications involve mashing together numbers which represent different kinds
of things, and thus benefit from having units of measure. It must be admitted
that the benefits of async-await are for more drastic than units of measure
because they substantially improve program structure, whereas units only
improve the likelihood that the program is correct. However, this argument is
different than the objection above - the point is that Zig has precedent for
adding broad-domain-specific features.

The increase in type safety is small, even in the domain where units are useful.

True, but I also claim that the friction of using the units will be just as
small, which means the language will hardly become any larger or more
complicated from this feature.

Other Languages

For a comprehensive overview of other attempts at units of measure, read this
article
.
The article does a good job of explaining the concepts that are relevant to
units of measure.

  • I personally think that F# does a good job with units of measure. This
    details of this proposal are inspired by how F# implments the feature.
  • The C++ Boost library includes compile-time units of measure with dimensional
    analysis and tries to be as general as possible. This means supporting
    different unit systems, which likely makes using the library more complicated.

Concrete details

All primitive numerical types (including iN, uN, and fN and C numeric types) can now be
refined by following their type names with a backtick and a unit of measure
(which may be simple or complex). Numeric literals may also be refined in this
way.

Examples:

  • i32`m
  • f64`(m/s**2) where ** indicates exponentiation.
  • 5.32`m

A unit is just a combination of of base units combined with multiplication,
division and exponentiation by integers. A difficulty in the syntax of units of
measure is differentiating unit literals and variables that contain units. It
seems good to treat units like structs and enums by allowing variables to
contain them.

const mph = unit { mi/hr };

However, this is fairly heavyweight if we keep the same syntax for literals as
well, like 5.34`unit { mi/hr }. However, if we do not add some special
syntax for units, then we could have the syntax const mph = mi/hr, in which
case the mi unit is indistinguishable from an ordinary variable. Thus, I
argue that we keep the aforementioned syntax while also allowing atomic units
without curly braces.

// Complex units require curly braces
const mph = unit { mi/hr };

// Simple units do not require curly braces
const mi = unit mi;
const hr = unit hr;

// Note that `mi` and `hr` refer to the previously defined zig variables rather
// than unit literals.
const mph2 = mi/hr;

// If one finds this syntax too heavyweight...
const x = 5.34`unit { mi/hr };
// then they can bare with the small inconvenience of declaring units
// beforehand. (common ones could already be defined in std.units)
const y = 5.34`(mi/hr);

Symbols inside a union declaration are similar to fields of enums and structs
because they have their own namespace and do not conflict with ordinary
variables. We could avoid the whole unit keyword by using a dollar sign
before symbols to denote that they are units, but in my opinion this is ugly
and inconsistent with the rest of Zig's syntax.

Units have no metadata apart from the symbol which represents them. That is,
if two units have the same symbol or combination of symbols, then they are the
same unit. There is a single global unit namespace, but this may be worked
around because the variables containing units can be namespaced.

Because of the Zig principle that nothing should be a function call unless it
looks like a function call, there will be no automatic detection and conversion
of prefixed units. In concrete terms, 5`mm + 5`cm + 5`m will be a compile
error rather than resulting in 5.055`m. However, one could write conversion
functions.

fn mm_to_m(x: f64`mm) f64`m {
    return x / 1000`(mm/m);
}

This is a very concrete function, but one could imagine a much more general
distance_to_m function which receives a var argument and converts based on
the unit attached to the type. A general pattern is that this proposal is very
simple, but the comptime features of Zig allow for more powerful features to be
achieved in userland. Note the distinction between types and units. While
types "contain" values, units are merely annotations on other types. Units may
be viewed as reified refinements on types. With the features described so far,
Zig would become the language with both the most powerful and the easiest to
use unit system support.

Interaction between unit-less variables and unit-heavy variables

Implicit casts between numeric types is not allowed because it requires actual
CPU instructions, which is against the Zig philosophy. However, because units
have no runtime cost, they can be happily upcast from a bare value to a value
with a unit. Thus, it is allowed to supply an expression without a unit when an
expression with a unit is expected. This includes parameters as well.

If it is allowed to go the other way (provide an expression with a unit when no
unit is expected), then it is confusing what is expected in 5.34`m + 1.23.
In addition, although we want the expression 5.34`m + 1.23`s to be
disallowed, one could argue that it is allowable if no unit is expected because
both values will be coerced to remove their units. Thus, to make things is
simple as possible, I think that one should need to explicitly remove units
from variables by using a double backtick. Thus, 5.34`m + (1.23`s)``
would indeed be allowed.

Using the double-backtick, it becomes easy to switch the unit of a variable,
using a double-backtick followed by annotation as normal. Thus, triple-backtick
is meaningful as well (but I won't show an example, because markdown is making
using many backticks painful).

A troublesome thing as that some functions may transform values without
units merely because they do not care about the units. For example,

fn add(x: f64, y: f64) f64 {
    return x + y;
}
fn divide(x: f64, y: f64) f64 {
    return x / y;
}

I'm not totally sure, but I believe that for any function, the units of the
output type should be inferrable from the flow of the function and the units of
the input type. Calling this function without units is of course legal, but
calling it with units requires that the function body be valid for with the
units of the input parameters (note that add would require that the both
parameters have the same unit). This would probably be the most complicated
inference in the compiler. This "unit-agnostic" function idea is similar to how
any function may be called as an async function.

Is the complexity added to the compiler implementation worth it?

This is an important question. By adding this feature, which is fairly complex
in implementation even though it is intuitive for users, the bar for different
compilers other than the official one is raised. I believe that this is a
feature which could be omitted in the non-self-hosted compiler because it only
adds verification, and does not seem to useful for the compiler domain.

I am not too familiar with the implementation of the compiler, but I think that
the unit checking logic could be done separately from the rest of the type
system. In other words, even if it is complex to implement, it would not add to
the complexity of the architecture of the compiler - the complexity would be
isolated.

@andrewrk andrewrk added this to the 0.7.0 milestone Jan 26, 2020
@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Jan 26, 2020
@ghost
Copy link

ghost commented Jan 26, 2020

Having the compiler preserve and transform measures through expressions and functions is something I also would like to see. I have been considering how units of measure could work in zig myself. You can have a look at #3142, and #3002

The problem comes with functions, like you say.

4`m^2` == pow2(2`m`);
4`s^2` == pow2(2`s`);
4`m^2 * s^2` == pow2(2`m*s`); 
// ....

Lots and lots of valid combinations. Do you define all the valid measure transformations yourself in a table for each function? That can become a lot of book keeping for the programmer. Even when you create functions to automatically populate the table for all possible combinations. You still have to create the table or table-generating function for every function you import or write into your project.

I think combining distinct types (#1595) with semantic analysis might be an alternative way to get "type safety" on measures, without having to change syntax too much. Take a look at the example I wrote here: #1595 (comment)

With such semantic analysis enabled, you could get a compiler error if inconsistent measure transformations were detected. In a sense the compiler would enforce a consensus, instead of programmers having to define all the transformations themselves.

@JesseRMeyer
Copy link

I love the idea, and have missed that sorely in simulation heavy code. Unfortunately I am not properly versed in how something like this should be implemented so I cannot comment to the details of this proposal. Hoping to be educated from those that actually are on this thread.

@pmwhite
Copy link
Author

pmwhite commented Jan 26, 2020

@user00e00

Lots and lots of valid combinations. Do you define all the valid measure transformations yourself in a table for each function? That can become a lot of book keeping for the programmer. Even when you create functions to automatically populate the table for all possible combinations. You still have to create the table or table-generating function for every function you import or write into your project.

If we have the implementation of pow2 available, then the compiler could do semantic analysis on the function to determine all combinations. One could think of the analysis as transforming the function to

fn pow2(U: unit, x`U) x`(U^2) {
    return x * x;
}

I don't think the user should have to explicitly pass this unit parameter. My point is that the compiler can infer all the unit information.

I think combining distinct types (#1595) with semantic analysis might be an alternative way to get "type safety" on measures, without having to change syntax too much. Take a look at the example I wrote here: #1595 (comment)

Distinct types would allow distinction between values of different measures, but they don't support dimensional analysis or math operations between different distinct types. That is, you cannot do 5`m / 1`s == 5`(m/s)

Also, note that this proposal would support distinct types I believe. Although the auto-casting rules may have to be switched - implicitly remove units, but explicitly add them.

@JesseRMeyer

I love the idea, and have missed that sorely in simulation heavy code. Unfortunately I am not properly versed in how something like this should be implemented so I cannot comment to the details of this proposal. Hoping to be educated from those that actually are on this thread.

I linked an article in the original post which has an extensive overview of how units have been attempted with other languages and libraries. You might find that educational. To be honest, I might not be all that qualified to speak on this issue myself.

@ghost
Copy link

ghost commented Jan 26, 2020

Distinct types would allow distinction between values of different measures, but they don't support dimensional analysis or math operations between different distinct types. That is, you cannot do 5m / 1s == 5`(m/s)

It's not directly dimensional analysis, but if the compiler can enforce a "consensus" on unit transformations you can achieve something very similar just by using distinct types. Consider this:

const Meter = @Distinct(f64);
const Second = @Distinct(f64);
const MeterPerSecond = @Distinct(f64);
const MilesPerHour = @Distinct(f64);

const dist = @as(Meter, 100.0);
const duration = @as(Second, 144.0);

const speed : MeterPerSecond = dist/duration; 
// Transformation table entry: 
// Meter / Second -> MetersPerSecond

const speed2 : MilesPerHour = dist/duration; // ERROR
// Transformation table entry: 
// Meter / Second -> MilesPerHour
// - > COMPILE ERROR! Consensus on distinct type transformation is broken 

This "transformation table" could include all transformations, whether binary operators or functions. This would be especially powerful if in complex expressions, every reduction would be included and checked in the transformation table.

const x : MeterPerSecond = x_meter/sqrt(pow2(t_second));

Let us assume that pow2 and sqrt is used elsewhere, so there is a consensus that @TypeOf(pow2(x : Second)) == Second2, and also that @TypeOf(sqrt(x : Second2)) == Second. Then the compiler can reduce the above to:

const tmp1 : Second2 = pow2(t_second);
const tmp2 : Second = sqrt(tmp1);
const x : MeterPerSecond = x_meter/tmp2;

And thus check every "sub transformation" in the expression.

Of course, you would need lots of existing code to create this "consensus", but it would require no book keeping for programmers to define any transformations themselves.

@pmwhite
Copy link
Author

pmwhite commented Jan 26, 2020

I don't think your suggestion should be dismissed solely because it would be a pain to specify all the valid transformations. All the transformations for common unit systems such as SI would already by in the standard library, so the transformations would be no different for the user than if they were directly supported by the compiler.

However, unless I've misunderstood the distinct types discussion (I actually haven't read through that thread yet), I would think that the motivation behind distinct types is to increase the friction of their usage by functions expecting the base type. In contrast, units of measure are most of the time appropriate to pass to functions expecting the base type; I might even say functions for which unitted types would be inappropriate ought to be using a distinct type (for example, a file descriptor type).

Thus, my thought is that distinct types and units of measure fulfill somewhat conflicting goals, so trying to unify them could be bad. That is not to say that only one should be included in the language. Both tools can be provided; I don't think their will be many if any cases where both of them are appropriate, so this would not be a case of "More than one way to do it".

@JesseRMeyer
Copy link

It seems like this feature would naturally be implemented in user-space as part of the build step. In fact, attempting to do so would be a great litmus test of Zig's build step extensibility.

@andrewrk
Copy link
Member

This proposal competes with #1595, and between the two I find #1595 to be the stronger proposal. That's not saying #1595 will be accepted; but it's enough to close this one.

@andrewrk andrewrk modified the milestones: 0.7.0, 0.6.0 Feb 10, 2020
@ghost
Copy link

ghost commented Apr 24, 2020

Making an attempt at applying typedef (#5132) to the comments I made here earlier:

Units of measure with typedef rely on two related typedef configs: .Measure and .MeasureGroup.

const Mg = typedef(f64, .MeasureGroup); // base type must be a plain zig numeric primitive
const M1 = typedef(Mg, .Measure); // Measures must wrap a MeasureGroup
const M2 = typedef(Mg, .Measure); // related to M1 in terms of sharing the same base 
const M3 = typedef(Mg, .Measure);
const M = typedef(Mg, .Measure); // will refer to this as the generic measure of group Mg below

Mg is only meant as the base for measures M1, M2, ( M.., ..), so that the measures can be related to each other. Mg is also a temporary placeholder in expressions. It's not possible to cast directly to Mg, but in expressions M1 or M2 coerce down to Mg.

Mg is also not valid as the type of an identifier, so Mg must be able to coerce "upwards" to M (any typedef measure part of the Mg group) in variable assignments. This causes every measure/unit transformation in the codebase to be specified.

const mg = @as(Mg, 0.5 ); // error. cannot cast to a measure group
const m1 = @as(M1, 10.0); // ok
const m2 = @as(M2, 11.1); // ok
const x1 = m1 * m2; // error. inferred type of x1 to Mg. Cannot use a measure group as identifier type
const x2 : M3 = m1*m2; // ok. coerce up from Mg to M3

Now, the unit transformation M3 = M1*M2 is established. The idea is to collect all the unit transformations in the whole codebase, and then check them for consistency afterwards. If transformations M1 = M1*M2 and M2 = M1*M2 are both found in the code base, that could indicate an error.

Whereas distinct types coerce down to the base in arithmetic expressions, typedef measures stay
within their group.

const x : M1 = @as(M1,50.1) + 43; // error. both operands must be measure typedefs of the same group
const x2 :M1 = @as(typedef(MgOther, .Measure), 50.1) + @as(M1, 43); // error. same as above.

typedef measures turn out to be a slight variation of typedef distincts, yet make some dimensional
analysis possible in zig, including complex expressions that involve functions like pow or sqrt.

// could be built-in or part of commonly used library
const Si = typedef(f64, .MeasureGroup){
    const s  = typedef(Si, .Measure); // [second]
    const m  = typedef(Si, .Measure);  // [meter]
    const kg = typedef(Si, .Measure); // [kilogram]
    const A = typedef(Si, .Measure); // [ampere]
    const K = typedef(Si, .Measure); // [kelvin]
    const mol = typedef(Si, .Measure); // [mole]
    const cd = typedef(Si, .Measure); // [candela]
}

// "new" units are created whenever an appropriate one is not already 
// declared, though a typedef measure unit is less useful in semantic 
// analysis terms if it's not referenced frequently.
const Si2 = struct{
    const m_per_s = typedef(Si, .Measure); // [m/s]
    const N = @import("3rdpartyphysics.zig").Newton; // Si typedef also used in 3rd party lib
}

test "si" {
    const m = @as(Si.m, 10.4);
    const s = @as(Si.s, 5);
    const speed : Si2.m_per_s = m/s;
    const kg = @as(Si.kg, 50.0);
    const newton : Si2.N = kg*m/(s*s); // consistent with transformations in 3rdparty lib
}

@ityonemo
Copy link
Sponsor Contributor

Thank you for taking the time to think about this rigourously. Whatever zig chooses, I'm glad the zig community is at the very least not considering to do the completely insane thing that Go did with units of time.

@Sobeston
Copy link
Sponsor Contributor

at the very least not considering to do the completely insane thing that Go did with units of time.

Slightly off topic, but what do you mean by this?

@ityonemo
Copy link
Sponsor Contributor

ityonemo commented Apr 24, 2020

sorry. I should have been more specific. It's absolutely not off-topic and worth logging so that mistakes don't get repeated.

in Golang, if you have a time value, you are permitted to multiply it by a "comptime", or rather, const, number. However, if you want to multiply it by a runtime value (let' s say, a user wants to select a delay and you want to multiply that value times 1000 to get seconds), you must explicitly typeconvert it to the "milliseconds" type and then do the multiplication.

I get that this in some vague way this is consistent with how C does types but, this basically breaks the brain of anyone who was a conscientious practicioner of science and engineering. IMO culturally speaking you don't want to discourage conscientious and careful people who are using your system, and thus if you're implementing units in the typesystem, it needs to be fully consistent with dimensional analysis, or, don't do it at all.

@ghost
Copy link

ghost commented Apr 25, 2020

Here's how converting from seconds to milliseconds would look with typedef measures (configs .MeasureGroup and .Measure):

// Assume all typedef measures have base type f64 here
const seconds : Si.s = @as(Si.s, 5.2);
// a[ms] = b[s] * 1000[ms/s]
const ms0 = seconds * @as(Si.milli, 1000); // error, measuregroup typedef `Si` not valid type for identifiers. specify measure within group
const ms1 : Si.ms = seconds * @as(Si.milli, 1000); // ok. Si.ms is (measure) typedef of Si
const ms2 = seconds.toMillis(); // if having convenience methods in Si.s typedef namespace

Again, it's not that zig would do any dimensional analysis itself during compilation, it would just force the user to specify all the unit transformations, which can be checked
for consistency afterwards with automated tools or through manual inspection (think spreadsheet representation with columns [ transformation_str, count], where errors
would usually correspond to transformations with few occurrences).

Just realized lazy compilation is a drawback in this context though. Transformations in scopes that are not analyzed by the compiler will not be accounted for.

@daurnimator
Copy link
Collaborator

Interesting article: https://www.hillelwayne.com/post/frink/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

6 participants