Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Private Fields #9909

Closed
zzyxyzz opened this issue Oct 6, 2021 · 29 comments
Closed

Proposal: Private Fields #9909

zzyxyzz opened this issue Oct 6, 2021 · 29 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@zzyxyzz
Copy link
Contributor

zzyxyzz commented Oct 6, 2021

Introduction

Currently, function and variable declarations are private by default and are made externally visible with the pub modifier. However, data fields are always public and there is no way to restrict their visibility. Apparently, field access control was never really working in Zig (#569) and was officially removed in #2059. Which is somewhat surprising, since no other modern language I can think of (Java, C++, C#, D, Rust, Go, ...) fails to provide this feature.

The lack of private data makes it impossible to do proper encapsulation / implementation hiding, which is widely considered to be a basic software engineering technique. In particular:

  • Types can hide some of their methods and constants, but always expose their internal structure. This leaves them open to accidental (or bad-practical) corruption. The compiler will not complain if the .capacity of an ArrayList is overwritten.
  • Documentation value is lost. Whether or not a field is intended for direct access can only be communicated through code comments.
  • Auditing code becomes more difficult, because there are more opportunities to violate constraints and invariants.
  • Implementing opaque handles is nearly impossible without additional builtins: Introduce @OpaqueHandle() builtin #9859, #1595 (comment).

Possible objections

No official reason was given in #2059, but two possible objections to private fields come to mind:

  1. Visibility control is more useful for decls than for fields, because invisible decls become completely inaccessible, while fields can always be messed with through pointers.
  2. This will lead to over-encapsulation and getter-setter boilerplate (#2974 (comment)). Sometimes it's just better to reach in and do things directly.

Concerning 1, I'd say that protection does not have to be perfect to be useful. Preventing accidental and semi-deliberate messing is valuable by itself, not to mention the documentation value.

2 may be a real concern, or maybe not. The same argument can be made about private decls, but a proposal to override the visibility of methods at the call site (#8779) was recently met with a resounding rejection -- even though there are certainly cases where it is both reasonable and safe to call private methods. In addition, Zig is increasingly making legal but potentially buggy patterns into hard errors, somtimes controversially (e.g. unused variables). The lack of basic implementation hiding is not consistent with this safety-first attitude, IMHO.

Syntax

The simplest option is to adopt the same default as with decls: file-level private by default and externally visible with pub. If this clashes with data-oriented style, the opposite default can be chosen, but that wold require the introduction of a private keyword. Yet another possibility is to make field visibility struct-level, e.g. with opaque struct {...}.

I don't really have a strong opinion here.

Open questions

  • Is there a case for field-level access control in unions?

Update 1: As a side benefit, it may be possible to remove the opaque {} type from the language, since it is rarely used and can be simulated with struct { private ptr: usize }.

Update 2: I wouldn't mind adding an escape hatch like @privateField(object, "name") for cases where you need to use a particular library, but find the API too locked-down. Some discussion of this is here.

@vijfhoek
Copy link

vijfhoek commented Oct 7, 2021

I feel the argument you make in the objections ("In addition, Zig is increasingly making legal but potentially buggy patterns into hard errors, somtimes controversially (e.g. unused variables).") is a very good one. This is something that I feel Zig has evolved in since #2059.

Maybe worth moving that argument up to the introduction, to pay it more attention?

A possible objection is that this could also be achieved with a strict naming convention, e.g. starting variables with an underscore will mark them as if they're private, and perhaps be enforced with future linter rules or the like. Just a thought.

@batiati
Copy link
Sponsor

batiati commented Oct 7, 2021

To support private fields, from zig zen:

  • Communicate intent precisely.

@rohlem
Copy link
Contributor

rohlem commented Oct 7, 2021

As for documentation value, in status quo I would suggest:

pub const ByteList = struct {
 // public decls and members here
 data: []u8,

 private: ByteListPrivate,
};
/// non-public, private decls and members here
const ByteListPrivate = struct {
  allocator: std.mem.Allocator,
  capacity: usize,
};

Accessing fields in .private should not occur accidentally, and be relatively easy to spot in a code review.
If you use reflection to iterate over fields, maybe have a private-informed branch of std.meta to exclude/ warn about specific field names.

I'll probably have to reread the @OpaqueHandle discussion, but it should be possible to replace the field with an equally-sized-and-aligned byte array align(@alignOf(Private)) private: [@sizeOf(Private)]u8,.
Then @ptrCast/@bitCast back-and-forth with non-public helper functions.

The missing piece then is "making the compiler complain" when overwriting the .private field.
The one scenario that comes to my mind would be reassigning the entire containing struct (ByteList above).
Maybe declaring the .private field to be pinned, as in #7769, would already be enough of a solution - granted though, the actual source of the error would be the fact you can't copy from .private, instead of inability to copy to .private.

@ghost
Copy link

ghost commented Oct 7, 2021

This might be really useful, it doesn'make sense to me why access is a concern for decls, but not fields. It should be either both or neither.

Just an opinion:
Maybe its private fields that should be marked as such, not public?

As for setter/getter boilerplate, half of a problem can possibly be solved with something like 'readonly' keyword, which would effectively act as '*const' outside of a file scope.

@zzyxyzz
Copy link
Contributor Author

zzyxyzz commented Oct 7, 2021

@rohlem
OpaqueHandle would indeed provide equivalent functionality: You factor out your private state into a separate struct, wrap it in an @OpaqueHandle, and then use @toOpaqueHandle() and @fromOpaqueHandle() to read and write. My question is, why use such a roundabout way to hide state, when there is the well-established technique of visibility modifiers?

Also, and somewhat ironically, the OpaqueHandle trick only works because it leverages the existing ability to declare a private struct. Otherwise there would be very little point in the whole concept of OpaqueHandle, and you might just as well go with a naming convention like _private, as @vijfhoek suggested.

@Guigui220D
Copy link

I have been using naming conventions for my private fields, like _capacity: usize in order to discourage usage, but I feel like having pub for fields would be better.

@InKryption
Copy link
Contributor

Personally, I quite like that data in zig is so transparent; it makes no attempt to model anything other than what the computer is really doing.

That said, there is obviously a use case for data hiding, but I don't think it should be as easy as just marking a field as private; and as well, I think this shouldn't be as powerful as the privatization of declarations, and moreso act as stronger documentation than comments. Essentially, I think we should continue to encourage transparent data, by making private fields harder to declare in some way.

In that regard, I actually think something similar to what was modelled in #9859 would actually be better for this purpose - using a separate "opaque bag of bits" as a section for private data. This would not only slightly discourage private data, but also document even better the separation of private and public fields.

I think there is perhaps also an argument to be made that, if private fields, whether in the proposed style or in the #9859 style, maybe some form of controlled access into private fields should be enabled anyway, like a built-in (e.g. @privateField(expr, "private field name")). Because unlike what was argued against @call being able to access private functions, accessing what is just data wouldn't be as possibly destructive to any "library state" - and as well, as has been said in this proposal, even with private fields, we still have the ability to mess with the data, so maybe granting an assuredly safe mechanism to do so in lieu of "smartness" would be a good move (and as a bonus, it would be a clear mark of possible code-smell, versus aforementioned smartness).

Just some food for thought. I'll stop typing now.

@zzyxyzz
Copy link
Contributor Author

zzyxyzz commented Oct 7, 2021

Personally, I quite like that data in zig is so transparent.

So do I, and in private projects I don't usually bother with access restrictions, even in other languages. But is this sufficient reason to make data hiding inconvenient or impossible in places where it is needed (e.g. public-facing libraries or high-assurance projects)?

In that regard, I actually think something similar to what was modelled in #9859 would actually be better for this purpose - using a separate "opaque bag of bits" as a section for private data. This would not only slightly discourage private data, but also document even better the separation of private and public fields.

I'll have to disagree here. To me, pub or private is as clear documentation as it gets for the intended access level. Putting everything into a single private section may also interfere with data layout, although that is a secondary concern.

As for adding a @privateField override, I'm all for it. Like you say, this indicates code smell clearly enough, and should allay fears along the lines of "other programmers will abuse private fields and will only get in my way". Though I don't see why private decls shouldn't be accessible in the same fashion. While reading private data is not as bad as calling private methods, modifying it is probably worse. Also, both would be needed for "extending" library containers, for example.

@InKryption
Copy link
Contributor

@zzyxyzz As it pertains to private declarations, I think the most convincing rationale I've heard for it is that it forces change at an API level, as opposed to forcing it at the library-usage level; that is to say, if a library has marked a certain function as private, but the community at large has decided that it would be best marked public, this will either force the library implementers to push the change, or force a fork of the library that implements this change (with the former being the preferred outcome). This as opposed to forcing people wrapping said library to use functions that don't have the same guarantees as those explicitly marked public.

I can't quite recall where I read that here, or whether or not that's a good enough reason. But for the purposes of arguing against applying this logic it to private fields, at the very least, I would point to the fact that although a field may be marked private, it cannot be guaranteed to be inaccessible, where its memory is accessible in the public-facing API (as explained), and thus it makes sense to treat field access modifiers more like built-in documentation tools, as opposed to strict API boundary controls.

(Please tell me if any of this makes sense, my synthesizing skills are not pristine).

@zzyxyzz
Copy link
Contributor Author

zzyxyzz commented Oct 7, 2021

@InKryption

that is to say, if a library has marked a certain function as private, but the community at large has decided that it would be best marked public, this will either force the library implementers to push the change, or force a fork of the library that implements this change (with the former being the preferred outcome).

I've heard this argument too, but I find it unrealistically idealistic. Even assuming that your proposed change will be accepted into the public API (which will not happen most of the time, either because your use-case is too specific or the authors plain disagree with your way of doing things), the usual timeline for the change is measured in months or years. And outright forks rarely happen at all, unless the project is abandoned by the previous author. So, while this position does represent best practice for upstreaming changes in open-source projects, it has little to no relevance for everyday software development.

@InKryption
Copy link
Contributor

I am glad that we generally agree, then. I'm just bringing the argument to the forefront.

@sina-devel
Copy link

If the private keyword is added, it's not syntactically compatible (functions, etc are private by default and exported by the pub keyword)

@InKryption
Copy link
Contributor

@sina-devel I would argue that it makes sense for functions to private by default, as it incentivizes explicitly building a robust public-facing API, as opposed to simply allowing everything to spill into global scope. Whereas, here, I think it is more useful to overall make transparent data the default, and incentivize having to explicitly decide if data should really be marked as an "internal" component (which would likely lead to more boilerplate related to reading and writing to the data, which should certainly not be the default, in my opinion, coming from a C++ background).

@sina-devel
Copy link

sina-devel commented Oct 12, 2021

adding private or internal:

  • boilerplate code for geter (e.g. geter for capacity of ArrayList)
  • stdlib changes a lot
  • ...

right now:

  • all fields are public for get and set

by default private and exported by pub:

  • boilerplate code for geter (e.g. geter for capacity of ArrayList)
  • stdlib changes a lot
  • ...

adding keyword to readonly at outside and mutable at inside:

  • complexity increases
  • ...

@ghost
Copy link

ghost commented Oct 13, 2021

Get/set function are not exclusively a bad thing, sometimes they are needed to accept data from a user in a different way than it is stored internally, and sometimes you really need to keep different fields in sync.

For example, library can provide a chunk of data, that user can store in whatever way they need, where it's still makes sense to keep some non-const, non-user-defined cache:

const Texture = struct {

    renderer_id: u32,  // not appropriate for user to change
    width:  usize,
    height: usize,
    pixel_format: enum { R8_G8_B8_A8, Grayscale },
    filter: enum { Nearest, Bilinear },
    mipmap_levels: usize, // should be in sync with renderer

    pub fn load(path: []const u8) @This() { ... }
    pub fn unload() void { ... }
    pub fn genMipmaps(self: *@This(), mipmaps: usize) void { ... }

};

In this example, you don't ever want to modify some fields without syncing state with the renderer, but may need to pass container around freely, and allow operations on an instance of a struct in member functions.

Argument that fields shouldn't even be allowed to be hidden steers all libraries to be enclosed state machines, limited to hiding data by using top-level declarations.

// value_container.zig
var value: usize = 0;
const value_container = @import("value_container.zig");

test "acces from a different file" {
    value_container.value = 3;
}

Output of this test:

error: 'value' is private
    value_container.value = 3;

I guess what i am trying to say, is that acces control concerns are equally valid for functionality and data, and should, in my opinion, be consistent. Either it's "everything is transparent, all the time", or access control is a useful thing for both.

@ghost
Copy link

ghost commented Oct 13, 2021

Maybe a way to alleviate some of boilerplate, is to allow reading from private fields?

const Container = struct {
    value: usize, // field without any specifiers is private by default
};

var c: Container = .{ .value = 0 };
var read = c.value; // reading from private fields is allowed
c.value = 42;       // writing to a private field not allowed

const Container2 = struct {
    pub value: usize, // and here is a way to make it writable
};

@zzyxyzz
Copy link
Contributor Author

zzyxyzz commented Oct 13, 2021

@di-ant,
I could endorse read-only fields as a last-resort compromise, but not as my preferred option. They can successfully catch unwanted data-modification, which is admittedly better than nothing. But public fields (whether they are writable or read-only) are part of the public interface, and changing or removing them is a breaking change. Precisely expressing the intent that something is part of the internal implementation and should not be modified or relied on, can only be properly done by restricting visibility.

@tauoverpi
Copy link
Sponsor Contributor

How would this interact with comptime reflection?

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Oct 13, 2021
@andrewrk andrewrk added this to the 0.9.0 milestone Oct 13, 2021
@zzyxyzz
Copy link
Contributor Author

zzyxyzz commented Oct 13, 2021

@andrewrk,
If you find the time, could you provide a short explanation for why this was closed -- or a link if this was already discussed somewhere else?

Since this is something that Zig does differently from other languages, it may come up again in the future. So it might be good to have a findable "official position" on this issue.

@andrewrk
Copy link
Member

Here is my reasoning:

There are many properties that can be "leaked" into the public interface of an API. For example: size/alignment of a struct, performance characteristics of a given function implementation, whether a function can be executed at comptime, existence/non-existence of declarations, and more. It is up to the documentation of a given package to specify what exactly the major version is protecting - that is - what exactly counts as "breaking" and what does not. Whether a given field is part of the public API or not is one more item in this list. Of course most of these have general recommendations for defaults:

  • size/alignment of a type: no
  • performance characteristics of a function: no
  • whether a function can be executed at comptime: unclear
  • existence of a declaration and its type: yes
  • non-existence of declarations: no
  • existence of a field and its type: yes
  • non-existence of a field: no

The idea of private fields and getter/setter methods was popularized by Java, but it is an anti-pattern. Fields are there; they exist. They are the data that underpins any abstraction. My recommendation is to name fields carefully and leave them as part of the public API, carefully documenting what they do. A good example is a field protected by a mutex:

const Foo = struct {
    /// This field must be accessed only when `mutex` is held.
    unprotected_counter: usize,
    mutex: Mutex,
};

In this case the field name unprotected_counter properly communicates the danger of using the field without using any abstraction. However it being public also allows proper abstractions to be built, if the ones provided are not enough. For example, if it could only be accessed via getCounter() like this:

pub fn getCounter(foo: *Foo) usize {
    foo.mutex.lock();
    defer foo.mutex.unlock();
    return foo.unprotected_counter;
}

Then if the API user held the mutex for some other reason, getCounter() would actually deadlock (or rely on a silly hack such as recursive mutexes). The API user needs the ability to interface with the abstractions.

In my subjective experience, public fields generally lead to better abstractions by eliminating the temptation to attempt full encapsulation, when the more effective strategy is to provide composable abstraction layers.

The bottom line here is that private fields is an addition to the language, making it more complicated and requiring the language to answer a bunch of questions that otherwise would not exist. Not only does the feature does not provide enough value to merit the language complexity, I contend that it actively leads to worse source code.

@zzyxyzz
Copy link
Contributor Author

zzyxyzz commented Oct 13, 2021

Thank you for the detailed explanation!

I can't say I agree with it, but it's certainly good to know where things stand.

@ziglang ziglang deleted a comment from daveb68 Jan 15, 2022
@likern
Copy link

likern commented Feb 2, 2022

@andrewrk I understand your reasoning and agree with it. On the other hand, it's not clear is it safe (is this intended / allowed) to access struct fields directly, as they can be unprotected.

A would propose to have private/public fields, so they can't be accessed accidentally.
But not hiding private fields from accessing completely. If you really need you can.
This is how it'd don in TypeScript with object!.field syntax, which means "Hey, I know what I'm doing!".

const obj = struct {
  const Self = @This();
  // private by default or have # syntax as modern JavaScript proposal
  #length: usize,
  pub fn length(self: *Self) {
    return self.#length;
  }
}

// Know what I'm doing!
const a = obj {};
const len = a.!length;
// or like this
const len = a.#length;

@zzyxyzz
Copy link
Contributor Author

zzyxyzz commented Feb 2, 2022

@likern
I agree that adding an escape hatch is the proper answer to the concern that interfaces will become too locked-down and uncomposable. This solution was already suggested in this thread, though (in the form of @privateField(a, "length")). Using a built-in for this has multiple advantages, such as not requiring new syntax and being easier to search for. The additional verbosity may also be seen as a benefit, since overriding visibility is not to be used casually.

@andrewrk
Copy link
Member

andrewrk commented Feb 2, 2022

As a reminder this issue is closed

@zzyxyzz
Copy link
Contributor Author

zzyxyzz commented Oct 20, 2022

I know this issue is closed, but just in case it is ever reconsidered, I'd like to post this reply to Andrew's comments. This has been eating at me for a while, so here goes:

There are many properties that can be "leaked" into the public interface of an API. For example: size/alignment of a struct, performance characteristics of a given function implementation, whether a function can be executed at comptime, existence/non-existence of declarations, and more. It is up to the documentation of a given package to specify what exactly the major version is protecting - that is - what exactly counts as "breaking" and what does not. Whether a given field is part of the public API or not is one more item in this list.

Yes. But lets not throw the baby out with the bathwater. Just because there will always be things (some of them quite esoteric) that cannot be formally expressed in the interface, that should not necessarily prevent us from picking the low-hanging fruit. Field access control is a simple and well-established technique, not unlike the distinction between var and const. Implying that it is about as useful (and complex) as, e.g., a compiler-checked formal specification of the performance characteristics of a function is incorrect, IMO.

The idea of private fields and getter/setter methods was popularized by Java, but it is an anti-pattern.

I would not say that getter-setters are per se an anti-pattern. Their excessive use in some languages certainly is, but when there is a legitimate need for an encapsulated interface, they're just the thing. And it's not like Java is the only language with private fields. Almost all popular languages have them. I did not have the impression that Rust, for example, was suffering from a getter-setter epidemic.

My recommendation is to name fields carefully and leave them as part of the public API, carefully documenting what they do.

This will work well sometimes, and sometimes it won't. I'm pretty sure that many library authors would prefer to hide some parts of the implementation, rather than carefully arranging things such that there is nothing to hide.

In this case the field name unprotected_counter properly communicates the danger of using the field without using any abstraction. However it being public also allows proper abstractions to be built, if the ones provided are not enough.

I note that the building of additional abstractions is just as likely to be prevented by low-level methods being private. Does this mean we should remove visibility controls etirely and rely on careful naming and documentation for methods and other decls as well? Besides, anticipated disagreements between library authors and users about what should or shouldn't be hidden can also be resolved with an explicit override (e.g. @privateField()).

Then if the API user held the mutex for some other reason, getCounter() would actually deadlock (or rely on a silly hack such as recursive mutexes).

So the fields should be public in this particular case. It does not follow that struct fields in general should always be public. Plenty of valid examples can be presented on both sides of the argument.

The bottom line here is that private fields is an addition to the language, making it more complicated and requiring the language to answer a bunch of questions that otherwise would not exist.

Arguably, adding private fields would simplify the language, at least from the user perspective, since it makes the language more consistent and more in line with patterns familiar from other languages.


In my view Andrew makes a good case for why open interfaces should be preferred in general. He does not make a convincing case for why they should be forced to always be open. Projects differ. Requirements differ. Sometimes what you need is an open library, at other times it's a piece of high-assurance code that is audited forwards and backwards. Neither is right or wrong, they're just different, and I see little reason to pick one over the other when we can easily support both.

@egtann
Copy link

egtann commented Mar 28, 2023

I am a beginner at Zig, but I come from a background in many other languages. Perhaps my experience is wholly irrelevant to Zig, but I wanted to share my perspective as it seems a bit different from @andrewrk.

I’ve noticed private fields have 3 major benefits for library authors, consumers, and the ecosystem as a whole:

  1. Private fields enable library authors to communicate intent in APIs for how things should be used. As a library author I can create a beautifully simple tool, even if the implementation details are complex.
  2. Private fields add enormous clarity to readers/users of the API. In many cases in Go, I know exactly how something can and should be used, and I can’t misuse it. For instance, sync.Mutex had Lock and Unlock. The internals are hidden away. If I need a deeper access I can of course fork it, copy/paste/modify it, write my own or use a different library. But most times, if a library is well-designed for your use-case, its API provides the right abstraction. (I’ve needed to copy/paste/export 1 field exactly 1 time in 8 years, for what it’s worth).
  3. When combined with semver-based tooling and community buy-in, you get backwards compatibility guarantees that are not reasonably possible otherwise. I have been able to reliably upgrade and use Go libraries for years. I’ve encountered just 1 broken package version in 8 years (which go.mod allows me to blacklist) across a 300k LOC codebase with several dozen dependencies. This is unlike any experience I’ve had in any other language in my career (including Java, Objective C, JavaScript/Typescript, Ruby, Python). Go’s ecosystem stability is a wonderful feature of the language.

I do not believe we will have Go’s level of stability in the Zig ecosystem without giving library authors a standard way to say, “this is what I promise will remain stable.” Other things are needed as well, of course, but this seems to be an important piece.

Zig already has default private behavior with requiring authors to make things pub, so pub fields seem like such a natural inclusion from a beginner Zig-user’s POV.

@ityonemo
Copy link
Sponsor Contributor

I don't think zig is a language that will necessarily do something "just because other languages do it".

As someone who used to use languages which have private fields, and who now uses one - not zig - that does not.... It is absolutely amazing. Specifically, it makes debugging very lucid. I don't have to drop to a debugger for 99% of the things I need to do. Fields should absolutely NOT be hidden from access (because otherwise if you want to introspect them, you would need privileged code, i.e. std.debug.print "wouldn't just work"). However, I believe there is a use case for fields to not be assignable values outside of their own scope. It is a common thing in the language that I use that curious programmers dive into what is a "private api" and on a version change things get weird.

@svew
Copy link

svew commented Jul 8, 2024

I think it's worth saying, to rebutt the reasoning about docs as a suitable solution for private fields, that docs are always the WORST way of communicating code contracts to other developers, the best being language features which enforce certain behavior (types, mutability modifiers, access modifiers, etc.), the second best being widely recognized conventions.

Java has the "private" field language feature, Python has an underscore prefix code convention. Both of these are extremely valuable for communicating the intended use of a struct or library. It's valuable not just to the developer who may or may not (probably not) be fully reading the docs of the library they're using, but also to any code reviewers to show that a library is being used in an unintended way. It's not hard to imagine a new developer modifying the "capacity" field of a struct because intellisense suggested it first over "setCapacity" (which does additional, necessary work), and it getting missed during code reviews since there's no indication at all that this is a misuse of the class unless you comb over the library's documentation.

Not introducing ANY concept of "You probably don't want to call/write this function/field" is a huge mistake. If Zig wants to strive for being a maintainable language used by a team of more than a half dozen developers, then it MUST introduce either a language feature or code convention to communicate what is meant to be public API and what is not. It doesn't matter if it's Java-style private fields, or a Python-style naming convention, or something inbetween, any of these are adequate for communicating an unusual use of a library during development and code reviews. Docs, and expecting only developers of a certain rigor will use your language, are simply not an acceptable substitute for this kind of important & easily communicable information.

@Jarred-Sumner
Copy link
Contributor

Jarred-Sumner commented Jul 28, 2024

Private fields would've made this memory leak harder to happen (or maybe prevented it):

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests