Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread local variables #924

Closed
andrewrk opened this issue Apr 15, 2018 · 25 comments
Closed

thread local variables #924

andrewrk opened this issue Apr 15, 2018 · 25 comments
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

var x: i32 = 1; // global variable
threadlocal var y: i32 = 2; // thread local variable

here's 1 use case (taken from std/debug/index.zig):

var panicking: u8 = 0; // TODO make this a bool

pub fn panicExtra(trace: ?&const builtin.StackTrace, first_trace_addr: ?usize,
    comptime format: []const u8, args: ...) noreturn
{
    @setCold(true);

    if (@atomicRmw(u8, &panicking, builtin.AtomicRmwOp.Xchg, 1, builtin.AtomicOrder.SeqCst) == 1) {
        // Panicked during a panic.

        // TODO detect if a different thread caused the panic, because in that case
        // we would want to return here instead of calling abort, so that the thread
        // which first called panic can finish printing a stack trace.
        os.abort();
    }
    const stderr = getStderrStream() catch os.abort();
    stderr.print(format ++ "\n", args) catch os.abort();
    if (trace) |t| {
        dumpStackTrace(t);
    }
    dumpCurrentStackTrace(first_trace_addr);

    os.abort();
}
@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Apr 15, 2018
@andrewrk andrewrk added this to the 0.3.0 milestone Apr 15, 2018
@PavelVozenilek
Copy link

  1. The compiler should emit a warning/error if thread local variable is used by single thread only.
  2. Taking pointer of such variable and passing it to other threads may be forbidden by the compiler.
  3. When the value is initialized at comptime: is it calculated once, stored somewhere and then initial value set for every thread? What if comptime evaluation returns different value each time, say some unique number?

@phase
Copy link
Contributor

phase commented Apr 16, 2018

Instead of making this a language-level feature, could this be implemented in the standard library?

@bnoordhuis
Copy link
Contributor

Technically yes but you'd miss out on out-of-the-box tool support. gdb and lldb know about .tbss and .tdata segments; any zig-specific scheme they would need to be taught first.

  1. The compiler should emit a warning/error if thread local variable is used by single thread only.
  2. Taking pointer of such variable and passing it to other threads may be forbidden by the compiler.

I don't know if that is a reasonable expectation, that kind of data flow analysis is a Hard Problem. I'm not even sure it's possible in general without imposing additional restrictions à la linear types.

@bheads
Copy link

bheads commented Apr 16, 2018

For thread safety you may want to consider thread local by default and require the user to declare memory as global. Thread local and and a thread safe way to move memory between threads would be a huge win.

@andrewrk
Copy link
Member Author

For thread safety you may want to consider thread local by default and require the user to declare memory as global.

I understand this is what D does, but I'm not convinced this is the best thing to do. Thread local data has a very specific use case but it is not a general solution to data races within a thread. For example, thread local buffers cannot be used in a function which is directly or indirectly recursive. It also comes at a cost. Less thread local data makes threads less expensive.

@bheads
Copy link

bheads commented May 4, 2018

@andrewrk Just spent an hour dealing with a thread local bug in D, I think you are right...

@nordlow
Copy link

nordlow commented Jun 26, 2018

I'm curious...what alternatives to thread-local-by-default are you thinking of, @andrewrk?

Rust-style ownership and borrowing?

Pony-style builtin actor-model with thread-local deterministic GC?

@isaachier
Copy link
Contributor

@nordlow who said the alternative is more complex than the simple approach above: threadlocal var y : i32 = 2;?

@nordlow
Copy link

nordlow commented Jun 26, 2018

Doesn't that put Zing in the inter-thread-data-races-by-default language group?

Which is what languages such as Rust and D has "designed away" and I thought no new system language ever will have again?

@isaachier
Copy link
Contributor

As I've said in another issue, that is exactly what Zig's specialty is: allowing precise control over dangerous scenarios as long as errors are returned explicitly and not thrown up the stack. That way the user is in control of the behavior, whether or not races are involved. I think Zig competes with Rust because it is much more flexible and does not burden the user with proofs of correctness upfront. That can always be added later using external tooling/annotations (see Frama-C for an example of this). Personally, I don't see the use of globals being thread-local by default even when threads aren't being used.

@nordlow
Copy link

nordlow commented Jun 26, 2018

Are you saying that Zig will statically detect data-races and notify them to the developer as compilation errors?

@isaachier
Copy link
Contributor

I doubt it. Does C do that? More likely a runtime tool will exist to check for races (similar to Clang's ThreadSanitizer). If you are looking for a language that makes it impossible to shoot yourself in the foot, I think Zig is not the answer. If you want a language that lets you shoot yourself in the foot, but provides a myriad of ways to avoid it and detect it, then Zig is the answer.

@andrewrk
Copy link
Member Author

andrewrk commented Jun 26, 2018

I'm curious...what alternatives to thread-local-by-default are you thinking of, @andrewrk?

It sounds like you see thread-local-by-default as solving some problem, and I challenge that here: #924 (comment)

For concurrency (See #174), I'm experimenting with async/await (an event loop with coroutines multiplexed upon kernel threads) and atomics in the self-hosted compiler. If I can show that you can use higher level abstractions in this style relatively easily, then I think the problem is solved.

@nordlow
Copy link

nordlow commented Jun 26, 2018

No, I of course realize that thread-local storage has its pitfalls as well. But it's more unlikely to have race-conditions in a multi-threaded context when top-scope variables are thread-local by default. That's at least my experience. My private language of choice is D. I am however very interested in the progression of other languages such as Rust and Zig and want to understand all the different ways in which we can make best use of multi-core CPUs in an as safe way as possible. D attacks this using strong or weak function purity (pure), default thread-local storage and immutable GC-backed allocation. Rust uses ownership and borrowing combined with atomics and refcounted allocation at the bottom of its stack. I'm very curious if Zig has a similar or another strategy for tackling the problem of dealing with memory-safe concurrency (task-based parallellism).

Update: Ahh, sorry I'm confusing thread-local storage with function purity and strong immutability (shareable by default). To safely send data either by immutable or isolated references we need some kind of built-in data qualifiers for expressing immutability and isolatedness. What is Zig take on these issues?

@isaachier
Copy link
Contributor

I strongly disagree. The simplest optimization for multithreading I know of is to create a thread pool. Work is put into a queue, then operated on. Now imagine task A is interested in queueing task B but assumes it shares the same variables. By making all variables thread local we actually are more likely to make an error if task B runs on another thread than we would have if the variables were truly global.

@andrewrk
Copy link
Member Author

Note, given that some architectures/OSes do not support TLS, this makes #1764 especially important. With #1764 I feel comfortable accepting this issue, because we can make thread local variables be global variables when --single-threaded is selected. This protects the OSes/architectures that do not support TLS.

@daurnimator
Copy link
Contributor

This protects the OSes/architectures that do not support TLS.

Isn't TLS always possible, but not always fast/efficient?
You can always degenerate into perthreadvariables[gettid()]

@andrewrk
Copy link
Member Author

andrewrk commented Feb 1, 2019

You could do that if you were always in control of creation of threads. But if you are, for example, a library, and the thread is created externally and then calls your function, then you would have no perthreadvariables global. It has to be created when thread memory is allocated. That's the main point of TLS as a language feature, is that it goes into object files and libraries, and the linker keeps track of the perthreadvariables.

@daurnimator
Copy link
Contributor

a library, and the thread is created externally and then calls your function, then you would have no perthreadvariables global.

Why couldn't it be local to the library? (and infact the space per thread only needs to consider the amount of thread local storage used by that library).

@andrewrk
Copy link
Member Author

andrewrk commented Feb 1, 2019

Can you elaborate in detail how it would work? Here are some example questions I have: Where is the per thread memory? If allocated statically, how do you know the total number of threads that will ever be created? How do you know that two calls to gettid() which return the same value, refer to the same thread, and not a recycled tid? If allocated dynamically, how do you deal with allocation failure, when a variable load and store cannot fail?

@daurnimator
Copy link
Contributor

Where is the per thread memory?

Statically via loading/linking library

If allocated statically, how do you know the total number of threads that will ever be created?

You may not! perthreadvariables would need to be of length max_thread_id. This isn't even that bad if there is an MMU.

How do you know that two calls to gettid() which return the same value, refer to the same thread, and not a recycled tid?

Good question. This does seem to kill the concept when you have no thread-cleanup. I guess we can't count on posixy robust mutexes here?
When I saw this idea last applied it was at the kernel level where you could at least set up handlers for thread cleanup.

@andrewrk
Copy link
Member Author

andrewrk commented Feb 1, 2019

Now that we have #1764 done, this issue should be a breeze. The important thing to note here is that on some architecture/operating system combinations, thread local storage is not available. On these targets, --single-threaded is always on, and thus in Zig you can always use thread local storage, because it will become global variables in this case.

If someone knows of a target that supports threading and does not support thread local storage, I would love to know about that.

@bfloch
Copy link

bfloch commented Feb 3, 2019

Some references I found:

Here is Ulrich Drepper's paper on the TLS implementation in ELF:
https://akkadia.org/drepper/tls.pdf

If someone knows of a target that supports threading and does not support thread local storage, I would love to know about that.

Motivation of D's TLS by default:
http://www.drdobbs.com/cpp/its-not-always-nice-to-share/217600495

Walter mentions that OSX does not have TLS, or more specifically it did not as of 2009 although C++11 was supposed to push this due to the standardised keyword.
Manual implementations back then (2010/2009):
http://www.drdobbs.com/architecture-and-design/implementing-thread-local-storage-on-os/228701185
https://lifecs.likai.org/2010/05/mac-os-x-thread-local-storage.html

If I an not mistaken the __thread C keyword support was added to OS X 10.7 (2011).

Clang in XCode 8 added support for the C++ keyword as seen on TV (2016):
https://developer.apple.com/videos/play/wwdc2016/405/?time=354

They also mention the differences/limitations of the C++ (all types, compatible but slower) vs. C keyword (basic types+POD only but faster).

The Mach-O TLS section was added around 2015 I believe (based on clang commits) so I assume that before one of the workarounds was used. I am by no means an expert here so correct me if I am wrong.

But, based on this example, it might be safe to assume that a platform with threads does not necessarily provide build in TLS support, depending on their object format. Potentially libraries/compilers need to roll the own which is what D did back in 2010.

@andrewrk
Copy link
Member Author

andrewrk commented Feb 3, 2019

But, based on this example, it might be safe to assume that a platform with threads does not necessarily provide build in TLS support, depending on their object format.

Thanks for doing this research. However, I'm not sure I agree with your conclusion.

The way I would go about this is starting with the LLVM documentation, which says:

Not all targets support thread-local variables.

Unfortunately it doesn't say more than this in the documentation, so it is necessary to dive into the source to find the actual list of targets and whether they support TLS.

Next, look at each target which does not support TLS one by one and try to come up with a program that uses threads. I suspect that for each of these, in Zig, we can make --single-threaded unconditionally enabled. Any use cases which are exceptions to this we should examine explicitly, and not in an abstract sense.

@andrewrk
Copy link
Member Author

andrewrk commented Feb 6, 2019

I have this working for Linux x86_64 (need to polish it up a bit before committing). Next, the other supported targets. MacOS and FreeBSD are probably easy since they always link libc, and thus handle the thread local storage setup before calling main. The Windows one is a complete mystery; I have not looked up how that will work yet.

  • Linux x86_64
  • Windows x86_64
  • MacOS x86_64
  • FreeBSD x86_64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

9 participants