New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

self-hosted compiler #89

Open
andrewrk opened this Issue Jan 26, 2016 · 10 comments

Comments

3 participants
@andrewrk
Member

andrewrk commented Jan 26, 2016

Here's the plan: See #89 (comment)

Every time zig releases a new major version, the compiler source code is updated to the version of zig just released.

Currently no major versions have been released, therefore, the compiler is written in C++. When 1.0.0 is released, the self-hosting process will begin. zig 1.0.0 will be able to build the 2.0.0 release, 2.0.0 will be able to build the 3.0.0 release, and so on.

Hopefully this will be friendly toward package managers and make the bootstrapping process simple to understand and execute.

@andrewrk andrewrk added this to the 1.0.0 milestone May 5, 2017

@andrewrk

This comment has been minimized.

Member

andrewrk commented May 5, 2017

When we release zig 1.0.0, we'll actually have the self hosted compiler built and ready to go, passing all tests. Both projects will be maintained until 2.0.0.

@andrewrk

This comment has been minimized.

Member

andrewrk commented Sep 8, 2017

There are some compelling reasons to self-host:

  • Makes f128 work without needing gnu extensions
  • We can write the doc generator in userland
  • Makes builtin overflow stuff work without C compiler extensions
  • Ability to easily cross-compile the compiler

We still want the bootstrapping process to be simple though. So here's another proposal. We get a self-hosted compiler going right now. It's the official zig compiler. However the C++ implementation must be able to build the official zig compiler. As long as that remains true, bootstrapping is 1 step process.

@andrewrk

This comment has been minimized.

Member

andrewrk commented Apr 17, 2018

Things I Want to Improve in the Self-Hosted Compiler

Performance and Caching

  • Max out performance of machine with thread pool and async I/O
  • Pipeline all the work. Split the job up into individual functions that each produce .o files. We'll have LLVM spitting out .o files before the last source file has been tokenized.
  • No mutexes. When a coroutine needs a resource that another thread is working on, it yields to another job, getting resumed (in userspace!) when the resource is available (this looks like async/await)
  • Multi-layer caching. Cache files, cache AST, cache individual functions
  • Establish a file system watch on source files, detecting changes, running through the pipeline (taking caching into account), and atomically update output files in place. The compiler is a long lived process and some of the caching happens in memory.
  • Handle temporary out of memory situations with emitting an event that says "waiting for more memory to be available" and it prints how much was needed along with how much the system has available

Representation of Types and Values

ConstExprValue right now has a lot of footguns built into it,
and it wastes memory. The new data layout should accomplish
these things:

  • Use a minimal amount of memory
  • Have at the very least runtime safety for wrong union field access
    and hopefully more compile errors when adding and removing fields.
  • In Stage 1, the Type tells how to interpret the Value. In self-hosted,
    we should divorce these concepts. This should make comptime casting
    more correctly represented.
  • Introduce lazy values. For example, if you do @sizeOf(error), this
    can create Value that is backed by a LazyComptimeExpr. We could still
    find out @typeOf(@sizeOf(error)) without causing the lazy expr to
    evaluate. Once we get to the end of the compilation, we start evaluating
    all the lazy expressions. If a lazy expression depends on another lazy
    expression, it gets skipped, and we make a note to start over once done.
    If all lazy expressions must be skipped, then it's a compile error, and
    we show the dependency loop.
@phase

This comment has been minimized.

phase commented Apr 18, 2018

We'll have LLVM spitting out .o files before the last source file has been tokenized.

How will this work with name resolution? You can't compile a file that depends on a file that hasn't been parsed yet.

@andrewrk

This comment has been minimized.

Member

andrewrk commented Apr 18, 2018

Here's an example:

  • we have 2 cores and therefore thread pool size 2
  • thread 1 load,tokenize,parse main.zig, which calls foo(), bar(), baz()
  • thread 1 scan top level decls and create jobs to analyze foo(), bar(), baz()
  • thread 1 analyzes foo()
  • thread 2 analyzes bar()
  • thread 1 generates llvm code for foo()
  • thread 2 generates llvm code for bar()
  • thread 1 emits foo.o
  • thread 2 emits bar.o
  • thread 1 analyzes baz(). baz() calls @import("quux.zig")
  • thread 1 load,tokenize,parse quux.zig
  • etc
  • main thread calls LLD linker on all the .o files

You can see from this example we would get better parallelism if we prioritized analysis of functions since that creates jobs for the pipeline - it would make thread 2 have something to do while thread 1 analyzes baz(). But this should illustrate the idea.

@ghost

This comment has been minimized.

ghost commented Aug 21, 2018

0.3.0 ... seemed so close 😃

@andrewrk

This comment has been minimized.

Member

andrewrk commented Aug 21, 2018

Yeah. I couldn't make the deadline. 0.3.0 is two weeks away and I think those two weeks can best be spent on:

  • Stack traces for windows and MacOS
  • Documentation
  • Bug fixes
@ghost

This comment has been minimized.

ghost commented Aug 21, 2018

building master always anyway 👍

@winduptoy

This comment has been minimized.

winduptoy commented Nov 12, 2018

Something that I would like to see with a self-hosted compiler is the ability to import the compiler as a library within my application to compile and link new code while running. For example, https://github.com/anael-seghezzi/CToy embeds tcc and provides a creative coding environment that does not require a restart when modifying code.

Is such a thing possible already when importing std.build?

@andrewrk

This comment has been minimized.

Member

andrewrk commented Nov 12, 2018

Something that I would like to see with a self-hosted compiler is the ability to import the compiler as a library within my application to compile and link new code while running.

Unfortunately that's not ever going to be possible, because of the LLVM and clang dependency. Zig compiler ships with LLVM and clang libraries built into the zig compiler. And Zig supports cross compiling for many targets. To import the compiler as a library would require that LLVM and clang were available in source form (written in zig) so that they could be cross compiled for the target. To give up LLVM/clang and code our own code generator in zig would be giving up state-of-the-art optimizations and a very active community of people working on it.

However, the parts of the compiler that do not depend on LLVM and clang are available in the standard library, for example the parser and formatter. std.zig.parse and std.zig.render.

As for a coding environment that does not require a restart when modifying code, see #68.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment