-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concatenative languages - Quark #8
Comments
I've briefly explored concatenative languages, yes. I find the concept very intriguing. I've briefly played with integrating some concatenative capabilities into Tao, although I believe that currying and function composition net a considerable number of the benefits that concatenative features bring. If you have ideas about this, I'd love to hear them! As it happens, I'm currently working on a functional language backend for a new rework of Tao's compiler. I've read through the intro page on Quark and it sounds very interesting. Perhaps I'll play around with it tonight. I like the similarities it has with LISP. I know that on the esolang Discord languages like this have been discussed a lot. I actually came up with a stack-driven esolang in a similar vein: https://esolangs.org/wiki/Hanoifuck. As an aside, you might be interested in my |
Yes, currying (and in general function composition) is very powerful and allows for quite clear semantics when reading the code and is a firm base for (automated) correctness checking. This will be very subjective, but I fell in love with the concatenative concept (not so much with most concatenative languages I've tried so far 😢). It's for the sole reason of me finding it truly "unconstrained" with regards to composition. After writing something bigger in functional style (at best in a functional language), the more SLOC I wrote the more it felt tedious to manage all the pure surgically separated and utterly long "serialized branches" (code paths) where making a change requires mentally "synchronizing" these branches at deeply buried places which in practise feels like jumping randomly in a homogeneous infinite number of symbols on a tape (as it kind of lacks enough intuitively & quickly mentally recognizable points of interest leading to partial quasi-solutions like minimap getting widespread). I made an observation that after trying to write something in concatenative langs I felt like I don't want to go back to the verbose and tedious world of all other languages (kudos to Red/Spry/Rebol and partially V as the only exceptions in this regard - but they have their own difficulties...). For me functional style solves the "coding mess" by introducing overly narrow play field always leading to correct, but tedious and verbose stuff. Concatenative with pattern matching (or similar "cherry picking" mechanism) like in Quark feels much more fluent (and as a side effect is less verbose). So I'm not sure how could Tao be adjusted to target the verbosity and "surgical linearity" coming up from being functional. But let's contemplate a bit. I for myself have noticed there are three things greatly affecting general usability of any language (did I already tell you I'm into languages because of my obsession for practicality and pragmatism?):
How does Tao approach these 3 dimensions? How does Tao support returning multiple values? At least an arbitrary order of them (or even some simple pattern matching) specified at the place of use (not the place of function definition as that would serve only as the "default" order) would be useful for practical use. Is Tao's typing concept easily extensible and arbitrarily fine-grained (incl. fully dynamic)?
Pretty cool! Stacks in general are a very powerful concept if used carefully. How did you get the idea?
Curious what you found out 😉. Any observations & thoughts?
Yes. I actually took a look at Atto before I started this thread and noticed a huge PR zesterer/atto#4 which seems to be exactly what you linked above. I find Atto a nearly-concatenative language with pretty typical structure and properties 😉. The first thing I noticed is there are quite many "helper symbols/keywords" which could be omitted to get a more lightweight feeling. Second thing is it feels pretty strict with the prefix notation as well as with no higher level way to shuffle data around (yep, this "shuffling data around" implemented through pattern matching is what makes Quark IMHO so superior among concatenative languages despite the pattern matching being still rather basic). Any plans with Atto? |
Let me try to address these in turn:
Tao has static type-checking and hindley-milner type inference. In practical terms, this means that the type of any value must be known ahead of time and gets checked against other elements of the program. Type inference means that it's rarely necessary to manually annotate a value with a type hint because the compiler is allows to figure it out based on context. For example: # The type of `five` is inferred to be `Num`
let five = 5 in
# The type of `x` is inferred to be `Num` from the addition, and so `add_five` is inferred to be `Num -> Num`
let add_five = |x| x + five in
# The type of `add_ten` is inferred to be `Num -> Num` also as a result of the product of each call to `add_five` inside it.
let add_ten = |x| x:add_five:add_five (In the example above, As you can see, despite never needing to explicitly annotate types, the compiler is still able to infer the type of every expression based on the information it has available to it. Type-checking also occurs at the same time as type inference and ensures that every Tao program is well-formed and that the types of all values are appropriately compatible with one-another. In the near future I hope to expand the language with subtyping.
Tao functions represent strict mappings between two types such as Tao does have support for tuples and lists though, so multi-parameter, multi-return functions can be easily achieved. For example: # `sum_and_product` is inferred to have type `Num -> Num -> (Num, Num)`
def sum_and_product = |x, y| (x + y, x * y) (The I can't currently envisage a way in which this might be expanded to fully encompass the behaviours that concatenative languages permit, although I'm open to suggestions. Currently, it's necessary to deconstruct the results of multi-return functions using a
Tao is designed to provide predictable, deterministic, safe semantics that are easy to reason about. For this reason it doesn't support the sort of low-level pointer-juggling operations you speak of (not least because the future optimising backend I'm working on for it would make reasoning about program structure fairly difficult). I'm not inherently against things like AST-level macros but I generally consider them a hack that makes it unnecessarily difficult to reason about program semantics. If Tao does support things like meta-programming, it will do so in a way that takes into account the type system and makes it trivial to understand the behaviour of a piece of code.
Extensible in the sense that Tao supports type parameters, custom data types (including recursive ones), and with plans to expand this out into a trait/typeclass system to permit interface-driven programming. I don't plan to add support for dynamic typing though (see this blog post for some conclusions that I also reach when using dynamic type systems). You can emulate 'dynamic' typing with lists/maps and enums though. Encoding something like arbitrary JSON data in Tao's type system would be trivial to do.
I enjoy stack-driven languages and I'm also the creator of an optimising Brainfuck compiler. It occurred to me that marrying both together should be perfectly possible.
I enjoyed the simplicitly of the pattern-matching. It feels like a more rugged language than other concatenative languages like Forth, although unfortunately I'm not personally familiar enough with concatenative languages to properly appreciate the syntax.
Interestingly, Atto has almost no inherent keywords. Even basic operators like
Atto is first and foremost an esoteric language and as such it was designed to be incredibly simple to implement (hence the self-hosting interpreter). As a result, I've not really made any attempt to make it more expressive than what is minimally required to reach that goal.
The |
MOTIVATIONAL INTERMEZZOWhat I mean with practical is only one thing: a programmer just wants to get things done - i.e. as fast as possible and with as low friction as possible. First and foremost she needs to completely forget about formal correctness checking for a while and mimic exclusively the risk as is in real world (without taking any future into consideration because that's the job of the programming language and technology - to make arbitrary future changes super easy - unlike we were all told years ago at schools to "think about future"). Majority of risks in reality are being accepted with no action, some lowered (e.g. by paying insurance) and only a marginal amount mitigated. E.g. a manager tells her "JS APIs won't cause any big financial losses in our business if they won't work properly". If a reviewer (and later the manager) finds out, that she wrote some boilerplate code on top of the needed minimum (which took her considerable amount of time) and that she was the person who advocated for the technology (e.g. the programming language she used), she'll be considered a total greenhorn junior dev unable to cope with even the most trivial stuff from reality. END OF RANT 😉TLDR; I'm looking for a language (maybe a concatenative one or maybe not) with very fast compilation times (seconds at most), readable short syntax (imagine V but not necessarily C-like), dependent types-like checks support instead of types (imagine Dependent Types and Certificates for parallel correctness but with unrestricted form of dependent types, less verbose, and more modern with higher-level concepts), with pattern matching syntax instead of variables (imagine pattern matching from Quark but more advanced to accommodate for full support of dependent types), and conforming to the 3 major "acceptance criteria" I outlined in my comment above. Some "random" observations:
I'll react to Tao-specific and Atto-specific points from your comment separately. But feel free to not wait for it and write what you think 😉. |
Hmmm. I understand and empathise with a lot of your points, but I'm not sure I agree with several of them.
I don't really think this is true. Programmers often have a particular relationship with a programming language, and it's one in which they, the developer, are sentient and wish to progress forward. It is the job of the language to interpret their wishes in a formal and consistent manner while pointing out inconsistencies in their ideas. This is the almost universal story of the progression of type systems: becoming ever more capable of interpreting abstract concepts (user-define data types, generics, typeclass constraints, higher-ranked types, etc.) in order to correctly ground human thought in sound, provable logic. They aren't there to push the user forwards: rather, they're there as a brake mechanism on the mistake-driven tendencies of the human mind.
This is a misnomer from the past. In the 1980s one might drop out of C and jump into assembly in order to squeeze a few extra cycles out of the hardware. That hasn't been the case for about 20 years, however. Modern compilers are far more capable of making sensible choices about optimisation and representation than even the most well-versed human assembly programmers and are commonly capable of leaps of logic that are simply missed by programmers that are not working on a problem for many, many hours. As an example, take Rust's iterator API: let answer = (0..1000)
.filter(|x| x % 2 == 0)
.map(|y| (0..x)
.map(move |x| x * y))
.flatten()
.sum(); This simple section of code represents some rather complicated logic but represents it in a manner that is, in my view, as reasonably intuitive and simple as one could formally make it. You might argue that it has terrible performance characteristics: indeed, we're creating closures, passing variables between scopes, chaining method calls, and doing lots of bizarre and slow things. You'd be wrong. With optimisations enabled, the Rust compiler will flatten every aspect of this out into extremely efficient code, parallelised using SIMD, that is inevitably much more efficient than that which would be written by even the most experienced assembly programmers. You might then change your argument a little: perhaps this is simply a product of the codegen backend? Surely writing out the logic as something 'close to assembly, but not quite' would allow the compiler to do similar. This, too, would be wrong. Rust's iterator API actually statically allows the compiler to prove more invariants about the code by laying it out in such a way that makes the data dependencies clearer, thereby enabling more optimisations. If you were to write equivalent code using some good old There's another advantage here too: everything is type-checked. It's very difficult to write iterator code and get it wrong. It's much easier to get tangled loop logic wrong, however. This fact significantly reduces the mental overhead associated with writing code like this and frees up the programmer to concentrate on more useful elements: business logic and architecture. All of this points towards a rather important thing that's more true today than it ever has been and is likely to become more true in the future: abstraction allows the compiler to generate faster code. If you can define your code without reference to side-effects and curious but irrelevant artifacts like exactly how one constructs loops, it frees up the compiler to make much better decisions about these details when turning your code into an executable. Languages that permit this sort of abstract, declarative styles (i.e: functional languages) are starting to accelerate past tradition imperative languages in performance and this trend is likely to continue as compilers get better at applying high-level optimisations to code.
I don't inherently disagree with this. Being able to prove compile-time invariants is extremely useful. That said, dependently-typed languages have a long way to go before this can be made generally useful. In the meantime I think constraining data ranges Ada-style and taking a 'parse, don't validate' approach to program construction is much more likely to bear fruit in the space of maintaining program invariants. Rust, again, takes this to heart: Its
This sounds a lot like you'd enjoy Zig. It does a lot of development checks that then get disabled in release builds. I don't personally sit well with this methodology, however: correctness is an important principle to me and I like languages that constrain their semantics, as much as possible, to sets of programs that are safe and correct.
I think, as mentioned earlier, that you might be getting the wrong end of the stick from the article you linked. It's not making the case for a language based upon portable assembly: instead, it's making the case for alternative CPU designs that take advantage of languages that don't fit so cleanly on top of C's programming model and that making the assumption that 'all languages look like C' is a poor assumption.
I'm not convinced about this one. Code is generally read (and executed) far more times than it is written. It's important that a language constrains the side-effects of a piece of code in a manner that makes following its logic easy for an external observer. Writing code is expensive in the abstract, but getting it wrong or maintaining it over a long period of time is arguably much more expensive. If we want to build languages that can stand the test of time, they must have semantics that prioritise the long-term stability and legibility of a codebase, not the short-term off-the-cuff thoughts of a tired programmer. Being able to optimise for both at the same time is, obviously, the most most desirable outcome though.
This is another way of saying 'validate, don't parse' and it turns out to be a very bad idea for the long-term health of a large codebase. It may suit the needs of a programming cranking out some 300-line demonstration, but it simply does not scale well. Refinement types, as a point of interest, share similarities with the runtime assertions that you mention but deviate in one important (and incredibly useful) aspect that allows them to scale well across a codebase: once a refinement upon their representation is known, that refinement becomes associated with the value's type and hence no longer requires verifying later on in the program. This is crucially important. It means that if an earlier validation gets removed (or its scope widened) you get a type error at the point of use since the refinement is no longer valid. This massively reduces the mental burden of using them. |
Very interesting observations! I'm glad you are raising them as I can then better convey my thoughts.
I agree. I meant before any technology (incl. language) gets even chosen for the task.
This is a huge misunderstanding and I'm sorry for not articulating what I mean clearly enough. I'm missing really just abstractions over purely HW-related stuff, not arbitrary abstractions in code. E.g. I don't know of any simple (HW-focused) abstraction over transactional memory nor over all the CPU caches despite the models HW implements are really simple. But languages do not offer anything to my best knowledge (and no, intrinsics - at least those I know of - seem not trying to mimic the HW model, but are just syntactic sugar over pure ASM). Even SIMD you mentioned are offered in too simplified forms if any (no, compiler's automagical vectorization doesn't count here at all) and the HW SIMD model is only available in a horrible and totally unsafe form as assembly. To prove I meant really just HW-related abstractions, feel free to read my more than a year old wrap up of parallelism incl. why I consider SIMD parallelization to be actually a solved issue (hint: exactly because there are nice abstractions - many inspired by functional programming as you wrote - e.g. the state of the art libraries https://github.com/mratsim/weave and https://github.com/zero-functional/zero-functional and https://github.com/mratsim/Arraymancer for Nim or https://github.com/rayon-rs/rayon for Rust). One could also say I'm looking for HW abstractions which are higher-level and much safer than assembly but lower-level than e.g. Linux/BSD/Windows APIs offer to user applications. Something like exo kernels or typical type-1 hypervisors with paravirtualization API/ABI offer - but built-in in a language (or as part of minimal standard library in case the language has free form syntax).
Here I can't entirely agree, because I have strong experience, that the most severe bugs in terms of true losses & harm (not to be confused with frequent or common or expected bugs) are those inexpressible with Ada-style type constraints capabilities (though Ada is much closer to truly dependently typed language than any other I know of). Of course, this doesn't imply, that expressing only those invariants causing losses would automagically eliminate the need for expressing Ada-style invariants. Albeit in practice I'd bet expressing invariants causing losses would guide one to express also other invariants thus covering both cases. Notice also that I'm proposing
Actually many languages do that as it turns out very useful as development environment is always different from deployment environment and thus e.g. the setup is different, optimization needs are different, development HW is different, mocking is different, inputs are different, etc. I'm not the biggest fan of Zig especially due to its enormous unnecessary complexity and verbosity (it implements nearly any and every lower-level abstraction and idea from the whole IT computing era), but that's subjective. I know my coworker likes it a lot and I'm supporting him wherewer I can to allow widespread use of Zig as I feel that's the language which should replace C (it won't, but anyway 😉).
That's interesting. Could you elaborate more why you "don't sit well with this methodology"?
Yep, that's how I'd envision default behavior of a good language. On top of this I require the language to offer means to change the default behavior (that's the goal (3) from my previous comment above) - ideally with some sort of controlled shouting from the compiler (controlled e.g. by shouting especially in the "development" mode as discussed above and less shouting in "production" mode).
Sure, it's not about making "portable assembly", but about using (and creating I'd say) languages which actually embrace what modern HW offers. Which in the context of the article (i.e. C ecosystem and use cases) is another way of saying what I called "more bare" above.
I actually thought you might bring up this "read more times than written" observation (which on its own is totally correct and valid) to justify focusing solely on the resulting code in programming language design (which is false). There is a clear reason why this is not at all applicable to practical programming workflow as outlined in my previous post above. Unfortunately language designers seem to completely dismiss this and fall into this "read more times than written" trap 😢. It's because editing the code is the only inherently serial work which one person has to do to capture the algorithm. Everything else depends strictly on this "editing stage" being finished (i.e. the result is e.g. compilable without errors). Any reading (reviews, audits, learning, refreshing someones mind, etc.) happens fully in parallel to other activities (like compilation, pre-production test runs etc.) - but still only after the "editing stage" got finished. Not speaking about the fact, that any reading takes an order of magnitude less time than editing. So any argument that language shall make the resulting code a priority over the editing workflow is a plain fallacy and IMHO is one of the major reasons why IT industry is known to have the highest risk of any endeavor among all existing industries.
Hmmmm, I actually think the approach I've described does exactly parse, don't validate (and not validate, don't parse). The semantics of combinators and pattern matching (even as implemented in Quark) is not to tap on the stripe of symbols (i.e. read some part of memory, leave the memory as is, act upon the read stuff and then store the result somewhere else), but exactly the opposite. Namely to pop items out of memory first and then act upon them and "return" the results in place of the recently popped items. Which is exactly the concept "consume less structured, produce more structured" as described in the article. I called it ad-hoc looking, because it feels so, but it's not. IMHO it's even enforcing this parse, don't validate principle unlike actually Haskell or any other non-concatenative language I know. I'd go even further and say, that validate, don't parse exists merely because lambda-calculus-based functional programming without enforcing linear type system everywhere (which seems to account pretty much for more than 99% of all functional languages) emerged 😉. Actually Clean as a functional language which does support enforcement of linear typing seems to be surprisingly close to what concatenative languages offer in practice (though with more boilerplate) - I'm mentioning this to support claims like Joy is in fact closer to FFP than any of the languages mentioned so far from Manfred von Thun's papers (see much more on http://www.kevinalbrecht.com/code/joy-mirror/index.html and e.g. https://github.com/Wodan58/joy1/tree/master/lib - it's very informative and for some kind of groundbreaking stuff - it at least opened my eyes some time ago).
Hmmmm, I'm not sure I follow here. Let's see a simple example (assume a pure language with immutable values): fn f( v: Array<string> | boolean | Event ){
if( Array.isArray( v ) ){
return v
} else if( typeof v === "boolean" ){
return v
// v instanceof Event
} else {
return v
}
}
x = true
y = f( x )
f( y ) So based on what you wrote the call If it wouldn't produce any type error, then if we made the scope broader: fn f( v: Array<string> | boolean | Event ){
if( ( Array.isArray( v ) ) || ( typeof v === "boolean" ) ){
return v
// v instanceof Event
} else {
return v
}
}
x = true
y = f( x )
f( y ) Then by the same logic we'd still not get any type error. So I still do not follow what you meant. Anyway I'm certain the language with Thoughts? I actually didn't plan to lean on concatenative languages in this thread, but it somehow forces me to. That makes me wonder what else I'm putting too low emphasis on with regards to programming languages. Thanks again for your valuable inputs and discussion! |
@zesterer speaking of advanced types you might be interested in the language Formality (see the broader scope in https://github.com/Soonad/Whitepaper ). |
Formality seems very nice. I believe it shares a lot of ideas with the language I'm writing. I'm not too keen on some aspects of the syntax, but that's simply a matter of personal preference. |
I'm reading your personal web site and seeing all your endeavours with languages and I like your attitude you expressed e.g. in Why didn't you use an existing game engine?.
Have you also already explored concatenative languages in general (your
atto
seems also concatenative)? If not, I'd suggest taking a look at Quark (there is also an original Ruby implementation in few tens of lines).And sorry for being off topic - I didn't want to spam you privately 😉.
The text was updated successfully, but these errors were encountered: