# Why xDSL?

Most programming languages are rather far removed from Assembly code. As a result, translating a language directly into Assembly is both difficult (easy to make mistakes) and inefficient (generated Assembly code is slow!). Therefore, most compilers goes through many intermediate representations / languages (IRs), which represents different layers of abstractions, to provide an efficient translation down to Assembly code - which is also feasible for compiler engineers to write. 

Although an effective approach, over the years it has become apparent that many compilers are repeatedly defining similar intermediate languages. Given the separate architecture of different compilers, it has been quite difficult to reuse code between architectures. 

Previous projects such as LLVM has provided a common abstraction layer for different compilers to reuse - however this is only one layer, and compilers of different domains are still having to repeat significant amounts of work to reach the LLVM level. 

LLVM is also not necessarily the ideal layer to perform optimisations on - thus many compilers will also implement their own implementations on custom IRs before they reach LLVM.

MLIR, and its Python parallel, xDSL, are frameworks to make it easy to:
1. Define custom abstraction layers for compilers
2. Reuse abstractions from other compilers
3. Encode domain-specific knowledge (i.e. optimisations) within these abstraction layers

<!-- Covered here: --!>
<!-- - How compilers use IRs --!>
<!-- - Why a framework for IRs are needed and is useful (ease of specifying optimisations)  --!>

# General Concepts

The kind of IRs that xDSL builds are SSA-based - a restriction placed on IRs which makes it easy for analysis of the code.

### Static Single Assignment (SSA) 
The kind of programming languages xDSL defines all have the $\textit{static single assignment}$ (SSA) property. It is a property on variables that means they can be assigned a value only once, and once assigned a value, they cannot be modified. This is a common restriction used within compilers, as it enables many compiler optimisations to be performed.

Further reading: https://en.wikipedia.org/wiki/Static_single-assignment_form

# xDSL/MLIR
xDSL and MLIR are distinct frameworks for compiler IR specification, however, they are built upon similar basic concepts.

## What is an IR, anyway?

xDSL is a framework for modelling/specifying IRs. Before we go into the details of xDSL, it is perhaps useful to take a step back and consider what exactly an IR is!

Code can be represented in a human-readable textual format - however strings are difficult for compilers to handle. An IR is an *equivalent*, structured representation of code convenient for machines to parse through. 

Importantly, it is solely a representation of the code - and is not executable!

Wikipedia has a nice article as a starting point for further reading: https://en.wikipedia.org/wiki/Intermediate_representation

## Programming language syntax
All programming languages will have a set of "language construct", or, the _vocabulary_ available within the programming language.

These constructs can be composed together, forming "expressions" - which are sentences that _could_ be evaluated to a value. 

Just like in English, though, vocabulary (words) cannot be composed together arbitrarily - they should follow the syntactical rules of the language. For programming languages this is known as the _syntax_ of the language, and is often specified in terms of a [grammar](https://en.wikipedia.org/wiki/Formal_grammar).

Here are some examples of building blocks within various languages:
* Arithmetic: a language that describes arithmetic might have constructs including:
    * `+, -, x, /, round, max`
    * The values of this language are real numbers (floats, say). 
    * One syntax rule in this language is that each construct should take in either values, or other valid expressions.
* Python: includes too many constructs to fit in this code block, but some examples include: 
    * `for, if, while, assert, def, class, import`
    * Since Python is object-oriented, values in Python language are objects.
    * A valid expression in Python would follow the [Python grammar](https://docs.python.org/3/reference/grammar.html), a rather large document specifying all syntax rules of Python. 

## Programming language semantics
<!-- In English, a sentence being grammartically correct is not enough for it to convey _meaning_. With the wrong combination of words, it would still be a meaningless sentence! -->

<!-- Similarly for programming languages, not all syntactically correct sentences are valid. For instance, the following is a valid C++ expression: -->
<!--     `50 + *nullptr` -->
<!-- yet when executed, it will not always yield the same value - that is, the expression does not have a well-defined value associated with it. And so, it is difficult to associate a sensible value to the <!-- expression. -->

<!-- Whilst the exact meaning of phrases is a hazy concept for English sentences, it is possible to define exactly what an expression in a programming language calculates. The semantics of a programming language is a map from programs written within it, to the values it evaluates to. -->

<!-- However, in industrial languages the semantics is often not specified concretely, rather programmers and compilers have an intuitive understanding of what values an expression calculates -->

## xDSL: A tool for designing languages for compilers


xDSL provides the tools for describing IRs. To do so, it needs to be general enough to fit the wide range of potential designs and uses of language constructs.

xDSL and MLIR achieves this by introducing two concepts, `operations` and `attributes`:
1. `operations`, which is an abstract form of "language constructs" - think reserved keywords and built-in operations within a programming language.
    * Examples of things that _can_ be operations: if/else/for/while statements, arithmetic operations (+, -, *, /), memory reference operators (e.g. &, * in C++)
2. `attributes`, which are generic constructs able to store extra information that is available at _compile time_. 
    * Examples of things that _can_ be attributes: types (float32, tensor<10x5, float>, function), data which are present within the code that affects semantics of operations (e.g. the stride to use in a convolution operation), stores for [dataflow analysis](https://clang.llvm.org/docs/DataFlowAnalysisIntro.html) (liveness, range analysis)
    
As you can see, operations and attributes are both very general constructs, able to describe a vast range of objects. We will see how exactly this is achieved in the following sections.

This will also be followed by more concrete examples in action.

### Operations: Modelling general language constructs

Operations describes the general concept of a "language construct", or, the _vocabulary_ available within the programming language. A language would have many of these constructs, which can be composed together to form _expressions_ - things which can be evaluated to some value.

Some examples of expressions:

In Python



### Blocks: 

### Regions

### Attributes

### Dialects
Languages in xDSL are known as `dialects` - which is a grouping of related operations and attributes. For instance, 
- The `scf` (structured control flow) dialect groups together operations such as: for/while loops, reduce, yield, if/else statements, etc
- The `arith` (arithmetic) dialect groups together arithmetic operations, such as: comparison, int/floating point addition/multiplication, or/xor/and logical bitwise operations, and so on
- The `async` (asynchornous) dialect groups together operations commonly seen in async applications, as well as associated attributes (types).
    - Operations include: asynchronous calls, waits, async functions, etc
    - Attributes include: coroutine reference types, future types, etc

Dialects can be _combined together_, which makes it possible to define unions of multiple dialects, which has features from all of the dialects.

# DRAFT SECTION

# Constructs inside your toolkit 

- concepts
    - operations
    - dialects
    - regions
    - attributes (compile time data and types)

# MLIR/xDSL examples
Here's where the previous concepts come together

- An SQL-like representation
- A subset of Python, just for demonstration purposes
- Toy dialect
- Existing dialects, e.g. scf and arith

Would also be nice to have some examples of dialects being used together, e.g. the standard arith + if statements stuff?

# Transforming the IR
- E.g. implementation of optimisations on the SQL-like representation
- Implementation of a lowering, say from Toy? Or say, from Python to C++