Skip to content

Commit

Permalink
doc: update the section Performance tuning (#19530)
Browse files Browse the repository at this point in the history
  • Loading branch information
ArtemkaKun committed Oct 9, 2023
1 parent 04eb86c commit 04c8d45
Showing 1 changed file with 197 additions and 38 deletions.
235 changes: 197 additions & 38 deletions doc/docs.md
Expand Up @@ -6042,63 +6042,222 @@ To improve safety and maintainability, operator overloading is limited.
## Performance tuning
The generated C code is usually fast enough, when you compile your code
with `-prod`. There are some situations though, where you may want to give
additional hints to the compiler, so that it can further optimize some
blocks of code.
When compiled with `-prod`, V's generated C code usually performs well. However, in specialized
scenarios, additional compiler flags and attributes can further optimize the executable for
performance, memory usage, or size.
> **Note**
> These are *rarely* needed, and should not be used, unless you
> [!NOTE]
> These are *rarely* needed, and should not be used unless you
> *profile your code*, and then see that there are significant benefits for them.
> To cite gcc's documentation: "programmers are notoriously bad at predicting
> To cite GCC's documentation: "Programmers are notoriously bad at predicting
> how their programs actually perform".
`[inline]` - you can tag functions with `[inline]`, so the C compiler will
try to inline them, which in some cases, may be beneficial for performance,
but may impact the size of your executable.
| Tuning Operation | Benefits | Drawbacks |
|--------------------------|---------------------------------|---------------------------------------------------|
| `[inline]` | Performance | Increased executable size |
| `[direct_array_access]` | Performance | Safety risks |
| `[packed]` | Memory usage | Potential performance loss |
| `[minify]` | Performance, Memory usage | May break binary serialization/reflection |
| `_likely_/_unlikely_` | Performance | Risk of negative performance impact |
| `-skip-unused` | Performance, Compile time, Size | Potential instability |
| `-fast-math` | Performance | Risk of incorrect mathematical operations results |
| `-d no_segfault_handler` | Compile time, Size | Loss of segfault trace |
| `-cflags -march=native` | Performance | Risk of reduced CPU compatibility |
| `PGO` | Performance, Size | Usage complexity |
### Tuning operations details
#### `[inline]`
You can tag functions with `[inline]`, so the C compiler will try to inline them, which in some
cases, may be beneficial for performance, but may impact the size of your executable.
**When to Use**
`[direct_array_access]` - in functions tagged with `[direct_array_access]`
the compiler will translate array operations directly into C array operations -
omitting bounds checking. This may save a lot of time in a function that iterates
over an array but at the cost of making the function unsafe - unless
the boundaries will be checked by the user.
- Functions that are called frequently in performance-critical loops.
`if _likely_(bool expression) {` this hints the C compiler, that the passed
boolean expression is very likely to be true, so it can generate assembly
code, with less chance of branch misprediction. In the JS backend,
that does nothing.
**When to Avoid**
`if _unlikely_(bool expression) {` similar to `_likely_(x)`, but it hints that
the boolean expression is highly improbable. In the JS backend, that does nothing.
- Large functions, as it might cause code bloat and actually decrease performance.
- Large functions in `if` expressions - may have negative impact on instructions cache.
<a id='Reflection via codegen'>
#### `[direct_array_access]`
### Memory usage optimization
In functions tagged with `[direct_array_access]` the compiler will translate array operations
directly into C array operations - omitting bounds checking. This may save a lot of time in a
function that iterates over an array but at the cost of making the function unsafe - unless the
boundaries will be checked by the user.
V offers these attributes related to memory usage
that can be applied to a structure type: `[packed]` and `[minify]`.
These attributes affect memory layout of a structure, potentially leading to reduced
cache/memory usage and improved performance.
**When to Use**
- In tight loops that access array elements, where bounds have been manually verified or you are
sure that the access index will be valid.
**When to Avoid**
- Everywhere else.
#### `[packed]`
The `[packed]` attribute can be added to a structure to create an unaligned memory layout,
which decreases the overall memory footprint of the structure.
which decreases the overall memory footprint of the structure. Using the `[packed]` attribute
may negatively impact performance or even be prohibited on certain CPU architectures.
> **Note**
> Using the [packed] attribute may negatively impact performance
> or even be prohibited on certain CPU architectures.
> Only use this attribute if minimizing memory usage is crucial for your program
> and you're willing to sacrifice performance.
**When to Use**
- When memory usage is more critical than performance, e.g., in embedded systems.
**When to Avoid**
- On CPU architectures that do not support unaligned memory access or when high-speed memory access
is needed.
#### `[minify]`
The `[minify]` attribute can be added to a struct, allowing the compiler to reorder the fields
in a way that minimizes internal gaps while maintaining alignment.
The `[minify]` attribute can be added to a struct, allowing the compiler to reorder the fields in
a way that minimizes internal gaps while maintaining alignment. Using the `[minify]` attribute may
cause issues with binary serialization or reflection. Be mindful of these potential side effects
when using this attribute.
> **Note**
> Using the `[minify]` attribute may cause issues with binary serialization or reflection.
> Be mindful of these potential side effects when using this attribute.
**When to Use**
- When you want to minimize memory usage and you're not using binary serialization or reflection.
**When to Avoid**
- When using binary serialization or reflection, as it may cause unexpected behavior.
#### `_likely_/_unlikely_`
`if _likely_(bool expression) {` - hints to the C compiler, that the passed boolean expression is
very likely to be true, so it can generate assembly code, with less chance of branch misprediction.
In the JS backend, that does nothing.
`if _unlikely_(bool expression) {` is similar to `_likely_(x)`, but it hints that the boolean
expression is highly improbable. In the JS backend, that does nothing.
**When to Use**
- In conditional statements where one branch is clearly more frequently executed than the other.
**When to Avoid**
- When the prediction can be wrong, as it might cause a performance penalty due to branch
misprediction.
#### `-skip-unused`
This flag tells the V compiler to omit code that is not needed in the final executable to run your
program correctly. This will remove unneeded `const` arrays allocations and unused functions
from the code in the generated executable.
This flag will be on by default in the future when its implementation will be stabilized and all
severe bugs will be found.
**When to Use**
- For production builds where you want to reduce the executable size and improve runtime
performance.
**When to Avoid**
- Where it doesn't work for you.
#### `-fast-math`
This flag enables optimizations that disregard strict compliance with the IEEE standard for
floating-point arithmetic. While this could lead to faster code, it may produce incorrect or
less accurate mathematical results.
The full specter of math operations that `-fast-math` affects can be found
[here](https://clang.llvm.org/docs/UsersManual.html#cmdoption-ffast-math).
**When to Use**
- In applications where performance is more critical than precision, like certain graphics
rendering tasks.
**When to Avoid**
- In applications requiring strict mathematical accuracy, such as scientific simulations or
financial calculations.
#### `-d no_segfault_handler`
Using this flag omits the segfault handler, reducing the executable size and potentially improving
compile time. However, in the case of a segmentation fault, the output will not contain stack trace
information, making debugging more challenging.
**When to Use**
- In small, well-tested utilities where a stack trace is not essential for debugging.
**When to Avoid**
- In large-scale, complex applications where robust debugging is required.
#### `-cflags -march=native`
This flag directs the C compiler to generate instructions optimized for the host CPU. This can
improve performance but will produce an executable incompatible with other/older CPUs.
**When to Use**
- When the software is intended to run only on the build machine or in a controlled environment
with identical hardware.
**When to Avoid**
- When distributing the software to users with potentially older CPUs.
#### PGO (Profile-Guided Optimization)
PGO allows the compiler to optimize code based on its behavior during sample runs. This can improve
performance and reduce the size of the output executable, but it adds complexity to the build
process.
**When to Use**
- For performance-critical applications where the added build complexity is justifiable.
**When to Avoid**
- For small, short-lived, or rapidly-changing projects where the added build complexity isn't
justified.
**PGO with Clang**
This is an example bash script you can use to optimize your CLI V program without user interactions.
In most cases, you will need to change this script to make it suitable for your particular program.
```bash
#!/bin/bash
# Get the full path to the current directory
CUR_DIR=$(pwd)
# Remove existing PGO data
rm -f *.profraw
rm -f default.profdata
# Initial build with PGO instrumentation
v -cc clang -skip-unused -prod -cflags -fprofile-generate -o pgo_gen .
# Run the instrumented executable 10 times
for i in {1..10}; do
./pgo_gen
done
# Merge the collected data
llvm-profdata merge -o default.profdata *.profraw
# Compile the optimized version using the PGO data
v -cc clang -skip-unused -prod -cflags "-fprofile-use=${CUR_DIR}/default.profdata" -o optimized_program .
# Remove PGO data and instrumented executable
rm *.profraw
rm pgo_gen
```
## Atomics
Expand Down

0 comments on commit 04c8d45

Please sign in to comment.