Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Redesign targets #2743

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nurmukhametov
Copy link
Collaborator

@nurmukhametov nurmukhametov commented Jan 26, 2024

Design document

The ISPC compiler comes equipped with a standard library, named stdlib.ispc. By design, it has to be provided automatically for every compiled ISPC program unless the opposite said by --nostdlib command-line flag. This automatic inclusion ensures that essential functions—ranging from mathematical operations, print capabilities for output, memory management, to data conversion—are readily available to the developer.

The ISPC compiler also has an internal built-in library composed of three distinct sections:

  • Dispatcher: This component directs calls to the appropriate implemenation based on the actual ISA.
  • Target Independent: This includes functions that are universal across all target ISA, such as print.
  • Target Dependent: This part contains functions optimized for specific target ISA, like avx2 or avx512.

The standard and builtin libraries have to be distributed with ISPC compiler. It should automatically provide them taking into consideration target ISA and OS if needed. This design proposes to distribute them alongside the compiler as LLVM bitcode files. This approach requires the build system to prepare the libraries during the ISPC build phase as follows:

  • Dispatching and Target-Dependent Parts: These are written on templated LLVM IR and instantiated by m4 to produce target dependent code for every target ISA for any supported width (by current build configuration). Then llvm-as generates corresponding bitcode files.
  • Target-Independent Part: This is written in C/C++ and compiled by clang into bitcode format for every supported OS by current build configuration.
  • Standard Library: This is written in ISPC language and compiled into LLVM bitcode format by ISPC compiler, which must be pre-built.

ISPC compiles programs that can invoke functions from its standard library. A significant enhancement in this redesign is the introduction of the stdlib.isph header file, which outlines the definitions for the standard library's functionalities. This header can be included in ISPC programs either explicitly by the programmer or implicitly, except when the --nostdlib flag is provided. Furthermore, the stdlib.isph header introduces explicit definitions for essential symbols — such as programIndex, programCount, PI, _have.* and etc. —which were previously generated internally by the compiler using the LLVM API. This change also opens the door to implementing certain standard library functionalities directly within the header file using templates, e.g., short vector arithmetics.

All associated bitcode files from both the standard and built-in libraries, as well as the stdlib.isph header file, must be distributed with the ISPC compiler binary. There are two principal approaches:

  1. Slim Binary: Distribute them alongside the ISPC binary within a designated directory structure.
  2. Whole Binary: Embed them into the ISPC binary itself.

The first approach, unlike the second, introduces extra files and directories. These are essential for the ISPC compiler to accurately locate the bitcode files and the stdlib.isph header. Consequently, when distributing or relocating the ISPC binary, it's crucial to maintain the relative positions of these resources to ensure the compiler's functionality

The second approach is similar to the current method, where target-dependent bitcode files are embedded directly into the ISPC binary. This has the advantage of not affecting the current distribution practices, meaning ISPC users can continue downloading and installing ISPC with their existing workflows and scripts without any modifications.

It's worth noting that these approaches are not mutually exclusive. Some users, particularly those in the open-source community who maintain ISPC packages for different Linux OSes, may prefer the first approach.

The build process for ISPC should be adjusted to include the following steps:

  1. Generation of all built-in library bitcode files.
  2. Compilation of the slim ISPC binary.
  3. Generation of standard library bitcode files by the slim ISPC binary.
  4. Embedding all bitcode files and header files into the slim ISPC binary to produce the whole ISPC binary.

The last step is optional. The installation process should support both the slim and whole binary scenarios.

Compilation process

The compilation process of an ISPC program is structured into the following steps:

  1. Adding declarations for both target-dependent and target-independent built-in functions to the ISPC symbol table.
  2. Translating the ISPC source code into LLVM IR, which includes:
    a. Implicitly including the stdlib.isph header if not explicitly included when needed,
    b. Running the preprocessor on the ISPC source,
    c. Parsing the ISPC source to create an internal AST representation,
    d. Generating LLVM IR from the AST.
  3. Adding definitions of standard library functions.
  4. Adding definitions of target-independent builtin functions.
  5. Adding definitions of target-dependent builtin functions.
  6. Adding dispatcher definitions, if required.
  7. Executing the optimization pipeline.
  8. Utilizing the backend to generate the target code.

Steps 1 through 7 progressively populate the LLVM module with the necessary code. Step 2a ensures that symbols from the standard library utilized in the ISPC source file are resolved. The inclusion of symbols in Step 1 facilitates the resolution of built-in symbols within the standard library, while Steps 3 through 6 provide the actual function bodies for these symbols. In multitarget compilation scenarios, all built-in and standard library functions are appended with an ISA-specific suffix.

Prior to Step 8, it's essential to refine the LLVM module to exclude unused functions and symbols. This is accomplished by linking only the required symbols into the resulting module and performing global dead code elimination. An exception is made for a subset of functions, primarily related to memory access (e.g., gather/scatter operations), which must be retained throughout the optimization pipeline due to potential late-stage optimization uses (e.g., in ImproveMemoryOps.cpp). These are preserved through references in the llvm.compiler.used symbol and removed in the final optimization step by the RemovePersistentFuncs pass.

*Note about Step 1: Currently, this step involves uplifting LLVM IR declarations to ISPC symbol definitions, a process which can be seen as somewhat fragile and implicit. For example, the following declarations has no sense:

  • varying unsigned int32 __min_varying_int32(varying unsigned int32 , varying unsigned int32)
  • varying int32 __max_varying_uint32(varying int32 , varying int32).

Adopting an explicit declarative approach, possibly by defining all built-in functions in a builtin.isph header, would make Step 1 unnecessary. Moreover, this change could allow for the migration of various declarations (_pseudo*) and other target-independent functions from built-in LLVM files to the ISPC standard library code, enhancing clarity and simplification.

Hierarchical implementation of target-dependent builtin functions.

Ideally, built-in functions should have target independent implementation. It can be very useful for initial support of new targets, e.g. RISC-V or others. Implementing these target-independent built-in functions could be achieved through LLVM IR, avoiding target-specific constructs (like certain intrinsics), or possibly through ISPC itself, although its feasibility in the current state is uncertain. This common baseline has to be good enough as the starting point for implementing target specific and performance critical pieces in incremental manner. Taking the X86 ISA as an example, the development path of built-in functions could follow a progression from a common implementation to more specialized ones: common -> sse2 -> sse4 -> avx -> avx2 -> avx512.

The compilation process for ISPC programs, as outlined, necessitates adjustments to accommodate a hierarchical implementation of target built-in functions. This adjustment specifically impacts Step 5, which now requires additional effort. To integrate built-in code effectively, the process should initiate with the most specialized ISA compatible with the user's request. Should this ISA resolve all symbols, the process can advance to Step 6. If unresolved symbols remain, the process must ascend to a more general ISA, continuing to resolve symbols iteratively until all are addressed by a common implementation.
It's important to highlight that ISPC currently employs a similar strategy, utilizing manually written m4-based macro templates. However, this method necessitates that developers fully define all target-specific built-in libraries, a task that can be both time-consuming and prone to errors. This issue is likely pronounced for fragmented, feature-based ISAs like RISC-V.

@nurmukhametov nurmukhametov force-pushed the redesign-targets branch 4 times, most recently from 0f9d5b2 to fc536d4 Compare February 1, 2024 14:03
@dbabokin
Copy link
Collaborator

dbabokin commented Feb 3, 2024

I'll be looking at it over the weekend, but it wouldn't hurt to clean up the commit history - there are commits that just fix build or formatting.

But the bigger thing - would be good to have a design document for that. It's not just for making the review easier, but also to ensure that ideas behind the specific design are crisp and solid, and all trade offs and compromises are well documented.

@nurmukhametov nurmukhametov force-pushed the redesign-targets branch 4 times, most recently from 7aa0012 to 6a73918 Compare February 12, 2024 13:09
Copy link
Collaborator

@aneshlya aneshlya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the PR looks good! Before merging I'd like to see:

  1. Design document
  2. all TODOs implemented.
  3. lit test checking the order and expected content after each Link* function - LinkDispatcher, LinkCommonBuiltins, LinkTargetBuiltins
  4. I believe that lSetAsInternal should disappear

std::string ISPCExecutableAbsPath = llvm::sys::fs::getMainExecutable(argv[0], (void *)(intptr_t)main);
llvm::SmallString<128> includeDir(ISPCExecutableAbsPath);
llvm::sys::path::remove_filename(includeDir);
llvm::sys::path::remove_filename(includeDir);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated line?

llvm::SmallString<128> filePath(g->shareDirPath);
llvm::sys::path::append(filePath, m_filename);
llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> bufferOrErr = llvm::MemoryBuffer::getFile(filePath.str());
// TODO! proper exit when load module fails
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO should be resolved before merging.

@@ -211,6 +211,7 @@ static void lPrintVersion() {
printf(" [--[no-]discard-value-names]\tDo not discard/Discard value names when generating LLVM IR\n");
printf(" [--dump-file[=<path>]]\t\tDump module IR to file(s) in "
"current directory, or to <path> if specified\n");
printf(" [--gen-stdlib]\t\tEnable special compilation mode to generate LLVM IR for stdlib.ispc.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it supposed to be used when generating builtins from stdlib.ispc only? If yes, I would add it to description. Ideally I would check that stdlib.ispc is passed as input file. Now it's easy to think that it's some helping debug mode to explore/dump stdlib before inserting it to the user's module and can be used like ispc --gen-stdlib -o test.ll --emit-llvm-text test.ispc

# Some builtins failed to build from the first time in parallel.
# It only happens with msbuild.
- cmd: |-
msbuild %ISPC_HOME%\build\builtins.vcxproj /p:Platform=x64 /p:Configuration=%configuration%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line not needed if you build if you build with /m:1 below?

@@ -64,7 +64,12 @@ for:
mkdir build && cd build
cmake .. -Thost=x64 -G %generator% -DCMAKE_BUILD_TYPE=%configuration% -DCMAKE_INSTALL_PREFIX=%ISPC_HOME%\install -DISPC_PREPARE_PACKAGE=ON -DM4_EXECUTABLE=C:\cygwin64\bin\m4.exe -DISPC_CROSS=ON -DISPC_GNUWIN32_PATH=%CROSS_TOOLS%\gnuwin32 -DWASM_ENABLED=%WASM% -DISPC_INCLUDE_BENCHMARKS=ON
build_script:
- cmd: msbuild %ISPC_HOME%\build\PACKAGE.vcxproj /p:Platform=x64 /p:Configuration=%configuration%
# Some builtins failed to build from the first time in parallel.
# It only happens with msbuild.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it happen in Github Actions?

Comment on lines +277 to +283
string(REGEX MATCH "^(xe|gen9)" isXe "${ispc_target}")
if (isXe)
set(target_arch "xe64")
if ("${bit}" STREQUAL "32")
continue()
endif()
endif()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's the same as line 269-274

@dbabokin
Copy link
Collaborator

Would you mind rebasing this branch?

Copy link
Collaborator

@dbabokin dbabokin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple observations while I'm still looking for general picture.

)

set(BITCODE_FOLDER ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/${CMAKE_CFG_INTDIR}/../share/ispc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked up CMAKE_CFG_INTDIR and it's Deprecated since version 3.21

stdlib.isph Outdated

void memory_barrier();

#define DEFINE_ATOMIC_OP(TA, TB, OPA, OPB, MASKTYPE, TC) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's not covered by clang-format.

@nurmukhametov
Copy link
Collaborator Author

But the bigger thing - would be good to have a design document for that. It's not just for making the review easier, but also to ensure that ideas behind the specific design are crisp and solid, and all trade offs and compromises are well documented.

See the design document added in the PR description.

@aneshlya
Copy link
Collaborator

I was thinking more about introducing of bitcode files to ISPC distribution and I see the following drawbacks:

  • Multiple versions mess up: It increases potential issues when people have several ISPC version installed on the system.
  • Security Risk: Exposed bitcode files might be more easily tampered with, posing a security risk.
  • Clutter: Can make the directory structure appear cluttered, and users might accidentally modify or delete necessary files.

But all of these can be more or less solved by wrapping them into static library.

But then I decided to look how people download and extract ISPC.
I reviewed the top 10 GitHub search results for "https://github.com/ispc/ispc/releases/download" and observed that users, including the OpenMoonray project, sometimes just copy bin/ispc directory in their installation scripts.

So changing of ISPC distribution model and adding a bunch of LLVM bitcode files in a share folder will break their scripts.

I suggest to list all pros/cons of having LLVM bitcode files separately vs embedded into the ISPC binary and see if the pros overcomes all the issues listed above.

@nurmukhametov
Copy link
Collaborator Author

I was thinking more about introducing of bitcode files to ISPC distribution and I see the following drawbacks:

  • Multiple versions mess up: It increases potential issues when people have several ISPC version installed on the system.
  • Security Risk: Exposed bitcode files might be more easily tampered with, posing a security risk.
  • Clutter: Can make the directory structure appear cluttered, and users might accidentally modify or delete necessary files.

My opinion is that the only actual drawback is security risk.

But then I decided to look how people download and extract ISPC. I reviewed the top 10 GitHub search results for "https://github.com/ispc/ispc/releases/download" and observed that users, including the OpenMoonray project, sometimes just copy bin/ispc directory in their installation scripts.

These examples of installing ISPC just look wrong.

Notwithstanding the above, I have addressed that in the design document by the following change:

All associated bitcode files from both the standard and builtin libraries, as well as the stdlib.isph header file, must be distributed with the ISPC compiler binary within a designated directory structure. This directory structure is essential for the ISPC compiler to accurately locate these resources. Consequently, when distributing or relocating the ISPC binary, the bitcode files, and the stdlib.isph header file have to maintain their relative positions intact to ensure the compiler functionality.

to

All associated bitcode files from both the standard and built-in libraries, as well as the stdlib.isph header file, must be distributed with the ISPC compiler binary. There are two principal approaches:

  1. Slim Binary: Distribute them alongside the ISPC binary within a designated directory structure.
  2. Whole Binary: Embed them into the ISPC binary itself.

The first approach, unlike the second, introduces extra files and directories. These are essential for the ISPC compiler to accurately locate the bitcode files and the stdlib.isph header. Consequently, when distributing or relocating the ISPC binary, it's crucial to maintain the relative positions of these resources to ensure the compiler's functionality

The second approach is similar to the current method, where target-dependent bitcode files are embedded directly into the ISPC binary. This has the advantage of not affecting the current distribution practices, meaning ISPC users can continue downloading and installing ISPC with their existing workflows and scripts without any modifications.

It's worth noting that these approaches are not mutually exclusive. Some users, particularly those in the open-source community who maintain ISPC packages for different Linux OSes, may prefer the first approach.

The build process for ISPC should be adjusted to include the following steps:

  1. Generation of all built-in library bitcode files.
  2. Compilation of the slim ISPC binary.
  3. Generation of standard library bitcode files by the slim ISPC binary.
  4. Embedding all bitcode files and header files into the slim ISPC binary to produce the whole ISPC binary.

The last step is optional. The installation process should support both the slim and whole binary scenarios.

The PR message contains the updated design document.

@nurmukhametov nurmukhametov mentioned this pull request Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants