-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Coding Style and Conventions
The Zeek codebase is quite old as code goes, with the original code being written in 1995. Some of the internal container types date from around 1987. This means that a large portion of it was written before a lot of modern C++ existed. As such, a lot of the early design decisions were made in that context. This guide strives to suggest modern techniques where possible, even when the existing code doesn't follow it to the letter. All new code should follow this guide. Old code should be updated to follow this guide when it is modified.
The base formatting for all Zeek C++ code, including C++ code inside BIF files, follows the Whitesmiths coding style
Zeek supports formatting the C++ code via clang-format
. It requires at least
clang-format
12.0.1 due to some additions that were made in that version to
better support the Whitesmiths style. Zeek also includes a pre-commit hook that
will check formatting for any commit to the master branch. This hook can also
be run manually by installing the pre-commit
python packet (using pip
),
and then running the pre-commit install
command. Once the hook is installed,
C++ files will be formatted automatically on commit. One can also explicitly
trigger running of clang-format for all staged files with pre-commit run clang-format
or on all files in the repository with pre-commit run -a clang-format
. See
pre-commit’s documentation for a more in-depth introduction.
Use tabs for indentation and spaces for alignment. An example for alignment is below.
class MyClass {
> void FunctionWithALongName(int argument1,
> ...........................int argument2)
> > {
> > if ( argument1 == 1 ||
> > .....argument2 == 2 )
> > > {
> > > DoSomething();
> > > DoSomethingMore();
> > > }
> > }
};
Tabs (represented by >
) are used to match indentation levels, and then then
spaces (represented by .
in the cases where they're used for alignment) are
used to match a particular column of the line above it.
Spaces should exist inside the outer parentheses for all control statements (conditional expressions), but not function calls. A space should also follow the keyword. For example:
if ( condition )
{
}
SomeFunction(arg1);
A space should exist after any not-operator. For example:
if ( ! condition )
{
}
Header files should always use the .h
extension. Implementation files
should always use the .cc
extension.
All headers should start with a #pragma once
line to guard against
duplicate includes. Avoid using #ifndef
/#define
include guards.
For multi-line blocks, braces should start on the line after the construct. Single-line blocks can remove the braces. For single-line function definitions the braces can be on the same line as the function definition, with a tab between the close of the function definition and the opening brace.
if ( true )
{
DoSomething();
DoSomethingElse();
}
bool Foo() { return false; }
Note that for single-line functions there should be a tab between the closing
)
and the opening {
.
- Type names (classes, enums, structs, etc) should always be
CamelCase
. - Class/struct methods should be
CamelCase
, though some exceptions are made. For example, Zeek classes that are similar enough to another class provided by the standard library,snake_case
may be used so they feel more familiar. Non-member functions tend to usesnake_case
. - Variable names, including member variables, should always be
snake_case
. - Prefer using more descriptive variable names, except for counter variables.
Include files in both headers and implementation files should be ordered as follows:
- In source (
.cc
files), the corresponding header for the source file - C includes such as
<unistd.h>
- C++ includes such as
<string>
and<vector>
- Includes from third-party or external sources, such as broker or src/3rdparty
- Local include headers from Zeek
- Generated files from the build, such as bif or pac headers
Further conventions include:
- Prefer to use the C++ version of headers rather than the C Standard version
(when writing C++, of course). E.g. use
<cstdio>
over<stdio.h>
. - Use angle braces around the file name for anything not coming directly from
the Zeek code base, e.g.
<string>
. This includes any system headers, any external libraries, and anything that can be referring to a file outside the code distribution, even if typically it does refer to a file within the Zeek source tree because it's embedded for convenience (e.g. Broker/CAF). - Use quotes around the file name for anything that always comes from the Zeek
code base and prefix them with the full path to the file starting with
zeek
. E.g."zeek/Val.h"
- Use forward declarations instead of including whenever possible.
Functions inside of header files should include doxygen-style comments,
including documentation for all parameters and return values. Implementation of
those methods in .cc
files do not need to include the comment. Example:
/**
* Recursively searches all (direct or indirect) childs of the
* analyzer for an analyzer with a specific ID.
*
* @param id The analyzer id to search. This is the ID that GetID()
* returns.
*
* @return The analyzer, or null if not found.
*/
virtual Analyzer* FindChild(ID id);
Non-obvious algorithms should include comments about what the code is doing to aid in later maintenance. Avoid writing comments for code where it is obvious what that code is doing.
Pointer and reference characters should associate with a type name rather than
the variable identifier. For example, use int* var
and not int *var
.
-
Use the ordering
public
->protected
->private
in class definitions for members. -
If the class includes
friend
methods, list those at the start of the class prior to the public block. -
Within each visibility block, use the following ordering for members:
- Static member functions
- Non-static member functions
- Static member variables
- Non-static member variables
-
Attempt to order member variables to avoid the compiler adding padding between them and bloating the size of the objects.
Zeek is in the process of migration to using two primary namespaces: zeek and zeek::detail. The zeek namespace is intended for interfaces that are used by external plugins and other code that expect a stable API. Any changes to this code will strictly follow the Zeek team's deprecation process. On the other hand, code in the zeek::detail namespace will not necessarily maintain a stable API. It is still available for use by external code, but the APIs included should not be relied upon to not change from release to release.
All of Zeek's headers will eventually live inside these two namespaces. As we are moving towards that world, we will follow our standard deprecation process for everything moving into zeek, meaning that we will initially keep aliases available inside the current global namespace. We will generally not keep aliases for anything moving into zeek::detail, although we may make exceptions in cases of substantial impact on external code. During the transition, everything still living only in the global namespace continues to follow the current semantics: No stability guarantees, but a "best effort" approach of not needlessly breaking APIs between releases.
Zeek may use C++ features up to and including those supported by the C++17 standard.
Avoid using exceptions for error handling. The primary reason to avoid them is that it makes error handling more difficult to reason about. Due to the nature of the reference counting in the Zeek code, exceptions will often cause the counting to be invalid unless handled very carefully.
Use C++-style casting (static_cast
, dynamic_cast
, reinterpret_cast
,
const_cast
) instead of bare C-style casts.
One artifact of the long life of this Zeek code is that a large number of the
strings created internally are plain char*
values. For new code, prefer
using std::string
or std::string_view
instead.
Single-argument constructors should be marked explicit
to aid in
type-checking.
Source files (*.cc
) may set up any namespace imports/aliases they find
convenient at any scope, including file scope. For example, they may choose to
do using namespace std
.
Header files (*.h
) should avoid, at file scope, anything that alters
namespaces or the name lookup process since it's usually not desirable for the
inclusion of a header to have those side effects. E.g. don't do things like
using namespace std
in a header file. However, it's acceptable to do this
inside function scopes should the implementation be defined in the header file.
Another artifact of the old Zeek code is that a large amount of variables,
functions, and constants are defined in the global namespace and then
extern
'd when needed in other places. Avoid adding any more to the global
namespace when possible. Prefer using constructs like the Singleton pattern or
static class members instead.
Follow the typical C++ best practices for parameter passing. Avoid passing
large objects by value, except in cases where the function can use move
semantics and the caller can use std::move
. For objects that will not be
modified by the function, pass by const-reference. For objects that may be
modified by the function, prefer making the argument a pointer instead of a
reference.
In new code, prefer using default initialization to set the values of member variables when they are defined in the header. Override the values in constructors only when necessary. For older code, use constructor initialization for consistency.
Member initialization lists should be attached to the class name, with a
space on either side of the :
. For example:
Type : var(value), var2(value2)
Members can move down to a second line if space is necessary.
The C++ Core Guidelines recommend using
statements over typedef
. There
are limited cases where typedef
is allowed, such as in code that might be
included by C files, but otherwise using
should be preferred.