Name	Name	Last commit message	Last commit date
parent directory ..
BUILD.bazel	BUILD.bazel
README.md	README.md
__init__.py	__init__.py
browse.py	browse.py
cindex.py	cindex.py
cpp-highlights.scm	cpp-highlights.scm
merge_decls.py	merge_decls.py
mod_scanner.bzl	mod_scanner.bzl
mod_scanner.py	mod_scanner.py
modules.yaml	modules.yaml
upload.py	upload.py

Modules POC

This folder contains a POC implementation of a module metrics tracker and enforcement. The following commands will run the scanner across the entire first-party codebase, and merge the results. All commands are assumed to run at the root of the checkout, inside of a correctly activated python virtual env.

Showing assigned and unassigned files

Run modules_poc/mod_scanner.py --dump-modules to produce a modules.yaml file in current directory. This file is a multi-level map from module name to team name to directory path to list of file names. For unassigned files it uses __NONE__ as the module name, and for unowned files it uses __NO_OWNER__ as the team, both of which conveniently sort first. For owned files it uses the part of the team-name after @10gen/ with - replaced with * to be friendlier to querying. In cases where multiple teams own a file, the file is duplicated to each team's list.

This file can be viewed directly in VSCode. The yaml plugin's breadcrumbs and folding are very helpful. yq (jq for yaml) is also a powerful tool. Here are a few examples using it, some of which produce enough output to be worth opening in vscode:

# list of teams
yq '[.[] | keys] | add | sort | unique[]' -r modules.yaml
# unassigned files owned by server-programmability
yq '.__NONE__.server_programmability' modules.yaml
# files owned by server-programmability across all modules (or lack thereof)
yq '.[] |= (.server_programmability | values)' modules.yaml
# assigned files owned by server-programmability outside of the core module
yq '.[] |= (.server_programmability | values) | del(.core) | del(.__NONE__)' modules.yaml
# assigned files owned by server-programmability in modules that don't start with core
yq '.[] |= (.server_programmability | values) |  with_entries(select(.key | startswith("core") | not)) | del(.__NONE__)' modules.yaml
# unowned files as a flat list
yq '.[].__NO_OWNER__ | values | to_entries | map("\(.key)/\(.value[])") | .[] ' modules.yaml -r | sort
# unowned files grouped by directory
yq '[.[].__NO_OWNER__ | to_entries? | .[]] | group_by(.key) | map({key: .[0].key, value: ([.[].value] | add | sort)}) | from_entries' modules.yaml

Running the scanner

This will build the merged_decls.json files in the current directory:

buildscripts/poetry_sync.sh # make sure the python env has the right packages installed
find bazel-out/ -name '*.mod_scanner_decls.json*' -delete # get rid of old data files
bazel build --config=mod-scanner  "//src/mongo/..."
python modules_poc/merge_decls.py

merge_decls.py takes an optional flag --intra-module if you want to include intra module accesses and declarations that are only used from within their module. Typically, you don't so it defaults to omitting them.

If you only wish to include the files linked in to a given executable, replace the bazel build command with the following commands:

TARGET="//src/mongo/db:mongod"
bazel cquery --config=mod-scanner "filter(//src/mongo, kind(cc_*, deps($TARGET)))"  | awk '{print $1}' > targets.file
bazel build --config=mod-scanner --target_pattern_file=targets.file

Browsing

Once you have produced a merged_decls.json file, you can browse it by running modules_poc/browse.py. It will show the available keybindings on the right, which can be toggled by pressing ?. If you are running from a VSCode or neovim terminal, you can press g to go to any location in your editor. You can also press p to toggle an embedded preview of the location the current line is currently on (you probably want to hide the help when doing this). You can press Tab ↹ to switch between the tree and preview.

The browser is primarily intended to assist in labeling public APIs, so the files are sorted with the most number of unlabeled declarations ("unknowns") first. Only declarations that are used outside of their module are counted and shown. You can search for a file by pressing f or press m to filter the files by module.

As an advanced feature, you can pass a custom file to browse.py and it will use it rather than the default merged_decls.json. It does need still to have the same shape as the original. This works best with [jq] filtering to do advanced filtering. For example, here is a command that will only show declarations where some TUs will only see a forward declaration from another module, and will assume that that module is the owner (we need to fix this):

./modules_poc/browse.py <(jq '[.[] | select(.other_mods)]' merged_decls.json)

In general, your jq query should be of the form [.[] | select( SOME QUERY )] to avoid breaking the format expectations. For more advanced analysis, using jq directly is a good idea.

Uploading

Run the following command to upload

python modules_poc/upload.py $MONGO_URI # fill this in

If the upload fails with an error connecting and you need to update the IP whitelist for your virtual workstation, curl -4 wtfismyip.com/text is a good way to see your public IP address

Note for implementers

You can also scan a single file which is useful when iterating on this. You can either pass it the same flags used to compile, or pass it just a cpp file and it will figure out the flags from your compile_commands.json. It will create a file called decls.yaml to the current directory when run this way.

python modules_poc/mod_scanner.py src/mongo/bson/bsonobj.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modules_poc

modules_poc

README.md

Modules POC

Showing assigned and unassigned files

Running the scanner

Browsing

Uploading

Note for implementers

Future Work

Files

modules_poc

Directory actions

More options

Directory actions

More options

Latest commit

History

modules_poc

Folders and files

parent directory

README.md

Modules POC

Showing assigned and unassigned files

Running the scanner

Browsing

Uploading

Note for implementers

Future Work