Settings class with support for configuration file and more #167

jherland · 2023-02-20T01:27:24Z

This series adds a lot of stuff, and probably makes most sense to read one commit at a time.

The series starts in earnest with the 4th commit, and everything up to and incl. the 9th commit contains the core of the work. I consider the last three commits natural followup-work that may or may not be important for the public release.

tests/conftest.py: Remove redundant parentheses
test_real_project: Fix inaccuracy in docstring
test_cmdline: Minor move of an underscore
settings: Introduce Settings class for holding FawltyDeps settings
main: Begin using Settings in main() and Analysis
compare_imports_to_dependencies(): Take Settings arguemnt
Move command-line parser setup into Settings class
Implement proper cascade of settings
Add --config-file argument to allow user to say where FD config is found
Store Settings objects inside Analysis (and include in JSON output)
settings: Add OutputFormat enum
Add --detailed and --summary for controlling output verbosity
- This commit resolves/relates to Decouple verbosity level of the logs and FD printed output #111, although we might want to discuss this solution further, and possible even defer this part until later.

jherland · 2023-02-20T17:44:17Z

(feel free to skip reading this rant/braindump about the intricacies of pydantic)

As a side-note I can comment that I'm only about 80-90% happy with building our settings on top of the BaseSettings framework from pydantic. It is clearly intended for a slightly different use case (e.g. configuring an server-style application running inside a Docker container where you need to pass in secrets, etc.).

Don't get me wrong, there are a lot of pros here:

BaseSettings gives us a nicely configurable prioritized cascade (i.e. which sources override which other sources)
We get configuration via environment variables "for free". This required zero effort on my part, and Just Works(tm)
The automatic validation/coercion is nice, and means that I did not have to think about how to e.g. represent sets of enums inside pyproject.toml or worse: inside environment variable values 😧 -> 😌
It was pleasantly easy to add a new/custom configuration file source. The PyprojectTomlSettingsSource I ended up writing is very straightforward.
Did I mention how nice the automatic validation/coercion is here? By giving our enums string values, the user can simply use those values in their configuration, e.g. actions = ["list_imports", "list_deps"] in pyproject.toml, and it automatically ends up as .actions = {Action.LIST_IMPORTS, Action.LIST_DEPS}`` in our Settings` object! 🎉

However, here are some of the sharp corners I bumped into:

Customizing JSON encoding of object members that end up being fairly deeply nested in the generated JSON is not as easy as I'd hoped. Attaching configuration on the members and/or their immediate class does not work when the JSON serialization "starts" higher up on the Analysis object itself. (This is general to pydantic's BaseModel, not specifically it's BaseSettings.)
There is no automatic integration with command-line parsing, neither in the setup of available options, nor in the parsing and setup of the resulting Settings object. Currrently, I've put it all in the same class, just to vaguely co-locate things, but really I'd like to be able to attach configuration for the command-line parser directly on the Settings members themselves.
- A tricky detail here is to remember to suppress command-line options so command-line defaults don't come in and override the underlying configuration.
I'd like to pull the Settings class apart into two classes, one base class that contains general adaptations for writing a command-line application (general command-line parsing, parsing config from TOML file, etc.), and a concrete class for the specific case of FawltyDeps, with our settings, and CLI options. However, this is not easy/possible to do because of the way pydantic attaches configuration via nested Config classes. I'd like to place that config class inside the base class, but then it won't apply(?) to the concrete case for FD.
Related to the above point, I'm not happy with how much of pydantic's "magic" happens at/before instance instantiation time. For example, in order to make the config file configurable (i.e. the --config-file argument), I have to store it as a class var on Settings, and then reach in and mutate this before I can instantiate a Settings object (which will kick of the automagical reading of config sources and validation). There is a borderline uncomfortable amount of magic happening at this early stage of an object's lifetime. In short, ~100% of settings.py is concerned with what happens before we ever get to Settings.__init__(). After we get to Settings.__init__(), though, everything is butter smooth: we have an immutable dataclass-like container for our application settings.
A future problem is how to properly document the configuration directives, something from which we could generate a webpage like this.

mknorps · 2023-02-21T08:33:10Z

Thank you @jherland for this detailed description of pros and cons of using pydantic. Combined with your previous story about dataclass and json representation would make a very interesting read for a broader group of people.

The magic and the level of needed nesting of configuration do not sound reassuring :(

To rephrase the main message: BaseSetting works well for integration with pyproject.toml and env variables, but is bumpy when it comes to cli. Do I understand it correctly?

jherland · 2023-02-21T09:26:29Z

The magic and the level of needed nesting of configuration do not sound reassuring :(

Yeah, it means that the code does not look as nice as I would have hoped. That said, I feel reasonably confident that it all works. And the immutability of the resulting Settings object means that once it is initialized, the rest of its lifetime/behavior is very simple to reason about. Thus, I am reasonably confident that this complexity is isolated to the Settings class itself, and won't spill over into the rest of our code.

To rephrase the main message: BaseSetting works well for integration with pyproject.toml and env variables, but is bumpy when it comes to cli. Do I understand it correctly?

Yeah, that is a fairly good summary. To be clear, I am still happy to be using BaseSettings as it certainly makes some things a lot simpler. But I still thought it worth sharing that not everything is smooth sailing.

mknorps

This is exquisite work!
Congratulation @jherland 🎉

This PR is a big one and introduces a big refactor to the code. I did not know BaseSettings before. They look like a good fit here!
As you did thorough research before, and I think the solution is well designed, I focused on the following things in the review:

Will I know where to add a new settings option?
YES! There is only one place where I add a new option. Of course to be able to use it is good to have the option to be passed from the command line. The details of what this setting does are to be implemented in main.
Will I know how to add new parser argument?
YES. Depending if it is one of the mutually exclusive actions of other options it is passed in a Settings class method where CLI arguments are added to the arg parser. This is done clean and modular.
My only remark here is that setup_cmdline_parser could take the Settings as input. Then setup_cmdline_parser and the parser populators could be taken out of the class (not to bloat the class too much). Unless there is a strong case why they should be inside the Settings class.
Is Analysis simplified and all experiment settings are in one place?
YES
Do Settings have sensible defaults?
YES
Can we read from TOML configuration?
YES. This is set in customize_source and uses PyprojectTomlSettingsSource class which reads settings from a pyproject.toml.file.
Is the behavior CLI options -> TOML configuration -> defaults preserved
YES. As proved in tests and I also tested this behavior using CLI in the branch.
Does FD work as before when I run it?
YES. Tested CLI. No surprises there.
Is the logging level and output format separated?
YES. Tested CLI.

This PR is very good and is almost ready to launch. I added some comments and remarks, which you may include in the final version r just create issues out of them. The small amendment I would like to see is about report format CLI options - to make them mutually exclusive.

fawltydeps/check.py

fawltydeps/main.py

fawltydeps/settings.py

tests/test_cmdline.py

fawltydeps/settings.py

Nour-Mws

I had one pass over this and honestly, chapeau bas Johan and thank you for the great work!
I added a couple of comments and will have another - hopefully quicker - pass before approving (I feel like I couldn't grasp everything in one go).

fawltydeps/settings.py

tests/conftest.py

fawltydeps/check.py

tests/test_cmdline.py

mknorps

Great work 🚀

It answers to several UX problems raised by our testers :)

The new settings module provides a Settings class that builds on top of the settings management framework provided by pydantic[1]. The Settings class will hold the entirety of FawltyDeps' configuration, as parsed from: - command-line options - fawltydeps_* environment variables - directives in the [tool.fawltydeps] section in pyproject.toml - hardcoded defaults that are used when not overridden by the above By using pydantic's BaseSettings for our Settings, we get automatic validation/coercion of the members of our Settings object. For this to work properly, we also need to switch the order of SpecialPath and Path in our PathOrSpecial union: When pydantic validates/coerces a union member, it will use the first type in the union that succeeds. With Path first, Location("<stdin>") would be automatically coerced to Location(Path("<stdin>")), which is NOT what we want. Putting SpecialPath (= Literal["<stdin>"]) first in the union solves this. [1]: https://docs.pydantic.dev/usage/settings/

Build a Settings object from the command-line args, and use that instead of passing individual command-line args around. This refactoring merely changes _how_ we pass the settings around. There is no change to the settings themselves, since the Settings object is still fully determined by what's passed on the command-line. (Reading settings from a config file or the environment will follow later.)

We used to pass separate args for the --ignore-unused and --ignore-undeclared args from the command line, but with this commit we instead take a single Settings argument. This also makes it easier to pass more options into this part of the code later, if we need to.

The Settings class takes over the management of (most) command-line arguments. This is in preparation for the Settings module having to "own" the command-line parsing in order to ensure the proper layering/ cascading of command-line arguments on top of environment variables, configuration file directives, and the hardcoded defaults in the Settings class. We leave the possibility for the `main` module to add its own command-line arguments for things that do not end up in the Settings module. For now, this is limited to the -V/--version argument.

Until now, all members of the Settings object were determined by the command-line parser, which used hardcoded default. This means that we never included any settings whatsoever from the environment or the configuration file. Fix this by changing the command-line parsing so that only arguments that are actually _passed_ on the command line are passed on to the Settings constructor. This will cause settings from the environment or the configuration file to be applied, since they are no longer always overridden by the command line.

Before this commit, we have added functionality to allow passing setting via a config file, but the configuration file itself has always defaulted to 'None', i.e. meaning that no config file is actually read by our command line application. This is the first commit for which FawltyDeps will actually read settings from a configuration file. The file is controlled with the new --config-file argument, and defaults to pyproject.toml in the current directory. In all tests that run FD in a subprocess, set --config-file=/dev/null by default to prevent our own pyproject.toml from "infecting" the tests.

Until now, we have stored the set of requested actions inside the Analysis object, and hidden it from the JSON output. Now that we have a structured Settings object that encapsulates all of our configuration (including the requested actions), it makes sense to store that in the Analysis object instead. Also expose this object in the JSON output. This makes the JSON output considerably more verbose, but also makes it much easier to debug when there might be retroactive questions about what FawltyDeps was asked to do. In order to include the entire Settings object in the JSON output and successfully test this, we need the JSON serialization of our Settings object to be stable. Fortunately it's fairly easy to customize the pydantic JSON encoder to use sorted() instead of list() when serializing (frozen)set objects.

This prepares for refactoring how we determine output format

This changes -v/--verbose and -q/--quiet back to _only_ controlling the log level. In the command-line help message, add a clearly delineated group for choosing the output format. Add tests to verify that --detailed/--summary are independent of -v/-q: - give summary description with debug-level logging - give detailed description with quiet logging Improved-by: Maria Knorps <maria.knorps@tweag.io>

We want the Settings-related command-line options to be co-located with the Settings class, as the two likely need to be changed together, when we add/modify settings to FawltyDeps. Still, they don't necessarily need to be part of the same _class_. This makes the Settings class somewhat smaller. Suggested-by: Maria Knorps <maria.knorps@tweag.io>

jherland added this to the First public release milestone Feb 20, 2023

jherland force-pushed the jherland/settings-introduction branch from 15f38a6 to e168f1b Compare February 20, 2023 15:02

jherland marked this pull request as ready for review February 20, 2023 15:12

jherland requested review from Nour-Mws, mknorps and vreuter February 20, 2023 15:12

jherland self-assigned this Feb 20, 2023

jherland linked an issue Feb 20, 2023 that may be closed by this pull request

Configure FawltyDeps via pyproject.toml (in addition to command-line interface) #54

Closed

jherland mentioned this pull request Feb 20, 2023

Add FD version number to JSON output #172

Merged

jherland force-pushed the jherland/settings-introduction branch from e168f1b to ae0970c Compare February 20, 2023 18:14

jherland mentioned this pull request Feb 20, 2023

Add manual specification for dependency declaration parser #165

Merged

mknorps requested changes Feb 21, 2023

View reviewed changes

Nour-Mws reviewed Feb 21, 2023

View reviewed changes

fawltydeps/settings.py Show resolved Hide resolved

tests/conftest.py Show resolved Hide resolved

fawltydeps/check.py Show resolved Hide resolved

tests/test_cmdline.py Outdated Show resolved Hide resolved

tests/test_cmdline.py Outdated Show resolved Hide resolved

jherland force-pushed the jherland/settings-introduction branch from ae0970c to a871ea2 Compare February 22, 2023 11:07

jherland requested a review from mknorps February 22, 2023 17:35

mknorps approved these changes Feb 23, 2023

View reviewed changes

jherland added 3 commits February 23, 2023 12:42

tests/conftest.py: Remove redundant parentheses

01d1cff

test_real_project: Fix inaccuracy in docstring

e4fa17d

test_cmdline: Minor move of an underscore

1f9e91d

Nour-Mws approved these changes Feb 23, 2023

View reviewed changes

jherland added 7 commits February 23, 2023 14:43

jherland added 3 commits February 23, 2023 14:43

settings: Add OutputFormat enum

80e56de

This prepares for refactoring how we determine output format

jherland force-pushed the jherland/settings-introduction branch from 0dfe4fc to da5877b Compare February 23, 2023 13:43

jherland merged commit fc1815c into main Feb 23, 2023

jherland deleted the jherland/settings-introduction branch February 23, 2023 13:45

jherland mentioned this pull request Feb 23, 2023

Decouple verbosity level of the logs and FD printed output #111

Closed

This was referenced May 11, 2023

Support multiple --pyenv options #321

Merged

Reconsider how multi-valued options combine between the config file and the command-line? #328

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Settings class with support for configuration file and more #167

Settings class with support for configuration file and more #167

jherland commented Feb 20, 2023 •

edited

Loading

jherland commented Feb 20, 2023

mknorps commented Feb 21, 2023

jherland commented Feb 21, 2023

mknorps left a comment

Nour-Mws left a comment

mknorps left a comment

Settings class with support for configuration file and more #167

Settings class with support for configuration file and more #167

Conversation

jherland commented Feb 20, 2023 • edited Loading

jherland commented Feb 20, 2023

mknorps commented Feb 21, 2023

jherland commented Feb 21, 2023

mknorps left a comment

Choose a reason for hiding this comment

Nour-Mws left a comment

Choose a reason for hiding this comment

mknorps left a comment

Choose a reason for hiding this comment

jherland commented Feb 20, 2023 •

edited

Loading