Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Settings class with support for configuration file and more #167

Merged
merged 13 commits into from
Feb 23, 2023

Conversation

jherland
Copy link
Member

@jherland jherland commented Feb 20, 2023

This series adds a lot of stuff, and probably makes most sense to read one commit at a time.

The series starts in earnest with the 4th commit, and everything up to and incl. the 9th commit contains the core of the work. I consider the last three commits natural followup-work that may or may not be important for the public release.

  • tests/conftest.py: Remove redundant parentheses
  • test_real_project: Fix inaccuracy in docstring
  • test_cmdline: Minor move of an underscore
  • settings: Introduce Settings class for holding FawltyDeps settings
  • main: Begin using Settings in main() and Analysis
  • compare_imports_to_dependencies(): Take Settings arguemnt
  • Move command-line parser setup into Settings class
  • Implement proper cascade of settings
  • Add --config-file argument to allow user to say where FD config is found
  • Store Settings objects inside Analysis (and include in JSON output)
  • settings: Add OutputFormat enum
  • Add --detailed and --summary for controlling output verbosity

@jherland jherland added this to the First public release milestone Feb 20, 2023
@jherland jherland force-pushed the jherland/settings-introduction branch from 15f38a6 to e168f1b Compare February 20, 2023 15:02
@jherland jherland marked this pull request as ready for review February 20, 2023 15:12
@jherland jherland self-assigned this Feb 20, 2023
@jherland
Copy link
Member Author

(feel free to skip reading this rant/braindump about the intricacies of pydantic)

As a side-note I can comment that I'm only about 80-90% happy with building our settings on top of the BaseSettings framework from pydantic. It is clearly intended for a slightly different use case (e.g. configuring an server-style application running inside a Docker container where you need to pass in secrets, etc.).

Don't get me wrong, there are a lot of pros here:

  • BaseSettings gives us a nicely configurable prioritized cascade (i.e. which sources override which other sources)
  • We get configuration via environment variables "for free". This required zero effort on my part, and Just Works(tm)
  • The automatic validation/coercion is nice, and means that I did not have to think about how to e.g. represent sets of enums inside pyproject.toml or worse: inside environment variable values 😧 -> 😌
  • It was pleasantly easy to add a new/custom configuration file source. The PyprojectTomlSettingsSource I ended up writing is very straightforward.
  • Did I mention how nice the automatic validation/coercion is here? By giving our enums string values, the user can simply use those values in their configuration, e.g. actions = ["list_imports", "list_deps"] in pyproject.toml, and it automatically ends up as .actions = {Action.LIST_IMPORTS, Action.LIST_DEPS}`` in our Settings` object! 🎉

However, here are some of the sharp corners I bumped into:

  • Customizing JSON encoding of object members that end up being fairly deeply nested in the generated JSON is not as easy as I'd hoped. Attaching configuration on the members and/or their immediate class does not work when the JSON serialization "starts" higher up on the Analysis object itself. (This is general to pydantic's BaseModel, not specifically it's BaseSettings.)
  • There is no automatic integration with command-line parsing, neither in the setup of available options, nor in the parsing and setup of the resulting Settings object. Currrently, I've put it all in the same class, just to vaguely co-locate things, but really I'd like to be able to attach configuration for the command-line parser directly on the Settings members themselves.
    • A tricky detail here is to remember to suppress command-line options so command-line defaults don't come in and override the underlying configuration.
  • I'd like to pull the Settings class apart into two classes, one base class that contains general adaptations for writing a command-line application (general command-line parsing, parsing config from TOML file, etc.), and a concrete class for the specific case of FawltyDeps, with our settings, and CLI options. However, this is not easy/possible to do because of the way pydantic attaches configuration via nested Config classes. I'd like to place that config class inside the base class, but then it won't apply(?) to the concrete case for FD.
  • Related to the above point, I'm not happy with how much of pydantic's "magic" happens at/before instance instantiation time. For example, in order to make the config file configurable (i.e. the --config-file argument), I have to store it as a class var on Settings, and then reach in and mutate this before I can instantiate a Settings object (which will kick of the automagical reading of config sources and validation). There is a borderline uncomfortable amount of magic happening at this early stage of an object's lifetime. In short, ~100% of settings.py is concerned with what happens before we ever get to Settings.__init__(). After we get to Settings.__init__(), though, everything is butter smooth: we have an immutable dataclass-like container for our application settings.
  • A future problem is how to properly document the configuration directives, something from which we could generate a webpage like this.

@mknorps
Copy link
Collaborator

mknorps commented Feb 21, 2023

Thank you @jherland for this detailed description of pros and cons of using pydantic. Combined with your previous story about dataclass and json representation would make a very interesting read for a broader group of people.

The magic and the level of needed nesting of configuration do not sound reassuring :(

To rephrase the main message: BaseSetting works well for integration with pyproject.toml and env variables, but is bumpy when it comes to cli. Do I understand it correctly?

@jherland
Copy link
Member Author

The magic and the level of needed nesting of configuration do not sound reassuring :(

Yeah, it means that the code does not look as nice as I would have hoped. That said, I feel reasonably confident that it all works. And the immutability of the resulting Settings object means that once it is initialized, the rest of its lifetime/behavior is very simple to reason about. Thus, I am reasonably confident that this complexity is isolated to the Settings class itself, and won't spill over into the rest of our code.

To rephrase the main message: BaseSetting works well for integration with pyproject.toml and env variables, but is bumpy when it comes to cli. Do I understand it correctly?

Yeah, that is a fairly good summary. To be clear, I am still happy to be using BaseSettings as it certainly makes some things a lot simpler. But I still thought it worth sharing that not everything is smooth sailing.

Copy link
Collaborator

@mknorps mknorps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exquisite work!
Congratulation @jherland 🎉

This PR is a big one and introduces a big refactor to the code. I did not know BaseSettings before. They look like a good fit here!
As you did thorough research before, and I think the solution is well designed, I focused on the following things in the review:

  1. Will I know where to add a new settings option?
    YES! There is only one place where I add a new option. Of course to be able to use it is good to have the option to be passed from the command line. The details of what this setting does are to be implemented in main.

  2. Will I know how to add new parser argument?
    YES. Depending if it is one of the mutually exclusive actions of other options it is passed in a Settings class method where CLI arguments are added to the arg parser. This is done clean and modular.
    My only remark here is that setup_cmdline_parser could take the Settings as input. Then setup_cmdline_parser and the parser populators could be taken out of the class (not to bloat the class too much). Unless there is a strong case why they should be inside the Settings class.

  3. Is Analysis simplified and all experiment settings are in one place?
    YES

  4. Do Settings have sensible defaults?
    YES

  5. Can we read from TOML configuration?
    YES. This is set in customize_source and uses PyprojectTomlSettingsSource class which reads settings from a pyproject.toml.file.

  6. Is the behavior CLI options -> TOML configuration -> defaults preserved
    YES. As proved in tests and I also tested this behavior using CLI in the branch.

  7. Does FD work as before when I run it?
    YES. Tested CLI. No surprises there.

  8. Is the logging level and output format separated?
    YES. Tested CLI.

This PR is very good and is almost ready to launch. I added some comments and remarks, which you may include in the final version r just create issues out of them. The small amendment I would like to see is about report format CLI options - to make them mutually exclusive.

fawltydeps/check.py Show resolved Hide resolved
fawltydeps/main.py Show resolved Hide resolved
fawltydeps/settings.py Outdated Show resolved Hide resolved
fawltydeps/settings.py Show resolved Hide resolved
tests/test_cmdline.py Show resolved Hide resolved
fawltydeps/settings.py Show resolved Hide resolved
fawltydeps/settings.py Show resolved Hide resolved
Copy link
Collaborator

@Nour-Mws Nour-Mws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had one pass over this and honestly, chapeau bas Johan and thank you for the great work!
I added a couple of comments and will have another - hopefully quicker - pass before approving (I feel like I couldn't grasp everything in one go).

fawltydeps/settings.py Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
fawltydeps/check.py Show resolved Hide resolved
tests/test_cmdline.py Outdated Show resolved Hide resolved
tests/test_cmdline.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@mknorps mknorps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work 🚀

It answers to several UX problems raised by our testers :)

The new settings module provides a Settings class that builds on top of
the settings management framework provided by pydantic[1].

The Settings class will hold the entirety of FawltyDeps' configuration,
as parsed from:

- command-line options
- fawltydeps_* environment variables
- directives in the [tool.fawltydeps] section in pyproject.toml
- hardcoded defaults that are used when not overridden by the above

By using pydantic's BaseSettings for our Settings, we get automatic
validation/coercion of the members of our Settings object.

For this to work properly, we also need to switch the order of
SpecialPath and Path in our PathOrSpecial union: When pydantic
validates/coerces a union member, it will use the first type in the
union that succeeds. With Path first, Location("<stdin>") would be
automatically coerced to Location(Path("<stdin>")), which is NOT what
we want. Putting SpecialPath (= Literal["<stdin>"]) first in the union
solves this.

[1]: https://docs.pydantic.dev/usage/settings/
Build a Settings object from the command-line args, and use that instead
of passing individual command-line args around.

This refactoring merely changes _how_ we pass the settings around. There
is no change to the settings themselves, since the Settings object is
still fully determined by what's passed on the command-line. (Reading
settings from a config file or the environment will follow later.)
We used to pass separate args for the --ignore-unused and
--ignore-undeclared args from the command line, but with this commit we
instead take a single Settings argument.

This also makes it easier to pass more options into this part of the
code later, if we need to.
The Settings class takes over the management of (most) command-line
arguments. This is in preparation for the Settings module having to
"own" the command-line parsing in order to ensure the proper layering/
cascading of command-line arguments on top of environment variables,
configuration file directives, and the hardcoded defaults in the
Settings class.

We leave the possibility for the `main` module to add its own
command-line arguments for things that do not end up in the Settings
module. For now, this is limited to the -V/--version argument.
Until now, all members of the Settings object were determined by the
command-line parser, which used hardcoded default. This means that we
never included any settings whatsoever from the environment or the
configuration file.

Fix this by changing the command-line parsing so that only arguments
that are actually _passed_ on the command line are passed on to the
Settings constructor. This will cause settings from the environment or
the configuration file to be applied, since they are no longer always
overridden by the command line.
Before this commit, we have added functionality to allow passing setting
via a config file, but the configuration file itself has always
defaulted to 'None', i.e. meaning that no config file is actually read
by our command line application.

This is the first commit for which FawltyDeps will actually read
settings from a configuration file. The file is controlled with the new
--config-file argument, and defaults to pyproject.toml in the current
directory.

In all tests that run FD in a subprocess, set --config-file=/dev/null by
default to prevent our own pyproject.toml from "infecting" the tests.
Until now, we have stored the set of requested actions inside the
Analysis object, and hidden it from the JSON output.

Now that we have a structured Settings object that encapsulates all of
our configuration (including the requested actions), it makes sense to
store that in the Analysis object instead.

Also expose this object in the JSON output. This makes the JSON output
considerably more verbose, but also makes it much easier to debug when
there might be retroactive questions about what FawltyDeps was asked to
do.

In order to include the entire Settings object in the JSON output and
successfully test this, we need the JSON serialization of our Settings
object to be stable. Fortunately it's fairly easy to customize the
pydantic JSON encoder to use sorted() instead of list() when serializing
(frozen)set objects.
This prepares for refactoring how we determine output format
This changes -v/--verbose and -q/--quiet back to _only_ controlling the
log level.

In the command-line help message, add a clearly delineated group for
choosing the output format.

Add tests to verify that --detailed/--summary are independent of -v/-q:
- give summary description with debug-level logging
- give detailed description with quiet logging

Improved-by: Maria Knorps <maria.knorps@tweag.io>
We want the Settings-related command-line options to be co-located with
the Settings class, as the two likely need to be changed together, when
we add/modify settings to FawltyDeps. Still, they don't necessarily need
to be part of the same _class_. This makes the Settings class somewhat
smaller.

Suggested-by: Maria Knorps <maria.knorps@tweag.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configure FawltyDeps via pyproject.toml (in addition to command-line interface)
3 participants