Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate probes for PV provisioning and mounting #101

Merged

Conversation

ushitora-anqou
Copy link
Contributor

@ushitora-anqou ushitora-anqou commented Jan 12, 2024

The current probe checks that both a new provisioning of a PV and its mounting succeed on every Node.
This guarantee is sufficient but not necessary; although mounting an already provisioned PV should succeed on every node, it is sufficient that a new provisioning succeeds on at least one Node.

To address the above issue, the new architecture has the following two types of probes:

  • provision-probe, which checks that a new provision succeeds; and
  • mount-probe, which checks that a PV (possibly already provisioned) can be successfully mounted on each Node.

In addition, PieProbe custom resource is introduced to group the probes that monitor a StorageClass. Each probe has an owner reference to a PieProbe and is GCed when the referenced PieProbe is deleted. See also issue #50.

This PR introduces the following new metrics:

  • pie_io_write_latency_on_mount_probe_seconds
    • Experimental metrics. IO latency of write, benchmarked on mount-probe Pods.
    • TYPE: gauge
  • pie_io_read_latency_on_mount_probe_seconds
    • Experimental metrics. IO latency of read, benchmarked on mount-probe Pods.
    • TYPE: gauge
  • pie_mount_probe_total
    • Experimental metrics. The number of attempts of the creation of the mount-probe Pod object and the creation of the container.
    • TYPE: counter
  • pie_performance_on_mount_probe_total
    • Experimental metrics. The number of attempts of performing the IO benchmarks on mount-probe Pods.
    • TYPE: counter
  • pie_provision_probe_total
    • Experimental metrics. The number of attempts of the creation of the provision-probe Pod object and the creation of the container.
    • TYPE: counter

Note that old metrics (e.g., pie_io_write_latency_seconds and pie_create_probe_total) should behave in the same way as before merging this patch.

@ushitora-anqou ushitora-anqou force-pushed the add-separate-metrics-for-pv-provision-and-mounting branch 5 times, most recently from 3ec117b to ac72651 Compare January 19, 2024 04:12
@ushitora-anqou ushitora-anqou changed the title [DNM][WIP] separate metrics for PV provision and mounting Separate probes for PV provisioning and mounting Jan 19, 2024
@ushitora-anqou ushitora-anqou marked this pull request as ready for review January 19, 2024 04:43
@ushitora-anqou ushitora-anqou requested a review from a team as a code owner January 19, 2024 04:43
@ushitora-anqou ushitora-anqou marked this pull request as draft January 19, 2024 06:02
@ushitora-anqou ushitora-anqou force-pushed the add-separate-metrics-for-pv-provision-and-mounting branch 4 times, most recently from 8993f3b to 71b87dd Compare January 26, 2024 02:50
@ushitora-anqou ushitora-anqou force-pushed the add-separate-metrics-for-pv-provision-and-mounting branch 8 times, most recently from 5dc0b30 to a9b5c36 Compare January 31, 2024 09:14
@ushitora-anqou ushitora-anqou marked this pull request as ready for review January 31, 2024 09:15
Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
…Probe`

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
@ushitora-anqou ushitora-anqou force-pushed the add-separate-metrics-for-pv-provision-and-mounting branch from a9b5c36 to c019c38 Compare February 1, 2024 05:54
@ushitora-anqou
Copy link
Contributor Author

ushitora-anqou commented Feb 1, 2024

I updated my PR. The changes are: https://github.com/topolvm/pie/compare/a9b5c369798460d423528113786504830e4f809f..c019c385d3bcb19bc66f0eb728adafd5d1d8d86c

  • I rebased this branch onto the current main branch to resolve the conflicts.
  • I dropped the commit of adding the config/ directory to .gitignore.

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
…-line arguments instead

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
@ushitora-anqou ushitora-anqou force-pushed the add-separate-metrics-for-pv-provision-and-mounting branch from 1ffc610 to 23a2ee4 Compare February 20, 2024 07:23
@pluser
Copy link
Contributor

pluser commented Feb 27, 2024

LGTM!

@satoru-takeuchi satoru-takeuchi merged commit 24f8469 into main Feb 27, 2024
7 checks passed
@satoru-takeuchi satoru-takeuchi deleted the add-separate-metrics-for-pv-provision-and-mounting branch February 27, 2024 05:39
ushitora-anqou added a commit that referenced this pull request Feb 29, 2024
In pull request #101, we introduced a new custom resource called
PieProbe. However, the charts/ and test/e2e/ directories were not
synchronized with the auto-generated config/ directory.

This patch fixes the above probelm, by copying the latest PieProbe CRD
from config/ to charts/ and updating the manifests used in the e2e
tests.

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
ushitora-anqou added a commit that referenced this pull request Feb 29, 2024
In pull request #101, we introduced a new custom resource called
PieProbe. However, the charts/ and test/e2e/ directories were not
synchronized with the auto-generated config/ directory.

This patch fixes the above probelm, by copying the latest PieProbe CRD
from config/ to charts/ and updating the manifests used in the e2e
tests.

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
ushitora-anqou added a commit that referenced this pull request Mar 29, 2024
The new provision observer (provision_observer2.go) introduced in #101 tries to
remove the owned jobs that is failing for a while, but it doesn't work well
because it looks for them by a wrong prefix.

This PR fixes the above problem by using constants.MountProbeNamePrefix and
constants.ProvisionProbeNamePrefix instead of constants.ProbeNamePrefix when
the provision observer looks for the jobs.
ushitora-anqou added a commit that referenced this pull request Mar 29, 2024
The new provision observer (provision_observer2.go) introduced in #101 tries to
remove the owned jobs that is failing for a while, but it doesn't work well
because it looks for them by a wrong prefix.

This PR fixes the above problem by using constants.MountProbeNamePrefix and
constants.ProvisionProbeNamePrefix instead of constants.ProbeNamePrefix when
the provision observer looks for the jobs.

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
ushitora-anqou added a commit that referenced this pull request Apr 1, 2024
The new provision observer (provision_observer2.go) introduced in #101 tries to
remove the owned jobs that is failing for a while, but it doesn't work well
because it looks for them by a wrong prefix.

This PR fixes the above problem by using constants.MountProbeNamePrefix and
constants.ProvisionProbeNamePrefix instead of constants.ProbeNamePrefix when
the provision observer looks for the jobs.

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
ushitora-anqou added a commit that referenced this pull request Apr 2, 2024
The new provision observer (provision_observer2.go) introduced in #101 tries to
remove the owned jobs that is failing for a while, but it doesn't work well
because it looks for them by a wrong prefix.

This PR fixes the above problem by using constants.MountProbeNamePrefix and
constants.ProvisionProbeNamePrefix instead of constants.ProbeNamePrefix when
the provision observer looks for the jobs.

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

3 participants