Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make hostNetwork configurable #107

Merged
merged 4 commits into from
Jan 12, 2024
Merged

feat: make hostNetwork configurable #107

merged 4 commits into from
Jan 12, 2024

Conversation

vadasambar
Copy link
Contributor

@vadasambar vadasambar commented Jan 11, 2024

  • disable it by default (fixes port conflict issue on the node if another port is trying to acquire the same port)

Why

We use hostNetwork: true by default which causes problems in the following scenario:

  1. another-csi-driver pod is running on the same node A
  2. another-csi-driver uses port 8080
  3. warm-metal pod is scheduled on node A
  4. warm-metal pod tries to use port 8080 as well (for metrics endpoint) but fails because the port is already in use

This happens because we use hostNetwork: true. Based on my tests and analysis (along with integration tests), we don't really need hostNetwork because we are not accessing anything on the node nor is the node accessing anything on the warm-metal pod directly.

More info

hostNetwork has been a part of the initial commit: 515cec1

The liveness probe is a sidecar container that exposes an HTTP /healthz endpoint, which serves as kubelet's livenessProbe hook to monitor health of a CSI driver.

The liveness probe uses Probe() call to check the CSI driver is healthy. See CSI spec for more information about Probe API call. Container Storage Interface (CSI)

https://github.com/kubernetes-csi/livenessprobe/tree/master?tab=readme-ov-file#liveness-probe

So kubelet basically executes livenessProbe which pings the healthz endpoint to check if the CSI driver is alive and livenessprobe server forwards that request to the CSI driver's Probe() endpoint to check if the driver is healthy.
https://github.com/kubernetes-csi/livenessprobe/blob/ebd49b57031ab90f6f376f1472884948f381558c/cmd/livenessprobe/main.go#L169 -> https://github.com/kubernetes-csi/livenessprobe/blob/ebd49b57031ab90f6f376f1472884948f381558c/cmd/livenessprobe/main.go#L73

I don't see a need for warm metal to use hostNetwork since it doesn't need to access anything on the node.

warm-metal can access all the unix sockets via volumeMounts:

...
    volumeMounts:
    - mountPath: /csi
      name: socket-dir
...
    - mountPath: /run/containerd/containerd.sock
      name: runtime-socket
...
  volumes:
  - hostPath:
      path: /var/lib/kubelet/plugins/csi-image.warm-metal.tech
      type: DirectoryOrCreate
    name: socket-dir
...
  - hostPath:
      path: /run/containerd/containerd.sock
      type: Socket
    name: runtime-socket
...
  - name: kube-api-access-xst7n
...

ref: https://github.com/warm-metal/csi-driver-image/blob/master/charts/warm-metal-csi-driver/templates/nodeplugin.yaml

- disable it by default (fixes port conflict issue on the node if another port is trying to acquire the same port)
@vadasambar vadasambar marked this pull request as ready for review January 11, 2024 12:26
@vadasambar vadasambar requested a review from a team as a code owner January 11, 2024 12:26
@@ -102,7 +102,8 @@ func (m *PullExecutor) StartPulling(o *PullOptions) error {
c, cancel := context.WithTimeout(context.Background(), pullCtxTimeout)
defer cancel()

if pullstatus.Get(o.NamedRef) == pullstatus.StillPulling {
if pullstatus.Get(o.NamedRef) == pullstatus.StillPulling ||
pullstatus.Get(o.NamedRef) == pullstatus.Pulled {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fix for what appears to be a race condition when I was testing #81
Basically when two requests sent back-to-back with little time duration between them, one of them succeeds while the other goes into the mutex section and continues to pull the same image again (if pullAlways: true). I haven't seen this issue outside #81 but thought it would be good to get the fix in before we get #81 merged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Collaborator

@mugdha-adhav mugdha-adhav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if we could also update other references for hostNetwork and set it to false for maintaining consistency.

@vadasambar
Copy link
Contributor Author

It would be good if we could also update other references for hostNetwork and set it to false for maintaining consistency.

I have replaced the occurrences.

@vadasambar
Copy link
Contributor Author

Will ping you here once the CI finishes successfully.

@vadasambar
Copy link
Contributor Author

@mugdha-adhav can you review the PR again.

@vadasambar
Copy link
Contributor Author

@mugdha-adhav can I ask for a review again 🙏

@mugdha-adhav mugdha-adhav merged commit 2da2ca1 into warm-metal:master Jan 12, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants