Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support fort kubelet stats summary #306

Merged
merged 3 commits into from Aug 20, 2018

Conversation

cpuguy83
Copy link
Contributor

@cpuguy83 cpuguy83 commented Aug 9, 2018

This allows hipster or metrics-server to scrape stats from VK for providers that support it.
Of course added an implementation for the Azure provider.

By default this enables a new HTTP listener on :10255 with a /stats/summary endpoint.
hitting this endpoint will give a result like this:

{
  "node": {
    "nodeName": "vk-metrics",
    "startTime": "2018-08-08T23:52:26Z"
  },
  "pods": [
    {
      "podRef": {
        "name": "nginx",
        "namespace": "default",
        "uid": "4305386a-9b66-11e8-b3ec-000d3a5dae4d"
      },
      "startTime": "2018-08-01T23:53:34Z",
      "containers": [
        {
          "name": "nginx",
          "startTime": "2018-08-01T23:53:34Z",
          "cpu": {
            "time": "2018-08-09T00:14:00Z",
            "usageNanoCores": 0
          },
          "memory": {
            "time": "2018-08-09T00:14:00Z",
            "usageBytes": 23470080
          },
          "userDefinedMetrics": null
        }
      ],
      "cpu": {
        "time": "2018-08-09T00:14:00Z",
        "usageNanoCores": 0
      },
      "memory": {
        "time": "2018-08-09T00:14:00Z",
        "usageBytes": 23470080
      },
      "network": {
        "time": "2018-08-09T00:14:00Z",
        "name": "",
        "rxBytes": 254,
        "txBytes": 172
      }
    }
  ]
}

Everything is great... except it's not actually working. metrics-server is scraping that stats but not storing the results.
This leads to, in the above case, a message in the metrics server logs when you try to fetch metrics from it like reststorage.go:93] No metrics for pod default/nginx, even though I know it (or something... something else for me to check) grabbing metrics from VK.

Posting this here for review/discussion while I try to figuring out why the stats are not actually being picked up. Also please let me know if you have some ideas as to what might be causing this.

stat.Network.TxBytes = &bytes
}
stat.Network.Time = metav1.NewTime(data.Timestamp)
stat.Network.InterfaceStats.Name = "eth0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheating on the name here. Info is not available in the API AFAICT.
Also wanted to make sure that a missing name wasn't causing my issues with metrics-server (it's not...)

@@ -216,6 +217,7 @@ func NewACIProvider(config string, rm *manager.ResourceManager, nodeName, operat
p.nodeName = nodeName
p.internalIP = internalIP
p.daemonEndpointPort = daemonEndpointPort
p.startTime = time.Now()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheating here a bit... really was just checking if this was an issue with metrics-server... doesn't appear to be.

@cpuguy83
Copy link
Contributor Author

Ok, so after adding in some extra logging, it seems metrics-server is not populating pod CPU stats (have to investigate why still as vk is providing it), when the metrics-server API is hit it skips the pod because the pod CPU stats are missing.

All in all, the CPU stats are a bit rough for me to wrap my head around how to report. Right now taking the provided average millicores, converting to nano, and multiplying by 60 (stat interval from ACI) to produce the number of nanoseconds of core time for the container... and also having to reset the core time for each stat request.

The k8s stats summary does support nanocore usage, but metrics-server (and hipster) don't use it from what I can tell.

Anyway, please let me know if you have some suggestions here.

stats "k8s.io/kubernetes/pkg/kubelet/apis/stats/v1alpha1"
)

const startTimeFormat = "2006-01-01 15:04:05 -0700 MST"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2006-01-02

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice catch, that would be why I'm getting negative uptime :)

@@ -151,6 +151,11 @@ func New(nodeName, operatingSystem, namespace, kubeConfig, taint, provider, prov
}

go ApiserverStart(p)
if metricsAddr != "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it has a default value :10255, we don't need expect it's empty

@@ -32,13 +33,50 @@ func ApiserverStart(provider Provider) {
r := mux.NewRouter()
r.HandleFunc("/containerLogs/{namespace}/{pod}/{container}", ApiServerHandler).Methods("GET")
r.HandleFunc("/exec/{namespace}/{pod}/{container}", ApiServerHandlerExec).Methods("POST")
r.HandleFunc("/stats/summary", MetricsSummaryHandler).Methods("GET")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove as they are added in MetricsServerStart function

@@ -32,13 +33,50 @@ func ApiserverStart(provider Provider) {
r := mux.NewRouter()
r.HandleFunc("/containerLogs/{namespace}/{pod}/{container}", ApiServerHandler).Methods("GET")
r.HandleFunc("/exec/{namespace}/{pod}/{container}", ApiServerHandlerExec).Methods("POST")
r.HandleFunc("/stats/summary", MetricsSummaryHandler).Methods("GET")
r.HandleFunc("/stats/summary/", MetricsSummaryHandler).Methods("GET")
r.NotFoundHandler = http.HandlerFunc(NotFound)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set the NotFoundHandler in the MetricsServer too.

@cpuguy83
Copy link
Contributor Author

Ok, so the real main issue is that it wants node level stats...

https://github.com/kubernetes-incubator/metrics-server/blob/f90c6705d2381ea2db1a6343da6c400bd2ef4cb2/metrics/storage/util/util.go#L27-L34

I'm not sure if any value makes sense for these stats considering that metrics-server assumes all the pods are on the node.

@cpuguy83
Copy link
Contributor Author

I should note metrics-server and heapster were nearly identical codebases until a couple of days ago. So maybe metrics-server HEAD behaves differently, I will have to look into it.

@cpuguy83
Copy link
Contributor Author

Ok, this is working.

Please pay extra attention to stat calculations. @robbiezhang I'm sure you have more insight into what that cpu stat is actually saying about the container group.

@cpuguy83 cpuguy83 force-pushed the metrics branch 3 times, most recently from c24012b to b331fbf Compare August 14, 2018 22:58
@cpuguy83 cpuguy83 changed the title [WIP] Add support fort kubelet stats summary Add support fort kubelet stats summary Aug 14, 2018
@cpuguy83
Copy link
Contributor Author

Ok, I've removed WIP from this.

p.metricsSyncTime = time.Now()
}()

cgs, err := p.aciClient.ListContainerGroups(p.resourceGroup)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should get the pods from resource manager. Ideally, it should be in-sync with the container groups in the resource group. Since it's in-memory, it's more lightweight, and won't be throttled by ARM.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to filter out the terminated and pending ones.

// MetricMetadataValue stores extra metadata about a metric
// In particular it is used to provide details about the breakdown of a metric dimension.
type MetricMetadataValue struct {
Name ValueDescriptor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`json:"name"`

p.metricsSync.Lock()
defer p.metricsSync.Unlock()

if time.Now().Sub(p.metricsSyncTime) < 30*time.Second {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 minute? since ACI metrics is PT1M

}

var errGroup errgroup.Group
chResult := make(chan stats.PodStats, len(cgs.Value))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by value? prefer pointer to avoid object copy.


// average is the average number of millicores over a 1 minute interval (which is the interval we are pulling the stats for)
nanoCores := uint64(data.Average * 1000000)
usuageNanoSeconds := nanoCores * 60
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo 'usage'

if metricsAddr != "" {
go MetricsServerStart(metricsAddr)
} else {
log.Println("skipping metrics server startup sicne no address was provided")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: ... startup 'since' no address ...

r.HandleFunc("/stats/summary", MetricsSummaryHandler).Methods("GET")
r.HandleFunc("/stats/summary/", MetricsSummaryHandler).Methods("GET")
r.NotFoundHandler = http.HandlerFunc(NotFound)
http.ListenAndServe(addr, r)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log the error

}

if _, err := w.Write(b); err != nil {
io.WriteString(w, err.Error())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if it still fail?

@@ -34,11 +37,58 @@ func ApiserverStart(provider Provider) {
r.HandleFunc("/exec/{namespace}/{pod}/{container}", ApiServerHandlerExec).Methods("POST")
r.NotFoundHandler = http.HandlerFunc(NotFound)

if metricsAddr != "" {
go MetricsServerStart(metricsAddr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move the Metrics Server into a new file, and start it from the vkubelet.go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I didn't do this for now is right now ApiServerStart is setting a global (for provider) and these will race with each other.
I'd rather refactor this in a separate PR.

Copy link
Collaborator

@robbiezhang robbiezhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor issues and typos

@cpuguy83 cpuguy83 force-pushed the metrics branch 2 times, most recently from 411c11f to 2960aa2 Compare August 17, 2018 20:51
@cpuguy83
Copy link
Contributor Author

This is all fixed up.

Copy link
Collaborator

@robbiezhang robbiezhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. do you want to merge #323 first, and integrate with the new logging in this PR?

@cpuguy83
Copy link
Contributor Author

Yeah, that might be easier 😃

@cpuguy83
Copy link
Contributor Author

This is all integrated, merging.

@mcraken
Copy link

mcraken commented Oct 18, 2018

Is the stats endpoint running now for the virtual kubelet? I see from the metrics server that it is still not able to scrap metrics because it can not find the stats endpoint. The virtual kubelet version is 1.11.2.

cpuguy83 added a commit to cpuguy83/virtual-kubelet that referenced this pull request Feb 27, 2019
Add support fort kubelet stats summary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants