New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support fort kubelet stats summary #306
Conversation
stat.Network.TxBytes = &bytes | ||
} | ||
stat.Network.Time = metav1.NewTime(data.Timestamp) | ||
stat.Network.InterfaceStats.Name = "eth0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheating on the name here. Info is not available in the API AFAICT.
Also wanted to make sure that a missing name wasn't causing my issues with metrics-server (it's not...)
providers/azure/aci.go
Outdated
@@ -216,6 +217,7 @@ func NewACIProvider(config string, rm *manager.ResourceManager, nodeName, operat | |||
p.nodeName = nodeName | |||
p.internalIP = internalIP | |||
p.daemonEndpointPort = daemonEndpointPort | |||
p.startTime = time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheating here a bit... really was just checking if this was an issue with metrics-server... doesn't appear to be.
b4c0314
to
b06f18b
Compare
Ok, so after adding in some extra logging, it seems metrics-server is not populating pod CPU stats (have to investigate why still as vk is providing it), when the metrics-server API is hit it skips the pod because the pod CPU stats are missing. All in all, the CPU stats are a bit rough for me to wrap my head around how to report. Right now taking the provided average millicores, converting to nano, and multiplying by 60 (stat interval from ACI) to produce the number of nanoseconds of core time for the container... and also having to reset the core time for each stat request. The k8s stats summary does support nanocore usage, but metrics-server (and hipster) don't use it from what I can tell. Anyway, please let me know if you have some suggestions here. |
providers/azure/metrics.go
Outdated
stats "k8s.io/kubernetes/pkg/kubelet/apis/stats/v1alpha1" | ||
) | ||
|
||
const startTimeFormat = "2006-01-01 15:04:05 -0700 MST" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2006-01-02
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice catch, that would be why I'm getting negative uptime :)
vkubelet/vkubelet.go
Outdated
@@ -151,6 +151,11 @@ func New(nodeName, operatingSystem, namespace, kubeConfig, taint, provider, prov | |||
} | |||
|
|||
go ApiserverStart(p) | |||
if metricsAddr != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it has a default value :10255, we don't need expect it's empty
vkubelet/apiserver.go
Outdated
@@ -32,13 +33,50 @@ func ApiserverStart(provider Provider) { | |||
r := mux.NewRouter() | |||
r.HandleFunc("/containerLogs/{namespace}/{pod}/{container}", ApiServerHandler).Methods("GET") | |||
r.HandleFunc("/exec/{namespace}/{pod}/{container}", ApiServerHandlerExec).Methods("POST") | |||
r.HandleFunc("/stats/summary", MetricsSummaryHandler).Methods("GET") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove as they are added in MetricsServerStart function
vkubelet/apiserver.go
Outdated
@@ -32,13 +33,50 @@ func ApiserverStart(provider Provider) { | |||
r := mux.NewRouter() | |||
r.HandleFunc("/containerLogs/{namespace}/{pod}/{container}", ApiServerHandler).Methods("GET") | |||
r.HandleFunc("/exec/{namespace}/{pod}/{container}", ApiServerHandlerExec).Methods("POST") | |||
r.HandleFunc("/stats/summary", MetricsSummaryHandler).Methods("GET") | |||
r.HandleFunc("/stats/summary/", MetricsSummaryHandler).Methods("GET") | |||
r.NotFoundHandler = http.HandlerFunc(NotFound) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set the NotFoundHandler in the MetricsServer too.
Ok, so the real main issue is that it wants node level stats... I'm not sure if any value makes sense for these stats considering that metrics-server assumes all the pods are on the node. |
I should note metrics-server and heapster were nearly identical codebases until a couple of days ago. So maybe metrics-server HEAD behaves differently, I will have to look into it. |
Ok, this is working. Please pay extra attention to stat calculations. @robbiezhang I'm sure you have more insight into what that cpu stat is actually saying about the container group. |
c24012b
to
b331fbf
Compare
Ok, I've removed WIP from this. |
providers/azure/metrics.go
Outdated
p.metricsSyncTime = time.Now() | ||
}() | ||
|
||
cgs, err := p.aciClient.ListContainerGroups(p.resourceGroup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should get the pods from resource manager. Ideally, it should be in-sync with the container groups in the resource group. Since it's in-memory, it's more lightweight, and won't be throttled by ARM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to filter out the terminated and pending ones.
providers/azure/client/aci/types.go
Outdated
// MetricMetadataValue stores extra metadata about a metric | ||
// In particular it is used to provide details about the breakdown of a metric dimension. | ||
type MetricMetadataValue struct { | ||
Name ValueDescriptor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`json:"name"`
providers/azure/metrics.go
Outdated
p.metricsSync.Lock() | ||
defer p.metricsSync.Unlock() | ||
|
||
if time.Now().Sub(p.metricsSyncTime) < 30*time.Second { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 minute? since ACI metrics is PT1M
providers/azure/metrics.go
Outdated
} | ||
|
||
var errGroup errgroup.Group | ||
chResult := make(chan stats.PodStats, len(cgs.Value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by value? prefer pointer to avoid object copy.
providers/azure/metrics.go
Outdated
|
||
// average is the average number of millicores over a 1 minute interval (which is the interval we are pulling the stats for) | ||
nanoCores := uint64(data.Average * 1000000) | ||
usuageNanoSeconds := nanoCores * 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo 'usage'
vkubelet/apiserver.go
Outdated
if metricsAddr != "" { | ||
go MetricsServerStart(metricsAddr) | ||
} else { | ||
log.Println("skipping metrics server startup sicne no address was provided") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: ... startup 'since' no address ...
vkubelet/apiserver.go
Outdated
r.HandleFunc("/stats/summary", MetricsSummaryHandler).Methods("GET") | ||
r.HandleFunc("/stats/summary/", MetricsSummaryHandler).Methods("GET") | ||
r.NotFoundHandler = http.HandlerFunc(NotFound) | ||
http.ListenAndServe(addr, r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log the error
vkubelet/apiserver.go
Outdated
} | ||
|
||
if _, err := w.Write(b); err != nil { | ||
io.WriteString(w, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if it still fail?
@@ -34,11 +37,58 @@ func ApiserverStart(provider Provider) { | |||
r.HandleFunc("/exec/{namespace}/{pod}/{container}", ApiServerHandlerExec).Methods("POST") | |||
r.NotFoundHandler = http.HandlerFunc(NotFound) | |||
|
|||
if metricsAddr != "" { | |||
go MetricsServerStart(metricsAddr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move the Metrics Server into a new file, and start it from the vkubelet.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn't do this for now is right now ApiServerStart
is setting a global (for provider) and these will race with each other.
I'd rather refactor this in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor issues and typos
411c11f
to
2960aa2
Compare
This is all fixed up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. do you want to merge #323 first, and integrate with the new logging in this PR?
Yeah, that might be easier 😃 |
This adds a new, optional, interface for providers that want to provide stats.
This is all integrated, merging. |
Is the stats endpoint running now for the virtual kubelet? I see from the metrics server that it is still not able to scrap metrics because it can not find the stats endpoint. The virtual kubelet version is 1.11.2. |
Add support fort kubelet stats summary
This allows hipster or metrics-server to scrape stats from VK for providers that support it.
Of course added an implementation for the Azure provider.
By default this enables a new HTTP listener on
:10255
with a/stats/summary
endpoint.hitting this endpoint will give a result like this:
Everything is great... except it's not actually working. metrics-server is scraping that stats but not storing the results.
This leads to, in the above case, a message in the metrics server logs when you try to fetch metrics from it like
reststorage.go:93] No metrics for pod default/nginx
, even though I know it (or something... something else for me to check) grabbing metrics from VK.Posting this here for review/discussion while I try to figuring out why the stats are not actually being picked up. Also please let me know if you have some ideas as to what might be causing this.