Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatically expose pprof+prometheus endpoint. #677

Merged
merged 8 commits into from Mar 18, 2020
Merged

Conversation

raulk
Copy link
Contributor

@raulk raulk commented Mar 12, 2020

With this change, test plans using runtime.Invoke() will automatically benefit from pprof and prometheus HTTP exports. Fixes #673.

Works in all runneres (local:exec, local:docker, cluster:k8s (via k8s port forwarding).

Fixes #698.

@raulk raulk changed the title automatically expose pprof+prometheus endpoint. [DO NOT MERGE] automatically expose pprof+prometheus endpoint. Mar 12, 2020
@raulk
Copy link
Contributor Author

raulk commented Mar 12, 2020

TODO:

  • publish ports on local:docker.
  • make sure sidecar doesn't annul the routes.

@raulk raulk added this to the Testground v0.3 milestone Mar 12, 2020
pkg/build/golang/Dockerfile.template Show resolved Hide resolved
manifests/placebo.toml Show resolved Hide resolved
pkg/docker/container.go Show resolved Hide resolved
pkg/runner/local_docker.go Show resolved Hide resolved
Comment on lines 177 to 194
// Allow traffic from the host machine (acting as the gateway).
var (
gwIPs = make(map[string]struct{})
gwRoutes []netlink.Route
)
for _, route := range controlRoutes {
if _, ok := gwIPs[route.Gw.String()]; ok {
continue
}
nlroutes, err := netlinkHandle.RouteGet(route.Gw)
if err != nil {
return nil, fmt.Errorf("failed to resolve route: %w", err)
}
gwIPs[route.Gw.String()] = struct{}{}
gwRoutes = append(gwRoutes, nlroutes...)
}

controlRoutes = append(controlRoutes, gwRoutes...)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key. In kubernetes we don't need this, because kubernetes has native port-forwarding out of the box. In our case, we connect our test containers to the control and data networks. The former is tightly restricted, and the latter is internal and connects only the containers participating in the test. For containers to successfully expose ports to the host, besides publishing those ports, we need to ALLOW traffic from the gateway (host).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is actually added here?
I think this would be just a route to the single host, the route to the host, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the way, I see this route table on plan containers:

docker exec tg-example-prometheus2-f3de125372b3-single-1 ip route
default via 192.18.0.1 dev eth0 
16.0.0.0/16 dev eth1 proto kernel scope link src 16.0.0.2 
192.18.0.0/16 dev eth0 proto kernel scope link src 192.18.0.6 

they can ping google

docker exec tg-example-prometheus2-f3de125372b3-single-1 ping google.com
PING google.com (172.217.1.142) 56(84) bytes of data.
64 bytes from dfw28s23-in-f14.1e100.net (172.217.1.142): icmp_seq=1 ttl=52 time=97.6 ms
64 bytes from dfw28s23-in-f14.1e100.net (172.217.1.142): icmp_seq=2 ttl=52 time=104 ms

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for comparison... Without this change, from the current master,
the route table looks like this:

docker exec tg-example-prometheus2-19979292b320-single-1 ip route
16.1.0.0/16 dev eth1 proto kernel scope link src 16.1.0.3 
192.18.0.3 dev eth0 src 192.18.0.7 
192.18.0.4 dev eth0 src 192.18.0.7 

and there is no default route, so no place to send packets outside of known networks.

docker exec tg-example-prometheus2-19979292b320-single-1 ping -c 2 google.com
ping: google.com: Temporary failure in name resolution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this. Definitely unintended. I've fixed this logic in eaa06cd. Can you take it out for a spin again? Works for me.

  • The host can access ports exposed by the container.
  • The container cannot access public IPs.

@raulk raulk changed the title [DO NOT MERGE] automatically expose pprof+prometheus endpoint. automatically expose pprof+prometheus endpoint. Mar 17, 2020
@raulk raulk requested a review from coryschwartz March 17, 2020 13:53
Sometimes the flush could end up sending a nil bytes slice
to TgWriter, and this would make it as a nil progress payload
on the wire.
manifests/placebo.toml Show resolved Hide resolved
pkg/docker/container.go Show resolved Hide resolved
Comment on lines 177 to 194
// Allow traffic from the host machine (acting as the gateway).
var (
gwIPs = make(map[string]struct{})
gwRoutes []netlink.Route
)
for _, route := range controlRoutes {
if _, ok := gwIPs[route.Gw.String()]; ok {
continue
}
nlroutes, err := netlinkHandle.RouteGet(route.Gw)
if err != nil {
return nil, fmt.Errorf("failed to resolve route: %w", err)
}
gwIPs[route.Gw.String()] = struct{}{}
gwRoutes = append(gwRoutes, nlroutes...)
}

controlRoutes = append(controlRoutes, gwRoutes...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is actually added here?
I think this would be just a route to the single host, the route to the host, right?

pkg/docker/container.go Show resolved Hide resolved
manifests/placebo.toml Show resolved Hide resolved
Comment on lines 177 to 194
// Allow traffic from the host machine (acting as the gateway).
var (
gwIPs = make(map[string]struct{})
gwRoutes []netlink.Route
)
for _, route := range controlRoutes {
if _, ok := gwIPs[route.Gw.String()]; ok {
continue
}
nlroutes, err := netlinkHandle.RouteGet(route.Gw)
if err != nil {
return nil, fmt.Errorf("failed to resolve route: %w", err)
}
gwIPs[route.Gw.String()] = struct{}{}
gwRoutes = append(gwRoutes, nlroutes...)
}

controlRoutes = append(controlRoutes, gwRoutes...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the way, I see this route table on plan containers:

docker exec tg-example-prometheus2-f3de125372b3-single-1 ip route
default via 192.18.0.1 dev eth0 
16.0.0.0/16 dev eth1 proto kernel scope link src 16.0.0.2 
192.18.0.0/16 dev eth0 proto kernel scope link src 192.18.0.6 

they can ping google

docker exec tg-example-prometheus2-f3de125372b3-single-1 ping google.com
PING google.com (172.217.1.142) 56(84) bytes of data.
64 bytes from dfw28s23-in-f14.1e100.net (172.217.1.142): icmp_seq=1 ttl=52 time=97.6 ms
64 bytes from dfw28s23-in-f14.1e100.net (172.217.1.142): icmp_seq=2 ttl=52 time=104 ms

Comment on lines 177 to 194
// Allow traffic from the host machine (acting as the gateway).
var (
gwIPs = make(map[string]struct{})
gwRoutes []netlink.Route
)
for _, route := range controlRoutes {
if _, ok := gwIPs[route.Gw.String()]; ok {
continue
}
nlroutes, err := netlinkHandle.RouteGet(route.Gw)
if err != nil {
return nil, fmt.Errorf("failed to resolve route: %w", err)
}
gwIPs[route.Gw.String()] = struct{}{}
gwRoutes = append(gwRoutes, nlroutes...)
}

controlRoutes = append(controlRoutes, gwRoutes...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for comparison... Without this change, from the current master,
the route table looks like this:

docker exec tg-example-prometheus2-19979292b320-single-1 ip route
16.1.0.0/16 dev eth1 proto kernel scope link src 16.1.0.3 
192.18.0.3 dev eth0 src 192.18.0.7 
192.18.0.4 dev eth0 src 192.18.0.7 

and there is no default route, so no place to send packets outside of known networks.

docker exec tg-example-prometheus2-19979292b320-single-1 ping -c 2 google.com
ping: google.com: Temporary failure in name resolution

@raulk raulk requested a review from coryschwartz March 18, 2020 11:38
@raulk
Copy link
Contributor Author

raulk commented Mar 18, 2020

Re-requested review from @coryschwartz. I've rectified the logic that whitelists the host's IP addr.

Copy link

@coryschwartz coryschwartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appears to work, I can hit http://<plan_ip>:6060/metrics and it responds with output,

The plan can't ping outside the network. LGTM

switch {
case len(pub) == 0:
logging.S().Warnw("failed to discover gateway/host address; no routes to public IPs", "container_id", container.ID)
case pub[0].Gw == nil:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, I have the following routing table in the plans:

docker exec tg-example-prometheus2-a0bf3767448d-single-9 ip r
16.2.0.0/16 dev eth1 proto kernel scope link src 16.2.0.3 
192.18.0.1 dev eth0 src 192.18.0.6 
192.18.0.3 dev eth0 src 192.18.0.6 
192.18.0.4 dev eth0 src 192.18.0.6 

Very nice, no default gateway.

and the outside is un-pingable, also very nice.

docker exec tg-example-prometheus2-a0bf3767448d-single-9 ping 1.1.1.1
connect: Network is unreachable

// Sidecar doesn't whitelist traffic to public addresses, but it special-cases
// traffic between the container and the host, so that pprof, metrics and other
// ports can be exposed to the Docker host.
var PublicAddr = net.ParseIP("1.1.1.1")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the IP address of Cloudfliar DNS. I don't have a problem with that, just pointing that out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's ok. This will not perform any lookup of any kind, it just matches against the kernel routing tables.

@@ -44,6 +44,10 @@ type Error struct {
}

func (tgw *TgWriter) Write(p []byte) (n int, err error) {
if p == nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this prevent errors from popping up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It suppresses empty writes from making it onto the wire, which I found can happen when Flush is called. This was leading to the intermittent bug with the describe command.

@raulk raulk merged commit e16f9ca into master Mar 18, 2020
@raulk raulk deleted the feat/automatic-pprof branch March 18, 2020 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants