Istio and OpenCensus 101 - Lightning Demo

This is the code I use for my Istio 101 talk and Istio and OpenCensus talk. Please take a look! I assume some prior knowledge of Kubernetes, but it's not totally required.

Talk Video:

Want to get started with one click? Open this repo in Cloud Shell. It's free and has everything you need to get started.

TL;DR - I want to skip setup

Run this:

make create-cluster deploy-istio build push deploy-stuff

Run this in another terminal:

make start-monitoring-services

Cluster Setup

You need a Kubernetes 1.10 or newer cluster.

You will also need Docker and kubectl 1.9.x or newer installed on your machine, as well as the Google Cloud SDK. You can install the Google Cloud SDK (which will also install kubectl) here.

To create the cluster with Google Kubernetes Engine, run this command:

make create-cluster

This will create a cluster called "my-istio-cluster" with 4 nodes in the us-west1-b region. This will deploy into the current active project set in your Google Cloud SDK. You can change this by passing in a custom value for the Project ID and/or Zone.

make create-cluster PROJECT_ID=your-custom-id-here ZONE=your-custom-zone

Istio Setup

This project assumes you are running on x64 Linux. If you are running on another platform, I highly reccomend using Google Cloud Shell to get a free x64 Linux environment.

To deploy Istio into the cluster, run

make deploy-istio

This will deploy the Istio services and control plane into your Kubernetes Cluster. Istio will create its own Kubernetes Namespace and a bunch of services and deployments. In addition, this command will install helper services. Jaeger for tracing, Prometheus for monitoring, Servicegraph to visualize your microservices, and Grafana for viewing metrics.

Start Helper Services

Run this in another terminal:

make start-monitoring-services

This will create tunnels into your Kubernetes cluster for Jaeger, Servicegraph, and Grafana. This command will not exit as it keeps the connection open.

Create Docker Container

To build and push the code, run:

make build push

This will create the Docker containers and push it up to your Google Container Registry.

Again, you can pass in a custom project ID, but make sure it is the same as before:

make build push PROJECT_ID=your-custom-id-here

Note: This will build three containers, one for the vanilla Istio demo, and two for the OpenCensus demo.

Deploy Kubernetes Services

This will create the three Deployments and the three Services.

make deploy-stuff

Again, you can pass in a custom project ID, but make sure it is the same as before:

make deploy-stuff PROJECT_ID=your-custom-id-here

The Code

The above command deployed three microservices all running the same code, with a different configuration for each. The code is super simple, all it does it make a request to a downstream service, takes the result, and concatenates it with the its own name, some latency information, and the downstream URL.

This is a great demo app for Istio, because you can chain together an "infinite" number of these to create deep trees of services that simulate real microservice deployments.

Using Istio

Let's see the Kubernetes resources:

make get-stuff

You should see something like this:

kubectl get pods && kubectl get svc && kubectl get ingress
NAME                                 READY     STATUS    RESTARTS   AGE
backend-prod-1666293437-dcrnp        2/2       Running   0          21m
frontend-prod-3237543857-g8fpp       2/2       Running   0          22m
middleware-canary-2932750245-cj8l6   2/2       Running   0          21m
middleware-prod-1206955183-4rbpt     2/2       Running   0          21m
NAME         CLUSTER-IP    EXTERNAL-IP       PORT(S)        AGE
backend      10.3.252.16   <none>            80/TCP         22m
frontend     10.3.248.79   <none>            80/TCP         22m
kubernetes   10.3.240.1    <none>            443/TCP        23m
middleware   10.3.251.46   <none>            80/TCP         22m
NAME                   TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)                                                                                                     AGE
istio-ingressgateway   LoadBalancer   10.3.246.226   35.XXX.XXX.XXX  80:31380/TCP,443:31390/TCP,31400:31400/TCP,15011:31754/TCP,8060:32056/TCP,15030:31770/TCP,15031:32112/TCP   2m

Where is Istio?

Aside from the "istio-ingressgateway", you might notice there is no trace of Istio to be seen. This is because the Istio control plane is launched into its own Kubernetes Namespace. You can see the Istio resources with this command:

kubectl get pods --namespace=istio-system

We have launched Istio in "auto inject" mode. After the Makefile deployed Istio, the Makefile ran this command:

kubectl label namespace default istio-injection=enabled

This means that any Pods that Kubernetes creates in the default namespace will automatically get a Istio sidecar proxy attached to it. This proxy will enforce Istio policies without any action from the app! You can also run Istio in the normal mode, and add the proxy into the Kubernetes YAML manually. Again, there is no change to the app, but the Kubernetes Deployment is manually patched. This is useful if you want some services to bypass Istio.

Trying it out

You may notice that none of our services have External IPs. This is because we want Istio to manage all inboud traffic. To do this, Istio uses an Ingress Gateways.

Visit the External IP of the "istio-ingressgateway". At this point, you should get an error!

Don't worry! Everything is ok!

Right now, the gateway is not set-up, so it is dropping all traffic at the edge. Let's set it up with this command:

kubectl create -f ./configs/istio/ingress.yaml

This file contains two objects. The first object is a Gateway, which will allow us to bind to the "istio-ingressgateway" that exists in the cluster. The second object is a VirtualService, which let's us apply routing rules. Because we are using a wildcard (*) charater for the host and only one route rule, all traffic from this gateway to the frontend service.

Visit the IP address again, and you should see that it is working!

frontend-prod - 0.287secs
http://middleware/ -> middleware-canary - 0.241secs
http://backend/ -> backend-prod - 0.174secs
http://time.jsontest.com/ -> StatusCodeError: 404 - ""

You can see that the frontend service requests the middleware service, which requests the backend service, which finally requests time.jsontest.com

Fixing the 404

You might notice that time.jsontest.com is returning a 404. This is because by default, Istio blocks all Egress traffic out of the cluster. This is a great security practice, as it prevents malicious code from calling home or your code from talking to unverified 3rd party services.

Let's unblock time.jsontest.com by creating a ServiceEntry.

You can see the ServiceEntry that we are creating here, and notice that it allows both HTTP and HTTPS access to time.jsontest.com

To apply this rule, run:

kubectl apply -f ./configs/istio/egress.yaml

Now, you should see the services fully working!

frontend-prod - 0.172secs
http://middleware/ -> middleware-canary - 0.154secs
http://backend/ -> backend-prod - 0.142secs
http://time.jsontest.com/ -> {
   "time": "12:43:09 AM",
   "milliseconds_since_epoch": 1515026589163,
   "date": "01-04-2018"
}

Fixing the Flip Flop

If you refresh the page enough times, you might see a small change between loads. Sometimes, the middleware service is called middleware-canary and sometimes is it called middleware-prod

This is because there is a single Kubernetes service called middleware sending traffic to two deployments (called middleware-prod and middleware-canary). This is a really powerful feature that let's you do things like Blue-Green deployments and Canary testing. Using Servicegraph, we can actually see this visually.

Open Servicegraph: http://localhost:8088/dotviz

You should see something like this:

You can see that traffic goes from frontend (prod), and then is sent to either the middleware (prod) or middleware (canary), and then finally goes backend (prod)

With Istio, you can control where traffic goes using a VirtualService. For example, you can send 80% of the all traffic that is going to the middleware service to prod and 20% to canary. In this case, let's set 100% of the traffic to prod.

You can see the VirtualServices here.

There are a few interesting things about this file.

You might notice that the frontend VirtualService looks different than the others. This is because we are binding the frontend service to the gateway.

Next comes the way we are making all traffic go to the "prod" deployment. This is the key section for the middleware VirtualService is:

  http:
  - route:
    - destination:
        host: middleware
        subset: prod

You might be wondering, where is this "subset" defined? How does Istio know what "prod" is?

You can define these subsets in something called a DestinationRule.

Our DestinationRules are defined here. Let's look at the important part of the middleware rule:

spec:
  host: middleware
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
  subsets:
  - name: prod
    labels:
      version: prod
  - name: canary
    labels:
      version: canary

First, we are setting a trafficPolicy that ensures that all traffic is encrypted with Mutual TLS. This means that we get encryption and we can actually restrict which services get access to other services inside the mesh. Even if someone breaks into your cluster, they can't impersonate another service if you are using mTLS, thus limiting the damange they can do.

Next, you can see how the subsets are defined. We give each subset a name, and then use Kubernetes lables to select the pods in each subset. We are creating a prod and canary subet in this case.

Let's create the DestinationRules and the VirtualServices:

kubectl apply -f ./configs/istio/destinationrules.yaml

kubectl apply -f ./configs/istio/routing-1.yaml

Now all traffic will be sent to the middleware-prod service.

Create Some Instability

In the real world, services fail all the time. In a microservices world, this means you can have thousands of downstream failures, and your app needs to be able to handle them. Istio's Service Mesh can be configured to automatically handle many of these failures so your app doesn't have to!

Note: Istio has a feature call Fault Injection that can simulate errors. This means you don't need to write bad code to test if your app can handle failure.

While the code we deployed is very stable, there is a hidden function that will cause the app to randomly return a 500!

The code is triggered by a HTTP Header called "fail" which is a number between 0 and 1, with 0 being a 0% chance of failure, and 1 being 100% chance.

I'll be using Postman to send the headers and see the result.

With a 30% failure percentage, you can see the app failing a lot, but sometimes it works. Bumping that to 50%, the app fails almost every time! This is because each service forwards headers to the downstream services, so this is a cascading error!

Fix it with Istio

Because the request will sometimes work and sometimes not work, this is called a flaky error. These are hard to debug because they are hard to find! If there is a very low percentage chance, then the error will only be triggered maybe one time in a million. However, that might be an important request that can potentially break everything. There is a simple fix for this type of flaky error, just retry the request! This only works up to a certain point, but can easily pave over flaky errors that have low probabilities of occurring.

Normally, you would need to write this retry logic in every single one of your microservices for every single network request. Thankfully, Istio provides this out of the box so you don't need to modify your code at all!

Let's modify the VirtualServices to add in some retry logic. Here is the updated rule. There is an additional section for each that looks like this:

   retries:
     attempts: 3
     perTryTimeout: 2s

This means that Istio will retry the request three times before giving up, and will wait 2 seconds per retry (in case the downstream service hangs). Your app just sees it as one request, all the retry complexity is abstracted away.

Apply the rule:

kubectl apply -f ./configs/istio/routing-2.yaml

And refresh the page:

And now, all the errors are gone!

What about that middleware canary?

If you remember, there are two middleware services. Prod and Canary. Right now, the rules send all traffic to Prod. This is good, because we don't want normal people accessing the Canary build.

However, we do want our dev team and trusted testers to access it. Thankfully, Istio makes doing this quite easy!

Routing Rules can have conditional routing based on things like headers, cookies, etc. We could check if a user is part of the trusted group and set a cookie that let's them access the canary service. For simplicity's sake, I'm going to use a header called "x-dev-user" and check if the value is "super-secret".

You can see the new rule in this file

  http:
  - match:
    - headers:
        x-dev-user:
          exact: super-secret
    route:
      - destination:
          host: middleware
          subset: canary
    retries:
      attempts: 3
      perTryTimeout: 2s
  - route:
    - destination:
        host: middleware
        subset: prod
    retries:
      attempts: 3
      perTryTimeout: 2s

You can see that we are checking if the "x-dev-user" header is set to "super-secret" and if it is then we send traffic to the canary subset. Otherwise, traffic gets sent to prod.

Apply this file:

kubectl apply -f ./configs/istio/routing-3.yaml

And now, when you send the proper header, Istio automatically routes you to the right service.

When working with Istio, it is important that all your services forward the headers needed by the downstream services. I recommend standardizing on some headers, and ensuring that you always forward them when talking to a downstream service. There are many libraries that can do this automatically.

Monitoring and Tracing

A awesome benefit of Istio is that it automatically adds tracing and monitoring support to your apps. While monitoring is added for free, tracing needs you to forward the trace headers that Istio's Ingress controller automatically injects so Istio can stitch together the requests. You need to forward the following headers in your code:

[
    'x-request-id',
    'x-b3-traceid',
    'x-b3-spanid',
    'x-b3-parentspanid',
    'x-b3-sampled',
    'x-b3-flags',
    'x-ot-span-context',
]

The sample microservice you deployed already does this.

Viewing Traces

Now, you can open Jaeger, select the "frontend" service, and click "Find Traces". Istio will sample your requests, so not every request will be logged.

Click a Trace, and you can see a waterfall of the requests. Because we set the "fail" header, we can also see Istio's auto-retry mechanism in action!

Viewing Metrics

To view Metrics, open Grafana

You can see a lot of cool metrics in the default Istio dashboard, or customize it as you see fit!

OpenCensus

While Istio is great for network level tracing and metrics, it is important to also collect information about what is going on inside you app. This is where OpenCensus comes into play. OpenCensus is a vendor-neutral tracing and metrics library thats built for mulitple popular langauges.

Let's add OpenCensus to our app and see how it can help!

Automatic Header Forwarding

If you remember from the tracing section of this README, we had to manually foward all those x-blah-blah headers for each incoming and outgoing reqest.

One really cool thing about OpenCensus is that it can automatically forward these headers for the majorty of popular server frameworks.

You can see the full code here, but the key part is at the top:

const tracing = require('@opencensus/nodejs');
const propagation = require('@opencensus/propagation-b3');
const b3 = new propagation.B3Format();

tracing.start({
	propagation: b3,
	samplingRate: 1.0
});

Just put this little snippit at the top of your code and every single request will get trace headers automatically added! You can change the samplingRate if you don't want every request to be traced (which can get overwhleming for large systems).

While our test app only has one route and one downstream call, you can see how this would be very useful for a real server which has many routes and downstream calls. Without this, you would have to add header forwarding to every single route and downstream call yourself!

Let's deploy this code:

make deploy-opencensus-code

You shouldn't notice anything different (expct for the text indicating its using "opencensus simple"), but the big difference is our code no longer has to manually forward these headers.

In-app Tracing

While tracing network reqests is great, OpenCensus can also trace function calls inside your app and tie them together with the network traces! This give you more visibility into what exactly is going on inside your microservices.

You can see the full code here.

In the setup, we need to tell OpenCensus how to connect to the Jaeger instance that is running in the Istio namepsace.

// Set up jaeger
const jaeger = require('@opencensus/exporter-jaeger')

const jaeger_host = process.env.JAEGER_HOST || 'localhost'
const jaeger_port = process.env.JAEGER_PORT || '6832'

const exporter = new jaeger.JaegerTraceExporter({
	host: jaeger_host,
    port: jaeger_port,
	serviceName: service_name,
});

tracing.start({
	propagation: b3,
	samplingRate: 1.0,
    exporter: exporter
});

The environment variables that the code uses are set in a Kubernetes ConfigMap. In the tracing.start function, we simply pass in the exporter that we created with the Jaeger details, and that's it! We are all set up!

Now you can create custom spans to trace anything you want. Because OpenCensus automatically creates "rootSpans" for all incoming web requests, you can start creating "childSpans" inside the routes very easily:

  const childSpan = tracing.tracer.startChildSpan('Child Span!')
  // Do Stuff
  childSpan.end()

If you want to create grandchildren, just make sure to set the parentSpanId to the child's id.

  const childSpan = tracing.tracer.startChildSpan('Child Span!')
    // Do Child Stuff
    const grandchildSpan = tracing.tracer.startChildSpan('Grandchild Span!')
    grandchildSpan.parentSpanId = childSpan.id
    // Do Granchild Stuff
    grandchildSpan.end()
  childSpan.end()

In our code, we calculate a random Fibonacci number and then sleep for a bit. It's a silly example, but it demonstrates tracing well!

You can see all the nested traces, and the really cool thing is that you can see the traces for all three microservices at the same time! This is because OpenCensus and Istio are using the same IDs throughout the stack.

Custom Metrics

While Istio can give you a lot of network level metrics, once again it is not able to give you app specific metrics that you might care about. Just like app-level tracing, OpenCensus can help you with app-level metrics as well.

COMING SOOON!

Shutting Down The Test Cluster

make delete-cluster

or

make delete-cluster PROJECT_ID=your-custom-id-here ZONE=your-custom-zone

Advanced

Customize the Istio Deployment

If you want to customize the components of Istio that are installed, you can do that with Helm! First, you need to download the 1.0 release of Istio which contains the Helm charts you need.

make download-istio

Then, you can generate the Istio yaml that I use for this demo with the following command:

helm template istio-1.0.0/install/kubernetes/helm/istio --name istio --namespace istio-system --set global.mtls.enabled=true --set tracing.enabled=true --set servicegraph.enabled=true --set grafana.enabled=true > istio.yaml

Feel free to modify this command to suite your needs, but note that this demo won't work without these things enabled!

NOTE: This is not an official Google product

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
code		code
configs		configs
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
grafana.png		grafana.png
istio.yaml		istio.yaml
jaeger-opencensus.png		jaeger-opencensus.png
jaeger.gif		jaeger.gif
no_retry.gif		no_retry.gif
retry.gif		retry.gif
servicegraph.png		servicegraph.png
servicegraph2.png		servicegraph2.png
working.gif		working.gif
x-dev-user.gif		x-dev-user.gif
zipkin.png		zipkin.png
zipkin_fail.png		zipkin_fail.png

License

thesandlord/Istio101

Folders and files

Latest commit

History

Repository files navigation