Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 279 lines (194 sloc) 16.904 kb
84ffd64 Johan Oskarsson Add logo and brush up contribute seconds
johanoskarsson authored
1 ![Zipkin (doc/zipkin-logo-200x119.jpg)](https://github.com/twitter/zipkin/raw/master/doc/zipkin-logo-200x119.jpg)
2
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
3 Zipkin is a distributed tracing system that helps us gather timing data for all the disparate services at Twitter.
4 It manages both the collection and lookup of this data through a Collector and a Query service.
96809c1 Johan Oskarsson Make changes needed to build on travis-ci
johanoskarsson authored
5 We closely modelled Zipkin after the <a href="http://research.google.com/pubs/pub36356.html">Google Dapper</a> paper. Follow <a href="https://twitter.com/zipkinproject">@zipkinproject</a> for updates. [![Build Status](https://secure.travis-ci.org/twitter/zipkin.png)](http://travis-ci.org/twitter/zipkin)
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
6
7 ## Why distributed tracing?
8 Collecting traces helps developers gain deeper knowledge about how certain requests perform in a distributed system.
9 Let's say we're having problems with user requests timing out. We can look up traced requests that timed out and display
1f72e8d s/ui/UI/
Franklin Hu authored
10 it in the web UI. We'll be able to quickly find the service responsible for adding the unexpected response time.
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
11 If the service has been annotated adequately we can also find out where in that service the issue is happening.
12
1f72e8d s/ui/UI/
Franklin Hu authored
13 ![Screnshot of the Zipkin web UI (doc/web-screenshot.png)](https://github.com/twitter/zipkin/raw/master/doc/web-screenshot.png)
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
14
15 ## Architecture
16 These are the components that make up a fully fledged tracing system.
17
18 ![Zipkin Architecture (doc/architecture-0.png)](https://github.com/twitter/zipkin/raw/master/doc/architecture-0.png)
19
20 ### Instrumented libraries
21 Tracing information is collected on each host using the instrumented libraries and sent to Zipkin.
22 When the host makes a request to another service, it passes a few tracing identifers along with the request so we can later tie the data together.
23
24 ![Zipkin Instrumentation architecture (doc/architecture-1.png)](https://github.com/twitter/zipkin/raw/master/doc/architecture-1.png)
25
26 We have instrumented the libraries below to trace requests and to pass the required identifiers to the other services called in the request.
27
28 ##### Finagle
29 > Finagle is an asynchronous network stack for the JVM that you can use to build asynchronous Remote Procedure Call (RPC) clients and servers in Java, Scala, or any JVM-hosted language.
30
31 <a href="https://github.com/twitter/finagle">Finagle</a> is used heavily inside of Twitter and it was a natural point to include tracing support. So far we have client/server support for Thrift and HTTP as well as client only support for Memcache and Redis.
32
33 To set up a Finagle server in Scala, just do the following.
34 Adding tracing is as simple as adding <a href="https://github.com/twitter/finagle/tree/master/finagle-zipkin">finagle-zipkin</a> as a dependency and a `tracerFactory` to the ServerBuilder.
35
72fd02b Add syntax highlighting to README code blocks
Franklin Hu authored
36 ```scala
37 ServerBuilder()
38 .codec(ThriftServerFramedCodec())
39 .bindTo(serverAddr)
40 .name("servicename")
41 .tracerFactory(ZipkinTracer())
42 .build(new SomeService.FinagledService(queryService, new TBinaryProtocol.Factory()))
43 ```
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
44
45 The tracing setup for clients is similar. When you've specified the Zipkin tracer as above a small sample of your requests will be traced automatically. We'll record when the request started and ended, services and hosts involved.
46
47 In case you want to record additional information you can add a custom annotation in your code.
48
72fd02b Add syntax highlighting to README code blocks
Franklin Hu authored
49 ```scala
50 Trace.record("starting that extremely expensive computation")
51 ```
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
52
53 The line above will add an annotation with the string attached to the point in time when it happened. You can also add a key value annotation. It could look like this:
54
72fd02b Add syntax highlighting to README code blocks
Franklin Hu authored
55 ```scala
56 Trace.recordBinary("http.response.code", "500")
57 ```
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
58
59 ##### Ruby Thrift
60 There's a <a href="https://rubygems.org/gems/finagle-thrift">gem</a> we use to trace requests. In order to push the tracer and generate a trace id on a request you can use that gem in a RackHandler. See <a href="https://github.com/twitter/zipkin/blob/master/zipkin-web/config/application.rb">zipkin-web</a> for an example of where we trace the tracers.
61
62 For tracing client calls from Ruby we rely on the Twitter <a href="https://github.com/twitter/thrift_client">Ruby Thrift client</a>. See below for an example on how to wrap the client.
63
72fd02b Add syntax highlighting to README code blocks
Franklin Hu authored
64 ```ruby
65 client = ThriftClient.new(SomeService::Client, "127.0.0.1:1234")
66 client_id = FinagleThrift::ClientId.new(:name => "service_example.sample_environment")
67 FinagleThrift.enable_tracing!(client, client_id), "service_name")
68 ```
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
69
70 ##### Querulous
71 <a href="https://github.com/twitter/querulous">Querulous</a> is a Scala library for interfacing with SQL databases. The tracing includes the timings of the request and the SQL query performed.
72
73 ##### Cassie
74 <a href="https://github.com/twitter/cassie">Cassie</a> is a Finagle based Cassandra client library. You set the tracer in Cassie pretty much like you would in Finagle, but in Cassie you set it on the KeyspaceBuilder.
75
72fd02b Add syntax highlighting to README code blocks
Franklin Hu authored
76 ```scala
77 cluster.keyspace(keyspace).tracerFactory(ZipkinTracer())
78 ```
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
79
80 ### Transport
81 We use Scribe to transport all the traces from the different services to Zipkin and Hadoop.
82 Scribe was developed by Facebook and it's made up of a daemon that can run on each server in your system.
83 It listens for log messages and routes them to the correct receiver depending on the category.
84
85 ### Zipkin collector daemon
86 Once the trace data arrives at the Zipkin collector daemon we check that it's valid, store it and the index it for lookups.
87
88 ### Storage
89 We settled on Cassandra for storage. It's scalable, has a flexible schema and is heavily used within Twitter. We did try to make this component pluggable though, so should not be hard to put in something else here.
90
91 ### Zipkin query daemon
92 Once the data is stored and indexed we need a way to extract it. This is where the query daemon comes in, providing the users with a simple Thrift api for finding and retrieving traces. See <a href="https://github.com/twitter/zipkin/blob/master/zipkin-thrift/src/main/thrift/zipkin.thrift">the Thrift file</a>.
93
94 ### UI
95 Most of our users access the data via our UI. It's a Rails app that uses <a href="http://d3js.org/">D3</a> to visualize the trace data. Note that there is no built in authentication in the UI.
96
096d3c4 Modularize remaining collector Scribe dependencies
Franklin Hu authored
97 ## Modules
98 ![Modules (doc/modules.png)](https://github.com/twitter/zipkin/raw/master/doc/modules.png)
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
99
100 ## Installation
101
102 ### Cassandra
103 Zipkin relies on Cassandra for storage. So you will need to bring up a Cassandra cluster.
104
105 1. See Cassandra's <a href="http://cassandra.apache.org/">site</a> for instructions on how to start a cluster.
106 2. Use the Zipkin Cassandra schema attached to this project. You can create the schema with the following command.
107 `bin/cassandra-cli -host localhost -port 9160 -f zipkin-server/src/schema/cassandra-schema.txt`
108
109 ### ZooKeeper
110 Zipkin uses ZooKeeper for coordination. That's where we store the server side sample rate and register the servers.
111
112 1. See ZooKeeper's <a href="http://zookeeper.apache.org/">site</a> for instructions on how to install it.
113
114 ### Scribe
115 <a href="https://github.com/facebook/scribe">Scribe</a> is the logging framework we use to transport the trace data.
98abb0b Johan Oskarsson Add a note that it is possible to not use Scribe at all
johanoskarsson authored
116 You need to set up a network store that points to the Zipkin collector daemon. If you are just trying out Zipkin you can skip this step entirely and point the ZipkinTracer directly at the collector.
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
117
118 A Scribe store for Zipkin might look something like this.
119
120 <store>
121 category=zipkin
122 type=network
580c639 Johan Oskarsson Clarify the Scribe setup documentation.
johanoskarsson authored
123 remote_host=123.123.123.123
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
124 remote_port=9410
125 use_conn_pool=yes
126 default_max_msg_before_reconnect=50000
127 allowable_delta_before_reconnect=12500
128 must_succeed=no
129 </store>
130
580c639 Johan Oskarsson Clarify the Scribe setup documentation.
johanoskarsson authored
131 If you don't want to hardcode the IP address of your collector there are a few options.
132
133 You can use an internal DNS entry for the collectors, that way you only have one place to change the addresses when you add or remove collectors.
134
135 If you want to get all fancy you can use a modified version of <a href="Scribe">https://github.com/traviscrawford/scribe</a> that picks up the collectors via ZooKeeper. When each collector starts up it adds itself to ZooKeeper and when a collector shuts down it is automatically removed. The modified Scribe gets notified when the set of collectors change. To use this mode you change remote_host in the configuration to zk://zookeeper-hostname:2181/scribe/zipkin or something similar.
136
137 We're hoping that others might add non-Scribe transports for the tracing data; there is no reason why Scribe has to be the only one.
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
138
139 ### Zipkin servers
96809c1 Johan Oskarsson Make changes needed to build on travis-ci
johanoskarsson authored
140 We've developed Zipkin with <a href="http://www.scala-lang.org/downloads">Scala 2.9.1</a>, <a href="http://www.scala-sbt.org/download.html">SBT 0.11.2</a>, and JDK7.
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
141
142 1. `git clone https://github.com/twitter/zipkin.git`
096d3c4 Modularize remaining collector Scribe dependencies
Franklin Hu authored
143 1. `cd zipkin`
144 1. `cp zipkin-scribe/config/collector-dev.scala zipkin-scribe/config/collector-prod.scala`
145 1. `cp zipkin-server/config/query-dev.scala zipkin-server/config/query-prod.scala`
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
146 1. Modify the configs above as needed. Pay particular attention to ZooKeeper and Cassandra server entries.
147 1. `bin/sbt update package-dist` (This downloads SBT 0.11.2 if it doesn't already exist)
148 1. `scp dist/zipkin*.zip [server]`
149 1. `ssh [server]`
150 1. `unzip zipkin*.zip`
151 1. `mkdir -p /var/log/zipkin`
6f7142d Johan Oskarsson Add two simple startup scripts for the collector and query daemons
johanoskarsson authored
152 1. `zipkin-scribe/scripts/collector.sh -f zipkin-scribe/config/collector-prod.scala`
153 1. `zipkin-server/scripts/query.sh -f zipkin-server/config/query-prod.scala`
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
154
096d3c4 Modularize remaining collector Scribe dependencies
Franklin Hu authored
155 You can also run the collector and query services through SBT.
156
157 To run the Scribe collector service: `bin/sbt 'project zipkin-scribe' 'run -f zipkin-scribe/config/collector-dev.scala'`
158
159 To run the query service: `bin/sbt 'project zipkin-server' 'run -f zipkin-server/config/query-dev.scala'`
160
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
161 ### Zipkin UI
162 The UI is a standard Rails 3 app.
163
164 1. Update config with your ZooKeeper server. This is used to find the query daemons.
165 2. Deploy to a suitable Rails 3 app server. For testing you can simply do
166 ```
167 bundle install &&
168 bundle exec rails server.
169 ```
170
3f49bd4 zipkin-tracer gem
Franklin Hu authored
171 #### zipkin-tracer gem
172 The `zipkin-tracer` gem adds tracing to a Rails application through the use of a Rack Handler.
173 In `config.ru`:
174
72fd02b Add syntax highlighting to README code blocks
Franklin Hu authored
175 ```ruby
3f49bd4 zipkin-tracer gem
Franklin Hu authored
176 use ZipkinTracer::RackHandler
177 run <YOUR_APPLICATION>
178 ```
179
180 If the application's static assets are served through Rails, those requests will be traced.
181
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
182 ## Running a Hadoop job
183 It's possible to setup Scribe to log into Hadoop. If you do this you can generate various reports from the data
184 that is not easy to do on the fly in Zipkin itself.
185
186 We use a library called <a href="http://github.com/twitter/scalding">Scalding</a> to write Hadoop jobs in Scala.
187
188 1. To run a Hadoop job first make the fat jar.
189 `sbt 'project zipkin-hadoop' compile assembly`
190 2. Change scald.rb to point to the hostname you want to copy the jar to and run the job from.
9dd9bc5 Johan Oskarsson Fix the hadoop command, tools class no longer needed. Update readme
johanoskarsson authored
191 3. Update the version of the jarfile in scald.rb if needed.
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
192 3. You can then run the job using our scald.rb script.
9dd9bc5 Johan Oskarsson Fix the hadoop command, tools class no longer needed. Update readme
johanoskarsson authored
193 `./scald.rb --hdfs com.twitter.zipkin.hadoop.[classname] --date yyyy-mm-ddThh:mm yyyy-mm-ddThh:mm --output [dir]`
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
194
5c7fbb9 Johan Oskarsson Added a section to the readme describing how to instrument a library
johanoskarsson authored
195 ## How to instrument a library
196 We have instrumented a few libraries and protocols, but we hope to get some help instrumenting a few more.
197 Before we start we need to know a few things about how we structure the tracing data.
198
199 * Annotation - includes a value, timestamp, and host
200 * Span - a set of annotations that correspond to a particular RPC
201 * Trace - a set of spans that share a single root span
202
203 The above is used to send the tracing data to Zipkin. You can find these and more described <a href="https://github.com/twitter/zipkin/blob/master/zipkin-thrift/src/main/thrift/zipkinCore.thrift">here</a>
204
205 Another important part of the tracing is the light weight header we use to pass information between the traced services.
206 The tracing header consists of the following:
207
208 * Trace Id - identifies the whole trace
209 * Span Id - identifies an individual request
210 * Optional Parent Span Id - Added if this request was made as part of another request
211 * Sampled boolean - tells us if we should log the tracing data or not
212
213 Now that we know a bit about the data types, let's take a step by step look at how the instrumentation works.
214 The example below will describe how the Http tracing in Finagle works. Other libraries and protocols will of course be different, but the general principle should be the same.
215
216 ### Server side
217 1. Check if there are any tracing headers in the incoming request. If there is, we adopt ids associated with that for this request. If not, we generate a new trace id, span id and decide if we should sample or not. See <a href="https://github.com/twitter/finagle/blob/master/finagle-http/src/main/scala/com/twitter/finagle/http/Codec.scala">HttpServerTracingFilter</a> for an example of this.
218
219 1. If the current request is to be sampled we gather information such as service name, hostname, span name (http get/put for example) and the actual annotations. We create a "server received" annotation when we get the request and a "server send" one when we are done processing and are just about to send the result. Again, you can see this in <a href="https://github.com/twitter/finagle/blob/master/finagle-http/src/main/scala/com/twitter/finagle/http/Codec.scala">HttpServerTracingFilter</a>.
220
221 1. The tracing data created is passed to whatever tracer was set on the ServerBuilder. This could be ConsoleTracer for debugging for example, but in our case we'll assume it's <a href="https://github.com/twitter/finagle/tree/master/finagle-zipkin">ZipkinTracer</a>. When tracing data is received by the ZipkinTracer it aggregates them by span id.
222
223 1. Once the ZipkinTracer receives an "end of span" event, something like a "server received" annotation or a timeout it will send the aggregated data as a Thrift struct to Scribe. If no such event happens it will eventually send the data anyway. We're open to adding other ways of transporting the data, for us Thrift and Scribe made sense but perhaps JSON and Http will work better for some.
224
225 ### Client side
226 1. Before making the request, figure out if we are part of a trace already. It could be that this client is used within a server for example. That server could be processing a request and therefore already has a trace id assigned. We reuse that trace id, but we generate a new span id for this new request. We also set the parent span id to the previous span id, if available. You can see some of this <a href="https://github.com/twitter/finagle/blob/master/finagle-core/src/main/scala/com/twitter/finagle/tracing/TracingFilter.scala">here</a> and <a href="https://github.com/twitter/finagle/blob/master/finagle-core/src/main/scala/com/twitter/finagle/tracing/Trace.scala">here</a>.
227
228 1. Similar to on the server side we have a <a href="https://github.com/twitter/finagle/blob/master/finagle-http/src/main/scala/com/twitter/finagle/http/Codec.scala">HttpClientTracingFilter</a> that adds the tracing headers to the outgoing http request.
229
230 1. We also generate the appropriate annotations, such as "client send" before the request and "client receive" after we receive a reply from the server.
231
232 1. Similar to the server side the data reaches the ZipkinTracer that sends it off to Zipkin.
233
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
234
235 ## Mailing lists
236 There are two mailing lists you can use to get in touch with other users and developers.
237
238 Users: https://groups.google.com/group/zipkin-user
239
240 Developers: https://groups.google.com/group/zipkin-dev
241
242 ## Issues
243 Noticed a bug? Please add an issue here. https://github.com/twitter/zipkin/issues
244
245 ## Contributions
246 Contributions are very welcome! Please create a pull request on github and we'll look at it as soon as possible.
247
248 Try to make the code in the pull request as focused and clean as possible, stick as close to our code style as you can.
249
250 If the pull request is becoming too big we ask that you split it into smaller ones.
251
1f72e8d s/ui/UI/
Franklin Hu authored
252 Areas where we'd love to see contributions include: adding tracing to more libraries and protocols, interesting reports generated with Hadoop from the trace data, extending collector to support more transports and storage systems and other ways of visualizing the data in the web UI.
84ffd64 Johan Oskarsson Add logo and brush up contribute seconds
johanoskarsson authored
253
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
254 ## Versioning
255 We intend to use the <a href="http://semver.org/">semver</a> style versioning.
256
257 ## Authors
258 Thanks to everyone below for making Zipkin happen!
259
260 Zipkin server
d0fec99 Johan Oskarsson Linkify the Twitter accounts in the README.
johanoskarsson authored
261 * Johan Oskarsson: <a href="https://twitter.com/skr">@skr</a>
262 * Franklin Hu: <a href="https://twitter.com/thisisfranklin">@thisisfranklin</a>
263 * Ian Ownbey: <a href="https://twitter.com/iano">@iano</a>
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
264
265 Zipkin UI
d0fec99 Johan Oskarsson Linkify the Twitter accounts in the README.
johanoskarsson authored
266 * Franklin Hu: <a href="https://twitter.com/thisisfranklin">@thisisfranklin</a>
267 * Bill Couch: <a href="https://twitter.com/couch">@couch</a>
268 * David McLaughlin: <a href="https://twitter.com/dmcg">@dmcg</a>
269 * Chad Rosen: <a href="https://twitter.com/zed">@zed</a>
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
270
271 Instrumentation
d0fec99 Johan Oskarsson Linkify the Twitter accounts in the README.
johanoskarsson authored
272 * Marius Eriksen: <a href="https://twitter.com/marius">@marius</a>
273 * Arya Asemanfar: <a href="https://twitter.com/a_a">@a_a</a>
2b7acea Johan Oskarsson Initial commit
johanoskarsson authored
274
275 ## License
276 Copyright 2012 Twitter, Inc.
277
278 Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0
Something went wrong with that request. Please try again.