- What is Opentelemetry?
    - A way to, remotely, measure the performance of **everything** in our app

- Observability has 2 main questions:
    - How will we generate the data?
    - What will we do with the data once we have it?

- **Open telemetry deals with the first question of how to generate**
    - Standardisation of what distributed systems are doing
    - OpenTelemetry is a suite of tools used to 1. instrument, 2. generate, 3. collect, and 4. export data 

## Microservices

- Monolithic software used to be the norm
    - But monoliths are inherently not scalable, because codebases tend to become so big that it is impossible to develop without impacting other developers
    - This neccessitated a move to microservices, smaller services with dedicated teams that interact with fixed contracts

## Observability

- The move from monolithic software to microservices means that any software has many parts talking to each other

- Because of the sheer amount of traffic generated by such chatter, it is rarely possible to observe what exactly is happening when things go wrong

- This creates a need for OpenTelemetry

## M.E.L.T

- Metrics (M)
    - Measurements collected at regular intervals 
    - Has a fixed timestamp, name, numeric value, and count of events represented (e.g. error rate, response time)

- Events (E)
    - Some action that happens at any moment in time

- Logs (L)
    - Data/context around an event
    - `logger.info` / `console.log`

- Traces (T)
    - Follows a request from the initial request to the eventual output
    - Records causal chains of events, showing end to end latency 
    - But it can be extremely hard to do this; every service has to be instrumented one by one, layer by layer

- `OpenTelemetry` was formed by merger of `OpenTracing` and `OpenCensus`



## Jargon

- When a server receives a request and sends a response, the trace is usually represented by a bar. This is called a **span**
- Each span can start at different times, and takes different amounts of time. The amount of time taken is **latency**. The time between a data being sent and a component receiving it is **network latency**
- Traces let us correlate events across service boundaries. For us to do this, components in the distributed system must collect, store, and transfer metadata. This is known as a **context**
    - There are 2 types of **contexts**. 
        - **Span context**: data required for moving trace information across boundaries (trade ID, span ID, trace flags, trace state)
        - **Correlation context**: User defined properties (customer ID, host name, region...)
- A context has information for us to identify the current span and trace, and **propagation** is the mechanism we use to bundle up our context and transfer across sevices

## Issues that opentelemetry resolves

- Backend
    - Bad logic or bad user input
    - Poorly instrumented backend calls
    - Poor performant code on API

- Frontend
    - Back logic or user input
    - Poorly instrumented JS
    - Geo-specific slowness

- Instrastructure
    - Noisy neighbours
    - Config changes
    - Version audits
    - Misconfigured DNS