# Finding Attack Paths in Kubernetes with KubeHound

This notebook demonstrates how to use KubeHound to discover security vulnerabilities in a Kubernetes cluster by analyzing *attack paths*. Attack paths are the chains of misconfigurations that allow an attacker to move from an initial breach to full cluster compromise.

**What you'll learn:**
- Why graph-based analysis reveals risks that vulnerability lists miss
- How to trace attack paths from external entry points to cluster compromise
- How to prioritize findings by focusing on realistic attack scenarios
- How to use KubeHound's DSL to answer real security questions

**The approach:** We'll start by looking at EVERYTHING (overwhelming!), then progressively filter down to find the attack paths that a real attacker could exploit.

## Initial Setup

This cell configures the graph visualization settings to make attack paths easier to read:
- Smooth, curved edges (instead of straight lines)
- Arrows showing attack direction

You can customize these settings further - see the [graph-notebook visualization guide](https://github.com/aws/graph-notebook) for options.

In [1]:
%%capture "Remove this line to see debug information"
%%graph_notebook_vis_options
{
  "edges": {
    "smooth": {
      "enabled": true,
      "type": "dynamic"
    },
    "arrows": {
      "to": {
        "enabled": true,
        "type": "arrow"
      }
    }
  }
}

### What are we looking at?

Let's start by seeing what resources exist in this cluster.

**All resources:** The query below shows every Kubernetes resource in the cluster as individual dots. Each color represents a different resource type (pods, containers, identities, nodes, volumes, etc.). No connections are shown yet.

<details>
<summary><i>Query breakdown</i></summary>

`kh.V()` gets all [vertices](https://kubehound.io/terminology/) | `.path()` shows them as individual nodes | `.by(elementMap())` includes all [properties](https://kubehound.io/reference/entities/common/)
</details>

In [None]:
%%gremlin -d label -g class -le 50 -p inv,oute
kh.V().path().by(elementMap())

### Critical attack paths

Now let's see how these resources connect through attacks.

This query finds attack chains that lead to **cluster compromise** - meaning an attacker gains control of a Node. Once an attacker controls a Node, they can access all containers, secrets, and data on that node, essentially owning the cluster.

<details>
<summary><i>Query breakdown</i></summary>

`kh.V()` gets all [vertices](https://kubehound.io/terminology/) | `.limit(100)` takes first 100 | `.criticalPaths()` finds [attack chains](https://kubehound.io/queries/dsl/) to cluster compromise | `.by(elementMap())` includes all [properties](https://kubehound.io/reference/entities/common/) | `.limit(500)` caps total paths
</details>

In [None]:
%%gremlin -d label -g class -le 50 -p inv,oute
kh.V().limit(100).criticalPaths().by(elementMap()).limit(500)

### Too much information!

That was a lot! In a real assessment, you wouldn't start this broad. The point here was to demonstrate the scale of what KubeHound finds, and why filtering is important.

**Let's narrow down to containers.** Containers seem like a good place to start here because containers are where application code runs and can often have misconfigurations such as excessive permissions, container escape vulnerabilities, and access to sensitive volumes.

The query below finds all containers in the cluster that have attack paths leading to cluster compromise.

<details>
<summary><i>Query breakdown</i></summary>

`kh.containers()` gets all [container](https://kubehound.io/reference/entities/container/) [vertices](https://kubehound.io/terminology/) | `.criticalPaths()` finds [attack chains](https://kubehound.io/queries/dsl/) | `.by(elementMap())` includes all [properties](https://kubehound.io/reference/entities/common/) | `.limit(200)` caps results
</details>

In [None]:
%%gremlin -d label -g class -le 50 -p inv,oute
kh.containers().criticalPaths().by(elementMap()).limit(200)

### Still too many results

Nope, fitering by `containers` still produces too many results. It shows us possible attacks that originate from anywhere, internally or externally to our cluster. Let's narrow this down further.

**Let's focus on endpoints.** In KubeHound, an [endpoint](https://kubehound.io/reference/entities/endpoint/) is an exposed service that an attacker can reach from outside the cluster. Endpoints are the realistic entry points for attacks.

There are also attacks where containers are compromised internally, without using endpoints. These are called *supply chain attacks*, and they are sophisticated and therefore less common.

For this lab, let's focus on finding the misconfigurations that are most likely to be exploited: exposed services.

The query below finds attack paths that start from exposed endpoints - services that attackers can reach from outside the cluster.

<details>
<summary><i>Query breakdown</i></summary>

`kh.endpoints()` gets all [endpoint](https://kubehound.io/reference/entities/endpoint/) [vertices](https://kubehound.io/terminology/) | `.criticalPaths()` finds [attack chains](https://kubehound.io/queries/dsl/) | `.by(elementMap())` includes all [properties](https://kubehound.io/reference/entities/common/) | `.limit(100)` caps results
</details>

In [None]:
%%gremlin -d name -g class -le 50 -p inv,oute
kh.endpoints().criticalPaths().by(elementMap()).limit(100)

We are making progress but we can do more to figure out which possible attacks are most likely to be exploited. Let's step back from the complex graphs for a moment and get a simple list of which services are actually vulnerable.

### Identify the vulnerable services

The previous queries showed us visual attack paths, but now we want something simpler: just the names and ports of exposed services that have critical attack paths.

The query below extracts which specific services have critical attack paths. It identifies the services by name and by port. Then the query groups the results so we can see which exposed services are vulnerable.

<details>
<summary><i>Query breakdown</i></summary>

`kh.endpoints()` gets all [endpoints](https://kubehound.io/reference/entities/endpoint/) | `.criticalPaths()` finds [attack chains](https://kubehound.io/queries/dsl/) | `.limit(local,1)` takes 1 path per endpoint | `.dedup()` removes duplicates | `.valueMap("serviceEndpoint","port")` extracts service name and port | `.group().by("serviceEndpoint").by("port")` groups by service
</details>

In [None]:
%%gremlin -d name -g class -le 50 -p inv,oute
kh.endpoints().criticalPaths().limit(local,1)
.dedup().valueMap("serviceEndpoint","port")
.group().by("serviceEndpoint").by("port")

Looking at the results from the previous query, not all services are equally interesting from a security standpoint. Some, like `kube-dns`, are internal infrastructure services.

### Filter out internal infrastructure

**What is kube-dns?** It's the internal DNS service that Kubernetes uses for service discovery within the cluster. It's not externally accessible and isn't a realistic entry point for external attackers.

The query below shows attack paths from endpoints, excluding the `kube-dns` internal service. We're filtering out internal system services to focus on externally-accessible services that attackers are more likely to target.

<details>
<summary><i>Query breakdown</i></summary>

`kh.endpoints()` gets all [endpoints](https://kubehound.io/reference/entities/endpoint/) | `.not(has("serviceEndpoint","kube-dns"))` excludes kube-dns | `.criticalPaths()` finds [attack chains](https://kubehound.io/queries/dsl/) | `.by(elementMap())` includes all [properties](https://kubehound.io/reference/entities/common/)
</details>

In [None]:
%%gremlin -d label -g class -le 50 -p inv,oute
kh.endpoints().not(has("serviceEndpoint","kube-dns")).criticalPaths().by(elementMap())

The previous query used `.criticalPaths()` - a KubeHound shortcut that finds paths to ANY critical resource (Nodes, cluster-admin roles, privileged secrets, etc.). But we want to focus specifically on Node compromise.

### Trace the complete attack path to Node takeover

As mentioned earlier, once an attacker reaches a Node, they have full control of that host. They can access all containers, secrets, and data on that node. Game over. 

The query below shows the complete path: which endpoint an attacker starts from, what they compromise along the way (containers, identities, permissions), and how they finally reach Node access.

<details>
<summary><i>Query breakdown</i></summary>

`kh.endpoints()` gets all [endpoints](https://kubehound.io/reference/entities/endpoint/) | `.not(has("serviceEndpoint","kube-dns"))` excludes kube-dns | `.repeat(outE().inV().simplePath())` follows [edges](https://kubehound.io/terminology/) to next [vertices](https://kubehound.io/terminology/) | `.until(hasLabel("Node").or().loops().is(5))` stops at [Node](https://kubehound.io/reference/entities/node/) or after 5 hops | `.hasLabel("Node")` filters for paths reaching Nodes | `.path().by(elementMap())` shows complete attack chain | `.limit(100)` caps results
</details>

In [None]:
%%gremlin -d label -g class -le 50 -p inv,oute
kh.endpoints().not(has("serviceEndpoint","kube-dns"))
	.repeat(
		outE().inV().simplePath()
	)
	.until(
		hasLabel("Node")
		.or()
		.loops().is(5)
	)
	.hasLabel("Node")
	.path()
	.by(elementMap())
	.limit(100)	// Limit the number of results for large clusters

Congratulations! You've successfully filtered down from hundreds of attack paths to the most critical, actionable findings.