diff --git a/README.md b/README.md index 834724dc..9cbeb57b 100644 --- a/README.md +++ b/README.md @@ -1,117 +1,235 @@ ![](https://github.com/vmware-tanzu/crash-diagnostics/workflows/Crash%20Diagnostics%20Build/badge.svg) -# Crash Recovery and Diagnostics for Kubernetes +# Crashd - Crash Diagnostics -Crash Recovery and Diagnostics for Kubernetes (*Crash Diagnostics* for short) is designed to help human operators who are investigating and troubleshooting unhealthy or unresponsive Kubernetes clusters. It is a project designed to automate the diagnosis of problem clusters that may be in an unstable state including completely inoperable. In its introductory release, Crash Diagnostics provides cluster operators the ability to automatically collect machine states and other information from each node in a cluster. The collected information is then bundled in a tar file for further analysis. +Crash Diagnostics (Crashd) is a tool that helps human operators to easily interact and collect information from infrastructures running on Kubernetes for tasks such as automated diagnosis and troubleshooting. -## Crash Diagnostics Design -Starting with the version 0.3.x of Crash Diagnostics, the project will undergo a major redesign: -* Refactor the programmatic API surface into distinct infrastructural components -* A programmatic extension/plugin for distinct backend implementations to support different compute infrastructures -* Tigher Kubernetes integration including the ability to extract troubleshooting data Cluster-API managed clusters +## Crashd Features +* Crashd uses the [Starlark language](https://github.com/google/starlark-go/blob/master/doc/spec.md), a Python dialect, to express and invoke automation functions +* Easily automate interaction with infrastructures running Kubernetes +* Interact and capture information from compute resources such as machines (via SSH) +* Automatically execute commands on compute nodes to capture results +* Capture object and cluster log from the Kubernetes API server +* Easily extract data from Cluster-API managed clusters -See the detail Google Doc design document [here](https://docs.google.com/document/d/1pqYOdTf6ZIT_GSis-AVzlOTm3kyyg-32-seIfULaYEs/edit?usp=sharing). +## How Does it Work? +Crashd executes script files, written in Starlark, that interacts a specified infrastructure along with its cluster resources. Starlark script files contain predefined Starlark functions that are capable of interacting and collect diagnostics and other information from the servers in the cluster. -## Collecting information for troubleshooting -To specify the resources to collect from cluster machines, a series of commands are declared in a file called a diagnostics file. Like a Dockerfile, the diagnostics file is a collection of line-by-line directives with commands that are executed on each specified cluster machine. The output of the commands is then added to a tar file and saved for further analysis. +For detail on the design of Crashd, see this Google Doc design document [here](https://docs.google.com/document/d/1pqYOdTf6ZIT_GSis-AVzlOTm3kyyg-32-seIfULaYEs/edit?usp=sharing). -For instance, when the following diagnostics file (saved as Diagnostics.file) is executed, it will collect information from the two cluster machines (specified with the `FROM` directive): +## Installation +There are two ways to get started with Crashd. Either download a pre-built binary or pull down the code and build it locally. -``` -FROM 192.168.176.100:22 192.168.176.102:22 -AUTHCONFIG username:${remoteuser} private-key:${HOME}/.ssh/id_rsa -WORKDIR /tmp/crashout +### Download binary +1. Dowload the latest [binary relase](https://github.com/vmware-tanzu/crash-diagnostics/releases/) for your platform +2. Extract `tarball` from release + ``` + tar -xvf .tar.gz + ``` +3. Move the binary to your operating system's `PATH` -# copy log files -COPY /var/log/kube-apiserver.log -COPY /var/log/kube-scheduler.log -COPY /var/log/kube-controller-manager.log -COPY /var/log/kubelet.log -COPY /var/log/kube-proxy.log -# Capture service status output -CAPTURE journalctl -l -u kubelet -CAPTURE journalctl -l -u kube-apiserver +### Compiling from source +Crashd is written in Go and requires version 1.11 or later. Clone the source from its repo or download it to your local directory. From the project's root directory, compile the code with the +following: -# Collect docker-related logs -CAPTURE journalctl -l -u docker -CAPTURE /bin/sh -c "docker ps | grep apiserver" +``` +GO111MODULE=on go build -o crashd . +``` -# Collect objects and logs from API server if available -KUBECONFIG $HOME/.kube/kind-config-kind -KUGEGET objects namespaces:"kube-system default" kind:"deployments" -KUBEGET logs namespaces:"default" containers:"hello-app" +Or, yo can run a versioned build using the `build.go` source code: -OUTPUT ./crash-out.tar.gz ``` -Note that the tool can also collect resource data from the API server, if available, using `KUBECONFIG` and the `KUBEGET` command. +go run .ci/build/build.go -## Features -* Simple declarative script with flexible format -* Support for multiple directives to execute user-provided commands -* Ability to declare or use existing environment variables in commands -* Easily transfer files from cluster machines -* Execute commands on remote machines and captures the results -* Automatically collect information from multiple machines -* Collect resource data and pod logs from an available API server +Build amd64/darwin OK: .build/amd64/darwin/crashd +Build amd64/linux OK: .build/amd64/linux/crashd +``` + +## Getting Started +A Crashd script consists of a collection of Starlark functions stored in a file. For instance, the following script (saved as diagnostics.crsh) collects system information from a list of provided hosts using SSH. The collected data is then bundled as tar.gz file at the end: + +```python +# Crashd global config +crshd = crashd_config(workdir="{0}/crashd".format(os.home)) + +# Enumerate compute resources +# Define a host list provider with configured SSH +hosts=resources( + provider=host_list_provider( + hosts=["170.10.20.30", "170.40.50.60"], + ssh_config=ssh_config( + username=os.username, + private_key_path="{0}/.ssh/id_rsa".format(os.home), + ), + ), +) + +# collect data from hosts +capture(cmd="sudo df -i", resources=hosts) +capture(cmd="sudo crictl info", resources=hosts) +capture(cmd="df -h /var/lib/containerd", resources=hosts) +capture(cmd="sudo systemctl status kubelet", resources=hosts) +capture(cmd="sudo systemctl status containerd", resources=hosts) +capture(cmd="sudo journalctl -xeu kubelet", resources=hosts) + +# archive collected data +archive(output_file="diagnostics.tar.gz", source_paths=[crshd.workdir]) +``` -See the complete list of supported [directives here](./docs/README.md). +The previous code snippet connects to two hosts (specified in the `host_list_provider`) and execute commands remotely, over SSH, and `capture` and stores the result. +> See the complete list of supported [functions here](./docs/README.md). -## Running Diagnostics -The tool is compiled into a single binary named `crash-diagnostics`. For instance, when the following command runs, by default it will search for and execute diagnostics script file named `./Diagnostics.file`: +### Running the script +To run the script, do the following: ``` -crash-diagnostics run +$> crashd run diagnostics.crsh ``` -Flag `--file` can be used to specify a different diagnostics file: +If you want to output debug information, use the `--debug` flag as shown: ``` -crash-diagnostics --file test-diagnostics.file +$> crashd run --debug diagnostics.crsh + +DEBU[0000] creating working directory /home/user/crashd +DEBU[0000] run: executing command on 2 resources +DEBU[0000] run: executing command on localhost using ssh: [sudo df -i] +DEBU[0000] ssh.run: /usr/bin/ssh -q -o StrictHostKeyChecking=no -i /home/user/.ssh/id_rsa -p 22 user@localhost "sudo df -i" +DEBU[0001] run: executing command on 170.10.20.30 using ssh: [sudo df -i] +... +``` + +## Compute Resource Providers +Crashd utilizes the concept of a provider to enumerate compute resources. Each implementation of a provider is responsible for enumerating compute resources on which Crashd can execute commands using a transport (i.e. SSH). Crashd comes with several providers including + +* *Host List Provider* - uses an explicit list of host addresses (see previous example) +* *Kubernetes Nodes Provider* - extracts host information from a Kubernetes API node objects +* *CAPV Provider* - uses Cluster-API to discover machines in vSphere cluster +* *CAPA Provider* - uses Cluster-API to discover machines running on AWS +* More providers coming! + + +## Accessing script parameters +Crashd scripts can access external values that can be used as script parameters. +### Environment variables + Crashd scripts can access environment variables at runtime using the `os.getenv` method: +```python +kube_capture(what="logs", namespaces=[os.getenv("KUBE_DEFAULT_NS")]) ``` -The output file generated by the tool can be specified using flag `--output` (which overrides value in script): +### Command-line arguments +Scripts can also access command-line arguments passed as key/value pairs using the `--args` flag. For instance, when the following command is used to start a script: ``` -crash-diagnostics --file test-diagnostics.file --output test-cluster.tar.gz + crashd run --args="kube_ns=kube-system username=$(whoami)" diagnostics.crsh ``` +Values from `--args` can be accessed as shown below: -When you use the `--debug` flag, you should see log messages on the screen similar to the following: +```python +kube_capture(what="logs", namespaces=["default", args.kube_ns]) ``` -$> crash-diagnostics run --debug -DEBU[0000] Parsing script file -DEBU[0000] Parsing [1: FROM local] -DEBU[0000] FROM parsed OK -DEBU[0000] Parsing [2: WORKDIR /tmp/crasdir] +## More Examples +### SSH Connection via a jump host +The SSH configuration function can be configured with a jump user and jump host. This is useful for providers that requires a host proxy for SSH connection as shown in the following example: +```python +ssh=ssh_config(username=os.username, jump_user=args.jump_user, jump_host=args.jump_host) +hosts=host_list_provider(hosts=["some.host", "172.100.100.20"], ssh_config=ssh) ... -DEBU[0000] Archiving [/tmp/crashdir] in out.tar.gz -DEBU[0000] Archived /tmp/crashdir/local/df_-i.txt -DEBU[0000] Archived /tmp/crashdir/local/lsof_-i.txt -DEBU[0000] Archived /tmp/crashdir/local/netstat_-an.txt -DEBU[0000] Archived /tmp/crashdir/local/ps_-ef.txt -DEBU[0000] Archived /tmp/crashdir/local/var/log/syslog -INFO[0000] Created archive out.tar.gz -INFO[0002] Created archive out.tar.gz -INFO[0002] Output done -``` - -## Compile and Run -`crash-diagnostics` is written in Go and requires version 1.11 or later. Clone the source from its repo or download it to your local directory. From the project's root directory, compile the code with the -following: +``` +### Connecting to Kubernetes nodes with SSH +The following uses the `kube_nodes_provider` to connect to Kubernetes nodes and execute remote commands against those nodes using SSH: + +```python +# SSH configuration +ssh=ssh_config( + username=os.username, + private_key_path="{0}/.ssh/id_rsa".format(os.home), + port=args.ssh_port, + max_retries=5, +) + +# enumerate nodes as compute resources +nodes=resources( + provider=kube_nodes_provider( + kube_config=kube_config(path=args.kubecfg), + ssh_config=ssh, + ), +) + +# exec `uptime` command on each node +uptimes = run(cmd="uptime", resources=nodes) + +# print `run` result from first node +print(uptimes[0].result) ``` -GO111MODULE=on go install . + + +### Retreiving Kubernetes API objects and logs +The`kube_capture` is used, in the folliwng example, to connect to a Kubernetes API server to retrieve Kubernetes API objects and logs. The retrieved data is then saved to the filesystem as shown below: + +```python +nspaces=[ + "capi-kubeadm-bootstrap-system", + "capi-kubeadm-control-plane-system", + "capi-system capi-webhook-system", + "cert-manager tkg-system", +] + +conf=kube_config(path=args.kubecfg) + +# capture Kubernetes API object and store in files +kube_capture(what="logs", namespaces=nspaces, kube_config=conf) +kube_capture(what="objects", kinds=["services", "pods"], namespaces=nspaces, kube_config=conf) +kube_capture(what="objects", kinds=["deployments", "replicasets"], namespaces=nspaces, kube_config=conf) ``` -This should place the compiled `crash-diagnostics` binary in `$(go env GOPATH)/bin`. You can test this with: +### Interacting with Cluster-API manged machines running on vSphere (CAPV) +As mentioned, Crashd provides the `capv_provider` which allows scripts to interact with Cluster-API managed clusters running on a vSphere infrastructure (CAPV). The following shows an abbreviated snippet of a Crashd script that retrieves diagnostics information from the mangement cluster machines managed by a CAPV-initiated cluster: + +```python +# enumerates management cluster nodes +nodes = resources( + provider=capv_provider( + ssh_config=ssh_config(username="capv", private_key_path=args.private_key), + kube_config=kube_config(path=args.mc_config) + ) +) + +# execute and capture commands output from management nodes +capture(cmd="sudo df -i", resources=nodes) +capture(cmd="sudo crictl info", resources=nodes) +capture(cmd="sudo cat /var/log/cloud-init-output.log", resources=nodes) +capture(cmd="sudo cat /var/log/cloud-init.log", resources=nodes) +... + ``` -crash-diagnostics --help + +The previous snippet interact with management cluster machines. The provider can be configured to enumerate workload machines (by specifying the name of a workload cluster) as shown in the following example: + +```python +# enumerates workload cluster nodes +nodes = resources( + provider=capv_provider( + workload_cluster=args.cluster_name + ssh_config=ssh_config(username="capv", private_key_path=args.private_key), + kube_config=kube_config(path=args.mc_config) + ) +) + +# execute and capture commands output from workload nodes +capture(cmd="sudo df -i", resources=nodes) +capture(cmd="sudo crictl info", resources=nodes) +... ``` -If this does not work properly, ensure that your Go environment is setup properly. + +### All Examples +See all script examples in the [./examples](./examples) directory. ## Roadmap This project has numerous possibilities ahead of it. Read about our evolving [roadmap here](ROADMAP.md). diff --git a/ROADMAP.md b/ROADMAP.md index 2ee6f84c..8d2a5860 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -1,5 +1,5 @@ -# Roadmap -This project has just started and is going through a steady set of iterative changes to create a tool that will be useful for Kubernetes human operators. The release cadance is designed to allow the implemented features to mature overtime and lessen technical debts. Each release series will consist of alpha and beta releases before each major release to allow time for the code to be properly exercized by the community. +# Crash Diagnostics Roadmap +This project has been in development through several releases. The release cadance is designed to allow the implemented features to mature overtime and lessen technical debts. Each release series will consist of alpha and beta releases (when necessary) before each major release to allow time for the code to be properly exercized by the community. This roadmap has a short and medium term views of the type of design and functionalities that the tool should support prior to a `1.0` release. @@ -25,22 +25,20 @@ The following additional features are also planned for this series. ## v0.3.x-Releases -This series of release will see the redsign of the internals of Crash Diagnostics: -* Refactor the programmatic API surface into distinct infrastructural components -* A programmatic extension/plugin to create backend implementations for different infrastructures -* Tigher Kubernetes integration including the ability to extract troubleshooting data Cluster-API managed clusters +This series of release will see the redsign of the internals of Crash Diagnostics to move away from a custom configuration and adopt the [Starlark](https://github.com/bazelbuild/starlark) language (a dialect of Python): +* Refactor the internal implementation to use Starlark +* Introduce/implement several Starlark functions to replace the directives from previous file format. +* Develop ability to extract data/logs from Cluster-API managed clusters See the Google Doc design document [here](https://docs.google.com/document/d/1pqYOdTf6ZIT_GSis-AVzlOTm3kyyg-32-seIfULaYEs/edit?usp=sharing). -## v0.5.x-Releases +## v0.4.x-Releases This series of releases will explore optimization features: * Parsing and execution optimization (i.e. parallel execution) * A Uniform retry strategies (smart enough to requeue actions when failed) -## v0.4.x-Releases +## v0.5.x-Releases Exploring other interesting ideas: * Automated diagnostics (would be nice) -* And more... - -TBD \ No newline at end of file +* And more... \ No newline at end of file diff --git a/TODO.md b/TODO.md index a5aa7d89..bce1d2bd 100644 --- a/TODO.md +++ b/TODO.md @@ -75,8 +75,5 @@ This tag/version reflects migration to github * [ ] Cloud API recipes (i.e. recipes to debug CAPV) # v0.3.0 -* Refactor internal executor into a pluggable interface-driven model - - i.e. possible suppor for different runtime () - - default runtime may use ssh and scp while other runtime may choose to use something else - - default runtime may use ssh/scp all the time regardless of local or remote - \ No newline at end of file +* Redesign the script/configuration language for Crash Diagnostics +* Refactor internal and implement support for [Starlark](https://github.com/bazelbuild/starlark) language \ No newline at end of file diff --git a/docs/README.md b/docs/README.md index 55d7921a..8580be2a 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,483 +1,697 @@ -# `Crash-Diagnostics` -The tool is compiled into a single binary named called `crash-diagnostics`. Currently, the binary supports two commands: +# Crash Diagnostics Reference + +## Running `crashd` +Crash Diagnostics is compiled into a single binary called `crashd`. The command can be invoked as follows: ``` Usage: - crash-diagnostics [command] + crashd [command] Available Commands: help Help about any command - run Executes a diagnostics script file + run Executes a script file ``` -Command `run` uses a diagnostics file to script how and what resources are collected from cluster machines. By default, `crash-diagnostics run` searches for script for `Diagnostics.file` which specifies line-by-line directives and commands that are interpreted into actions to be executed against the nodes in the cluster. +Command `run` executes the specified sript file. Use flag `--help` to get additional help for a given command: ``` -> crash-diagnostics run --help +> crashd run --help Usage: - crash-diagnostics run [flags] - -Flags: - --file string the path to the diagnostics script file to run (default "Diagnostics.file") - --output string the path of the generated archive file (default "out.tar.gz") -``` - - -For instance, the following command will execute file `./Diagnostics.file` and store any collected data in file `out.tar.gz`: - -``` -crash-diagnostics run -``` - -To run a different script file or specify a different output archive, use the flags shown below: - -``` -crash-diagnostics --file test-cluster.file --output test-cluster.tar.gz -``` - -## Diagnostics.file Format -`Diagnostics.file` uses a simple line-by-line format (à la Dockerfile) to specify directives on how to collect data from cluster servers: - -``` -DIRECTIVE [arguments] -``` - -A directive can either be a `preamble` for runtime configuration or an `action` which can execute a command that runs on each specified host. - -### Example Diagnostics.file -The following is a sample Diagnostics.file that captures command output and copy files from two hosts: -``` -FROM 127.0.0.1:22 192.168.99.7:22 -WORKDIR /tmp/crashdir - -COPY /var/log/kube-apiserver.log -CAPTURE df -h -CAPTURE df -i -CAPTURE netstat -an -CAPTURE ps -ef -CAPTURE lsof -i -CAPTURE journalctl -l -u kube-apiserver -COPY /var/log/kubelet.log -COPY /var/log/kube-proxy.log - -OUTPUT path:/tmp/crashout/out.tar.gzip - -``` -In the previous example, the tool will collect information from servers `127.0.0.1:22` and `192.168.99.7:22` by executing the COPY and the CAPTURE -commands specified in the file. The collected information is bundled into archive file `/tmp/crashout/out.tar.gzip` specified by `OUTPUT` (note that -the output file can also be specified by flag `--output`). - -## Diagnostics.file Directives -Currently, `crash-diagnostics` supports the following directives: -``` -AS -AUTHCONFIG -CAPTURE -COPY -ENV -FROM -KUBECONFIG -KUBEGET -OUTPUT -RUN -WORKDIR -``` -Each directive can receive named parameters to pass values to the command it represents. Each named parameter uses an identifier followed by a colon `:` as shown below: -``` -DIRECTIVE name0: name1: ... nameN: -``` -Optionally, most directives can be declared with a single default unnamed parameter value as shown below: -``` -DIRECTIVE -``` -As an example, directive `WORKDIR` can be declared with its `path` named parameter: -``` -WORKDIR path:/some/path -``` -Or it can be declared with an unnamed parameter, which internally is assumed to be the `path:` parameter: -``` -WORKDIR /some/path -``` - -### AS -This directive specifies the `userid` and optional `groupid` to use when `crash-diagnostics` execute its commands against the current machine. -``` -AS userid: [groupid:] -``` -Example: -``` -AS userid:100 -``` -Or -``` -AS userid:vladimir groupid:200 -``` - -### AUTHCONFIG -Configures an authentication for connections to remote node servers. A `username` must be along with an optional `private-key` which can be used by command backends that support private key/public key certificate such as SSH. - -``` -AUTHCONFIG username:vladimir private-key:/Users/vladimir/.ssh/ssh_rsa -``` - -### CAPTURE -This directive captures the output of a command when executed executed on a specified machine (see `FROM` directive). The output of the executed command is captured and saved in a file that is added to the archive file bundle. - -The following shows an example of directive `CAPTURE`: - -``` -CAPTURE /bin/journalctl -l -u kube-apiserver -``` - -Or, with its named parameter `cmd:`: -``` -CAPTURE cmd:"/bin/journalctl -l -u kube-apiserver" -``` - -#### CAPTURE file names -The captured output will be written to a file whose name is derived from the command string as follows: - -``` -_bin_journalctl__l__u_kube-api-server.txt -``` - -#### CAPTURE Echo output -The CAPTURE command can also copy its result to standard output using the `echo` parameter: - -``` -CAPTURE cmd:"/bin/journalctl -l -u kube-apiserver" echo:"true" -``` - -Note that you have to use the named parameter format. - -### COPY -This directive specifies one or more files (and/or directories) as data sources that are copied -into the arachive bundle as shown in the following example - -``` -COPY /var/log/kube-proxy.log /var/log/containers - -# Or with using its named parameter format with parameter `paths`: - -COPY paths:"/var/log/kube-proxy.log /var/log/containers" -``` -The previous command will copy file `/var/log/kube-proxy.log` and each file in directory `/var/log/containers` as part of the generated archive bundle. - -#### File name expansion -The `COPY` command also supports file name expansion using patterns (or globbing). For instance, the following will copy only log files whose names start with `kube` from the nodes: - -``` -COPY /var/log/kube*.log -``` - -### ENV -This directive is used to inject environment variables that are made available to other commands in the script file at runtime: -``` -ENV key0=val0 key1=val1 key2=val2 -ENV key3=val3 -... -ENV keyN=valN -``` -Multiple variables can be declared for each `ENV` and a Diagnostics file can have one or more `ENV` declarations. The `ENV` command can optionally use the named parameter format with parameter `vars:` as shown below: -``` -ENV vars:"Foo=bar Blat=bat" -``` - -#### ENV Variable Expansion -`Crash-Diagnostics` supports a simple version of Unix-style variable expansion using `$VarName` and `${varName}` formats. The following example shows how this works: - + crashd run [flags] script-file + ... ``` -# environment vars -ENV logroot=/var/log kubefile=kube-proxy.log -ENV containerlogs=/var/log/containers -# references vars above -COPY $logroot/${kubefile} -COPY ${containerlogs} -``` - -#### Escaping Variable Expansion -Because Crash Diagnostics files use the same variable expansion format as a shell script, this may create situations where the Diagnostics file expand variables that are intended to be interpreted by the shell script on the remote server. For instance, the following command will not work properly: - -``` -RUN /bin/bash -c 'for f in $(find /var/logs/containers -type f); do cat $f; done' -``` -The previous will fail because Crash Diagnostics will expand the named variables (to empty) before the command is sent to the server as follows: -``` -/bin/bash -c 'for f in find /var/logs/containers -type f); do cat ; done' -``` - -To fix this, Crash Diagnostics supports the ability to escape a variable expansion using `\$`. Using the previous example, this would look like the following: - -``` -RUN /bin/bash -c 'for f in \$(find /var/logs/containers -type f); do cat \$f; done' -``` -With the escape slashes in place, the correct shell command will be sent to the remote server as intended: - -``` -/bin/bash -c 'for f in $(find /var/logs/containers -type f); do cat $f; done' -``` - -### FROM -`FROM` specifies the source machines from which data is collected. Machines (virtual or otherwise) are specified by directly by providing a space-separated list of address endpoints consisting of `:` as shown in the following example: - -``` -FROM 10.10.100.2:22 10.10.100.3:22 10.10.100.4:22 -``` -Or using its named parameter `hosts:` -``` -FROM hosts:"10.10.100.2:22 10.10.100.3:22 10.10.100.4:22" -``` -#### FROM Default Port Setting -By default the `crash-diagnostics` internal executor uses `SSH/SCP` protocols to connect to remote machines. If a specified machine address does not include a port, port 22 will be used as shown below: +### Passing script arguments +`crashd` script files can receive parameters from the command-line using the `--args` flag which takes a key/value pair seprated by spaces as shown below: ``` -FROM 10.10.100.2 10.10.100.4:2244 +crashd run --args "arg0='value 0' args1='value 1'" ``` -In the previous example, machine `10.10.100.2` will be connected using port 22. The default port can be specified using the `port:` named parameter as shown below: -``` -FROM hosts:"10.10.100.2 10.10.100.3 10.10.100.4:2244" port:"2211" -``` -In the previous example, `crash-diagnostics` will connect to machines `10.10.100.2` and `10.10.100.3` on port `2211` +These values can be accessed inside a running script using the `args` struct as follows: -#### FROM Maximum Connection Retries -Because `crash-diagnostics` uses network protocols (i.e. SSH/SCP) to connect to remote machines, it will automatically retry a remote command upon failure. The number of retries can be configured using the `retries:` named parameter. The following will retry each remote command attempt up to 10 times before giving up: -``` -FROM hosts:"10.10.100.2 10.10.100.3 10.10.100.4:2244" port:"2211" retries:"10" +```python +ssh_config(username=args.args0, private_key_path=args.args1) ``` -#### Sourcing from Kubernetes Node Objects -Instead of directly specifying machine addresses as shown above, source machines information can be extracted from Kubernetes Node objects (if a cluster is available). The following example will get machine information stored in cluster Node objects and use default port 2222 to remotely connect to each machine: +### Accessing environment variables +At runtime, `crashd` scripts can also access values stored in environment variables as shown in the following snippet: ``` -KUBECONFIG $HOME/.kube/kind-config-kind -FROM nodes:"all" port:"2222" +KUBE_NS=capi-system crashd run file.crsh ``` -In the previous example, `crash-diagnostics` uses the specified KUBECONFIG to connect to the API server to retrieve available Node objects. These objects are used to determine the IP of the cluster machines to which `crash-diagnositcs` will connect using the specified port. -The `nodes:` parameter can also be used to specified a list of node names to match when retrieving Node objects as shown: -``` -KUBECONFIG $HOME/.kube/kind-config-kind -FROM nodes:"worker-node-1 worker-node-2" port:"2222" -``` -In the previous example, `crash-diagnostics` will extract IP address information from Node objects with names matching `workder-node-1` and `worker-node-2`. +The running script can access `KUBE_NS` using the `os` struct as shown below: -The Node objects can be further filtered using labels. For instance, the following will only select nodes where label `kubernetes.io/hostname` has a value of `control-plane`: -``` -KUBECONFIG $HOME/.kube/kind-config-kind -FROM nodes:"all" labels="kubernetes.io/hostname=control-plane" port:"2222" +```python +kube_capture(what="logs", namespaces=[os.getenv("KUBE_NS")]) ``` -### KUBECONFIG -This directive specifies the fully qualified path of the Kubernetes client configuration file or KUBECONFIG. If the specified path does not exist, all subsquent command that uses this configuration will quietly fail (logged). +## Starlark: the Crashd Language +Crashd scripts are written in Starlark, a python dialect. This means that Crashd scripts can have normal programming constructs: +- Variable declarations +- Function definitions +- Simple data types (string, numeric, bool) +- Composite types (dictionary, list, tuple, set, and functions) +- Statements and expressions +- Etc -``` -KUBECONFIG $HOME/.kube/kind-config-kind -``` -The previous configures KUBECONFIG to use `$HOME/.kube/kind-config-kind`. +> For more on Starlark, see the [language reference](https://github.com/bazelbuild/starlark/blob/master/spec.md). -### KUBEGET -The `KUBEGET` directive allows a running diagnostic script to connect to an available API server and retrieve API resources such as objects and logs. `KUBEGET` takes several parameters that can be combined to filter and select specific objects. The command can get API server `objects`, `logs`, or `all` specified using optionally-named `what` parameter as shown below: -``` -# specifies to get objects -KUBEGET objects -``` +## The Crashd Script File +A script file is composed Starlark language elements and built-in functions provided by Crashd at runtime. In addition to built-in functions, script authors have the ability to define their own custom functions that can be reused in the script. The following is an example of a valid script that `crashd` can execute: -Or, the long format of the same command: +```python +def from_hosts(): + hosts = run_local("cat /etc/hosts | grep -E '([0-9]){3}\.' | awk '{print $1}'") + return hosts.splitlines() -``` -KUBEGET what:"objects" -``` +ssh_config(username="username", port=2222, max_retries=10) +resources(hosts=from_hosts()) -#### `KUBEGET` parameters: -* `what` - an optionally-named parameter that specifies what to get inclusing `objects`, `logs`, or `all`. - * When `objects` - any API objects are retrieved (without logs) - * When `logs` - Pods are retrieved including associated logs - * When `all` - everything is retrieved including objects and logs. - * Example: `KUBEGET objects` -* `groups` - a list specifying from which group to retrieve API objects. For legacy core group, use `core`. - * Example: `KUBEGET objects groups:"core apps"` - * Selects all objects from both `/api/v1` (core) and `/apis/apps`. - * When `what=logs`, groups is automatically set to `core`. -* `kinds` - a list of object kinds to select. - * Example: `KUBEGET objects kinds:"pods deployments"` - * Retrieves objects of kind (or resource.Name) `pods` and `deployments` - * While the parameter is called `kinds`, the match is done on the resource's plural name (i.e. `pods`, `services`, `deployments`, etc). - * When `what=logs"`, kinds is preset to `pods`. -* `namespaces` - specifies a list of namespaces from which to select objects. - * Example: `KUBEGET logs namespaces:"default kube-system"` - * Retrieves logs from pods in namespace`default` or `kube-system`. - * An empty value will get objects from all namespaces. -* `versions` - a list of API versions used to select objects. - * Example: `KUBEGET objects groups:"apps" versions:"v1 v1alpha1"` - * Retrieves objects from group `apps` having versions `v1` or `v1alpha1`. -* `names` - a list used to filter retrieved object names. - * Example:`KUBEGET logs names:"kindnet etcd"` - * Retrieves logs from pods with name matching `kindnet` or `etcd`. -* `containers` - a list of container names used when to filter selected pod objects. - * Example: `KUBEGET objects kinds:"pods" containers:"kindnet-cni"` - * Retrieves the pods that have containers named `kindnet-cni` -* `labels` - the label selector expression used to filter selected objects. - * Example: `KUBEGET objects kinds:"services" labels:"app=website"` - * Retrieves all services with label `app:website`. - * Expression uses same format as that used in `kubectl`. - -Here is an example of `KUBEGET` that explicitly uses most of its parameters (assuming `KUBECONFIG` is declared properly): -``` -KUBEGET objects groups:"core" kinds:"pods" namespaces:"kube-system default" containers:"nginx etcd" +capture(cmd="sudo crictl info") +copy(path="/var/log/cloud-init-output.log") +copy(path="/var/log/cloud-init.log") ``` -The previous `KUBEGET` command will retrieve all pods from namespaces `kube-system` or `default` that have container names `nginx` or `etcd`. - -Crash-Diagnostics stores all retrieved objects under root directory `kubeget` as JSON files. Inside that directory, the saved files are organized by namespaces (for namespaced resources) or -saved at the root directory. - -### OUTPUT -This directive configures the location and file name of the generated archive file as shown in the following example: -``` -OUTPUT /tmp/crashout/out.tar.gz +The previous example shows the definition of a custom function `from_host` which extracts a list of hosts from the local host file. The script also show the use of several built-in functions including: +* `ssh_config` +* `resources` +* `capture` +* `copy` -# Or with its named parameter path +These built-in functions are used to configure the script and issue commands against remote compute resources. -OUTPUT path:"/tmp/crashout/out.tar.gz" -``` +## Crashd Built-in Types +Crashd comes with many built-in functions and other types to help you create functioning and useful scripts. Each built-in function falls in to one the following category: +* Configuration functions +* Provider functions +* Resource enumeration function +* Command functions +* Default Values +* OS data and functions +* Argument data -If `OUTPUT` is not specified in the `Diagnostics.file`, the tool will apply the value of flag `--output` if provided. +## Configuration Functions +Configuration functions help to declare data structures that are used to store configuration information that can be used in the script. -### RUN -This directive executes the specified command on each machine in the `FROM` list. Unlike `CAPTURE` however, the output of the command is not written to the archive file bundle. +### `crashd_config()` +This function declares script-wide configuration information that is used to configure the script behavior at runtime. Values declared here are usually not used directly by the script. -The following shows an example of `RUN`: +#### Parameters -``` -RUN /bin/journalctl -l -u kube-apiserver - -# Or with its named parameter `cmd` +| Param | Description | Required | +| -------- | -------- | -------- | +| `workdir` | the working directory used by some functions to store files.| Yes | +| `uid`| User ID used to run local commands|No, defaults to current ID| +| `gid`| Group ID used to run local commands|No, defaults to current ID| +| `default_shell` |The default shell to use to execute commands |No, defaults to no shell| -RUN cmd:"/bin/journalctl -l -u kube-apiserver" -``` -`RUN` is useful and helps to execute commands to interact with the remote node for tasks such as data preparation or gathering before aggregation. +#### Output +`crashd_config()` returns a struct with the following fields. -The following shows how `RUN` can be used (see [originating issue](https://github.com/vmware-tanzu/crash-diagnostics/issues/4#issuecomment-540926598)) +| Field | Description | +| --------| --------- | +| `workdir` | The provided `workdir` | +| `uid` | The current UID set | +| `gid` | The current GID set | +| `default_shell`|The shell set, if any| +#### Example +```python +crashd_config( + workdir = "{}/crashd".format(os.home) +) ``` -# prepare needed data -RUN mkdir -p /tmp/containers -RUN /bin/bash -c 'for file in $(ls /var/log/containers/); do sudo cat /var/log/containers/$file > /tmp/containers/$file; done' -COPY /tmp/containers - -# clean up -RUN /usr/bin/rm -rf /tmp/containers -``` - -#### RUN Echo output -The RUJN command can also copy its result to standard output using the `echo` parameter: - -``` -RUN cmd:"/bin/journalctl -l -u kube-apiserver" echo:"true" -``` - -Note that you have to use the named parameter format. - -### WORKDIR -In a Diagnostics.file, `WORKDIR` specifies the working directory used when building the archive bundle as shown in the following example: - -``` -WORKDIR /tmp/crashdir - -# Or using its named parameter path - -WORKDIR path:"/tmp/crashdir" -``` - -The directory is used as a temporary location to store data from all data sources specified in the file. When the tar is built, the content of that directory is removed. - -### Example File - -``` -FROM local 162.164.10.1:2222 162.164.10.2:2222 -KUBECONFIG ${USER}/.kube/kind-config-kind -AUTHCONFIG username:test private-key:${USER}/.ssh/id_rsa -WORKDIR /tmp/output - -CAPTURE df -h -CAPTURE df -i -CAPTURE netstat -an -CAPTURE ps -ef -CAPTURE lsof -i - -OUTPUT path:/tmp/crashout/out.tar.gz -``` - -### Comments -Each line that starts with with `#` is considered to be a comment and is ignored at runtime as shown in -the following example: - -``` -# This shows how to comment your script -FROM local 162.164.10.1:2222 162.164.10.2:2222 -KUBECONFIG ${USER}/.kube/kind-config-kind -AUTHCONFIG username:test private-key:${USER}/.ssh/id_rsa -WORKDIR /tmp/output - -# Capture the following commands -CAPTURE df -h -CAPTURE df -i -CAPTURE netstat -an -CAPTURE ps -ef -CAPTURE lsof -i - -# send output here -OUTPUT path:/tmp/crashout/out.tar.gz -``` - - -## Compile and Run -`crash-diagnostics` is written in Go and requires version 1.11 or later. Clone the source from its repo or download it to your local directory. From the project's root directory, compile the code with the -following: - -``` -GO111MODULE="on" go install . -``` - -This should place the compiled `crash-diagnostics` binary in `$(go env GOPATH)/bin`. You can test this with: -``` -crash-diagnostics --help -``` -If this does not work properly, ensure that your Go environment is setup properly. - -Next run `crash-diagnostics` using the sample Diagnostics.file in this directory. Ensure to update it to reflect your -current environment: - -``` -crash-diagnostics run --output crashd.tar.gzip --debug -``` - -You should see log messages on the screen similar to the following: -``` -DEBU[0000] Parsing script file -DEBU[0000] Parsing [1: FROM local] -DEBU[0000] FROM parsed OK -DEBU[0000] Parsing [2: WORKDIR /tmp/crasdir] -... -DEBU[0000] Archiving [/tmp/crashdir] in out.tar.gz -DEBU[0000] Archived /tmp/crashdir/local/df_-i.txt -DEBU[0000] Archived /tmp/crashdir/local/lsof_-i.txt -DEBU[0000] Archived /tmp/crashdir/local/netstat_-an.txt -DEBU[0000] Archived /tmp/crashdir/local/ps_-ef.txt -DEBU[0000] Archived /tmp/crashdir/local/var/log/syslog -INFO[0000] Created archive out.tar.gz -INFO[0002] Created archive out.tar.gz -INFO[0002] Output done -``` - -## Contributing - -New contributors will need to sign a CLA (contributor license agreement). Details are described in our [contributing](CONTRIBUTING.md) documentation. - +### `kube_config()` +This configuration function declares and stores configuration needed to connect to a Kubernetes API server. -## License -This project is available under the [Apache License, Version 2.0](LICENSE.txt) \ No newline at end of file +#### Parameters +| Param | Description | Required | +| -------- | -------- | ------- | +| `path` | Path to the local Kubernetes config file. Default: `$HOME/.kube/config`| No | +| `capi_provider` | A Cluster-API provider (see providers below) to obtain Kubernetes configurations | No | + +#### Output +`kube_config()` returns a struct with the following fields. + +| Field | Description | +| --------| --------- | +| `path` | The path to the local Kubernetes config that was set | +| `capi_provider`|A provider that was set for Cluster-API usage| + +#### Example +```python +kube_config(path=args.kube_conf) +``` +### `ssh_config()` +This function creates configuration that can be used to connect via SSH to remote machines. + +#### Parameters +| Param | Description | Required | +| -------- | -------- | -------- | +| `username` | SSH user ID| Yes | +| `private_key_path`| Path for private key | No, default: `$HOME/.ssh/id_rsa` | +| `port` | Port for SSH connection | No, default `"22"` | +| `jump_user` | Username for an SSH proxy connection | No | +| `jump_host` | Host address for an SSH proxy connection | Yes if `jump_user` is provided | +| `max_retries` | The maximum number of tries to connect to SSH host| No default `5`| + +#### Output +`ssh_config()` returns a struct with the following fields. + +| Field | Description | +| --------| --------- | +| `username` | The `username` that was set | +| `private_key_path` | The private file that was set | +| `port` | The port value that was set | +| `jump_user`|The proxy user that was set| +| `jump_host`|The proxy host that was set if proxy user was provided| +| `max_retries`|The max number of retries set| + +#### Example +```python +ssh=ssh_config( + username=os.username, + private_key_path="{0}/.ssh/id_rsa".format(os.home), + port=args.ssh_port, + max_retries=5, +) +``` + +## Provider Functions +A provider function implements the code to cofigure and to enumerate compute resources for a given infrastructure. The result of the provider functions are used by the `resources` function to generate/enumerate the compute resources needed. + +### `capa_provider()` +This function configures the Cluster-API provider for AWS (CAPA). This provider can enumerate management or workload cluster machines in order to execute commands using SSH on those machines. + +#### Parameters +| Param | Description | Required | +| -------- | -------- | -------- | +| | | | + +### `capv_provider()` +This function configures a provider for a Cluster-API managed cluster running on vSphere (CAPV). By default, this provider will enumerate cluster resources for the management cluster. However, by specifying the name of a `workload_cluster`, the provider will enumarate cluster compute resources for the workload cluster. + +#### Parameters +| Param | Description | Required | +| -------- | -------- | -------- | +| `ssh_config`|SSH configuration returned by `ssh_config()`|Yes | +| `kube_config` |Kubernetes configuration returned by `kube_config`|Yes| +| `workload_cluster`|The name of a workload cluster. When specified the provider will retrieve a cluster's compute nodes for the workload cluster.|No| +| `labels`|A list of labels used to filter cluster's compute nodes|No| +| `nodes` |A list of node names that can filter selected cluster nodes|No| + +#### Output +`capv_provider()` returns a struct with the following fields. + +| Field | Description | +| --------| --------- | +| `kind`| The name of the provider (`capv_provider`)| +|`transport`|The name of the transport to use (i.e. `ssh, http, etc`)| +| `ssh_config` | A struct with SSH configuration | +| `kube_config` | A struct with Kubernetes configuration | +| `workload_cluster` | The name of the | +| `hosts`|A list of host addresses generated from cluster information| + +#### Example +```python + +ssh=ssh_config( + username=os.username, + private_key_path="{0}/.ssh/id_rsa".format(os.home), + port=args.ssh_port, + max_retries=5, +) + +kube=kube_config(path=args.kube_conf) + +capv_provider( + workload_cluster="my-wc-cluster", + ssh_config=ssh, + kube_config=kube +) +``` + +### `host_list_provider()` +As its name suggests, this provider is used to explicitly specify a list of host addresses directly. + +#### Parameters +| Param | Description | Required | +| -------- | -------- | -------- | +| `hosts` | A list of IP addresses or machine names | Yes | +| `ssh_config` | An SSH configuration as returned by ssh_config() | Yes | + +#### Output +`host_list_provider()` returns a struct with the following fields. + +| Field | Description | +| --------| --------- | +| `hosts` | The list of hosts that was set | +| `ssh_config` | The SSH configuration that was set| + +#### Output +`capv_provider()` returns a struct with the following fields. + +| Field | Description | +| --------| --------- | +| `kind`| The name of the provider (`host_list_provider`)| +| `transport`|The name of the transport to use (i.e. `ssh, http, etc`)| +| `ssh_config` | A struct with SSH configuration | +| `hosts`|The list of host addresses| + +#### Example + +```python +ssh=ssh_config( + username=os.username, + private_key_path="{0}/.ssh/id_rsa".format(os.home), + port="2222", + max_retries=5, +) + +host_list_provider(hosts=["172.100.10.20", "ctlplane.local"], ssh_config=ssh) +``` + +### `kube_nodes_provider()` +This provider captures configuration information to enumerate a Kubernetes cluster nodes. + +#### Parameters +| Param | Description | Required | +| -------- | -------- | -------- | +| `kube_config` | Kubernetes config returned by `kube_config()` | Yes | +| `ssh_config` | An SSH configuration as returned by ssh_config() | Yes | +| `names`|A list of names used to filter nodes |No| +| `labels`|A list of labels used to filter nodes|No| + +#### Output +`kube_nodes_provider()` returns a struct with the following fields. + +| Field | Description | +| --------| --------- | +| `kind`| The name of the provider (`kube_nodes_provider`)| +| `transport`|The name of the transport to use (i.e. `ssh, http, etc`)| +| `ssh_config` | A struct with SSH configuration | +| `kube_config` | The Kubernetes configuration that was set | +| `hosts`|A list of host addresses generated from cluster information| + +#### Example + +```python +ssh=ssh_config( + username=args.username, + private_key_path=args.key_path, + port=args.ssh_port, + max_retries=5, +) + +kube_nodes_provider( + kube_config=kube_config(path=args.kubecfg), + ssh_config=ssh, +) +``` + +## Resource Enumeration +Crashd uses the notion of a compute resource to which the running script can connect and possibly execute commands (see Command Functions). + +### `resrouces()` +The Crashd script uses the `resources` function along with a provider (see providers above) to properly enumerate compute resources. Each provider implements its own logic which determines how resources are enumerated. + +#### Parameter +| Param | Description | Required | +| -------- | -------- | -------- | +|`provider`|Species the provider to use for resource enumeration|Yes| + +#### Output +`resources` returns a list of structs based on the type of provider that is used. + +For `host_list_provider`, `kube_nodes_provider`, and `capv_provider`, each struct has the following fields. + +| Field | Description | +| --------| --------- | +| `kind` | The kind for the resources (`host_resource`) | +| `provider` | The name of the provider that generated the resource | +| `host` | Host address | +| `transport`|transport to use| +| `ssh_config`|SSH configuration| + +#### Example +```python +ssh=ssh_config( + username=os.username, + private_key_path="{0}/.ssh/id_rsa".format(os.home), + port="2222", + max_retries=5, +) + +hosts=resources( + provider=host_list_provider( + hosts=["localhost", "127.0.0.1"], + ssh_config=ssh, + ), +) + +run(cmd="uptime", resources=hosts) +``` +In the previous example, `hosts` contains the a list of informatation about hosts that can be used in command functions such as `run`. + +## Command Functions +Command functions can execute commands on all specified enumerated compute resources automatically or be used in a custom function (`def`) for more control. + +### `archive()` +The archive function bundles the specified directories into a single archive file (format tar.gz). + +#### Parameters +| Param | Description | Required | +| -------- | -------- | -------- | +|`source_paths`|A list of directories to be archived|Yes| +|`output_file`|The name of the generated archive file|No, default `archive.tar.gz`| + +#### Output +`archive` returns the full path of the created bundled file. + + +### `capture()` +This function runs its command all provided compute resources automatically. The output of the executed command is captured and saved in a file for each execution. + +#### Parameters +| Param | Description | Required | +| -------- | -------- | -------- | +| `cmd`|The command string to execute|Yes| +| `resources`|The value returned by `resources()`|Yes| +| `workdir`|A parent directory where captured files will be saved|No, defaults to `crashd_config.workdir`| +| `file_name`|The path/name of the generated file|No, auto-generated based on command string, if omitted| +| `desc`|A short description added at the start of the file|No| + +#### Output +`capture()` returns a list `[]` of command result struct for each compute resource where the command was executed. Each struct contains the following fields. + +| Field | Description | +| --------| --------- | +| `resource` | The address or name of the compute resource | +| `result` | the path of the file created | +| `err` | An error message if one was encountered | + +#### Example +```python +ssh=ssh_config( + username=os.username, + private_key_path="{0}/.ssh/id_rsa".format(os.home), + port="2222", + max_retries=5, +) + +hosts=resources( + provider=host_list_provider( + hosts=["localhost", "127.0.0.1"], + ssh_config=ssh, + ), +) + +capture(cmd="sudo df -i", resources=hosts) +capture(cmd="sudo crictl info", resources=hosts) +capture(cmd="df -h /var/lib/containerd", resources=hosts) +capture(cmd="sudo systemctl status kubelet", resources=hosts) + +``` + +### `capture_local()` +This function runs a command locally on the machine running the script. It then captures its output in a specified file. + +#### Parameters +| Param | Description | Required | +| -------- | -------- | -------- | +| `cmd`|The command string to execute|Yes| +| `workdir`|A parent directory where captured files will be saved|No, defaults to `crashd_config.workdir`| +| `file_name`|The path/name of the generated file|No, auto-generated based on command string, if omitted| +| `desc`|A short description added at the start of the file|No| + +#### Output +`capture_local()` returns the full path of the capured output file. + + +### `copy_from()` +This command specifies a list of files that are copied from a remote location to the local machine running the script. + +#### Parameters +| Param | Description | Required | +| -------- | -------- | -------- | +| `path`|The path of the remote file|Yes| +| `resources`|The value returned by `resources()`|Yes| +| `workdir`|A parent directory where files are copied to|No, defaults to `crashd_config.workdir`| + +#### Output +`copy()` returns a list `[]` of command result struct for each compute resource where the command was executed. Each struct contains the following fields. + +| Field | Description | +| --------| --------- | +| `resource` | The address or name of the compute resource | +| `result` | the path of the file copied | +| `err` | An error message if one was encountered | + +#### Example +```python +ssh=ssh_config( + username=os.username, + private_key_path="{0}/.ssh/id_rsa".format(os.home), + port="2222", + max_retries=5, +) + +hosts=resources( + provider=host_list_provider( + hosts=["localhost", "127.0.0.1"], + ssh_config=ssh, + ), +) + +copy_from(path="/var/log/kube*.log", resources=hosts) +``` +### `run()` +This function executes its specified command string on all provided compute resources automatically. It then returns a list of result objects containing information about the remote compute resource, where the command was executed, and the result of the command. + +#### Parameters +| Param | Description | Required | +| -------- | -------- | -------- | +| `cmd`|The command string to execute on each compute resource|Yes| +| `resources`|A collection of compute resources returned by `resources()`|Yes| + +#### Output +`run()` returns a list `[]` of command result structs for each compute resource where the command was executed. +Each struct contains the following fields. + +| Field | Description | +| --------| --------- | +| `resource` | The address or name of the compute resource where the command was executed | +| `result` | The result of the command on the resource | +| `err` | An error message if one was encountered | + +#### Example +```python +ssh=ssh_config( + username=os.username, + private_key_path="{0}/.ssh/id_rsa".format(os.home), + port="2222", + max_retries=5, +) + +hosts=resources( + provider=host_list_provider( + hosts=["ctrlplane.local", "172.10.20.30"], + ssh_config=ssh, + ), +) + +# run uptime command on all hosts +uptimes = run(cmd="uptime", resources=hosts) + +#print result for each host +print(uptimes[0].result) +print(uptimes[1].result) +``` +### `run_local()` +This function executes a command locally on the machine running the script and returns the result as a string. + +#### Parameters +| Param | Description | Required | +| -------- | -------- | -------- | +| `cmd`|The command string to execute|Yes| + +#### Output +`run_local` returns the result of the command as a string value. + +#### Example + +```python +# run_local to parse local /etc/hosts file +def from_hosts(): + hosts = run_local("""cat /etc/hosts | grep -E '([0-9]){3}\\.' | awk '{print $1}'""") + return hosts.splitlines() + +ssh_config(username=os.user, port=2222, retries=10) +hosts=resources(provider=host_list_provivider(hosts=from_hosts())) + +# run on hosts +uptimes = run(cmd="uptime", resources=hosts) +print(uptimes[0].result) +print(uptimes[1].result) +``` +## Kubernetes Functions +These are functions used to execute API requests against a running Kubernetes cluster using a Kubernetes configuration (either explicitly defined or from predeclared default). + +### `kube_capture()` +The `kube_capture` function retrieves Kubernetes API objects and container logs. The captured information is stored in local files with directory structure similar to that of `kubectl cluster-info dump`. + +#### Parameters +| Param | Description | Required | +| -------- | -------- | -------- | +|`what`|Specifies what to get inclusing `objects` or `logs`|Yes| +|`groups`|A list of API groups from which to retrieve API objects. The core group is named `core`|No| +|`kinds`|A list of object kinds to select|No| +|`namespaces`|A list of namespaces from which to select objects|No| +|`versions`|A list of API versions used to select objects|No| +|`names`|A list used to filter retrieved object by names|No| +|`labels`|A list of label selector expressions used to filter objects|No| +|`containers`|A list of container names used to filter when selecting pod objects|No| +|`kube_config`|The Kubernetes configuration used for this call|No, uses default if omitted| + +#### Output +Function `kube_capture` returns a struct with the following fields. + +| Field | Description | +| --------| --------- | +|`file`|The root directory where the captured files are saved| +|`error`|An error message, if any was encountered| + +#### Example +```python + +kube = kube_config(path=args.kube_cfg) + +pod_ns=["default", "kube-system"] + +kube_capture(what="logs", namespaces=pod_ns, kube_config=kube) +kube_capture(what="objects", kinds=["pods", "services"], namespaces=pod_ns, kube_config=kube) +kube_capture(what="objects", kinds=["deployments", "replicasets"], groups=["apps"], namespaces=pod_ns, kube_config=kube) +``` + +## Default Values +Some value types can be saved as default values during the execution of a +script. When the following values are saved as default, Crashd will automatically use +the last known default value for that type when appropriate: +* `kube_config` - the struct created by calling `kube_config` +* `ssh_config` - the struct created by calling `ssh_config()` +* `resources` - the list of struct created by calling `resources()` + +### Setting Default Values +Default values are set using the `set_defaults()` function. Each time this function +is called, it will save the last instance of a given type (overwriting the previous) +value. + +For instance, consider the following script: +```python +ssh=ssh_config( + username=os.username, + private_key_path="{0}/.ssh/id_rsa".format(os.home), + port="2222", + max_retries=5, +) + +hosts=resources( + provider=host_list_provider( + hosts=["localhost", "127.0.0.1"], + ssh_config=ssh, + ), +) + +capture(cmd="sudo df -i", resources=hosts) +capture(cmd="sudo crictl info", resources=hosts) +capture(cmd="df -h /var/lib/containerd", resources=hosts) +capture(cmd="sudo systemctl status kubelet", resources=hosts) +``` + +The previous script can be simplified using default values: +```python +ssh=ssh_config( + username=os.username, + private_key_path="{0}/.ssh/id_rsa".format(os.home), + port="2222", + max_retries=5, +) + +set_defaults(hosts=resources( + provider=host_list_provider( + hosts=["localhost", "127.0.0.1"], + ssh_config=ssh, + ), +)) + +capture(cmd="sudo df -i") +capture(cmd="sudo crictl info") +capture(cmd="df -h /var/lib/containerd") +capture(cmd="sudo systemctl status kubelet") +``` +The previous can be further simplified by setting the `ssh_config` as a default +value as follows: + +```python +set_defaults(ssh_config( + username=os.username, + private_key_path="{0}/.ssh/id_rsa".format(os.home), + port="2222", + max_retries=5 +)) + +# host_list_provider shortcut for resources +set_defaults(resources(hosts=["localhost", "127.0.0.1"])) + +capture(cmd="sudo df -i") +capture(cmd="sudo crictl info") +capture(cmd="df -h /var/lib/containerd") +capture(cmd="sudo systemctl status kubelet") +``` +The previous snippet sets the values for both the `ssh_config` and `resources` +as default. Notice also that `resources()` supports a shortcut to specify +host lists directly as a parameter. Internally, `resources()` creates an +instance of the `host_list_provider` when this shortcut is used. + +## OS Struct +At runtime, executing scripts are able to access OS information via a global OS struct. + +| Field | Description | +| ------- | ---------- | +|`os.name`| Returns the name of the OS running the script | +|`os.username`|The current username running the script| +|`os.home`|The home directory associated with the user running the script| +| `os.getenv()` | A function which returns the value of the provided environment variable name| + +### Example +```python +ssh=ssh_config( + username=os.username, + private_key_path="{0}/.ssh/id_rsa".format(os.home), + max_retries=5, +) +``` + +## Argument Struct +A running script can receive argument values from the command that invoked +the script using the `--args` flag which takes a space-separated key/value pair +as shown: + +``` +crashd run --args "ssh_user='capv' ssh_port='2121' kube_cfg='~/my/cfg' file.crsh +``` +In the script, the args can be used as follows: +```python +ssh=ssh_config( + username=args.ssh_user, + private_key_path="{0}/.ssh/id_rsa".format(os.home), + port=args.ssh_port + max_retries=5, +) + +kube_config(path=args.kube_cfg) +``` \ No newline at end of file