GitHub archives and git Grafana visualization dashboards

Authors: Łukasz Gryglicki lgryglicki@cncf.io, Justyna Gryglicka jgryglicka@cncf.io.

This is a toolset to visualize GitHub archives using Grafana dashboards.

GHA2DB stands for GitHub Archives to DashBoards.

More information about Kubernetes dashboards here.

Kubernetes and Helm

Please see example Helm chart for an example Helm deployment.

Please see Helm chart for a full Helm deployment.

Please see LF Helm chart for the LF Helm deployment (it is a data deployment, has no Grafana).

Please see GraphQL Helm chart for GraphQL foundation DevStats deployment.

Please see Kubernetes dashboard if you want to enable a local dashboard to explore the cluster state.

Please see bare metal example to see an example of bare metal deployment.

The rest of this document describes the current bare metal deployment on metal.equinix.com used by CNCF projects.

Presentations

Presentations are available here.
Direct link.
Another direct link.

Talks

Architecture

DevStats is deployed using Helm on Kubernetes running on bare metal servers provided by Equinix.

DevStats is written in Go, it uses GitHub archives, GitHub API and git as its main data sources.

Under the hood, DevStats uses the following CNCF projects:

Helm (for deployment).
containerd (as a Kubernetes container runtime, CRI).
cert-manager (for HTTPS/SSL certificates).
OpenEBS (for local storage volumes support).
MetalLB (as a load balancer for bare metal servers).
CoreDNS (Kubernetes internal DNS).

And other projects, including:

Equinix (bare metal servers provider).
Ubuntu (containers base operating system).
kubeadm (for installing Kubernetes).
NFS (for shared write network volumes support).
NGINX (for ingress).
Calico (as networking for Kubernetes, CNI).
Golang (DevStats is written in Go).
PostgreSQL (DevStats database is Postgres).
patroni (HA deployment of PostgreSQL database, tweaked for DevStats).
GitHub archives (main data source).
GitHub API (data source).
git (data source).
Grafana (UI).
Let's Encrypt (provides HTTPS/SSL certificates).
Travis CI (continuous integration & testing).

Please check this for a detailed architecture description.

Deploying on your own project(s)

See the simple DevStats example repository for single project deployment (Homebrew), follow instructions to deploy for your own project.

Goal

We want to create a toolset for visualizing various metrics for the Kubernetes community (and also for all CNCF projects).

Everything is open source so that it can be used by other CNCF and non-CNCF open source projects.

The only requirement is that project must be hosted on a public GitHub repository/repositories.

Data hiding

If you want to hide your data (replace with anon-#) please follow the instructions here.

Forking and installing locally

This toolset uses only Open Source tools: GitHub archives, GitHub API, git, Postgres databases, and multiple Grafana instances. It is written in Go and can be forked and installed by anyone.

Contributions and PRs are welcome. If you see a bug or want to add a new metric please create an issue and/or PR.

To work on this project locally please fork the original repository, and:

Please see Development for local development guide.

For more detailed description of all environment variables, tools, switches, etc, please see Usage.

Metrics

We want to support all kinds of metrics, including historical ones. Please see requested metrics to see what kind of metrics are needed. Many of them cannot be computed based on the data sources currently used.

Repository groups

There are some groups of repositories that are grouped together as a repository groups. They are defined in scripts/kubernetes/repo_groups.sql.

To setup default repository groups:

PG_PASS=pwd ./kubernetes/setup_repo_groups.sh.

This is a part of kubernetes/psql.sh script and kubernetes psql dump already has groups configured.

In an All CNCF project repository groups are mapped to individual CNCF projects scripts/all/repo_groups.sql:

Company Affiliations

We also want to have per company statistics. To implement such metrics we need a mapping of developers and their employers.

There is a project that attempts to create such mapping cncf/gitdm.

DevStats has an import tool that fetches company affiliations from cncf/gitdm and allows to create per company metrics/statistics. It also uses companies.yaml file to map company acquisitions (any data generated by a company acquired by another company is assigned to the latter using a mapping from companies.yaml).

If you see errors in the company affiliations, please open a pull request on cncf/gitdm and the updates will be reflected on https://k8s.devstats.cncf.io a couple of days after the PR has been accepted. Note that gitdm supports mapping based on dates, to account for developers moving between companies.

New affiliations are imported into DevStats about 1-2 times/month.

Architecture

For architecture details please see architecture file.

Detailed usage is here

Adding new metrics

Please see metrics to see how to add new metrics.

Adding new projects

To add a new project on a bare metal deployment follow adding new project instructions.

See cncf/devstats-helm:ADDING_NEW_PROJECTS.md for information about how to add more projects on Kubernetes/Helm deployment.

Grafana dashboards

Please see dashboards to see a list of already defined Grafana dashboards.

Exporting data

Please see exporting.

Detailed Usage instructions

USAGE

Servers

The servers to run devstats are generously provided by Equinix bare metal hosting as part of CNCF's Community Infrastructure Lab.

One line run all projects

Use GHA2DB_PROJECTS_OVERRIDE="+cncf" PG_PASS=pwd devstats.
Or add this command using crontab -e to run every hour HH:08.

Checking projects activity

Use: PG_PASS=... PG_DB=allprj ./devel/activity.sh '1 month,,' > all.txt.
Example results here - all CNCF project activity during January 2018, excluding bots.

Troubleshooting

If you see error like this pq: row is too big: size 8192, maximum size 8160 and/or Error result for xyz (took 11m52.048191357s):

Shell into logging database and check:
Run on DevStats node: k exec -itn devstats-prod devstats-postgres-0 -- psql devstats.
Run while on devstats database: select dt, run_dt, msg from gha_logs where msg like '%Error result for%';.

             dt             |           run_dt           |                                  msg
----------------------------+----------------------------+-----------------------------------------------------------------------
 2024-09-01 00:48:07.079436 | 2024-09-01 00:34:26.426402 | Error result for helm (took 13m36.712884455s): exit status 2
 2024-09-07 00:16:11.132541 | 2024-09-07 00:04:14.051939 | Error result for prometheus (took 11m52.048191357s): exit status 2
 2024-09-07 00:26:43.701404 | 2024-09-07 00:05:55.08925  | Error result for fluentd (took 15m1.328366817s): exit status 2
 2024-09-07 00:16:11.038887 | 2024-09-07 00:08:43.846938 | Error result for grpc (took 7m24.348182232s): exit status 2
 2024-09-03 13:20:02.682134 | 2024-09-03 12:57:23.220227 | Error result for opentelemetry (took 22m29.324614973s): exit status 2
 2024-09-03 13:09:56.535074 | 2024-09-03 13:04:43.451026 | Error result for spinnaker (took 5m7.631109092s): exit status 2
(6 rows)

You can investigate each via: echo "select dt, prog, proj, msg from gha_logs where run_dt = '2024-09-01 00:34:26.426402';" | k exec -itn devstats-prod devstats-postgres-1 -- psql devstats > log.txt.
row is too big is usually caused by metric: suser_activity. You can add this metric to ./devel/test_metrics.yaml and generate devstats docker images to reinitialize it for given project(s) via:
helm install --generate-name ./devstats-helm --set namespace='devstats-prod',skipSecrets=1,skipPVs=1,skipBackupsPV=1,skipVacuum=1,skipBackups=1,skipBootstrap=1,skipCrons=1,skipAffiliations=1,skipGrafanas=1,skipServices=1,skipPostgres=1,skipIngress=1,skipStatic=1,skipAPI=1,skipNamespaces=1,testServer='',prodServer='1',provisionImage='lukaszgryglicki/devstats-prod',provisionCommand='./devstats-helm/add_metric.sh',nCPUs=8,indexProvisionsFrom=N,indexProvisionsTo=M.

Name		Name	Last commit message	Last commit date
Latest commit History 1,057 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
aerakimesh		aerakimesh
akri		akri
all		all
allcdf		allcdf
analysis		analysis
antrea		antrea
apache		apache
argo		argo
armada		armada
artifacthub		artifacthub
athenz		athenz
atlantis		atlantis
azf		azf
backstage		backstage
bankvaults		bankvaults
bfe		bfe
bpfman		bpfman
brigade		brigade
buildpacks		buildpacks
capsule		capsule
carina		carina
cartography		cartography
carvel		carvel
cdevents		cdevents
cdf		cdf
cdk8s		cdk8s
certmanager		certmanager
chaosblade		chaosblade
chaosmesh		chaosmesh
chubaofs		chubaofs
cii		cii
cilium		cilium
cloudcustodian		cloudcustodian
cloudevents		cloudevents
clusternet		clusternet
clusterpedia		clusterpedia
cncf		cncf
cni		cni
cnigenie		cnigenie
confidentialcontainers		confidentialcontainers
connect		connect
containerd		containerd
containerssh		containerssh
contour		contour
contrib		contrib
copacetic		copacetic
cord		cord
coredns		coredns
cortex		cortex
crio		crio
cron		cron
crossplane		crossplane
csv		csv
curiefense		curiefense
curve		curve
dapr		dapr
devel		devel
devfile		devfile
devspace		devspace
devstream		devstream
dex		dex
distribution		distribution
docs		docs
dragonfly		dragonfly
easegress		easegress
emissaryingress		emissaryingress
envoy		envoy
eraser		eraser
etcd		etcd
expressgraphql		expressgraphql
externalsecretsoperator		externalsecretsoperator
fabedge		fabedge
falco		falco
flatcar		flatcar
fluentd		fluentd
fluid		fluid
flux		flux
fn		fn
git		git
gitopswg		gitopswg
godotengine		godotengine
grafana		grafana
graphiql		graphiql
graphql		graphql
graphqljs		graphqljs
graphqlspec		graphqlspec
grpc		grpc
hami		hami
harbor		harbor
headlamp		headlamp
helm		helm
hexapolicyorchestrator		hexapolicyorchestrator
hide		hide
hwameistor		hwameistor
hyperledger		hyperledger
images		images
inclavarecontainers		inclavarecontainers
ingraind		ingraind
inspektorgadget		inspektorgadget

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitHub archives and git Grafana visualization dashboards

Kubernetes and Helm

Presentations

Talks

Architecture

Deploying on your own project(s)

Goal

Data hiding

Forking and installing locally

Metrics

Repository groups

Company Affiliations

Architecture

Adding new metrics

Adding new projects

Grafana dashboards

Exporting data

Detailed Usage instructions

Servers

One line run all projects

Checking projects activity

Troubleshooting

About

Contributors 9

Languages

License

cncf/devstats

Folders and files

Latest commit

History

Repository files navigation

GitHub archives and git Grafana visualization dashboards

Kubernetes and Helm

Presentations

Talks

Architecture

Deploying on your own project(s)

Goal

Data hiding

Forking and installing locally

Metrics

Repository groups

Company Affiliations

Architecture

Adding new metrics

Adding new projects

Grafana dashboards

Exporting data

Detailed Usage instructions

Servers

One line run all projects

Checking projects activity

Troubleshooting

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Contributors 9

Languages