GitHub Actions & Data Pipeline

This document outlines the setup and workflow of our GitHub Actions data pipeline. The primary goal is to manage and version-control data generated by our CI/CD processes, with a special focus on handling a SQLite database and its schema migrations.

Core Components

`_data` Branch

A dedicated, orphaned branch (_data) serves as the storage for data artifacts. This keeps large data files and frequent data updates out of the main source code history, making the main repository lighter and faster to clone.

`actions/pipeline-data` ("Setup Pipeline Data Branch")

This composite action manages the interaction with the _data branch by creating a git worktree.

operation: setup: Checks out the _data branch into a .pipeline-data-worktree directory and uses rsync to copy the entire contents of the worktree's data directory into the main workspace's data directory.
operation: update: Uses rsync to sync the data directory from the main workspace to the worktree, then commits and force-pushes the changes to the _data branch.
operation: cleanup: Removes the .pipeline-data-worktree directory. This should be run at the end of a workflow, typically using an if: always() condition to ensure cleanup happens even if other steps fail.

`actions/restore-db` ("SQLite Database Operations")

This action handles the dumping and restoring of the SQLite database in a way that is compatible with our migration-based schema management.

operation: dump:
1. Dumps the live SQLite database (e.g., data/db.sqlite) into a diffable format in the specified dump directory (e.g., data/dump) using sqlite-diffable.
2. Copies the Drizzle migration journal (drizzle/meta/_journal.json) into the dump directory as _journal.json. This is a critical step that versions the database schema state along with the data itself.
operation: restore:
1. Reads the latest migration number from the _journal.json file located within the dump directory.
2. Initializes a new, empty database.
3. Runs database migrations from the main branch's drizzle directory up to the version specified in the journal file. This creates a database with the exact schema that corresponds to the dumped data.
4. Loads the data from the diffable dump into the database.
5. Runs any remaining migrations from the main drizzle folder to bring the database schema fully up to date with the latest code in the main branch.

Workflows

This repository uses several GitHub Actions workflows to automate testing, data processing, and deployment.

`run-pipelines.yml` ("Run Pipelines")

This is the main data processing workflow. It's responsible for fetching the latest data from sources like GitHub, processing it, and generating summaries.

Triggers:
- Runs on a daily schedule (cron: "0 23 * * *").
- Can be manually triggered (workflow_dispatch) with various options to control its behavior (e.g., forcing re-ingestion, specifying date ranges).
Key Jobs:
- ingest-export:
  1. Checks out the _data branch and restores the database.
  2. Runs the ingest pipeline to fetch new data (issues, PRs, etc.).
  3. Runs the process pipeline to calculate scores and other metrics.
  4. Runs the export pipeline to save processed data.
  5. Dumps the updated database and pushes all new data artifacts to the _data branch.
- generate-summaries:
  1. Depends on the successful completion of ingest-export.
  2. Restores the latest database from the _data branch.
  3. Uses an AI service to generate project and contributor summaries.
  4. On the daily schedule, it generates project summaries daily and contributor summaries weekly.
  5. Pushes the generated summaries and updated database state back to the _data branch.

`pr-checks.yml` ("PR Checks")

This workflow runs on every pull request against the main branch to ensure code quality and prevent regressions.

Triggers:
- pull_request on the main branch.
Key Jobs:
- check: Lints the code and runs type-checking with TypeScript.
- build: Ensures the Next.js application builds successfully with the PR changes. It restores the production data to ensure the build process is realistic.
- test-pipelines: Runs the core data pipelines (ingest, process, export) in a test mode to verify their integrity.
- check-migrations: If the database schema (src/lib/data/schema.ts) is modified, this job verifies that a corresponding Drizzle migration has been generated.

`deploy.yml` ("Deploy to GitHub Pages")

This workflow handles the deployment of the application to GitHub Pages.

Triggers:
- Manually via workflow_dispatch.
- Automatically after the Run Pipelines workflow successfully completes on the main branch.
Key Steps:
1. Restores the latest data from the _data branch.
2. Runs any pending database migrations.
3. Builds the Next.js application for production.
4. Copies the data directory into the out directory to be included in the deployment.
5. Deploys the contents of the out directory to GitHub Pages.

Name		Name	Last commit message	Last commit date
Latest commit History 599 Commits
.claude/commands		.claude/commands
.cursor		.cursor
.gemini		.gemini
.github		.github
.husky		.husky
.taskmaster		.taskmaster
.vscode		.vscode
auth-worker		auth-worker
cli		cli
config		config
data		data
drizzle		drizzle
legacy-data		legacy-data
plan		plan
scripts/legacy		scripts/legacy
src		src
.cursorignore		.cursorignore
.env.example		.env.example
.gitignore		.gitignore
.lintstagedrc.mjs		.lintstagedrc.mjs
.nojekyll		.nojekyll
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
.windsurfrules		.windsurfrules
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
components.json		components.json
drizzle.config.ts		drizzle.config.ts
eslint.config.mjs		eslint.config.mjs
next.config.js		next.config.js
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
tsconfig.nextjs.json		tsconfig.nextjs.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GitHub Actions & Data Pipeline

Core Components

`_data` Branch

`actions/pipeline-data` ("Setup Pipeline Data Branch")

`actions/restore-db` ("SQLite Database Operations")

Workflows

`run-pipelines.yml` ("Run Pipelines")

`pr-checks.yml` ("PR Checks")

`deploy.yml` ("Deploy to GitHub Pages")

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 9

Uh oh!

Languages

License

elizaOS/elizaos.github.io

Folders and files

Latest commit

History

Repository files navigation

GitHub Actions & Data Pipeline

Core Components

_data Branch

actions/pipeline-data ("Setup Pipeline Data Branch")

actions/restore-db ("SQLite Database Operations")

Workflows

run-pipelines.yml ("Run Pipelines")

pr-checks.yml ("PR Checks")

deploy.yml ("Deploy to GitHub Pages")

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 9

Uh oh!

Languages

`_data` Branch

`actions/pipeline-data` ("Setup Pipeline Data Branch")

`actions/restore-db` ("SQLite Database Operations")

`run-pipelines.yml` ("Run Pipelines")

`pr-checks.yml` ("PR Checks")

`deploy.yml` ("Deploy to GitHub Pages")

Packages