A CLI tool that extracts commit history from a local Git repository and outputs it as JSON Lines (.jsonl) files, suitable for ingestion into data warehouses and analytical systems.
- Reads the local
.gitdirectory directly via isomorphic-git — nogitCLI required at runtime - Outputs one commit per line in JSON Lines format
- Explicit extraction modes:
--mode snapshotfor independent extraction,--mode incrementalfor differential extraction using a state file - Handles multi-branch extraction with cross-branch deduplication
- Node.js ≥ 22.0.0
- A local Git repository (cloned and fetched via your preferred method — gitrail reads
.gitdata directly and does not require thegitCLI)
npm install -g gitrail# One-time extraction from a local clone
gitrail -b main ./my-repo
# Continuous extraction — fetch remote changes, then extract new commits
git -C ./my-repo fetch origin
gitrail -m incremental -b origin/main -s ./gitrail-state.json --on-missing-state snapshot ./my-repoSee the User Guide for detailed workflow patterns including incremental setup, release-tag-based extraction, and CI configuration.
gitrail [options] <repository-path>| Parameter | Alias | Type | Required | Default | Description |
|---|---|---|---|---|---|
<repository-path> |
positional | ✅ | — | Local path to the Git repository | |
--mode |
-m |
snapshot | incremental |
snapshot |
Extraction mode. snapshot runs independently of state; incremental reads state to extract only new commits. |
|
--branch <ref> |
-b |
string (repeatable) | ✅ | — | Ref to traverse from. Specify one or more times. |
--output-dir <path> |
-o |
string | ./ |
Directory for output .jsonl files |
|
--output-prefix <string> |
string | derived | Filename prefix (derived from remote origin if omitted) | ||
--state <path> |
-s |
string | — | State file path. Required with --mode incremental. |
|
--on-missing-state |
error | snapshot |
error |
Behavior when state file is absent. Only valid with --mode incremental. |
||
--since-ref <ref> |
string | — | Exclude commits reachable from this ref (tag, branch, or hash). Snapshot mode only. | ||
--since-date <ISO8601> |
string | — | Include only commits after this datetime. Snapshot mode only. | ||
--rotate-lines <n> |
number | — | Start new file after n lines |
||
--rotate-size <bytes> |
number | — | Start new file after n bytes |
||
--quiet |
-q |
boolean | false |
Suppress progress and summary output |
Progress updates and the final summary are written to stderr; use --quiet to suppress them.
Validation errors exit with code 1; runtime errors with code 2. See the
User Guide for the full list of mutual exclusion rules.
Each line in the output .jsonl file is a JSON object representing one commit:
{
"oid": "a1b2c3d4...",
"subject": "Fix null pointer in auth module",
"body": "",
"author": {
"name": "Jane Doe",
"email": "jane@example.com",
"timestamp": "2024-01-15T09:00:00+09:00"
},
"committer": {
"name": "Jane Doe",
"email": "jane@example.com",
"timestamp": "2024-01-15T09:05:00+09:00"
},
"parents": ["parenthash1"],
"repository": { "name": "my-repo", "url": "https://github.com/org/my-repo" }
}| Field | Description |
|---|---|
oid |
Full SHA-1 commit hash |
subject |
First line of the commit message |
body |
Remainder of the commit message (empty string if none) |
author |
Person who originally authored the changes |
committer |
Person who committed (may differ from author after rebase/cherry-pick) |
author.timestamp / committer.timestamp |
ISO 8601 datetime using the offset embedded in the commit object |
parents |
Array of parent commit hashes (empty for the initial commit; two entries for merge commits) |
repository.name |
Repository name derived from remote origin URL (falls back to directory name) |
repository.url |
Remote origin URL, or null if no remote is configured |
Output files are named <prefix>-<timestamp>-000001.jsonl, <prefix>-<timestamp>-000002.jsonl, and so on. The prefix is
derived from the repository's remote origin URL; use --output-prefix to override. The timestamp
segment (YYYYMMDDTHHmmssZ) is captured once per session, so all files from a single run share
the same timestamp and will not overwrite files produced by earlier runs. Use
--rotate-lines or --rotate-size to split output across multiple files.
Note: Output line order is not guaranteed to be chronological. Sort by
committer.timestampin your downstream system.
- User Guide — detailed workflows, mode explanations, and full CLI reference
- Changelog — release history and notable changes by version
- Contributing Guide — local setup, quality checks, and pull request workflow
- Architecture — layer responsibilities, end-to-end flow, and key design decisions
- Git Traversal — DAG traversal, differential extraction modes, and deduplication strategy
- Output Schema — JSONL format, field definitions, timestamp conversion, and file rotation
MIT