A zero-dependency Python CLI that scans your local git repositories and produces daily engineering activity statistics as CSV or JSON.
Point it at a folder full of repos, give it an author email, and get a clean breakdown of what happened each day — lines added, deleted, files touched, directories changed, and a complexity score. No GitHub API, no tokens, no browser automation. Just git log under the hood.
You have 20+ repos cloned locally. You want to know:
- How many lines did a developer ship per day this quarter?
- Which days were inactive (excluding weekends)?
- How spread out were the changes across the codebase?
Existing tools either require GitHub API tokens, scrape browser UIs, or need a database. This one reads your local .git history directly and writes a CSV.
git clone https://github.com/thestuntcoder/git-developer-activity.git
cd git-developer-activity
# Scan all repos under ~/Sites for one author, last 30 days
python3 -m devstats scan ~/Sites \
--author-email you@company.com \
--last 30d \
--format csv \
--output stats.csvRequirements: Python 3.9+ and git on your PATH. No pip dependencies. Nothing to install.
$ python3 -m devstats scan ~/Sites --author-email dev@example.com --last 30d
date,number_of_commits,lines_added,lines_deleted,net_lines,total_churn,files_changed,directories_touched,complexity_score
2026-02-08,3,411,28,383,439,41,5,21.58
2026-02-09,1,4,2,2,6,2,2,1.98
2026-02-14,5,166,82,84,248,8,5,6.53
2026-02-24,8,260,42,218,302,10,3,7.1
2026-03-05,18,1787,1822,-35,3609,52,7,27.67
2026-03-06,17,3655,604,3051,4259,84,4,41.52
...
Summary: 117 commits across 18 active day(s) (2026-02-08 → 2026-03-07)
Lines added: 14,232
Lines deleted: 3,213
Total churn: 17,445
Net lines: 11,019
The CSV goes to stdout (or a file with -o). The summary goes to stderr. Pipe-friendly.
Days with no activity are emitted as normal rows with zeros (for non---by-repo mode), so the output is a continuous date series.
python3 -m devstats {scan,repos} [options]| Command | What it does |
|---|---|
scan <directory> |
Auto-discovers all git repos in the immediate subdirectories of <directory> |
repos <path> [path ...] |
Uses one or more explicit repo paths |
# All repos under ~/code, last year, as CSV
python3 -m devstats scan ~/code --author-email dev@company.com --last 1y -o year.csv
# Specific repos, JSON output
python3 -m devstats repos ~/code/api ~/code/frontend \
--author-email dev@company.com \
--since 2026-01-01 --until 2026-04-01 \
--format json
# Filter by author name instead of email
python3 -m devstats scan ~/code --author-name "Jane Doe" --last 6m
# Multiple email identities (same person, different emails)
python3 -m devstats scan ~/code \
--author-email dev@company.com \
--author-email dev@personal.com \
--last 6m
# Combine name and email (OR logic — matches either)
python3 -m devstats scan ~/code \
--author-name "Jane Doe" \
--author-email jane@other-company.com \
--last 6m
# Regex match on author name or email
python3 -m devstats scan ~/code --author-regex "Jane|jane@" --last 3m
# Per-repo breakdown
python3 -m devstats scan ~/code --author-email dev@company.com --last 30d --by-repo
# Only Python and JS files
python3 -m devstats scan ~/code --author-email dev@company.com --last 30d --extensions py,js
# Exclude test fixtures
python3 -m devstats scan ~/code --author-email dev@company.com --exclude-path "fixtures/*"
# Skip merge commits
python3 -m devstats scan ~/code --author-email dev@company.com --last 30d --exclude-merges
# Different timezone
python3 -m devstats scan ~/code --author-email dev@company.com --timezone America/New_York
# Include lock files in stats
python3 -m devstats scan ~/code --author-email dev@company.com --include-generated
# Verbose mode — see which repos are found
python3 -m devstats scan ~/code --author-email dev@company.com --last 7d -v| Flag | Description |
|---|---|
--author-email EMAIL |
Exact email match. Repeat for multiple identities. |
--author-name NAME |
Exact author name match. Repeat for multiple identities. |
--author-regex PATTERN |
Regex matched against Author Name <email> |
When multiple --author-email and/or --author-name flags are given, git treats them as OR — a commit matches if any one of them hits.
| Flag | Description |
|---|---|
--since DATE |
Start date (inclusive), e.g. 2026-01-01 |
--until DATE |
End date (exclusive), e.g. 2026-04-01 |
--last SPAN |
Shorthand: 7d, 4w, 6m, 1y. Overrides --since. |
| Flag | Description |
|---|---|
--format {csv,json} |
Output format. Default: csv |
-o FILE / --output FILE |
Write to file instead of stdout |
--by-repo |
Add repo_name column, one row per day per repo |
| Flag | Description |
|---|---|
--timezone TZ |
Timezone for day boundaries. Default: UTC. Accepts IANA names (America/New_York), offsets (+05:30, UTC+9). |
--exclude-merges |
Skip merge commits |
--include-generated |
Include lock files and minified assets |
--extensions py,js,ts |
Only count files with these extensions |
--exclude-path GLOB |
Exclude matching paths (repeatable) |
-v / --verbose |
Show which repos are processed (to stderr) |
| Column | Description |
|---|---|
date |
YYYY-MM-DD in the configured timezone |
number_of_commits |
Commits on that day |
lines_added |
Lines added (excluding binary and filtered files) |
lines_deleted |
Lines deleted |
net_lines |
lines_added − lines_deleted |
total_churn |
lines_added + lines_deleted |
files_changed |
Unique file paths changed that day |
directories_touched |
Unique top-level directories changed (see below) |
complexity_score |
Heuristic score (see below) |
With --by-repo, a repo_name column is inserted after date.
-
Discovery —
scanwalks the given directory one level deep, runninggit rev-parse --is-inside-work-treeon each subdirectory. Warns if a repo is a shallow clone. -
Enumeration — Runs
git log --all --author=<filter> --since=<date> --until=<date>to get matching commit SHAs. -
Detail extraction — For each commit, runs
git show --numstat --format=... -M <sha>to get per-file added/deleted counts. The-Mflag detects renames. -
Filtering — Removes binary files, files in skipped directories, generated/lock files, and anything excluded by
--exclude-pathor--extensions. -
Aggregation — Groups by calendar date using the author date (not committer date) converted to the configured timezone.
-
Export — Writes CSV or JSON with stable column ordering.
All git interaction happens via subprocess calling the git CLI. No GitPython, no pygit2, no API calls.
Counts unique top-level directories — the first path segment of each changed file. Files in the repo root count as ".".
Example: changes to src/a.py, src/b.py, lib/c.py, and README.md → 3 directories: src, lib, .
complexity_score = 0.35 × log(1 + total_churn)
+ 0.45 × files_changed
+ 0.20 × directories_touched
Rounded to 2 decimal places. The idea:
- log(churn) — rewards volume of work but dampens massive refactors or generated code
- files_changed — breadth of changes, correlates with review difficulty
- directories_touched — cross-cutting scope, context-switching cost
The weights live in devstats/complexity.py:WEIGHTS — change them if you want.
For non---by-repo exports, the tool emits a dense day-by-day timeline.
If a date has no commits, it is still included with zero values (number_of_commits=0, lines_added=0, etc.).
Range used for filling missing days:
--since/--untilif provided (--untilis exclusive)- if only
--sinceis provided, fill through today - if no date flags are provided, fill between first and last active day
node_modules, vendor, dist, build, coverage, tmp, .next, __pycache__, .cache, .tox, .mypy_cache, .pytest_cache, venv, .venv, env
- Exact filenames:
package-lock.json,yarn.lock,pnpm-lock.yaml,Gemfile.lock,Pipfile.lock,poetry.lock,composer.lock,Cargo.lock,go.sum,flake.lock - Glob patterns:
*.min.js,*.min.css,*.bundle.js,*.chunk.js,*.map,*.compiled.*,*.generated.*,*.pb.go,*_pb2.py,*.swagger.json
Git reports binary files as -\t- in numstat. They are excluded from all line counts, file counts, and directory counts.
Commits are grouped by author date — when the code was actually written — not committer date (when it was applied/merged). This means:
- A commit authored at 23:30 UTC will land on the next calendar day if you use
--timezone UTC+2 - Rebased or cherry-picked commits retain the original author date
- This is usually what you want for measuring "when did the person write this code"
- If a repo fails, processing continues. Failures are listed at the end.
- Shallow clones trigger a warning (history may be incomplete).
- The exit code is 0 if at least one repo was processed successfully.
devstats/
├── cli.py # Argument parsing, orchestration
├── discovery.py # Find git repos, validate paths, detect shallow clones
├── commits.py # git log / git show wrappers
├── numstat.py # Parse --numstat output (additions, deletions, renames)
├── filters.py # Skip dirs, generated files, extension whitelist
├── aggregation.py # Group by day, compute stats
├── complexity.py # Complexity score formula + weights
├── export.py # CSV / JSON output, summary with inactive days
├── constants.py # All default values in one place
tests/
├── test_numstat.py # Numstat parsing, renames, binary detection
├── test_filters.py # Skip dirs, generated files, classification
├── test_complexity.py # Score formula, determinism, custom weights
├── test_aggregation.py # Timezone handling, day grouping, merging
├── test_export.py # CSV/JSON format, dense timeline, summaries
├── test_cli.py # --last shorthand parsing
├── test_discovery.py # Repo detection
├── test_integration.py # End-to-end with a real temp git repo
pip install pytest
python3 -m pytest -v126 tests covering numstat parsing, filtering, complexity scoring, timezone-aware aggregation, inactive weekday calculation, CSV/JSON output, and full end-to-end integration with a real temporary git repo.
- Local repos only. No GitHub API, no tokens, no network calls.
- git must be installed. The tool shells out to
git logandgit show. - Shallow clones may report incomplete history. The tool warns but continues.
- Timezone handling uses Python's
zoneinfofor IANA names (Python 3.9+). For fixed offsets like+05:30, no extra modules are needed. Historical DST transitions are not modelled when using fixed-offset notation. - Large repos are processed sequentially. For millions of commits, narrow the date range with
--since/--until/--last. --sincewith date-only values — git interprets--since=2026-03-01using the current time-of-day as the cutoff (a git quirk). The--lastflag avoids this by emitting full ISO timestamps internally.
MIT