phantomstars

Automated detection and tracking of fake engagement on GitHub

A JS Labs project — part of the AI Slop Intelligence initiative.
Runs every day. Scores every suspicious account. Detects coordinated bot campaigns.
Files issues directly on compromised repos so maintainers can act.

Support this project

BTC 3QjWqhQbHdHgWeYHTpmorP8Pe1wgDjJy54
ETH 0x5851e6145F4773d1585b8686095FB16E368a4dA1
ZEC t1KSR5YkNPbjqRSCoLKo5AddFWdm9Kzxh1B

Why this exists

GitHub stars are a trust signal. They are how developers decide what to evaluate, what to depend on, and what to recommend. That signal is being systematically corrupted.

During the AI boom of 2024-2026, an industry of bot farms emerged to manufacture credibility for low-quality, often malicious repositories. A project with 800 stars in 48 hours reads as legitimate to a developer scanning search results. That's the point. The goal of fake engagement isn't the stars themselves; it's the social proof those stars produce, and the downstream decisions that social proof influences.

The pattern is identifiable. Accounts created the same week, no bio, no followers, no original repositories, starring the same 15 repos within a 2-hour window. Not one campaign, but dozens running simultaneously, every day, across thousands of accounts. The data shows repos where 185 out of 185 engagers are bots. A 100% fakeness ratio. Entire trending placements built on nothing.

phantomstars was built because this problem is tractable. The signal-to-noise ratio in GitHub's public API is, for now, still high enough that coordinated campaigns leave clear fingerprints. This project reads those fingerprints, publishes the raw data, and notifies affected repository maintainers directly.

This is part of the broader AI Slop Intelligence work at JS Labs, ongoing research into the mechanics and measurable effects of low-quality AI-generated content flooding developer ecosystems. Fake engagement isn't a peripheral issue. It's the distribution mechanism that gets slop in front of real users.

What it does

phantomstars runs a daily GitHub Actions job that:

Scrapes the GitHub Trending page for repos gaining stars today
Queries the GitHub Search API for repos created in the last 7 days with sudden star activity (the wider window catches multi-day campaigns missed by 24h-only scans)
Pulls recent engagement events (stars, forks) via the Events API (last 24 hours per repo)
Fetches the full profile of every engaging account via GraphQL: account creation date, follower/following counts, bio, repo history
Scores every account against a composite heuristics model: account age, profile completeness, repository patterns, and activity history
Detects coordinated campaigns using timestamp clustering and union-find: clusters of suspicious accounts that engaged within a 3-hour window
Appends all suspects to an append-only JSONL ledger committed back to this repo
Publishes a per-repo intelligence feed showing which repos are being targeted and at what fakeness ratio
Files GitHub issues directly on targeted repos so maintainers see the campaign data in their own issue tracker
Writes a formatted scan report to the GitHub Actions job summary

No servers. No databases. No infrastructure bill.

Frequently asked questions

Does it notify the targeted repo?

Yes. When a repo's fakeness ratio exceeds 40% or a coordinated campaign is detected, phantomstars opens an issue directly on that repository. The issue contains the full suspect table, campaign membership, composite scores, and account creation dates: everything a maintainer needs to investigate and report to GitHub.

If issues are disabled on a targeted repo, the notification is skipped silently and recorded in the scan log.

Can I report a false positive?

Yes. If your account appears in data/suspects.jsonl and you believe the classification is incorrect, open a false positive issue using the provided template. Reports are reviewed manually before any allowlist addition. The allowlist is stored in data/allowlist.txt; accounts listed there are excluded from all future scans and from the suspects ledger.

What is the campaign ID?

A campaign ID (e.g. c-a3f9b2e1) is a deterministic 8-character hex fingerprint derived from the SHA-256 hash of the sorted set of member logins in that campaign. The same group of accounts will produce the same campaign ID across independent scan runs, enabling longitudinal tracking. It is not a repo name, a username, or any external identifier.

Stability: the ID is stable as long as the campaign's member set is unchanged. If bots are added or suspended between scans, the ID changes because the membership changed. This is expected and reflects real-world drift in bot farm composition.

Does it check account creation dates?

Yes. Every account's creation date is fetched from the GitHub GraphQL API (createdAt field) and stored in each suspect record as account_created_at. It's also the primary input to the account age score, the strongest single signal for fake accounts. Accounts created within 2 days of engaging score 1.0 on age alone.

How confident is it?

Individual scores carry meaningful false positive rates. A new developer with a sparse profile legitimately scores 0.75+. The tool accounts for this by requiring campaign-level evidence before filing issues; a single suspicious account is not enough. A coordinated cluster of 40+ accounts, all created the same week, all scoring 0.75+, all engaging within 90 minutes, is a different matter. That's where confidence becomes actionable.

The data is always probabilistic. The issue bodies say so explicitly. The goal is to give maintainers the signal and the raw evidence to make their own judgement.

Live dashboard

Date	Scanned	Likely Fake	Suspicious	Campaigns	New Fakes (24h)
2026-05-17	8015	831	5709	82	831

Today's most-targeted repos

Repo	Engagers	Likely Fake	Fakeness %	Campaigns
Carolina313876/Quantum-Vanity-Address-Forge	185	185	100.0%	1
tonylinden54/palisade-security-nexus	185	185	100.0%	1
johanwolfaardt-ctrl/Account-Symphony-Dashboard	185	185	100.0%	1
keerthanapranesh/Claude-Code-Swarm-Toolkit	185	185	100.0%	1
psyicarus/quizlet-match-whisper	185	185	100.0%	1
yanilsa09cabrera-jpg/soundboard-studio-pro	185	185	100.0%	1
ogaawin/Draft-2026-CAD-Workspace	185	185	100.0%	1
nanasalgadas1000-cell/seraph-nuke-inferno	185	185	100.0%	1
23k65A1408/Create-Aeronautics-Skywards	185	185	100.0%	1
shritanu16007-ctrl/Delta-Executor-Next-Gen	185	185	100.0%	1
8015238355/mm2-analytics-dashboard-2026	185	185	100.0%	1
johnicassere/lab-rat-race	185	185	100.0%	1
NazmulHudha/office-automation-toolkit	185	185	100.0%	1
e7137768-stack/Extreme-DAW-Beat-Forge-2026	185	185	100.0%	1
jonathanngaboyeka/rust-movement-optimizer	185	185	100.0%	1
wilmer-afk/Apex-Injector	185	185	100.0%	1
husammuhayman/homm-legacy-lore-tome	185	185	100.0%	1
ImanFahrel/joystick-canvas	185	185	100.0%	1
ipinputra/GPT-Image-2-Unlocked-API-Toolkit	185	185	100.0%	1
MHuy9911/Game-Network-Turbo-Chamber	185	185	100.0%	1
LindyNongmaithem/guild-inflator-plus	185	185	100.0%	1
Khanhhayho-spec/jetbrains-enhancement-kit	185	185	100.0%	1
Aryanzzzz25/f95-zone-sync-manager	185	185	100.0%	1
JaideepN07/Crosshair-Studio-Engine	185	185	100.0%	1
ujan007/3dsmax-2027-studio-workflow	185	185	100.0%	1

Scoring model

Each account receives a composite suspicion score (0.0 = clean, 1.0 = likely fake) from four signals:

Signal	Weight	Measurement
Account age	35%	`< 2 days` → 1.00 · `< 7 days` → 0.90 · `< 30 days` → 0.55 · `< 90 days` → 0.20 · older → 0.00
Profile completeness	30%	Points for: no bio (+0.25), no location (+0.15), no company (+0.10), zero followers (+0.30), zero following (+0.10), bot-pattern username (+0.20)
Repository pattern	25%	Zero repos → 0.90 · all repos are forks → 0.80 · >85% fork ratio → 0.55
Activity history	10%	Accounts >14 days old with zero repos + zero social graph → 0.80 (ghost accounts). Zero repos only → 0.60. All-forks + no social graph → 0.50

Classification thresholds:

Score	Classification
≥ 0.75	`likely_fake`
≥ 0.45	`suspicious`
< 0.45	`clean` (not stored)

Campaign detection

A campaign is a group of ≥ 4 suspicious accounts that all engaged with the same repo within a 3-hour window. The algorithm uses union-find to build connected components; accounts that co-engaged within the window are merged, and any component above the minimum size is flagged as a coordinated campaign.

Campaign IDs are stable SHA-256 fingerprints of the sorted member set. The same campaign detected on consecutive days will have the same ID as long as membership is unchanged.

Why campaigns are the real signal: Individual scores have meaningful false positive rates. A new developer with a sparse profile can score 0.80 alone. Forty accounts all scoring 0.75+, created within the same week, all starring the same repo within 90 minutes, is not a coincidence. The campaign signal is where the data becomes actionable: the difference between a suspicious data point and evidence of a coordinated operation.

Data format

All findings are committed to data/suspects.jsonl and data/repos.jsonl, one JSON record per line, append-only. The GitHub Actions job summary (visible in the Actions UI after each run) provides a formatted per-scan report.

suspects.jsonl — one record per flagged account per scan:

{
  "login": "user98432",
  "account_age_score": 0.9,
  "profile_score": 0.8,
  "repo_pattern_score": 0.8,
  "activity_score": 0.85,
  "composite": 0.842,
  "classification": "likely_fake",
  "campaign_id": "c-a3f9b2e1",
  "scan_date": "2026-05-17",
  "account_created_at": "2026-05-15",
  "target_repos": ["owner/repo-a", "owner/repo-b"]
}

repos.jsonl — one record per targeted repo per scan:

{
  "full_name": "owner/suspicious-repo",
  "total_scanned": 87,
  "likely_fake": 62,
  "suspicious": 18,
  "fakeness_ratio": 0.713,
  "classification": "likely_fake",
  "campaign_count": 3,
  "scan_date": "2026-05-17"
}

Query examples:

# All likely_fake accounts from today
jq 'select(.scan_date == "2026-05-17" and .classification == "likely_fake") | .login' data/suspects.jsonl

# Accounts created in the last 3 days that were flagged
jq 'select(.account_created_at >= "2026-05-14") | [.login, .account_created_at, .classification] | @tsv' -r data/suspects.jsonl

# Which repos were targeted today, sorted by fakeness ratio
jq 'select(.scan_date == "2026-05-17") | [.full_name, .fakeness_ratio, .likely_fake] | @tsv' -r data/repos.jsonl | sort -t$'\t' -k2 -rn

# All members of a specific campaign
jq 'select(.campaign_id == "c-a3f9b2e1") | [.login, .account_created_at, .composite] | @tsv' -r data/suspects.jsonl

# Repos a specific account targeted
jq 'select(.login == "user98432") | .target_repos[]' data/suspects.jsonl

# High-confidence repos: fakeness ratio above 60%
jq 'select(.fakeness_ratio >= 0.6) | [.full_name, .fakeness_ratio, .campaign_count] | @tsv' -r data/repos.jsonl | sort -t$'\t' -k2 -rn

Setup

1. Fork this repo

Your fork owns the data. Results are committed back to data/suspects.jsonl and data/repos.jsonl on your fork after every daily run.

2. Add a GitHub PAT secret

Create a classic Personal Access Token with scopes:

public_repo: read public repo events and stargazers, create issues on public repos
read:user: fetch user profiles via GraphQL

Settings → Secrets and variables → Actions → New repository secret → name it GH_TOKEN.

The default GITHUB_TOKEN has restricted rate limits and cannot call the user GraphQL endpoint at full capacity. A PAT is required.

3. Enable Actions

Actions → Enable GitHub Actions on your fork. The workflow runs at 07:00 UTC daily (after GitHub resets the trending page). Manual trigger available via Actions → Daily Phantom Stars Scan → Run workflow.

After each run, the formatted scan report is visible in Actions → [run] → Summary.

4. Run locally

git clone https://github.com/YOUR_USERNAME/phantomstars.git
cd phantomstars
python -m venv venv && source venv/bin/activate
pip install -e .
GH_TOKEN=ghp_your_token python -m phantomstars.main

Project structure

phantomstars/
├── .github/
│   ├── workflows/daily-scan.yml       # Cron: 07:00 UTC, free on public repos
│   └── ISSUE_TEMPLATE/false_positive.yml
├── src/phantomstars/
│   ├── config.py                      # All constants, no argparse, no env parsing
│   ├── models.py                      # Frozen dataclasses
│   ├── github_client.py               # REST + GraphQL, tenacity retries, rate-limit aware
│   ├── heuristics.py                  # Per-user composite scoring engine
│   ├── campaigns.py                   # Timestamp clustering + union-find
│   ├── storage.py                     # JSONL append + query helpers
│   ├── reporter.py                    # README dashboard injector
│   ├── notifier.py                    # GitHub Issues notifier (files on targeted repos)
│   └── main.py                        # Orchestration entry point
├── tests/
│   ├── conftest.py
│   ├── test_heuristics.py
│   └── test_campaigns.py
├── data/
│   ├── suspects.jsonl                 # Append-only account findings ledger
│   ├── repos.jsonl                    # Append-only per-repo intelligence
│   └── allowlist.txt                  # Accounts excluded from future scans
└── pyproject.toml

Limitations and known failure modes

Events API cap: maximum 300 recent events per repo. Repos with thousands of stars in a day have partial coverage.
Search index lag: GitHub's search index is eventually consistent. Repos created seconds before the scan boundary may be missed.
Heuristic drift: Bot operators adapt. Score weights may require periodic tuning; adjust constants in config.py.
Individual false positives: A new developer with a sparse profile scores 0.75+ in isolation. Campaign membership is the high-confidence signal.
Campaign ID drift: If a bot farm's membership changes between scans (bots suspended, new bots added), the campaign ID changes. This reflects actual campaign evolution, not a bug.
Rate limits: 5,000 API requests/hour on an authenticated PAT. Well within limits for standard trending page sizes.
Issues disabled: Some targeted repos disable issues. Notifications for those repos are skipped silently.

False positive process

If your account appears in data/suspects.jsonl and you believe it is incorrectly classified:

Find your entry: jq 'select(.login == "YOUR_LOGIN")' data/suspects.jsonl
Open a false positive issue with your login, classification, scan date, and explanation
Reports are reviewed manually. Verified false positives are added to data/allowlist.txt and excluded from all future scans.

Note: opening an issue does not modify or remove any existing data. The suspects ledger is append-only. The allowlist only affects future scans.

Contributing

pip install -e ".[dev]"
python -m black .
python -m ruff check .
python -m mypy src
python -m pytest

All four must pass before a PR.

Disclaimer

This tool performs read-only analysis of public GitHub data using the official GitHub API. Where issues are filed on targeted repositories, they contain probabilistic findings and are clearly labelled as automated. Findings are indicators, not accusations. False positives exist and are expected.

Built with AI as a coding partner, in response to an ecosystem problem created in part by AI.

License

Apache 2.0. See LICENSE

Author

Built by tg12 · GitHub

A JS Labs project · AI Slop Intelligence Dashboards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

phantomstars

Why this exists

What it does

Frequently asked questions

Does it notify the targeted repo?

Can I report a false positive?

What is the campaign ID?

Does it check account creation dates?

How confident is it?

Live dashboard

Today's most-targeted repos

Scoring model

Campaign detection

Data format

Setup

1. Fork this repo

2. Add a GitHub PAT secret

3. Enable Actions

4. Run locally

Project structure

Limitations and known failure modes

False positive process

Contributing

Disclaimer

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
data		data
src/phantomstars		src/phantomstars
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

phantomstars

Why this exists

What it does

Frequently asked questions

Does it notify the targeted repo?

Can I report a false positive?

What is the campaign ID?

Does it check account creation dates?

How confident is it?

Live dashboard

Today's most-targeted repos

Scoring model

Campaign detection

Data format

Setup

1. Fork this repo

2. Add a GitHub PAT secret

3. Enable Actions

4. Run locally

Project structure

Limitations and known failure modes

False positive process

Contributing

Disclaimer

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages