Skip to content

Citation Tools

shencong edited this page Jun 1, 2026 · 1 revision

🌐 English · 中文 · 🏠 Home

Citation & Writing Tools

The Humanities Writing Companion ships a small toolchain in scripts/ that turns the skill's "engineering rigor" principle into something you can actually run. The idea is simple: AI self-awareness is a soft norm, scripts are the hard mechanism. A prompt can promise not to hallucinate citations; a script that pings Crossref can catch it when the promise slips.

These helpers are zero- or low-dependency by design:

  • The Python scripts use only the Python 3 standard library — nothing to pip install.
  • The shell scripts use bash + grep — nothing to install at all.
  • Everything fails safe: a missing file or an empty match returns a friendly message, never a stack trace.

This page is a hands-on tour of the three citation scripts, plus a brief note on the two writing-hygiene scripts. For each one you get: what it does, the exact command, real example input and output, and where it slots into the writing workflow (the skill's modes A–K).

Before you start: cd into your skill repo so the scripts/ paths below resolve. The Python scripts run as-is; the shell scripts may need execute permission once (see First-time setup).


The toolchain at a glance

Script Job Runs offline? Workflow home
citation-consistency.py Find inconsistent citation formatting across a draft ✅ Yes Mode B / pre-submission
citation-format-convert.py Convert a BibTeX library into Chicago / MLA / APA 7 / GB/T 7714 ✅ Yes Before Mode K / multi-journal switching
citation-verify.py Check each inline citation against Crossref (catch AI hallucinations) ❌ Needs network After Mode B, before Mode G
ai-trace-scan.sh Flag AI clichés & overused connectors ✅ Yes Mode F / before Mode B
pending-checks.sh Round up every unfinished marker in a project ✅ Yes Start of session / pre-submission

1. citation-consistency.py — is your formatting uniform?

What it does

This script reads a whole draft and asks one question: are you citing the same way everywhere? It is not a style checker — it does not know or care whether you are following APA or Chicago. It only catches drift: the half-width bracket you used in chapter 1 versus the full-width bracket in chapter 3, the & here and the there.

It looks for five kinds of inconsistency:

  1. Mixed bracket types — half-width () vs. full-width ()
  2. Mixed commas inside citations, vs.
  3. Inconsistent multi-author connectors& / and / / /
  4. Inconsistent name forms for the same source (a Chinese translated name in one place, the original surname in another)
  5. Inconsistent page-number formatsp. X / p.X / 第 X 页

How to run it

python3 scripts/citation-consistency.py path/to/paper/main.md

It takes a single Markdown file. Point it at your assembled manuscript, or at one chapter at a time.

Example

Say draft.md mixes a few conventions:

观点 A(Foucault,1975)与观点 B (Scott, 1986) 并列。
中文引用(张三和李四,2020,第 33 页)。
另见 (王五 & Smith, 2019, p.12)。

Run it:

python3 scripts/citation-consistency.py draft.md

Output:

=== 引用格式一致性扫描 / Citation-format consistency scan · draft.md ===

引用统计 / Citation counts:英文形式 / English form (Author, year) 1 / 中文形式 / Chinese form (作者,年份) 2

⚠ 括号类型混用 / Mixed parenthesis types:少数派占比 / minority share 50.0%(建议全文统一为一种 / recommend unifying to one form throughout)

⚠ 多作者连接词不统一 / Inconsistent multi-author connectors:{'&': 1, '和': 1}
  建议根据所选引用格式(APA 用 & / GB/T 7714 用 , 等)统一全文
  Recommend unifying throughout per the chosen citation style (APA uses & / GB/T 7714 uses , etc.)

⚠ 页码格式不统一 / Inconsistent page-number formats:{'p.X (无空格 / no space)': 1, '第 X 页': 1}

共发现约 3 处需要审查的格式问题 / About 3 format issue(s) need review

A clean file instead reports ✅ 未发现明显的格式不一致 / No obvious format inconsistencies found. When a bracket/comma mismatch sits on a single line, the report points to it precisely — e.g. L3: 全角括号 + 半角逗号 → (张三, 2020).

Where it fits

  • A local consistency check after finishing a chapter.
  • A whole-text uniformity audit before submission (Mode B and the final pass).
  • A regression check after introducing new sources — new references are exactly where drift creeps in.

Know the boundary

This script checks consistency, not conformance. It will happily tell you that your citations are uniformly wrong. To check whether the style itself is correct (APA / Chicago / GB/T 7714), work through _writing-config/引用格式速查.md by hand. And because the scan is heuristic regex, expect the occasional false positive — the results want a human eye.


2. citation-format-convert.py — one library, four styles (new in v4.0)

What it does

You keep a BibTeX library; the target journal wants a specific reference style. This script converts your .bib into a clean reference list in one of four formats the humanities actually use:

  • Chicago Author-Date — the workhorse of history and the humanities
  • MLA 9 — literature and linguistics
  • APA 7 — education, psychology, and parts of the social sciences
  • GB/T 7714 (numeric sequential system / 顺序编码制) — the Chinese national standard for journals, complete with [M] / [J] / [C] / [D] type markers and the truncation for 4+ authors

Supported BibTeX entry types: @book, @article, @incollection, @inbook, @inproceedings, @thesis, @phdthesis.

How to run it

# Print to your terminal
python3 scripts/citation-format-convert.py refs.bib --to chicago

# Write to a file
python3 scripts/citation-format-convert.py refs.bib --to apa --out refs-apa.txt

# Choose a sort order: author (default), year, key, or input
python3 scripts/citation-format-convert.py refs.bib --to mla --sort year

--to is required (chicago | mla | apa | gb7714). --out is optional — omit it and the list goes to stdout. Malformed entries are reported on stderr, never silently dropped.

Example

Given refs.bib:

@book{foucault1975,
  author = {Michel Foucault},
  title  = {Surveiller et punir},
  year   = {1975},
  address = {Paris},
  publisher = {Gallimard}
}

@article{scott1986,
  author = {Joan W. Scott},
  title  = {Gender: A Useful Category of Historical Analysis},
  journal = {The American Historical Review},
  volume = {91},
  number = {5},
  pages  = {1053-1075},
  year   = {1986}
}

Chicago Author-Date:

python3 scripts/citation-format-convert.py refs.bib --to chicago
Foucault, Michel. 1975. *Surveiller et punir*. Paris: Gallimard.

Scott, Joan W.. 1986. "Gender: A Useful Category of Historical Analysis." *The American Historical Review* 91, no. 5: 1053-1075.

APA 7:

Foucault, M. (1975). *Surveiller et punir*. Gallimard.

Scott, J. W. (1986). Gender: A Useful Category of Historical Analysis. *The American Historical Review*, *91*(5), 1053-1075.

GB/T 7714 (numeric sequential system):

Foucault M. Surveiller et punir[M]. Paris: Gallimard, 1975.

Scott JW. Gender: A Useful Category of Historical Analysis[J]. The American Historical Review, 1986,91(5):1053-1075.

Notice the GB/T 7714 conventions: the [M] / [J] document-type markers, no comma between surname and initials, and the compact year,volume(issue):pages locator.

Where it fits

  • Preparing the final reference list before submission, when the target journal has a specific format.
  • Switching the same paper between journals — regenerate the whole list in seconds.
  • Right before the Mode K (AI-use disclosure) output, so your bibliography is already in house style.

Know the boundary

This is not a replacement for BibLaTeX / CSL. Those engines model every journal's idiosyncratic variant; if your toolchain can use BibLaTeX, use it. This script serves the in-flight case — you have a .bib on hand and need a list for this journal right now. Two more honest limits:

  • Every style hides a thicket of subtle rules and journal-specific variants. Always diff the output against the target journal's style guide, and treat it as a draft, not finished copy.
  • It formats the reference list only — it does not touch the inline citations inside your prose, which require understanding document structure.

3. citation-verify.py — did the AI make this up? (new in v4.0)

What it does

This is the hallucination catcher. It scans every inline citation in a Markdown draft and checks each one for existence against the public Crossref API. Its sweet spot is fabricated journal-article citations — the plausible-looking (Author, Year) an LLM invents from "memory."

It parses citations across styles — Chicago/APA author-year (Foucault, 1975, p. 23), narrative Foucault (1975), and Chinese (福柯, 1975) — and sorts each into three verdicts:

  • ✓ FOUND — Crossref has a high-confidence match (similarity ≥ 0.85). Usually trustworthy.
  • ⚠ FUZZY_MATCH — a near-but-imperfect match (0.5–0.85). Could be a misspelling, a wrong year, or a different author of the same name. Review.
  • ✗ NOT_FOUND — no Crossref match. Be alert — but not alarmed (see the boundary below).

How to run it

# Human-readable report
python3 scripts/citation-verify.py path/to/draft.md

# Quiet + JSON, for CI or programmatic use
python3 scripts/citation-verify.py path/to/draft.md --quiet --json

Because it makes live network calls, it is politely rate-limited to one request per second — a 30-citation chapter takes about half a minute. By default it prints progress to stderr; --quiet suppresses that, and --json swaps the human report for machine-readable output.

Example

Given a draft with one real article and one invented one:

Scott (1986) reframed gender as a category of historical analysis.
A later study (Hallor, 2018) supposedly extended this to digital archives.

Run it:

python3 scripts/citation-verify.py draft.md

Report (abridged):

=== Citation verification (Crossref) ===
Total citations parsed: 2
  ✓ Found:        1
  ⚠ Fuzzy match:  0  ← review
  ✗ Not found:    1  ← review (or may be off-Crossref humanities work)

## ✗ NOT FOUND in Crossref (1)

  (Hallor, 2018)
    No Crossref match for (Hallor, 2018). This may be a humanities work outside
    Crossref coverage (monograph, archival, classics, dissertation,
    foreign-language), or it may not exist. Manually verify.

## ✓ FOUND (1)

  (Scott, 1986) → Gender: A Useful Category of Historical Analysis
    DOI: 10.2307/1864376

=== Reminders ===
  · NOT_FOUND is expected for monographs, archival sources, classics, dissertations,
    and non-English-language works. ...
  · This script catches the LLM-hallucination case (made-up journal articles).
    For monograph citations, use the [VERIFY] / [待核对] marker workflow.

The DOI shown is illustrative; the exact metadata comes back live from Crossref at run time.

Where it fits

  • After Mode B (chapter-level review), before Mode G (blind-reading check).
  • On any chapter the AI drafted (run it after the Mode C output).
  • As the final compliance check before submission.

Know the boundary

This is the most important caveat on the page: Crossref does not index everything. A huge swath of humanities scholarship — monographs from small university presses, untranslated foreign-language books, dissertations, archival sources, classical texts — simply is not in Crossref. For those, NOT_FOUND is the expected result and signals nothing wrong.

So read the verdicts in context. The script is excellent at flagging hallucinated journal articles (Crossref's strong suit) and near-useless at judging a citation to a 19th-century monograph. For monograph, archival, and classics citations, the right tool is the [VERIFY] / [待核对] marker protocol described in SKILL.md — a human goes to the shelf. Use the script and the marker protocol together, not one instead of the other.


Two writing-hygiene scripts (briefly)

The directory also holds two shell scripts that guard prose rather than citations.

ai-trace-scan.sh — cliché & connector scan

Scans a file or a whole project directory for the AI tells catalogued in references/ai-trace-checklist.md: high-frequency filler ("值得注意的是", "综上所述", "本文旨在", …) flagged on any occurrence, plus connectors ("此外", "同时", "另外", …) flagged only when they pile up past a threshold (>3 per file).

./scripts/ai-trace-scan.sh path/to/chapter.md     # one file
./scripts/ai-trace-scan.sh path/to/paper/         # whole project (recursive, *.md)

When: after revising a chapter in Mode F, before a Mode B review, and as a final pre-completion pass. It only flags suspects — whether a phrase actually needs changing is the author's call; some "boilerplate" is a deliberate choice in context.

pending-checks.sh — pending-marker roundup

Pulls every unfinished marker out of a project so nothing slips through to submission:

Marker Meaning Priority
[待核对] AI cited from memory / unverified fact 🔴 Must hit zero before submission
❓ 待讨论 A choice the author must decide 🟡 Handle as work advances
[AI 草稿,待作者审阅] AI-drafted, not yet reviewed 🟢 Remove marker after review
>>> A spot the AI was unsure about 🔵 Handle right after drafting
[作者微调] Author's tweak to an AI suggestion 🟣 Write back to the style profile
./scripts/pending-checks.sh path/to/paper/        # whole project
./scripts/pending-checks.sh path/to/chapter.md    # single file

When: at the start of every session (what's still open?), as the final checklist before submission, and as a status summary when you resume across conversations.


A note on language and detection patterns

You'll have noticed the bilingual output — every report prints Chinese / English side by side. That is deliberate: the companion is a bilingual project, and you can use it comfortably in either language.

You'll also notice the detection patterns are Chinese on purpose. The cliché list, the pending markers, and several citation patterns are tuned for Chinese academic prose, because that's the gap these scripts were built to close. The citation scripts still handle English citations and English BibTeX perfectly well — but the AI-cliché scanner, in particular, is sharpest on Chinese writing. This is a feature, not an oversight.


First-time setup

The Python scripts need nothing — they run on the standard library. The shell scripts may need execute permission once:

chmod +x scripts/ai-trace-scan.sh scripts/pending-checks.sh

After that, every script in this page is a single command away.


How these map to the writing modes

A quick mental model for when to reach for what:

  • Mode B (chapter review): run ai-trace-scan.sh, then citation-consistency.py, then citation-verify.py.
  • Mode C (AI drafting): any AI-drafted chapter gets a citation-verify.py pass and a pending-checks.sh sweep.
  • Mode F (draft revision): ai-trace-scan.sh after each chapter.
  • Before Mode K (AI-use disclosure) / multi-journal submission: citation-format-convert.py to regenerate the reference list in house style.
  • Pre-submission: the full battery — consistency, verification, pending markers all cleared, clichés reviewed.

The honest throughline, straight from the skill's design principles: these scripts state plainly what they do and what they do not, precisely so you never get the false comfort that "everything checked off" means "no problems left." They flag; you decide.


🌐 English · 中文 · 🏠 Home

Clone this wiki locally