Prompt Engineering

This repository contains a collection of Python scripts and tools designed for various tasks related to text extraction, categorisation, and prompt engineering. The main functionalities include a JSON database of research papers with Prompt Patterns (PPs) and Prompt Examples (PEs) extracted, extracting text from PDFs, categorising text using Cosine Similarity, and generating and testing prompts for AI models.

Installation

Clone the repository:

git clone https://github.com/yourusername/your-repo.git
cd your-repo

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows use `.venv\Scripts\activate`

Install the required dependencies:

pip install -r requirements.txt

Set up environment variables by creating a .env file in the root directory and adding the necessary keys:

AZURE_OPENAI_MODEL=<your-model>
API_VERSION=<your-api-version>
AZURE_OPENAI_KEY=<your-api-key>
AZURE_OPENAI_ENDPOINT=<your-endpoint>

Usage

Extract Text from PDF

To extract text from a PDF file, use the extractTextFromPDF.py script. Below are some examples:

python extractTextFromPDF.py -filename "Test.pdf"
python extractTextFromPDF.py -filename "Test.pdf" -pages 1-10
python extractTextFromPDF.py -filename "Test.pdf" -pages 1-10 -extractexamples True
python extractTextFromPDF.py -filename "Test.pdf" -pages 1-10 -summary True
python extractTextFromPDF.py -filename "Test.pdf" -pages 1-10 -keypoints True

Categorise Text Using Cosine Similarity

To categorise text using Cosine Similarity, use the categorisation_cosine_similarity.py script:

python categorisation_cosine_similarity.py --top_n 5
python categorisation_cosine_similarity.py --threshold 0.5

Generate and Test Prompt

To generate and test prompts, use the testPrompts.py script:

python testPrompts.py
python vision_testPrompts.py

Export PPs and PEs from the JSON File

To export and count the PPs and PEs from the promptpatterns.json JSON file, use the exportPromptPatternsJSONfile.py script. Below are some example usages:

Print the PPs and PEs to the console: This will print the PPs and PEs to the console in a formatted way.

python exportPromptPatternsJSONfile.py --format console

Write the PPs and PEs to an HTML file with the default filename promptpatterns.html:

This will write the PPs and PEs to an HTML file called promptpatterns.html in the same directory as the script.

python exportPromptPatternsJSONfile.py --format html

Write the PPs and PEs to an HTML file with a custom filename: This will write the PPs and PEs to an HTML file called mypromptpatterns.html in the same directory as the script.

python exportPromptPatternsJSONfile.py --format html --filename mypromptpatterns.html

Include the current date in the filename of the HTML file: This will write the PPs and PEs to an HTML file with a filename that includes the current date in the format promptpatterns_YYYYmmdd.html.

python exportPromptPatternsJSONfile.py --format html --filename promptpatterns_{date}.html

Count the number of Titles, PatternCategory, and pattern name:

This will count the number of Titles, PatternCategory, and pattern name and output it to the console.

python exportPromptPatternsJSONfile.py --count

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements, research paper additions or bug fixes.

For ethical / dual‑use concerns use the "Responsible Use Report" issue template. For AI-enriched metadata corrections use the "AI-Assisted Field Correction" template.

Security vulnerabilities should follow the coordinated disclosure process in SECURITY.md (private advisory or email) rather than a public issue.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Security Policy

See prompt-pattern-dictionary/SECURITY.md for supported versions and coordinated disclosure steps. Avoid including sensitive exploit payloads or personal data in reports.

Responsible Use Guidelines

A standalone page at /responsible-use in the web app and the Orientation "Accessibility & Responsible Use" section document:

Core principles: transparency, defensive focus, privacy, inclusivity.
Acceptable uses: research, defensive tooling, education, evaluation with non-sensitive data.
Prohibited uses: real exploit/malware generation, phishing deployment, guardrail bypass attempts.
Safeguards: provenance badges, planned caution indicators, issue templates, minimal telemetry (opt-in, excludes prompt content).

Report ethical concerns with the issue label responsible-use-review; corrections to AI-assisted fields with ai-assist-correction.

Prompt Pattern Dictionary (Web App)

The prompt-pattern-dictionary/ subfolder contains a Next.js application and a data pipeline to build a searchable dictionary of prompt patterns.

Key build notes:

Data pipeline script: prompt-pattern-dictionary/scripts/build-data.js
Python steps (embeddings, categorization, enrichment) auto-detect and prefer uv run when available. To force uv on Windows PowerShell:

$env:USE_UV = "1"
node .\prompt-pattern-dictionary\scripts\build-data.js --enrich --enrich-limit 10 --enrich-fields template

Enrichment flags:
- --enrich to enable optional enrichment via Azure OpenAI (GPT-5)
- --enrich-limit <n> to cap items processed
- --enrich-fields <csv> to scope fields: template,application,dependentLLM,turn
GPT-5 temperature behavior: The enrichment pipeline does not set temperature for GPT-5 (Azure requires default temperature). The client also retries without temperature if the service rejects the parameter.

Build and Run (Windows PowerShell)

Run these from the repository root unless noted.

Install dependencies for the web app:

cd .\prompt-pattern-dictionary
npm install

Build data (required before first run and whenever source JSON changes):

# Optional: prefer uv for any Python steps in the pipeline
$env:USE_UV = "1"
node .\scripts\build-data.js

Start in development mode:

npm run dev
# Open http://localhost:3000

Build for production and start the server:

npm run build
npm start
# Open http://localhost:3000

Notes:

The npm run build script runs the full pipeline: data transform, normalized schema, semantic categories, and next build.
Use npm run export if you want a static export (files in prompt-pattern-dictionary/out).

Known issue: OneDrive/OneNote locking `.next` folder

If the repo is inside a OneDrive-synced directory (including OneNote notebooks), the .next build folder may be locked or partially synced, causing build or dev server errors (e.g., EBUSY/EPERM on Windows).

Workarounds:

Exclude the project (or at least the .next folder) from OneDrive sync.
Move the project outside OneDrive-synced paths (recommended for Next.js development).
If a lock occurs, close OneNote/OneDrive temporarily, delete .next, and re-run npm run dev or npm run build.

Orientation Architecture

The Orientation content (how to use the dictionary) was refactored from a single long page into a hybrid, multi-page structure:

Hub: /orientation – overview cards linking to each section plus links to “All Sections” and the Cheat Sheet.
Per-section routes: /orientation/{slug} – focused pages (quick-start, what-is-a-pattern, pattern-anatomy, lifecycle, choosing-patterns, combining-patterns, adaptation, anti-patterns, quality-evaluation, accessibility-responsible-use, glossary, faq, feedback, next-steps).
Consolidated legacy view: /orientation/all – full scrollable content (retains original anchors for deep link continuity).
Printable / rapid reference: /orientation/cheatsheet – condensed key constructs and workflows.

Sections are metadata-driven via ORIENTATION_SECTIONS (number, slug, title, component). Navigation components (sidebar + inline chip set) render from this single source of truth; the pager component wires previous/next traversal.

Readability & Theming Controls

User-adjustable preferences enhance accessibility and reading comfort:

Font scale: data attribute data-font-scale applied to <html> with supported values -1, 0, 1, 2 (base, + steps). CSS scales body text and headings accordingly.
Width mode: data-width-mode = default | relaxed; relaxed widens prose up to ~85ch for users needing fewer line wraps.
Theme / contrast: data-theme = light | dark | high-contrast with a system option that removes the attribute and defers to prefers-color-scheme media queries. (The legacy value hc is auto-migrated if found in saved preferences.) See prompt-pattern-dictionary/docs/THEMING.md for full token architecture.
Persistence: Stored under localStorage key orientation:readability:v1; hydration script replays settings and applies attributes with minimal layout shift.
UI: ReadabilityControls component (toolbar) with buttons for font scaling, width toggle, and a select for theme mode. Appears in orientation layout (sidebar desktop + inline mobile). Can be reused site‑wide later.

Legacy Anchor Redirect Behavior

A lightweight client component (LegacyHashRedirect) preserves backward compatibility for old single-page anchors:

On mount, it inspects window.location.hash.
If the hash matches a known section slug and you are not already on that route, it router.replace() to /orientation/{slug}.
If the hash does not match a known slug and you are not on /orientation/all, it redirects to /orientation/all#hash, ensuring deep links to sub‑headings still land meaningfully.
If already on /orientation/all, no action is taken.

This maintains existing external links and bookmarks without server‑side redirects. A future enhancement may introduce explicit server (or Next.js middleware) 301 mappings for improved SEO signals—tracked as a backlog item.

Extending the Preference System

To add a new user preference:

Extend the usePreferences hook (under prompt-pattern-dictionary/src/app/orientation/hooks/) with state, localStorage serialization, and dataset syncing.
Choose a descriptive data-* attribute name; keep attribute count minimal (prefer reusing existing tokens vs. additive bespoke classes).
Update the ReadabilityControls UI (or create a new modular control) with accessible semantics (button labels, aria-pressed, or form elements as appropriate).
Document the accepted values and rationale in this README (and optionally in a dedicated docs/ACCESSIBILITY.md or docs/PREFERENCES.md).
Add non-destructive CSS tied to the attribute in globals.css guarded by clear comments.

Guardrails:

Avoid preferences that require layout reflow more than once per interaction.
Provide reversible changes (toggle or reset option) if adding multi-step controls.
Maintain WCAG contrast compliance across all themes and states.

Name		Name	Last commit message	Last commit date
Latest commit History 593 Commits
.github		.github
analysis_results		analysis_results
custom_output		custom_output
graphvizFiles		graphvizFiles
images		images
latex_writing		latex_writing
prompt-pattern-dictionary		prompt-pattern-dictionary
scripts		scripts
skills		skills
tests		tests
.gitignore		.gitignore
2402_07927.md		2402_07927.md
FOLDER_STRUCTURE.md		FOLDER_STRUCTURE.md
PRD_PromptPatternDictionary.md		PRD_PromptPatternDictionary.md
README.md		README.md
ScratchPad.md		ScratchPad.md
TEMPLATE_LOGIC_UPDATES.md		TEMPLATE_LOGIC_UPDATES.md
TestID0.txt		TestID0.txt
agent0.py		agent0.py
agent0prompt.json		agent0prompt.json
agent1.py		agent1.py
autogen02_research_assistant.py		autogen02_research_assistant.py
autogen02_two_agents.py		autogen02_two_agents.py
autogen04_deepresearchagents.py		autogen04_deepresearchagents.py
autogen04_writingassistant copy.py		autogen04_writingassistant copy.py
autogen04_writingassistant.py		autogen04_writingassistant.py
azure_gpt_task.py		azure_gpt_task.py
azure_models.py		azure_models.py
categorisation.py		categorisation.py
categorisation_cosine_similarity.py		categorisation_cosine_similarity.py
categorisation_from_promptpatterns.py		categorisation_from_promptpatterns.py
categorisation_logic_app_cat_pat.py		categorisation_logic_app_cat_pat.py
categorisation_logic_cat_app_pat.py		categorisation_logic_cat_app_pat.py
categorisation_write_up.py		categorisation_write_up.py
category_definitions.py		category_definitions.py
check_content_types.py		check_content_types.py
createMindMap.py		createMindMap.py
debug.log		debug.log
dual_categorization_insights_analysis.json		dual_categorization_insights_analysis.json
enhanced_patterns_20250730_110545.json		enhanced_patterns_20250730_110545.json
exportPromptPatternsJSONfile.py		exportPromptPatternsJSONfile.py
extractTextFromPDF.py		extractTextFromPDF.py
format_files.py		format_files.py
modify_logic.py		modify_logic.py
pe_strategies_techniques.md		pe_strategies_techniques.md
peil_prompt_generator.py		peil_prompt_generator.py
promptpatterns.json		promptpatterns.json
promptpatternsschema.json		promptpatternsschema.json
recategorization_report_20250730_110545.json		recategorization_report_20250730_110545.json
requirements.txt		requirements.txt
requirements_updated.txt		requirements_updated.txt
simple_categorisation_writeup.py		simple_categorisation_writeup.py
test-phase2-integration.py		test-phase2-integration.py
testPrompts.py		testPrompts.py
update_venv.py		update_venv.py
vision_testPrompts.py		vision_testPrompts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompt Engineering

Table of Contents

Installation

Usage

Extract Text from PDF

Categorise Text Using Cosine Similarity

Generate and Test Prompt

Export PPs and PEs from the JSON File

Contributing

License

Security Policy

Responsible Use Guidelines

Prompt Pattern Dictionary (Web App)

Build and Run (Windows PowerShell)

Known issue: OneDrive/OneNote locking `.next` folder

Orientation Architecture

Readability & Theming Controls

Legacy Anchor Redirect Behavior

Extending the Preference System

About

Uh oh!

Releases

Packages

Languages

timhaintz/PromptEngineering

Folders and files

Latest commit

History

Repository files navigation

Prompt Engineering

Table of Contents

Installation

Usage

Extract Text from PDF

Categorise Text Using Cosine Similarity

Generate and Test Prompt

Export PPs and PEs from the JSON File

Contributing

License

Security Policy

Responsible Use Guidelines

Prompt Pattern Dictionary (Web App)

Build and Run (Windows PowerShell)

Known issue: OneDrive/OneNote locking .next folder

Orientation Architecture

Readability & Theming Controls

Legacy Anchor Redirect Behavior

Extending the Preference System

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Known issue: OneDrive/OneNote locking `.next` folder

Packages