LangChain Utilities
Do you find yourself frequently copy-pasting texts from the web / PDFs / other documents into ChatGPT?
If yes, these tools are for you!
Optimized to feed into a chat interface (like ChatGPT) manually in one or multiple (to get around context length limits) goes.
Basically, the prompts generated look like this:
REPLY_OK_IF_YOU_READ_TEMPLATE = '''
Below is {what}, reply "OK" if you read:
"""
{content}
"""
'''.strip()
You can feed it directly to a chat interface like ChatGPT, and ask follow up questions about it.
See prompts.py
for other variations.
- Loading
https://github.com/tddschn/langchain-utils
and copy to clipboard:
CleanShot-2023-04-13T18-15-32.mp4
- Load 3 pages of a pdf file, open each part for inspection before copying, and optionally merge 3 pages into 2 prompts that wouldn't go over the
gpt-3.5-turbo
's context length limit with langchain'sTokenTextSplitter
.
CleanShot-2023-04-13T18-25-30.mp4
$ pandocprompt --help
usage: pandocprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT] [-C]
[-w WHAT] [-M] [--from PANDOC_FROM_FORMAT]
[--to PANDOC_TO_FORMAT]
[PATH ...]
Get prompts from arbitrary files. You need to have `pandoc` installed and in
$PATH, it will be used to convert source files to desired (hopefully textual)
format. Common use cases: Getting prompts from EPub books or several TeX
files.
positional arguments:
PATH Paths to the text files, or stdin if not provided
(default: None)
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
-C, --from-clipboard Load text from clipboard (default: False)
-w WHAT, --what WHAT Initial knowledge you want to insert before the PDF
content in the prompt (default: the content of a
document)
-M, --merge Merge contents of all pages before processing
(default: False)
--from PANDOC_FROM_FORMAT
The format that is passed to -f in pandoc (default:
None)
--to PANDOC_TO_FORMAT
The format that is passed to -t in pandoc. gfm-
raw_html means GitHub Flavored Markdown with raw HTML
stripped. (default: gfm-raw_html)
$ urlprompt --help
usage: urlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT] [-w WHAT]
[-M] [-j] [-g] [--github-path GITHUB_PATH]
[--github-revision GITHUB_REVISION] [--substack]
URL
Get a prompt consisting the text content of a webpage
positional arguments:
URL URL to the webpage
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
-w WHAT, --what WHAT Initial knowledge you want to insert before the PDF
content in the prompt (default: the content of a
webpage)
-M, --merge Merge contents of all pages before processing
(default: False)
-j, --javascript Use JavaScript to render the page (default: False)
-g, --github Load the raw file from a GitHub URL (default: False)
--github-path GITHUB_PATH
Path to the GitHub file (default: README.md)
--github-revision GITHUB_REVISION
Revision for the GitHub file (default: master)
--substack Load from a Substack URL and convert it to Markdown
(default: False)
$ pdfprompt --help
usage: pdfprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT]
[-p PAGES [PAGES ...]] [-l PAGE_SLICE] [-M] [-w WHAT] [-o]
[-O] [-L OCR_LANGUAGE]
PDF Path
Get a prompt consisting the text content of a PDF file
positional arguments:
PDF Path Path to the PDF file
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
-p PAGES [PAGES ...], --pages PAGES [PAGES ...]
Only include specified page numbers (default: None)
-l PAGE_SLICE, --page-slice PAGE_SLICE
Use Python slice syntax to select page numbers (e.g.
1:3, 1:10:2, etc.) (default: None)
-M, --merge Merge contents of all pages before processing
(default: False)
-w WHAT, --what WHAT Initial knowledge you want to insert before the PDF
content in the prompt (default: the content of a PDF
file)
-o, --fallback-ocr Use OCR as fallback if no text detected on page,
please set TESSDATA_PREFIX environment variable to the
path of your tesseract data directory (default: False)
-O, --force-ocr Force OCR on all pages (default: False)
-L OCR_LANGUAGE, --ocr-language OCR_LANGUAGE
Language to use for Tesseract OCR (like eng, chi_sim,
chi_tra, chi_tra_vert etc.)) (default: eng)
$ ytprompt --help
usage: ytprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT]
YouTube URL
Get a prompt consisting Title and Transcript of a YouTube Video
positional arguments:
YouTube URL YouTube URL
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
$ textprompt --help
usage: textprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT] [-C]
[-w WHAT] [-M]
[PATH ...]
Get a prompt from text files
positional arguments:
PATH Paths to the text files, or stdin if not provided
(default: None)
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
-C, --from-clipboard Load text from clipboard (default: False)
-w WHAT, --what WHAT Initial knowledge you want to insert before the PDF
content in the prompt (default: the content of a
document)
-M, --merge Merge contents of all pages before processing
(default: False)
$ htmlprompt --help
usage: htmlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT] [-C]
[-w WHAT] [-M]
[PATH ...]
Get a prompt from html files
positional arguments:
PATH Paths to the html files, or stdin if not provided
(default: None)
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
-C, --from-clipboard Load text from clipboard (default: False)
-w WHAT, --what WHAT Initial knowledge you want to insert before the PDF
content in the prompt (default: the text content of a
html file)
-M, --merge Merge contents of all pages before processing
(default: False)
This is the recommended installation method.
$ pipx install langchain-utils
$ pip install langchain-utils
$ git clone https://github.com/tddschn/langchain-utils.git
$ cd langchain-utils
$ poetry install