Skip to content

vapniks/extract-text

Repository files navigation

Commentary

Bitcoin donations gratefully accepted: 1FgnGFwRES9MGzoieRDWLmLkjqMT3AsjQF

This library provides functions for programmatically extracting text from buffers. The `extract-matching-strings’ and `extract-matching-rectangles’ macros allow you to perform simple extractions, but for more complex tasks there is `extract-text’. This macro allows complex text extraction specifications using a kind of dsl for text extraction. You can use included wrapper functions for picking out individual bits of text, or define your own and save them in `extract-text-user-wrappers’ and `extract-text-user-progs’. You can specify how many times to repeat each individual extraction, how to handle errors or missing values, restrict the buffer for finding certain extractions, how to structure &/or transform the output, etc. See the docstring for `extract-text’ for more details.

See the file extract_law-enforcement_data.el for an example use case of scraping law enforcement statistics from pdf files.

Commands

  • `extract-text-from-files’ : Extract text from a list of FILES, and save to file or `kill-ring’, or insert as org-table.
  • `extract-text-from-buffers’ : Extract text from a list of BUFFERS and save to file or `kill-ring’, or insert as org-table.
  • `extract-text-from-current-buffer’ : Extract text from the current buffer, or active region.

Functions

  • `match-strings’ : Return a list of all strings matched by last search.
  • `match-strings-no-properties’ : Return a list of all strings matched by last search.
  • `extract-matching-strings’ : Extract strings from current buffer that match subexpressions of REGEXP.
  • `extract-matching-rectangle’ : Extract a rectangle of text (list of strings) from the current buffer.
  • `copy-rectangle-to-buffer’ : Copy a rectangular region of the current buffer to a new buffer.
  • `extract-text-choose-prog’ : Choose item from `extract-text-user-progs’, and return arguments for `extract-text’.
  • `extract-text-compile-prog’ : Create compiled version of `extract-text’ with arguments PRG applied.

Macros

  • `extract-text’ : Extract text from buffer according to specifications in ARGS.

Options

  • `extract-text-user-wrappers’ : A list of wrapper functions that can be used with `extract-text’.
  • `extract-text-user-progs’ : A list of different extraction programs/specifications

Requirements

Features that might be required by this library:

cl, dash, macro-utils, keyword-arg-macros, ido-choose-function

Installation

Put extract-text.el in a directory in your load-path, e.g. ~/.emacs.d/ You can add a directory to your load-path with the following line in ~/.emacs (add-to-list ‘load-path (expand-file-name “~/elisp”)) where ~/elisp is the directory you want to add (you don’t need to do this for ~/.emacs.d - it’s added by default).

Add the following to your ~/.emacs startup file.

(require ‘extract-text)

About

Functions & macros for extracting/scraping text from buffers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published