PATFT: A USPTO PATFT Parsing Library

PATFT is a simple gem to extract relevant data from raw HTML provided by the USPTO at http://patft.uspto.gov/. PATFT uses Nokogiri and XPath to scan HTML files provided to it and returns a structure (e.g., Hash/JSON) representation of the patent document.

WARNING: PATFT is under active development, refer to the roadmap below (and the specs) to see what is and is not implemented.

Usage

require 'patft'

local_html = File.read('patent.html')
patents = Parser.new(local_html)

patents.extract(:title) # => 'System and method for ...'

Note that PATFT::Parser#parse requires a String representation of the HTML, how you get that is up to you. This was intentional given the USPTO's policy on scraping (and generally to encourage being responsible).

Output Format

Below are the keys output by Parser#parse:

number

A String containing the patent number, without kind code. Note that this field may contain non-numeric characters for design, re-issue, etc. patents.

title

A String containing the title.

Roadmap

Short Term

Extract the following fields:

Format notes:

Asterisks denote structured data.
Plusses denote arrays of data
Asterisks and plusses are arrays of structured data

Medium Term

CLI
Increase field support based on red book (e.g., PCT data)

Long Term (rough ideas)

Remote search interface
Query tool ("Advanced Search")
AppFT (probably a different gem)

License

The gem is available as open source under the terms of the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
bin		bin
lib		lib
spec		spec
.gitignore		.gitignore
.rspec		.rspec
.travis.yml		.travis.yml
Gemfile		Gemfile
Guardfile		Guardfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
patft.gemspec		patft.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PATFT: A USPTO PATFT Parsing Library

Usage

Output Format

number

title

Roadmap

Short Term

Medium Term

Long Term (rough ideas)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PATFT: A USPTO PATFT Parsing Library

Usage

Output Format

number

title

Roadmap

Short Term

Medium Term

Long Term (rough ideas)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages