openFDA is an FDA project to provide open APIs, raw data downloads, documentation and examples, and a developer community for an important collection of FDA public datasets.

Python 599 137 Updated Dec 26, 2022

wesm / vbench

vbench: A tool for benchmarking your code through time, for showing performance improvement or regressions

Python 244 41 Updated Oct 12, 2017

internetarchive / warc

Python library for reading and writing warc files

Python 239 114 Updated Mar 7, 2022

dpapathanasiou / pdfminer-layout-scanner

A more complete example of programming with PDFMiner, which continues where the default documentation stops

Python 214 115 Updated Dec 3, 2019

bryancatanzaro / copperhead

Data Parallel Python

Python 207 26 Updated May 10, 2013

trivio / common_crawl_index

Index URLs in Common Crawl

Python 193 47 Updated Sep 19, 2017

RevolutionAnalytics / rmr2

A package that allows R developer to use Hadoop MapReduce

Python 159 149 Updated Jul 21, 2020

internetarchive / warctools

Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)

Python 157 29 Updated Aug 27, 2020

cinfony / cinfony

Simplified and standard interface to a number of cheminformatics toolkits

Python 87 27 Updated Nov 4, 2023

XinyuanLu00 / TART

This is the repository for NAACL'25 paper "TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning"

Python 47 1 Updated Oct 24, 2024

artunit / ossocr

gathering point for open source OCR scripts and diffs

Python 43 7 Updated Jun 27, 2014

googlearchive / compute-getting-started-python

This sample python application demonstrates how to access the Compute Engine API using the Google Python API Client Library.

Python 41 16 Updated Sep 22, 2015

hildensia / scholar

Fetchs google scholar queries and outputs info or bibtex data

Python 32 13 Updated Feb 18, 2014

ianmilligan1 / Historian-WARC-1

The Historian's WARC Toolkit

Python 15 4 Updated May 14, 2015

danyq / diybookscanner

scanning script for the noisebridge book scanner

Python 14 3 Updated May 12, 2017

wiseman / common_crawl_index

Forked from trivio/common_crawl_index

Access index of web pages in Common Crawl

Python 9 2 Updated Apr 28, 2015

Python 1 1 Updated May 28, 2014

darkseed / ossocr

Forked from artunit/ossocr

gathering point for open source OCR scripts and diffs

Python 1 Updated Apr 20, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Andrew Defries PhD andrewdefries

Block or report andrewdefries

Stars

opendatalab / MinerU

nltk / nltk

allenai / olmocr

clips / pattern

euske / pdfminer

s3tools / s3cmd

ckreibich / scholar.py

FDA / openfda