Skip to content
A Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.
Branch: master
Clone or download
Latest commit b158cf6 May 11, 2019

README.rst

Summary

pytablereader is a Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.

PyPI package version Supported Python versions Linux/macOS CI status Windows CI status Test coverage GitHub stars

Features

  • Extract structured tabular data from various data format:
  • Supported data sources are:
    • Files on a local file system
    • Accessible URLs
    • str instances
  • Loaded table data can be used as:

Examples

Load a CSV table

Sample Code:
import pytablereader as ptr
import pytablewriter as ptw


# prepare data ---
file_path = "sample_data.csv"
csv_text = "\n".join([
    '"attr_a","attr_b","attr_c"',
    '1,4,"a"',
    '2,2.1,"bb"',
    '3,120.9,"ccc"',
])

with open(file_path, "w") as f:
    f.write(csv_text)

# load from a csv file ---
loader = ptr.CsvTableFileLoader(file_path)
for table_data in loader.load():
    print("\n".join([
        "load from file",
        "==============",
        "{:s}".format(ptw.dumps_tabledata(table_data)),
    ]))

# load from a csv text ---
loader = ptr.CsvTableTextLoader(csv_text)
for table_data in loader.load():
    print("\n".join([
        "load from text",
        "==============",
        "{:s}".format(ptw.dumps_tabledata(table_data)),
    ]))
Output:
load from file
==============
.. table:: sample_data

    ======  ======  ======
    attr_a  attr_b  attr_c
    ======  ======  ======
         1     4.0  a
         2     2.1  bb
         3   120.9  ccc
    ======  ======  ======

load from text
==============
.. table:: csv2

    ======  ======  ======
    attr_a  attr_b  attr_c
    ======  ======  ======
         1     4.0  a
         2     2.1  bb
         3   120.9  ccc
    ======  ======  ======

Get loaded table data as pandas.DataFrame instance

Sample Code:
import pytablereader as ptr

loader = ptr.CsvTableTextLoader(
    "\n".join([
        "a,b",
        "1,2",
        "3.3,4.4",
    ]))
for table_data in loader.load():
    print(table_data.as_dataframe())
Output:
     a    b
0    1    2
1  3.3  4.4

For more information

More examples are available at https://pytablereader.rtfd.io/en/latest/pages/examples/index.html

Installation

Install from PyPI

pip install pytablereader

Some of the formats require additional dependency packages, you can install the dependency packages as follows:

  • Excel
    • pip install pytablereader[excel]
  • Google Sheets
    • pip install pytablereader[gs]
  • Markdown
    • pip install pytablereader[md]
  • Mediawiki
    • pip install pytablereader[mediawiki]
  • SQLite
    • pip install pytablereader[sqlite]
  • Load from URLs
    • pip install pytablereader[url]
  • All of the extra dependencies
    • pip install pytablereader[all]

Install from PPA (for Ubuntu)

sudo add-apt-repository ppa:thombashi/ppa
sudo apt update
sudo apt install python3-pytablereader

Dependencies

Python 2.7+ or 3.5+

Mandatory Python packages

Optional Python packages

Optional packages (other than Python packages)

  • libxml2 (faster HTML conversion)
  • pandoc (required when loading MediaWiki file)

Test dependencies

Documentation

https://pytablereader.rtfd.io/

Related Project

  • pytablewriter
    • Tabular data loaded by pytablereader can be written another tabular data format with pytablewriter.
You can’t perform that action at this time.