html-table-to-json

Convert HTML Table (with no rowspan/colspan) to JSON using Python

Features

✅ Smart Header Detection - Automatically detects headers in <thead> or first row with <th> elements
✅ Flexible Output Format - Returns list of dictionaries (when headers exist) or list of lists
✅ Robust HTML Parsing - Handles complex nested HTML content within cells
✅ Error Handling - Comprehensive input validation and error reporting
✅ Type Safety - Full type hints for better development experience
✅ Multiple Utility Functions - Various conversion methods for different use cases

Installation

pip install beautifulsoup4

Usage

Basic Usage

from tabletojson import html_to_json

# Convert HTML table to JSON
html_content = "<table>...</table>"
json_result = html_to_json(html_content, indent=2)
print(json_result)

Advanced Usage

from tabletojson import (
    html_to_json,
    html_table_to_dict_list,
    html_table_to_list_of_lists,
    validate_html_table
)

# Validate table first
if validate_html_table(html_content):
    # Force dictionary output (with auto-generated headers if needed)
    dict_result = html_table_to_dict_list(html_content)
    
    # Force list of lists output (ignores headers)
    list_result = html_table_to_list_of_lists(html_content)

Examples

Example 1: Table with Headers

Input:

ID	Vendor	Product
1	Intel	Processor
2	AMD	GPU
3	Gigabyte	Mainboard

Output

[
    {
        "product": "Processor", 
        "vendor": "Intel", 
        "id": "1"
    }, 
    {
        "product": "GPU", 
        "vendor": "AMD", 
        "id": "2"
    }, 
    {
        "product": "Mainboard", 
        "vendor": "Gigabyte", 
        "id": "3"
    }
]

Example 2: Table without Headers

Input:

1	Intel	Processor
2	AMD	GPU
3	Gigabyte	Mainboard

Output:

[
  [
    "1",
    "Intel",
    "Processor"
  ],
  [
    "2",
    "AMD",
    "GPU"
  ],
  [
    "3",
    "Gigabyte",
    "Mainboard"
  ]
]

Example 3: Complex Table with tbody and Nested Content

Input:

Date	Transaction Description	Debit/Cheque	Credit/Deposit	Balance
13 Jul 2022	SOME WORKPLACE Salary		$3,509.30	OD $1,725.53
12 Jul 2022	ATM DEPOSIT CARD 1605		$400.00	OD $5,234.83
11 Jul 2022	Another Transaction Another Transaction		$104.00	OD $5,634.83
11 Jul 2022	MB TRANSFER TO XX-XXXX-XXXXXXX-51	$4.50		OD $5,738.83

Output:

[
    {
        "date": "13 Jul 2022",
        "transaction description": "SOME WORKPLACESalary",
        "debit/cheque": "",
        "credit/deposit": "$3,509.30",
        "balance": "OD $1,725.53"
    },
    {
        "date": "12 Jul 2022",
        "transaction description": "ATM DEPOSITCARD 1605",
        "debit/cheque": "",
        "credit/deposit": "$400.00",
        "balance": "OD $5,234.83"
    },
    {
        "date": "11 Jul 2022",
        "transaction description": "Another TransactionAnother Transaction",
        "debit/cheque": "",
        "credit/deposit": "$104.00",
        "balance": "OD $5,634.83"
    },
    {
        "date": "11 Jul 2022",
        "transaction description": "MB TRANSFERTO XX-XXXX-XXXXXXX-51",
        "debit/cheque": "$4.50",
        "credit/deposit": "",
        "balance": "OD $5,738.83"
    }
]

Improvements Made

🐛 Bug Fixes

Fixed header detection logic that was incorrectly searching for th elements
Fixed data structure handling for consistent output format
Fixed handling of tables with tbody elements
Fixed text extraction from nested HTML elements (preserves spaces)

✨ New Features

Type Safety: Added comprehensive type hints for better IDE support
Input Validation: Robust error handling with descriptive error messages
Multiple Output Formats: New utility functions for different use cases
Smart Header Detection: Detects headers in <thead> or first row with <th> elements
Empty Row Handling: Automatically skips empty rows
Flexible Cell Handling: Handles mismatched column counts gracefully

🚀 Performance & Code Quality

Modular Design: Split functionality into focused helper functions
Better Error Handling: Comprehensive exception handling with context
Documentation: Added detailed docstrings and usage examples
Test Coverage: Enhanced test cases covering edge cases and error scenarios

📋 New API Functions

html_to_json() - Main conversion function (improved)
html_table_to_dict_list() - Force dictionary output with auto-generated headers
html_table_to_list_of_lists() - Force list of lists output
validate_html_table() - Validate HTML table structure

TODO

Support for nested table
~~Support for buggy HTML table (ie. td instead of th in thead)~~ ✅ Fixed
~~Improve error handling~~ ✅ Added comprehensive error handling
~~Add type hints~~ ✅ Added full type safety
~~Better text extraction from nested elements~~ ✅ Improved HTML parsing
Support for rowspan/colspan attributes
Add support for custom header mappings
Add CSV export functionality

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
README.md		README.md
example_usage.py		example_usage.py
requirements.txt		requirements.txt
tabletojson.py		tabletojson.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

html-table-to-json

Features

Installation

Usage

Basic Usage

Advanced Usage

Examples

Example 1: Table with Headers

Output

Example 2: Table without Headers

Example 3: Complex Table with tbody and Nested Content

Improvements Made

🐛 Bug Fixes

✨ New Features

🚀 Performance & Code Quality

📋 New API Functions

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

mdminhazulhaque/html-table-to-json

Folders and files

Latest commit

History

Repository files navigation

html-table-to-json

Features

Installation

Usage

Basic Usage

Advanced Usage

Examples

Example 1: Table with Headers

Output

Example 2: Table without Headers

Example 3: Complex Table with tbody and Nested Content

Improvements Made

🐛 Bug Fixes

✨ New Features

🚀 Performance & Code Quality

📋 New API Functions

TODO

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages