Skip to content

tefing/td-function-mapping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Teradata-to-Databricks Function Mapping

Comprehensive mapping of Teradata SQL functions to their Databricks equivalents, with automated tests to validate behavioral equivalence.

Overview

  • 788 unique Teradata functions parsed from FunctionsV.csv
  • 131 functions mapped to Databricks equivalents (42 direct, 41 partial, 46 expression-based, 2 unmapped)
  • 657 functions documented as unmapped with reasons (internal contracts, ML analytics, system monitoring, etc.)
  • 278 automated tests (265 passing, 12 xfailed known differences, 1 xpassed)
  • 9 workarounds for behavioral differences between TD and Databricks

Project Structure

td-function-mapping/
├── inputs/FunctionsV.csv              # Source: Teradata function catalog
├── mapping/
│   ├── function_mapping.json          # Machine-readable mapping
│   ├── function_mapping.csv           # Spreadsheet-friendly mapping
│   └── unmapped_functions.csv         # Unmapped functions + reasons
├── src/
│   ├── parse_td_functions.py          # Parse and categorize TD functions
│   ├── mapping.py                     # Core mapping dictionary
│   ├── generate_outputs.py            # Generate JSON/CSV outputs
│   └── connections.py                 # TD + Databricks connection helpers
├── tests/                             # Pytest suite
│   ├── conftest.py                    # Fixtures + result recorder
│   ├── test_string_functions.py
│   ├── test_date_functions.py
│   ├── test_math_functions.py
│   ├── test_type_conversion.py
│   ├── test_json_functions.py
│   ├── test_null_functions.py
│   ├── test_bit_functions.py
│   └── test_array_functions.py
├── results/                           # Test results (persistent record)
│   ├── test_report.json               # Per-test-case results
│   ├── test_report.csv                # Spreadsheet-friendly results
│   ├── behavioral_differences.md      # Why certain edge cases differ
│   └── pytest_output.log              # Raw pytest output
├── requirements.txt
├── pytest.ini
└── .env.example

Setup

  1. Install dependencies:

    pip install -r requirements.txt
  2. Configure Teradata connection -- copy .env.example to .env and fill in:

    TD_HOST=your-teradata-host
    TD_USER=your-username
    TD_PASSWORD=your-password
    
  3. Configure Databricks -- ensure ~/.databrickscfg has a [DEFAULT] profile with host and either token or auth_type = databricks-cli (OAuth).

Running Tests

# Run all tests
pytest tests/ -v --junitxml=results/junit.xml 2>&1 | tee results/pytest_output.log

# Run a specific category
pytest tests/test_string_functions.py -v

# Run with detailed tracebacks
pytest tests/ -v --tb=long

Generating Mapping Files

python -m src.generate_outputs

Mapping Status Legend

Status Meaning
mapped Direct 1:1 equivalent, same semantics
partial Equivalent exists but edge cases may differ
expression No single function; requires a SQL expression
unmapped No Databricks equivalent

Key Behavioral Differences

See results/behavioral_differences.md for full details.

Notable differences with workarounds:

  • OREPLACE/REPLACE: Empty/NULL search/replacement behaves differently (workaround: CASE guard)
  • TRUNC (numeric): DB TRUNC is date-only; use CAST(CAST(x * POWER(10,d) AS BIGINT) AS DOUBLE) / POWER(10,d)
  • INITCAP: TD capitalizes after hyphens, DB does not (workaround: SPLIT/TRANSFORM/JOIN)
  • REGEXP_SUBSTR/REGEXP_EXTRACT: No match returns NULL (TD) vs '' (DB) (workaround: NULLIF)
  • GREATEST/LEAST with NULL: TD skips NULLs, DB propagates them (workaround: COALESCE)
  • WEEKNUMBER_OF_YEAR: TD 0-based vs DB ISO 1-based (workaround: subtract 1)
  • TRYCAST empty string: TD returns 0, DB returns NULL (workaround: COALESCE)
  • Date format strings: TD uses YYYY-MM-DD, DB uses yyyy-MM-dd (format translator included)

Unmapped Function Categories

Category Count Recommendation
Internal/Contract ~183 Not needed in migration
ML Analytics ~170 Use MLlib, MLflow, or Python ML libraries
System Monitoring ~63 Use Databricks system tables
Teradata Internal ~44 Review case-by-case
Spatial/GIS ~28 Use H3 or Mosaic library
XML ~28 Use XPath functions
Workload Management ~18 Use SQL warehouse config
Compression ~18 Delta handles natively
External I/O ~15 Use COPY INTO or external tables

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages