Teradata-to-Databricks Function Mapping

Comprehensive mapping of Teradata SQL functions to their Databricks equivalents, with automated tests to validate behavioral equivalence.

Overview

788 unique Teradata functions parsed from FunctionsV.csv
131 functions mapped to Databricks equivalents (42 direct, 41 partial, 46 expression-based, 2 unmapped)
657 functions documented as unmapped with reasons (internal contracts, ML analytics, system monitoring, etc.)
278 automated tests (265 passing, 12 xfailed known differences, 1 xpassed)
9 workarounds for behavioral differences between TD and Databricks

Project Structure

td-function-mapping/
├── inputs/FunctionsV.csv              # Source: Teradata function catalog
├── mapping/
│   ├── function_mapping.json          # Machine-readable mapping
│   ├── function_mapping.csv           # Spreadsheet-friendly mapping
│   └── unmapped_functions.csv         # Unmapped functions + reasons
├── src/
│   ├── parse_td_functions.py          # Parse and categorize TD functions
│   ├── mapping.py                     # Core mapping dictionary
│   ├── generate_outputs.py            # Generate JSON/CSV outputs
│   └── connections.py                 # TD + Databricks connection helpers
├── tests/                             # Pytest suite
│   ├── conftest.py                    # Fixtures + result recorder
│   ├── test_string_functions.py
│   ├── test_date_functions.py
│   ├── test_math_functions.py
│   ├── test_type_conversion.py
│   ├── test_json_functions.py
│   ├── test_null_functions.py
│   ├── test_bit_functions.py
│   └── test_array_functions.py
├── results/                           # Test results (persistent record)
│   ├── test_report.json               # Per-test-case results
│   ├── test_report.csv                # Spreadsheet-friendly results
│   ├── behavioral_differences.md      # Why certain edge cases differ
│   └── pytest_output.log              # Raw pytest output
├── requirements.txt
├── pytest.ini
└── .env.example

Setup

Install dependencies:
```
pip install -r requirements.txt
```

Configure Teradata connection -- copy .env.example to .env and fill in:

TD_HOST=your-teradata-host
TD_USER=your-username
TD_PASSWORD=your-password

Configure Databricks -- ensure ~/.databrickscfg has a [DEFAULT] profile with host and either token or auth_type = databricks-cli (OAuth).

Running Tests

# Run all tests
pytest tests/ -v --junitxml=results/junit.xml 2>&1 | tee results/pytest_output.log

# Run a specific category
pytest tests/test_string_functions.py -v

# Run with detailed tracebacks
pytest tests/ -v --tb=long

Generating Mapping Files

python -m src.generate_outputs

Mapping Status Legend

Status	Meaning
`mapped`	Direct 1:1 equivalent, same semantics
`partial`	Equivalent exists but edge cases may differ
`expression`	No single function; requires a SQL expression
`unmapped`	No Databricks equivalent

Key Behavioral Differences

See results/behavioral_differences.md for full details.

Notable differences with workarounds:

OREPLACE/REPLACE: Empty/NULL search/replacement behaves differently (workaround: CASE guard)
TRUNC (numeric): DB TRUNC is date-only; use CAST(CAST(x * POWER(10,d) AS BIGINT) AS DOUBLE) / POWER(10,d)
INITCAP: TD capitalizes after hyphens, DB does not (workaround: SPLIT/TRANSFORM/JOIN)
REGEXP_SUBSTR/REGEXP_EXTRACT: No match returns NULL (TD) vs '' (DB) (workaround: NULLIF)
GREATEST/LEAST with NULL: TD skips NULLs, DB propagates them (workaround: COALESCE)
WEEKNUMBER_OF_YEAR: TD 0-based vs DB ISO 1-based (workaround: subtract 1)
TRYCAST empty string: TD returns 0, DB returns NULL (workaround: COALESCE)
Date format strings: TD uses YYYY-MM-DD, DB uses yyyy-MM-dd (format translator included)

Unmapped Function Categories

Category	Count	Recommendation
Internal/Contract	~183	Not needed in migration
ML Analytics	~170	Use MLlib, MLflow, or Python ML libraries
System Monitoring	~63	Use Databricks system tables
Teradata Internal	~44	Review case-by-case
Spatial/GIS	~28	Use H3 or Mosaic library
XML	~28	Use XPath functions
Workload Management	~18	Use SQL warehouse config
Compression	~18	Delta handles natively
External I/O	~15	Use COPY INTO or external tables

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Teradata-to-Databricks Function Mapping

Overview

Project Structure

Setup

Running Tests

Generating Mapping Files

Mapping Status Legend

Key Behavioral Differences

Unmapped Function Categories

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
inputs		inputs
mapping		mapping
results		results
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Teradata-to-Databricks Function Mapping

Overview

Project Structure

Setup

Running Tests

Generating Mapping Files

Mapping Status Legend

Key Behavioral Differences

Unmapped Function Categories

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages