Comprehensive mapping of Teradata SQL functions to their Databricks equivalents, with automated tests to validate behavioral equivalence.
- 788 unique Teradata functions parsed from
FunctionsV.csv - 131 functions mapped to Databricks equivalents (42 direct, 41 partial, 46 expression-based, 2 unmapped)
- 657 functions documented as unmapped with reasons (internal contracts, ML analytics, system monitoring, etc.)
- 278 automated tests (265 passing, 12 xfailed known differences, 1 xpassed)
- 9 workarounds for behavioral differences between TD and Databricks
td-function-mapping/
├── inputs/FunctionsV.csv # Source: Teradata function catalog
├── mapping/
│ ├── function_mapping.json # Machine-readable mapping
│ ├── function_mapping.csv # Spreadsheet-friendly mapping
│ └── unmapped_functions.csv # Unmapped functions + reasons
├── src/
│ ├── parse_td_functions.py # Parse and categorize TD functions
│ ├── mapping.py # Core mapping dictionary
│ ├── generate_outputs.py # Generate JSON/CSV outputs
│ └── connections.py # TD + Databricks connection helpers
├── tests/ # Pytest suite
│ ├── conftest.py # Fixtures + result recorder
│ ├── test_string_functions.py
│ ├── test_date_functions.py
│ ├── test_math_functions.py
│ ├── test_type_conversion.py
│ ├── test_json_functions.py
│ ├── test_null_functions.py
│ ├── test_bit_functions.py
│ └── test_array_functions.py
├── results/ # Test results (persistent record)
│ ├── test_report.json # Per-test-case results
│ ├── test_report.csv # Spreadsheet-friendly results
│ ├── behavioral_differences.md # Why certain edge cases differ
│ └── pytest_output.log # Raw pytest output
├── requirements.txt
├── pytest.ini
└── .env.example
-
Install dependencies:
pip install -r requirements.txt
-
Configure Teradata connection -- copy
.env.exampleto.envand fill in:TD_HOST=your-teradata-host TD_USER=your-username TD_PASSWORD=your-password -
Configure Databricks -- ensure
~/.databrickscfghas a[DEFAULT]profile withhostand eithertokenorauth_type = databricks-cli(OAuth).
# Run all tests
pytest tests/ -v --junitxml=results/junit.xml 2>&1 | tee results/pytest_output.log
# Run a specific category
pytest tests/test_string_functions.py -v
# Run with detailed tracebacks
pytest tests/ -v --tb=longpython -m src.generate_outputs| Status | Meaning |
|---|---|
mapped |
Direct 1:1 equivalent, same semantics |
partial |
Equivalent exists but edge cases may differ |
expression |
No single function; requires a SQL expression |
unmapped |
No Databricks equivalent |
See results/behavioral_differences.md for full details.
Notable differences with workarounds:
- OREPLACE/REPLACE: Empty/NULL search/replacement behaves differently (workaround: CASE guard)
- TRUNC (numeric): DB TRUNC is date-only; use CAST(CAST(x * POWER(10,d) AS BIGINT) AS DOUBLE) / POWER(10,d)
- INITCAP: TD capitalizes after hyphens, DB does not (workaround: SPLIT/TRANSFORM/JOIN)
- REGEXP_SUBSTR/REGEXP_EXTRACT: No match returns NULL (TD) vs '' (DB) (workaround: NULLIF)
- GREATEST/LEAST with NULL: TD skips NULLs, DB propagates them (workaround: COALESCE)
- WEEKNUMBER_OF_YEAR: TD 0-based vs DB ISO 1-based (workaround: subtract 1)
- TRYCAST empty string: TD returns 0, DB returns NULL (workaround: COALESCE)
- Date format strings: TD uses
YYYY-MM-DD, DB usesyyyy-MM-dd(format translator included)
| Category | Count | Recommendation |
|---|---|---|
| Internal/Contract | ~183 | Not needed in migration |
| ML Analytics | ~170 | Use MLlib, MLflow, or Python ML libraries |
| System Monitoring | ~63 | Use Databricks system tables |
| Teradata Internal | ~44 | Review case-by-case |
| Spatial/GIS | ~28 | Use H3 or Mosaic library |
| XML | ~28 | Use XPath functions |
| Workload Management | ~18 | Use SQL warehouse config |
| Compression | ~18 | Delta handles natively |
| External I/O | ~15 | Use COPY INTO or external tables |