Spark SQL defines the timestamp type as TIMESTAMP WITH SESSION TIME ZONE, which is a combination of the fields (YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, SESSION TZ) where the YEAR through SECOND field identify a time instant in the UTC time zone, and where SESSION TZ is taken from the SQL config spark.sql.session.timeZone. The session time zone can be set as:

Zone offset '(+|-)HH:mm'. This form allows us to define a physical point in time unambiguously. Time zone name in the form of region ID 'area/city', such as 'America/Los_Angeles'. This form of time zone info suffers from some of the problems that we described above like overlapping of local timestamps. However, each UTC time instant is unambiguously associated with one time zone offset for any region ID, and as a result, each timestamp with a region ID based time zone can be unambiguously converted to a timestamp with a zone offset. By default, the session time zone is set to the default time zone of the Java virtual machine.

Spark’s TIMESTAMP WITH SESSION TIME ZONE is different from:

TIMESTAMP WITHOUT TIME ZONE, because a value of this type can map to multiple physical time instants, but any value of TIMESTAMP WITH SESSION TIME ZONE is a concrete physical time instant. The SQL type can be emulated by using one fixed time zone offset across all sessions, for instance UTC+0. In that case, we could consider timestamps at UTC as local timestamps. TIMESTAMP WITH TIME ZONE, because according to the SQL standard column values of the type can have different time zone offsets. That is not supported by Spark SQL.

Spark SQL provides a few methods for constructing date and timestamp values:

Default constructors without parameters: CURRENT_TIMESTAMP() and CURRENT_DATE().

From other primitive Spark SQL types, such as INT, LONG, and STRING
From external types like Python datetime or Java classes java.time.LocalDate/Instant.
Deserialization from data sources CSV, JSON, Avro, Parquet, ORC or others.
The function MAKE_DATE introduced in Spark 3.0 takes three parameters: YEAR, MONTH of the year, and DAY in the month and makes a DATE value. All input parameters are implicitly converted to the INT type whenever possible. The function checks that the resulting dates are valid dates in the Proleptic Gregorian calendar, otherwise it returns NULL. For example in PySpark:

In [1]:
println(spark.conf.get("spark.sql.session.timeZone"))

/*
There is NO difference between UTC and Etc/UTC time zones.

Etc/UTC is a timezone in the Olson-timezone-database (tz database), also known as IANA-timezones-database, in which all timezones conform to a uniform naming convention: Area/Location.

Since, some timezones cannot be attributed to any Area of the world (i.e. continents or oceans), the special Area Etc (Etcetera) was introduced. This applies mainly to administrative timezones such as UTC.
Thus, to conform with the naming convention, the universal coordinated time(zone) is named Etc/UTC in the tz database.
*/

## Calendars

- Lunar
- Julian: Only used in history
- Gregorian
    - Introduced in 1582
    - Used almost everywhere for civil purposes
- Proleptic Gregorian
    - Extension to Greorian calendar to support dates before 1582
    - Default in Java 8, Pandas, R and Apache Arrow
    - Default startomg Spark 3.0 (Before it used a combination of Julian (for dates before 1582) and Gregorian calendar)

## TimeStamp

- Extends `Date` type with `hour`, `minute`, `second` (with optional fractional part representing microseconds) and together with a global session scoped `time zone`
- E.g. year=2012, month=12, day=31, hour=23, minute=59, second=59.123456, session timezone = UTC+01:00
- When writing `Timestamp` to non-text sources like Parquet, the values are just `instants` with no time zone info

- Instant: The Instant class represents an instant in time
- LocalDate
    - The “local” part of the name refers to the local time-line. Specifically, a LocalDate has no reference to a time-zone or offset from UTC/Greenwich.
- LocalTime
    - It is ideal for representing time on a wall clock such as 10:15:30. For example when we say “Time is ten past three”, it can be represented using ‘LocalTime’ as 3:10:00. LocalTime supports nanosecond precision. The minimum value for LocalTime is midnight (00:00:00.0) at start of the day and maximum value is one nanosecond before midnight(23:59:59.999999999) at end of day.
- LocalDateTime
- OffsetTime
- ZonedDateTime

## TimeZone

- `Date` type doesn't consider time zone

In [2]:
import java.time._
java.time.ZoneId.systemDefault
java.time.ZoneId.of("America/Los_Angeles").getRules.getOffset(java.time.LocalDateTime.parse("1883-11-10T00:00:00"))

def getDateTimeWithTZOffset(dateTime:LocalDateTime, zoneStr:String): ZonedDateTime =
{
    ZonedDateTime.ofInstant(dateTime.toInstant(ZoneOffset.UTC), ZoneId.of(zoneStr))
}

In [3]:

val localTime = LocalTime.now()
val localDateTime = LocalDateTime.now()

println(s"Local date and time: $localDateTime")
println(s"Local date: ${localDateTime.toLocalDate}")
println(s"Local time: ${localDateTime.toLocalTime}")
println(s"System default timezone: ${java.time.ZoneId.systemDefault}")

println("Etc/UTC: "+getDateTimeWithTZOffset(localDateTime, "Etc/UTC"))
println("US/Eastern: "+getDateTimeWithTZOffset(localDateTime, "US/Eastern"))
println("US/Central: "+getDateTimeWithTZOffset(localDateTime, "US/Central"))
println("US/Mountain: "+getDateTimeWithTZOffset(localDateTime, "US/Mountain"))
println("US/Pacific: "+getDateTimeWithTZOffset(localDateTime, "US/Pacific"))

println("Etc/UTC: "+getDateTimeWithTZOffset(localDateTime, "Etc/UTC").toLocalTime)
println("US/Eastern: "+getDateTimeWithTZOffset(localDateTime, "US/Eastern").toLocalTime)
println("US/Central: "+getDateTimeWithTZOffset(localDateTime, "US/Central").toLocalTime)
println("US/Mountain: "+getDateTimeWithTZOffset(localDateTime, "US/Mountain").toLocalTime)
println("US/Pacific: "+getDateTimeWithTZOffset(localDateTime, "US/Pacific").toLocalTime)

println("Etc/UTC: "+getDateTimeWithTZOffset(localDateTime, "Etc/UTC").toInstant().toEpochMilli())
println("US/Eastern: "+getDateTimeWithTZOffset(localDateTime, "US/Eastern").toInstant().toEpochMilli())
println("US/Central: "+getDateTimeWithTZOffset(localDateTime, "US/Central").toInstant().toEpochMilli())
println("US/Mountain: "+getDateTimeWithTZOffset(localDateTime, "US/Mountain").toInstant().toEpochMilli())
println("US/Pacific: "+getDateTimeWithTZOffset(localDateTime, "US/Pacific").toInstant().toEpochMilli())

ZoneId.getAvailableZoneIds.stream.filter(x=>x.contains("US") ).forEach(println)

## Date and Time Functions

### Creating Date and Time

#### CURRENT_DATE()/CURRENT_TIME()

In [6]:
%%sql
SELECT CURRENT_DATE(), CURRENT_TIMESTAMP()

#### From external types: java.time.LocalDate/Instant

In [27]:
val date = LocalDate.now
val instant = Instant.now

val currTimeInMillis = java.util.Calendar.getInstance().getTimeInMillis()
val sqlDate = new java.sql.Date(currTimeInMillis)
val sqlInstant = new java.sql.Timestamp(currTimeInMillis)
// val legacyDate = new java.util.Date() //java.lang.UnsupportedOperationException: No Encoder found for java.util.Date
//val time = LocalTime.now // fails with java.lang.UnsupportedOperationException: No Encoder found for java.time.LocalTime
//val dateTime = LocalDateTime.now // java.lang.UnsupportedOperationException: No Encoder found for java.time.LocalDateTime

import spark.implicits._
val df1 = List(
    (date, instant, sqlDate, sqlInstant)
).toDF ("java8_local_date", "java8_instant", "old_sql_date", "old_sql_instant")
df1.show(truncate=false)

## Reference

- [Must read: A Comprehensive Look at Dates and Timestamps in Apache Spark™ 3.0](https://www.databricks.com/blog/2020/07/22/a-comprehensive-look-at-dates-and-timestamps-in-apache-spark-3-0.html)