This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE. This function
takes a timestamp which is timezone-agnostic, and interprets it as a timestamp in UTC, and
renders that timestamp as a timestamp in the given time zone.
However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not
timezone-agnostic. So in Spark this function just shift the timestamp value from UTC timezone to
the given timezone.
This function may return confusing result if the input is a string with timezone, e.g.
‘2018-03-13T06:18:23+00:00’. The reason is that, Spark firstly cast the string to timestamp
according to the timezone in the string, and finally display the result by converting the
timestamp to string according to the session local timezone.
New in version 1.5.0.
the column that contains timestamps
A string detailing the time zone ID that the input should be adjusted to. It should
be in the format of either region-based zone IDs or zone offsets. Region IDs must
have the form ‘area/city’, such as ‘America/Los_Angeles’. Zone offsets must be in
the format ‘(+|-)HH:mm’, for example ‘-08:00’ or ‘+01:00’. Also ‘UTC’ and ‘Z’ are
supported as aliases of ‘+00:00’. Other short names are not recommended to use
because they can be ambiguous.
Changed in version 2.4: tz can take a Column containing timezone ID strings.
>>> df = spark.createDataFrame([('1997-02-28 10:30:00', 'JST')], ['ts', 'tz'])
>>> df.select(from_utc_timestamp(df.ts, "PST").alias('local_time')).collect()
[Row(local_time=datetime.datetime(1997, 2, 28, 2, 30))]
>>> df.select(from_utc_timestamp(df.ts, df.tz).alias('local_time')).collect()
[Row(local_time=datetime.datetime(1997, 2, 28, 19, 30))]