Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Timestamp with tz loses its time zone after to_numpy #45644

Open
sharkdtu opened this issue Feb 28, 2025 · 1 comment
Open

[Python] Timestamp with tz loses its time zone after to_numpy #45644

sharkdtu opened this issue Feb 28, 2025 · 1 comment
Labels
Component: Python Type: usage Issue is a user question

Comments

@sharkdtu
Copy link

sharkdtu commented Feb 28, 2025

Describe the bug, including details regarding any error messages, version, and platform.

Timestamptz type loses its time zone after to_numpy.

Versions / Dependencies

In [44]: pa.__version__
Out[44]: '19.0.1'

Reproduction script

In [39]: import pyarrow as pa

In [40]: x = pa.array([1735689600, 1735689600, 1735689600], type=pa.timestamp("s", tz='UTC'))

In [41]: print(x.type)
timestamp[s, tz=UTC]

In [42]: y = x.to_numpy()

In [43]: print(y.dtype)
datetime64[s]

Component(s)

Python

@sharkdtu sharkdtu changed the title [Python] Timestamptz type loses its time zone after map transforming. [Python] Timestamp with tz loses its time zone after to_numpy Feb 28, 2025
@AlenkaF
Copy link
Member

AlenkaF commented Mar 3, 2025

This behaviour is expected as NumPy datetimes are not timezone aware, see https://numpy.org/devdocs/reference/arrays.datetime.html#datetimes-and-timedeltas.

You can convert pyarrow tz-aware timestamp array to

  • numpy datetime64 with loss of tz information,
  • pandas tz-aware datetime64 dtype
  • a Pyhon object
arr_tz = pa.array([1735689600, 1735689600, 1735689600], type=pa.timestamp("s", tz='UTC'))

# numpy datetime64 dtype (losing tz information)
>>> arr_tz.to_numpy()
array(['2025-01-01T00:00:00', '2025-01-01T00:00:00',
       '2025-01-01T00:00:00'], dtype='datetime64[s]')
# pandas tz-aware datetime64 dtype
>>> arr_tz.to_pandas().array
0   2025-01-01 00:00:00+00:00
1   2025-01-01 00:00:00+00:00
2   2025-01-01 00:00:00+00:00
dtype: datetime64[s, UTC]
# python object
>>> arr_tz.to_pandas(timestamp_as_object=True).to_numpy()
array([datetime.datetime(2025, 1, 1, 0, 0, tzinfo=<UTC>),
       datetime.datetime(2025, 1, 1, 0, 0, tzinfo=<UTC>),
       datetime.datetime(2025, 1, 1, 0, 0, tzinfo=<UTC>)], dtype=object)

I will keep this issue open as this needs to be documented in https://arrow.apache.org/docs/python/numpy.html.
Also connected and need to go into the docs: #41162.

@AlenkaF AlenkaF added Type: usage Issue is a user question and removed Type: bug labels Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Python Type: usage Issue is a user question
Projects
None yet
Development

No branches or pull requests

2 participants