1

I'm working with a Pandas DataFrame that utilizes a DatetimeIndex with timezone information. My objective is to compute the timezone offset (in hours) for each timestamp and store these offsets in a new column within the DataFrame.

Current Approach:

Currently, I'm using the .map() function combined with a lambda to extract the UTC offset from each timestamp:

import pandas as pd

# Sample DataFrame setup
timestamps = pd.date_range('2024-01-01 00:00:00', '2024-12-31 23:59:59', freq='5min', tz='Europe/Brussels')
df = pd.DataFrame({'value': range(len(timestamps))}, index=timestamps)

# Computing timezone_offset using .map() and lambda
df['timezone_offset'] = df.index.map(
    lambda x: x.utcoffset().total_seconds() / 3600 if x.utcoffset() else 0)

print(df['timezone_offset'][:5])

OUTPUT

2024-01-01 00:00:00+01:00    1.0
2024-01-01 00:05:00+01:00    1.0
2024-01-01 00:10:00+01:00    1.0
2024-01-01 00:15:00+01:00    1.0
2024-01-01 00:20:00+01:00    1.0

Issue:

While this method accurately populates the 'timezone_offset' column, it becomes significantly slow as the size of the DataFrame grows. Processing 100,000 entries takes a considerable amount of time, which is a bottleneck for larger datasets or real-time applications.

Objective:

I aim to vectorize the timezone offset calculation to enhance performance and reduce computation time. Ideally, I want to avoid using row-wise operations like .map() with lambda functions, which are known to be inefficient with large datasets.

1

1 Answer 1

1

One fast solution:

df["offset"] = df.index.tz_localize(None) - df.index.tz_convert('UTC').tz_localize(None)
df["offset"] = df["offset"].dt.total_seconds() / 3600

res

Sign up to request clarification or add additional context in comments.

2 Comments

doing the calculation in Unix time nanoseconds should give a bit of extra performance; (df.index.tz_localize(None).astype(int) - df.index.tz_convert('UTC').tz_localize(None).astype(int))/3_600_000_000_000. Some might add "at the cost of readability".
Indeed. The OP try takes around 0.900 s. My solution takes 0.010 s. And 0.007 s with your extra tip.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.