Multivariate Time Series dataset preparation

Question

I am a bit confused with the time series dataset preparation. From the internet, I saw all examples which used tree-based models, had input features and target defined as:

X = df.drop(['target'], axis=1)
y = df["target"]

i.e. we use input features and target of the same timestamp here.

While using LSTM, say with a window of x, we use input features of all x timestamps to predict the target value of (x+1) th timestamp. Suppose, if x was 1, i.e. window of size 1, we use input features of just the ith time stamp to predict target value of the (i+1) th timestamp.

While in tree based model, I end up using input features of ith time stamp to predict target value of same ith time stamp.

So should we shift the target column of tree based model by 1 and predict similar to what we do for LSTM?

Or what is the correct way to prepare the input dataset?

There is no restriction for LSTM to predict the current time-step, one needs to label the input sequence accordingly. — patagonicus
– patagonicus, Commented Nov 18, 2023 at 12:56

picky_porpoise · Accepted Answer · 2023-11-19 10:30:07Z

The correct way to use your input data depends on the way it was collected. In general, you can only use features for which the values are known before the model should forecast and this is usually before the target value is known. This holds true for all models since otherwise, the model would not be useful in practice. Let me illustrate with two examples:

If you want to forecast the day's mean share price of Umbrella Inc. based on the amount of rain and your data in df is indexed by day, then you would have to shift your target by 1. The simple reason is that you know the amount of rain for day X only when the day is over and the share prices have already materialized.
If you want to forecast the day's closing share price (maybe known at 8 pm) based on the opening share price (maybe known at 9 am), then you don't have to shift the target and can process df as stated in the question.

Generally, you have to take care how the data you are using is collected and when it will be available in a potential use case of your forecasting model.

Stack Exchange Network

Multivariate Time Series dataset preparation

1 Answer 1

Your Answer

Hot Network Questions

Multivariate Time Series dataset preparation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions