6
$\begingroup$

Multivariate time series are, to the best of my understanding, one of the few cases where Deep Learning still hasn't had its AlexNet moment. I'm especially interested to the case where most of the time series are continuous variables, with a few being categorical variables. To fix the ideas, think sensors of a big industrial machine - some sensors will record real-valued time series (pressure, temperature, speed, etc.) and some other categorical time series ("running/non running", "alarm 1/alarm 2/ no alarm", valve open/ valve closed", etc.). If the introduction of categorical variables makes the problem too hard, no issues - we can consider only continuous variables.

Some of these time series have a lot of missing data (the norm, rather than the exception, in industry) and I'd like to perform missing value imputation. In theory, a generative model seems the perfect fit for such a problem, thus I thought of VAEs. In practice, all the applications of VAE to missing data imputation that I know of, are related to images. Never seen them applied to missing data imputation for multivariate times series.

Can they be used? If so, which is the major modification I need to make to the architecture, in order to get a performing implementation for my use case? If no, which other Deep Learning or AI model are suitable for missing data imputation with multivariate time series?

$\endgroup$
5
  • 3
    $\begingroup$ I'm curious: are there any examples of deep learning doing well in an area of application in which there is a decent amount of noise in the response? By noise in the response, I mean that even a perfect model will have a good deal of uncertainty. To illustrate, classification in images generally does not have noise in the response; a picture of a dog is usually unquestionably a picture of a dog, although generating a rule to go from pixel values to classification is likely complicated. Calling a coin flip with only mild bias would have lots of noise; the best model can't do better than $p$. $\endgroup$ Commented Mar 24, 2019 at 1:48
  • 1
    $\begingroup$ We have a recent paper on this problem: arxiv.org/abs/1907.04155 $\endgroup$ Commented Jul 11, 2019 at 12:05
  • $\begingroup$ @Vincent then write an answer here $\endgroup$ Commented Jul 11, 2019 at 15:58
  • $\begingroup$ It is more of how to train the network than the architecture-modifications. Vincent's work looks nice; GPs are natural choice as they can handle irregular spacing by construction, compare to AR-type models. Other options are utilising GANs recent work ojs.aaai.org/index.php/AAAI/article/view/17086 $\endgroup$ Commented Jun 30, 2021 at 9:37
  • $\begingroup$ @MehmetSuzen thanks for the other link, but it would be better if you wrote an answer to my question, based on these links. $\endgroup$ Commented Jun 30, 2021 at 21:19

2 Answers 2

3
$\begingroup$

Can they be used?

Yes. Autoencoders (AE) are dimensionality reduction techniques. One could formulate a mapping from missing data series to full series, in 1-D, $\mathbb{R}^{m} \rightarrow \mathbb{R}^{n}$, where $m<n$, $m-n$ missing time-points. This is the conceptual idea. However, training vanilla AE may not be possible without introducing any prior knowledge from low-dimensional (missing data) set, that's why people introduce "pairing" AE with other techniques, such as Gaussian Process for the lower dimensional series $x_{t}$, to full latent series $z_{t}$. A sketch of the formulation in this case will look like $$ p_{\theta}(z_{t}|x_{t}) = \mathcal{N}(g_{\theta}(x_{t}), \sigma^{2}\bf{I}))$$

where the inference is to find $\theta$ parameters. AE replaces function $g$. See GP-VAE: Deep Probabilistic Time Series Imputation.

If so, which is the major modification I need to make to the architecture, in order to get a performing implementation for my use case?

The core issue is not only selecting appropriate architecture but formulation of the problem.

which other Deep Learning or AI model are suitable for missing data imputation with multivariate time series?

GANs are also utilised in the literature.

$\endgroup$
3
  • $\begingroup$ thanks for the answer. I think it needs some clarification: 1) I explicitly asked about methods for multivariate time series imputation, but you mention "a mapping from missing data series to full series, in 1-D, $\mathbb{R}^{m} \rightarrow \mathbb{R}^{n}$". So is this an approach for 1D time series, or multivariate ones? IIUC, it's for 1D time series, of length $n$, with $n-m$ missing data points. CorrecT? $\endgroup$ Commented Jul 1, 2021 at 7:50
  • 1
    $\begingroup$ Also, you write $ p_{\theta}(x_{t}|z_{t}) = \mathcal{N}(g_{\theta}(z_{t}), \sigma^{2}\bf{I}))$. Shouldn't it be $ p_{\theta}(z_{t}|x_{t})$? We are conditioning on the actual data $x_{t}$, and inferring the full data $z_{t}$, not the other way around. So we should be interested in the posterior distribution of $z_{t}$, given data $x_{t}$. $\endgroup$ Commented Jul 1, 2021 at 7:52
  • $\begingroup$ 1D notation was about to give an idea. True, conditioning should be other-way around, (corrected) $\endgroup$ Commented Jul 1, 2021 at 17:19
1
$\begingroup$

Wouldn't you consider language to be a type of time series? How about OpenAI Five's representation of the Dota game state as a time series with 20,000 continuous and discrete variables?

The tool of choice for such sequence modeling is LSTMs, Transformers, and other autoregressive models. And you can always tack on a latent prior to any of these models, (see "recurrent VAE"), but it's not necessary because any sequence distribution can be modeled as $p(x_0, x_1, \ldots) = p(x_0) \prod_j p(x_j|x_{i<j})$.

The difficulty with imputation using these models is that you'd presumably be trying to do something like $\max_{X_i \ldots X_j} p(x_0 = X_0, \ldots x_{i-1}=X_{i-1}, x_i=X_i, \ldots, x_j=X_j, \ldots)$, and due to the autoregressive nature of the model, there's actually no easy way to carry out the maximization. If you just want to sample $X_i \ldots X_j$ from the modeled distribution with the other values fixed, there's no easy way to do that either. Maybe it would not be too expensive to use some form of MCMC sampling.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.