-1
$\begingroup$

I have a task of making a quantile regression (5%, 50% and 95%) for tomorrow's power production. However, I am trying to grasp which quantiles we are talking about. Wikipedia (and similar sites) states that

Whereas the method of least squares estimates the conditional mean of the response variable across values of the predictor variables, quantile regression estimates the conditional median (or other quantiles) of the response variable

This definition only makes sense in the context of a fixed response variable. But my customer doesn't care which input features I use for my model:

  • If I have no input features available, my best guess is probably to try and predict the unconditional quantiles of my target $Y$, fx using the sample quantiles.
  • If I have an input feature set $X$, I could try and predict the conditional quantiles for $Y|X=x$, fx using linear quantile regression.
  • If I had another input feature set $Z$, I could do the above for this instead.

All models above seem valid, but they try to predict different things (quantiles for $Y$, $Y|X=x$, $Y|Z=z$, etc.) So how am I to understand the "quantiles" referered to in quantile regression?

$\endgroup$
13
  • 2
    $\begingroup$ I don't get what your question is. You correctly state that the quantile being predicted is the quantile of the response conditioned on the regressors (predictors). What else is not clear? $\endgroup$ Commented Jul 31 at 8:25
  • $\begingroup$ Say you were given the task: Produce forecasts for 5%, 50% and 95% quantiles of tomorrows power production. How I am to understand the quantiles being refered to here? Because such quantiles doesn't exist in the real world; they only makes sense when you model the power production as a random variable. And in that case, there is an infinite variety of response variable I could use for my model, and thus an infinite variety of conditional probability distributions. $\endgroup$ Commented Jul 31 at 8:38
  • 1
    $\begingroup$ "This definition only makes sense in the context of a fixed response variable." It is unclear what you mean by fixed response variable. The definition speaks about a response conditional/given other variables. $\endgroup$ Commented Jul 31 at 10:03
  • 1
    $\begingroup$ "So how am I to understand the "quantiles" referered to in quantile regression?" Read about it, look at some examples, then formulate more precisely what is confusing. "All models above seem valid, but they try to predict different things" Sure, different models will exist. Is that a problem? $\endgroup$ Commented Jul 31 at 10:07
  • 1
    $\begingroup$ Why would the explanatory variable need to be fixed for that definition? $\endgroup$ Commented Jul 31 at 10:09

2 Answers 2

6
$\begingroup$

Your task is to predict quantiles for tomorrow's power production. These are the quantiles you are interested in.

How you do that is a separate question. You could just take historical quantiles. (This would indeed be an extremely useful benchmark, and may actually be hard to beat.) You could use a time series model, because power production may have a strong seasonal effect, e.g., if there is a lot of solar power involved. Or you could use any kind of predictor - maybe more power is produced when electricity spot prices are high.

Your customer or supervisor is most interested in your output, i.e., in how good your quantile predictions are. To a lesser degree, they will likely be interested in your model: any complexity comes with a cost, so if using a time series model only yields a small incremental benefit over the "historical quantiles" method, this may simply not be worth the additional hassle.

Your confusion may be between what you are interested (quantiles, conditional on some information set) and how you get what you are interested in (which predictors to use, and in what model).

$\endgroup$
1
  • $\begingroup$ I think you a right. And I think I probably confused myself by thinking in terms of quantiles of the population distribution $Y$, while in reality I should be thinking of quantiles of the sampling distribution for the upcoming outcome $Y_t$. $\endgroup$ Commented Jul 31 at 10:54
2
$\begingroup$

After all the discussion in comments, your confusion seems to be that you assume that the regression always has a single scalar input (predictor). That is not the case.

It is a perfectly fine task for a regression model to predict a target based on multiple predictor variables (or equivalently, a vector-valued predictor variable). Then, each instance of the target will be associated with a vector of many predictor values. That's fine; it's trivial to define conditional expectation or quantiles when the predictor is a vector, it's completely analogous to the scalar-valued case. "The conditional quantile" will then refer to the quantile conditioned on however many predictors you have in your model. Don't overthink it.

$\endgroup$
6
  • $\begingroup$ No, this is not my confusion. My confusion is, that the choice of predictor variable is arbitary. Say Data Scientist A has predictors $X$ available and Data Scientist B has predictors $Z$ available. What is meant by the conditional quantile now? $Y|X$ or $Y|Z$. $\endgroup$ Commented Jul 31 at 12:37
  • $\begingroup$ Either one. A will return a prediction based on $X$, B will return one based on $Z$. Perhaps one prediction is clearly better, perhaps not. I also still struggle to understand what your question really is. You could take a look at the forecasting literature, e.g., the International Journal of Forecasting, which is full of papers comparing predictions based on different models or predictors. (Whether you are predicting conditional quantiles, or conditional expectations, or full densities does not seem to have a bearing here.) $\endgroup$ Commented Jul 31 at 12:41
  • 1
    $\begingroup$ I appreciate your help. I think I have I hard time formulating what I really mean, so I think we misunderstand eachother. I just find it difficult to understand: When a point forecast is needed, people typically want the forecast which will give the smallest error in future predictions. For quantiles it is different, since there is no "observed quantiles". So what is a good quantile forecast? How to evaluate it? $\endgroup$ Commented Jul 31 at 12:54
  • 1
    $\begingroup$ That actually is a different question, and one we can answer. A "good" quantile forecast is the correct quantile of the conditional distribution. You are right that we cannot directly observe it. But we can use the pinball loss to evaluate a quantile forecast, and this loss will be minimal in expectation for the correct quantile forecast (Gneiting, 2011, IJF). ... $\endgroup$ Commented Jul 31 at 13:15
  • 1
    $\begingroup$ ... and this is actually precisely analogous to any other forecast. We may want an expectation forecast. But we will never see the conditional expectation "in the data". Solution: use the MSE, which is minimized in expectation by this conditional expectation. (I know, too many "expectations" around here.) You may find Kolassa (2020, IJF) useful on this and on the entire concept of "smallest error", which is trickier than it look like at first glance. $\endgroup$ Commented Jul 31 at 13:17

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.