Questions tagged [high-dimensional]
Pertains to a large number of features or dimensions (variables) for data. (For a large number of data points, use the tag [large-data]; if the issue is a larger number of variables than data, use the [underdetermined] tag.)
428 questions
1
vote
0
answers
80
views
Estimation of many Gaussian process hyper-parameters
I am working with a Gaussian process $(X_t(x))_{x \in [0,1], t \geq 0}$ which evolves jointly in space and time. I know the statistics of this process: $\mathbf{E} X_t(x) = X_0(x) e^{-\mu r t} + (1-e^{...
1
vote
0
answers
26
views
Does compositional structure (actually) mitigate the curse of dimensionality?
The paper "Deep Quantile Regression: Mitigating the Curse of Dimensionality Through Composition" makes the following claim (top of page 4):
It is clear that smoothness is not the right ...
1
vote
0
answers
98
views
How to estimate empirical, nonparametric PDF for high dimensionality data?
I'm trying to calculate a PDF for a dataset with high dimensionality (thousands of variables and hundreds of thousands of observations), which is not assumed to be normal (or any other common ...
1
vote
0
answers
57
views
Does Double/Debiased Machine Learning(DML) apply if I have cross terms(interaction terms)?
Suppose I have a regression $Y=a_0+a_1D+a_2DX_1+a_3X_1+a_4X_2+\cdots+e,$ I'm interested in the coefficient in front of $D$ and $D$ interacting with $X_1$ (I want to see the direct effect of $D$ and ...
1
vote
0
answers
110
views
High-Dimensional Function Approximation with Uncertainty Quantification
Gaussian Processes are considered the gold-standard for regression with formal uncertainty guarantees. For this reason, they are used extensively to model system dynamics by researchers in the domain ...
1
vote
1
answer
110
views
Identifying potential food triggers for regurgitation using meal-level data
I'm helping someone track a recurring health symptom (regurgitation) that appears to be triggered by certain foods. We have a food log with 309 meals, each labeled as breakfast, lunch, or dinner. The ...
1
vote
0
answers
38
views
PCA and sparse PCA - what to use, how to interpret [duplicate]
I have a question regarding application and interpretation of the yielded results based on 2 techniques: PCA and sparse PCA.
I have a proteomics dataset, 10 subjects in each group ( 3 groups in total -...
0
votes
0
answers
100
views
PCA on zero inflated data [duplicate]
I have a dataset with 5 groups : 3 consists of patients with different cancer types, one consists of patients with benign tumour, another is a healthy control group. The proteins are measured in a way ...
3
votes
1
answer
129
views
Maximum likelihood with regularization
Maximum likelihood estimators (subject to regularity conditions) have very nice asymptotic properties. However with high dimensional data you are unlikely to have sufficient observations for this ...
1
vote
0
answers
62
views
Transformed fitting of a lasso model
Consider the standard LASSO regression problem:
$$
\hat{\beta} = \arg\min_{\beta} \frac{1}{2} \| y - X\beta \|^2 + \lambda \sum_{j=1}^{p} |\beta_j|.
$$
Now, suppose we fit the LASSO model using a ...
1
vote
0
answers
43
views
Natural ordered data in high dimensional covariance matrix estimation problem
I am reading some articles about estimation covariance matrix in high dimensional case and authors often mention about natural order of variables (for example https://arxiv.org/abs/0901.3079 or https:/...
2
votes
0
answers
57
views
Gaussian Graphical Models: Why Is Bounded Eigenvalues a Standard Assumption of High-Dimensional GGM Methodology?
From reading the literature on Gaussian graphical model methodology for high-dimensional data (where we may have dimension d > n), it is clear that the assumption of uniformly bounded eigenvalues ...
0
votes
0
answers
76
views
Visualizing high dimensional vectors into 2D polar space
Actually this is dimensionality reduction problem, but using t-SNE or UMAP should finding the right parameter and depends on dataset availability. The problem is, the number of samples is increasing ...
0
votes
0
answers
47
views
Can convolutional neural networks be used for mixed frequency time series?
For example in whether prediction different sensors can get data at different frequencies - 15 seconds 30 seconds 1 minute. One network that predicts a value every 1 hour can it use all the data ...
0
votes
0
answers
46
views
Does the curse of dimensionality apply to the Dirichlet distribution?
I was analyzing how the solution space of the Dirichlet distribution evolves as the number of parameters increases. I initially attempted to measure the "volume" covered by the Dirichlet ...
5
votes
0
answers
158
views
Rate of convergence of $\ell_1$-penalized quantile regression is $\sqrt{\frac{s\log (p \vee n)}{n}}$
In the standard LASSO literature, you often encounter that the LASSO estimator converges at a rate of $\sqrt{\frac{s\log p}{n}}$ (see e.g. this post).
A related method is the $\ell_1$-penalized ...
26
votes
2
answers
4k
views
Why doesn't ML suffer from curse of dimensionality?
Disclaimer: I asked this question on Data Science Stack Exchange 3 days ago, and got no response so far. Maybe it is not the right site. I am hoping for more positive engagement here.
This is a ...
0
votes
0
answers
98
views
What does consistency mean in high-dimensional settings?
In some papers about $\mathcal{l}_1$-penalized regression,
$$
\hat{\beta}=\underset{\beta\in\mathbb{R}^{p}}{\operatorname{\arg\min}}\|Y-X\beta\|_2^2+\lambda\|\beta\|_1,
$$
the authors say that they ...
3
votes
1
answer
359
views
Bayesian inference in high-dimension for a non-linear multimodal model
Consider the following model: I am sampling a Bernoulli variable with a probability $p$ given by
\begin{equation}
p(\omega_i, \tau) := \frac{1}{2} + \frac{1}{2 n} \left[ \sum_{i=1}^{n} \cos (\omega_i \...
1
vote
1
answer
77
views
The value of scale parameter σ in accelerated failure time model
my model follows Weibull distribution, my question about σ is when we could replace it with one and when we may consider it a scale parameter?
2
votes
1
answer
141
views
Predicting a Noisy Target with High-dimensional Features
I am working on a regression problem with two sets of continuous features, $X_1$ and $X_2$ that I assume are useful for predicting a continuous target $y$ that is very noisy. By "noisy" I ...
1
vote
0
answers
88
views
Proof of lemma 7.24 in High-Dimensional Statistics: A Non-Asymptotic Viewpoint
The lemma is a part of the proof of Theorem 7.16.
Theorem 7.16 states that(let $\rho^2(\Sigma)$ be the maximal diagonal entry of $\Sigma$)
Let $X \in \mathbb{R}^{n \times d}$ with each row $x_i \in \...
1
vote
0
answers
95
views
Efficient Methods for Approximating High-Dimensional Integrals with Gaussian-Like Factors
I'm seeking a computationally efficient method to approximately evaluate high-dimensional integrals of the form:
$$\int f(\textbf{x}) \prod_i g_i(x_i) \, d\textbf{x}$$
where $f(\mathbf{x}) = (\mathbf{...
3
votes
0
answers
140
views
Does the loss function necessarily converge to zero as we get more samples?
Background: I'm reading quite a long paper(link) about quantile regression which uses the transfer learning method in this field. A crucial part of this method is to avoid negative transfer --when you ...
1
vote
0
answers
18
views
Sampling from a hypersphere subject to a linear constraint? [duplicate]
I'm running into efficiency issues when trying to sample from a "hypercone" using rejection sampling. By a hypercone, I mean the set of vectors $C_{v,\beta} = \{w \sim N(0,1)\ |\ w^T v \geq \...
4
votes
1
answer
104
views
The sum of $O_p$ --$ O_p \left(s^2\frac{\log d}{n}+s\sqrt{\frac{\log d}{n}} \right) $
I read papers in the area of inference for high-dimensional graphical models and these papers always state the convergence rate of the estimator. Using $O_p$ is a good choice.
Maybe I made some ...
1
vote
0
answers
62
views
how to approximate the eigendecomposition of a correlation matrix when the data have been standardized?
Context
I am working to develop a penalized regression framework that will scale up to analyzing high dimensional data with a certain correlation structure. Let $X$ represent an $n \times p$ matrix of ...
6
votes
1
answer
215
views
Bound on Rademacher complexity using polynomial discrimination
This is lemma 4.14 in Wainwright's textbook on High-Dimensional Statistics, it states that given a class of function $\mathcal{F}$ has polynomial discrimination of order $v$, then for all integer $n$ ...
0
votes
1
answer
87
views
High dimensional regression with millions of covariates/features
as a matter of preamble, I am a machine learning researcher. I am interested if this community can point me to research and work showing settings that have performed regression where the number of ...
1
vote
1
answer
70
views
The covering number of a d-dim cube
In Martin Wainwright's textbook, equation (5.5) states that the $\delta$-covering number of the d-dimensional cube satisfies
$$
\log N(\delta; [0,1]^d) \asymp d \log(\frac{1}{\delta}),
$$
for small ...
3
votes
1
answer
105
views
Should we routinely conduct unsupervised learning when reporting descriptive statistics on data?
A standard approach prior to conducting a predictive or inferential analysis is to report some basic univariate descriptive statistics on the study variables: mean, median, minimum, maximum, variance, ...
0
votes
1
answer
90
views
What is the meaning of $\asymp$ and $\lesssim$ in Martin wainwright's high dim textbook? [closed]
Unfortunately, this text book did not provide a table of notations he used.
Can anyone provide me with a definition of $\asymp$ and $\lesssim$ and few examples?
For an example in the book, in display (...
1
vote
1
answer
91
views
Modeling a high dimensional multicollinear data
I am trying to predict a Plant physiology trait (y) from hyper spectral reflectance data from 400 to 2400nm (X). So far i have done the following
Skew correction with Square root (sqrt) on y
Scaling ...
1
vote
0
answers
187
views
Maximum Likelihood in High Dimensions [closed]
What are some examples of high-dimensional random variables for which MLE are solved using numerical methods because we are unable to explicitly solve the equations nicely? The only example to comes ...
1
vote
0
answers
72
views
Large $N$, small $T$ in SUR: workaround using system GMM
Consider a system of linear equations as in seemingly unrelated regression (SUR). If the number of equations $N$ is large relative to the sample size $T$, the weighting matrix in SUR (i.e. the error ...
1
vote
0
answers
36
views
Expected value of Cosinus in High dimension
I would like to prove that the cosinus of the angle formed by 3 randomly points tends to $\frac{1}{2}$ as the dimensionality tends to $\infty$. Could it be solved with the expected value formula ? It ...
1
vote
0
answers
42
views
Benjamini Hochberg Procedure [closed]
I am working on a problem for class related to multiple testing where I would like to run the BH procedure with a known $\pi_{0}$, denoting the proportion of hypothesis that are truly null, given ...
4
votes
1
answer
354
views
Explanation of the proof that SCAD penalty has the oracle property
I am trying to understand the proof that the SCAD has the oracle property. Could you help me with an explanation and a full break down of the steps, so that I can understand it?
I'm unclear on how ...
3
votes
1
answer
112
views
How do I interpret a second-order multi-variate growth model?
I am running a multi-variate second order growth model.
I have two factors, which are conceptually related to each other measured on 7 different occasions.
Wanting to know how the two factors ...
1
vote
1
answer
118
views
Why does FPCA not use scaling as PCA?
Functional principal component analysis (FPCA), according to the original paper, does not use scaling before FPCA, as in PCA. Instead, it uses a covariance matrix to compute the eigen-components.
I ...
8
votes
1
answer
359
views
Regression At Scale: Best Practices Around Ensuring Quality of a Large Numbers of Forecasts
Background
Often I am forecasting possibly one up to a few dozen variables in a project, but I have an upcoming project that will involve forecasting thousands of variables. I have some ideas of my ...
1
vote
0
answers
148
views
Concentration around the median implies concentration around the mean [duplicate]
Let $M$ denote the median of a function $f(X)$ that is Lipschitz continuous with $\left \| f \right \|_{Lip}=1$. I am trying to show that if $\left \| f(X)-M \right \|_{\psi_{2}}\leq C$, then $\left \|...
0
votes
0
answers
89
views
Independent features but PCA improves classifiers accuracy significantly. Why?
that's my first question on here :)
I am working with the kNN classifier on datasets from the multivariate normal distribution. I have to groups coming from ...
1
vote
0
answers
133
views
Efficient way to compute covariance matrix of Vector Autoregressive Process of order 1 (VAR)
For a VAR process
$$
X_t = A_1 X_{t-1} + \epsilon_t
$$
The covariance of $X_t$ can be computed in the following way:
$$
\text{vec}(\Sigma) = (I -(A \otimes A))^{-1} \text{vec}(\Sigma_{\epsilon})
$$
...
3
votes
1
answer
154
views
Selecting variables using lasso algorithm
I have a question concerning a large dataset with 94 observation and 15000 variables.
For data mining models (boosting, trees, neural networks...) this number of variables are too much and I have to ...
0
votes
1
answer
262
views
Choosing a probability distribution for 4D data: dirichlet challenges and alternatives
I'm seeking the right distribution for my 4D data, where the sum of values in each sample equals one. Currently, I've chosen to employ the Dirichlet distribution. However, upon applying this ...
0
votes
1
answer
185
views
What exactly is the KKT check and what is the point of it?
In the paper for strong screening rules for the lasso (link), the following screening algorithm is proposed (start of chapter 7):
Let $S(\lambda)$ be the strong rule set. Then the following strategy ...
0
votes
0
answers
111
views
Comparing scree plots or explained variance of two groups with different number of features after PCA
I want to define the dimensionality of a group as the number of PC features that can explain 80% of the variance in the group dataset. This intuition seems to work for a single group, however, if I ...
2
votes
2
answers
292
views
Tensorization of entropy: confusion regarding conditional entropy
I'm reading High-Dimensional Statistics by Wainright.
In the book, entropy for random variable $Z \geq 0$ is defined as $H(Z) = E[Z \log Z]- E[Z] \log E[Z]$. My understanding is that $H(Z)$ is a ...
0
votes
1
answer
919
views
Seeking recommendations for feature selection methods before applying a random forest model to high-dimensional data
I'm seeking recommendations for feature selection methods before applying a random forest model to high-dimensional data, specifically with over 60,000 features and only 1,000 samples. My concern is ...