Skip to main content

Questions tagged [high-dimensional]

Pertains to a large number of features or dimensions (variables) for data. (For a large number of data points, use the tag [large-data]; if the issue is a larger number of variables than data, use the [underdetermined] tag.)

Filter by
Sorted by
Tagged with
1 vote
0 answers
80 views

I am working with a Gaussian process $(X_t(x))_{x \in [0,1], t \geq 0}$ which evolves jointly in space and time. I know the statistics of this process: $\mathbf{E} X_t(x) = X_0(x) e^{-\mu r t} + (1-e^{...
Mete Yuksel's user avatar
1 vote
0 answers
26 views

The paper "Deep Quantile Regression: Mitigating the Curse of Dimensionality Through Composition" makes the following claim (top of page 4): It is clear that smoothness is not the right ...
Chris's user avatar
  • 322
1 vote
0 answers
98 views

I'm trying to calculate a PDF for a dataset with high dimensionality (thousands of variables and hundreds of thousands of observations), which is not assumed to be normal (or any other common ...
tkw954's user avatar
  • 293
1 vote
0 answers
57 views

Suppose I have a regression $Y=a_0+a_1D+a_2DX_1+a_3X_1+a_4X_2+\cdots+e,$ I'm interested in the coefficient in front of $D$ and $D$ interacting with $X_1$ (I want to see the direct effect of $D$ and ...
ExcitedSnail's user avatar
  • 3,090
1 vote
0 answers
110 views

Gaussian Processes are considered the gold-standard for regression with formal uncertainty guarantees. For this reason, they are used extensively to model system dynamics by researchers in the domain ...
M.C. Escher's user avatar
1 vote
1 answer
110 views

I'm helping someone track a recurring health symptom (regurgitation) that appears to be triggered by certain foods. We have a food log with 309 meals, each labeled as breakfast, lunch, or dinner. The ...
adamkski's user avatar
1 vote
0 answers
38 views

I have a question regarding application and interpretation of the yielded results based on 2 techniques: PCA and sparse PCA. I have a proteomics dataset, 10 subjects in each group ( 3 groups in total -...
ariadnaoliver's user avatar
0 votes
0 answers
100 views

I have a dataset with 5 groups : 3 consists of patients with different cancer types, one consists of patients with benign tumour, another is a healthy control group. The proteins are measured in a way ...
ariadnaoliver's user avatar
3 votes
1 answer
129 views

Maximum likelihood estimators (subject to regularity conditions) have very nice asymptotic properties. However with high dimensional data you are unlikely to have sufficient observations for this ...
UserB1234's user avatar
  • 147
1 vote
0 answers
62 views

Consider the standard LASSO regression problem: $$ \hat{\beta} = \arg\min_{\beta} \frac{1}{2} \| y - X\beta \|^2 + \lambda \sum_{j=1}^{p} |\beta_j|. $$ Now, suppose we fit the LASSO model using a ...
user19904's user avatar
  • 294
1 vote
0 answers
43 views

I am reading some articles about estimation covariance matrix in high dimensional case and authors often mention about natural order of variables (for example https://arxiv.org/abs/0901.3079 or https:/...
spenziak's user avatar
2 votes
0 answers
57 views

From reading the literature on Gaussian graphical model methodology for high-dimensional data (where we may have dimension d > n), it is clear that the assumption of uniformly bounded eigenvalues ...
Naive Bayes's user avatar
0 votes
0 answers
76 views

Actually this is dimensionality reduction problem, but using t-SNE or UMAP should finding the right parameter and depends on dataset availability. The problem is, the number of samples is increasing ...
Muhammad Ikhwan Perwira's user avatar
0 votes
0 answers
47 views

For example in whether prediction different sensors can get data at different frequencies - 15 seconds 30 seconds 1 minute. One network that predicts a value every 1 hour can it use all the data ...
IKNv99's user avatar
  • 111
0 votes
0 answers
46 views

I was analyzing how the solution space of the Dirichlet distribution evolves as the number of parameters increases. I initially attempted to measure the "volume" covered by the Dirichlet ...
EngineerMathlover's user avatar
5 votes
0 answers
158 views

In the standard LASSO literature, you often encounter that the LASSO estimator converges at a rate of $\sqrt{\frac{s\log p}{n}}$ (see e.g. this post). A related method is the $\ell_1$-penalized ...
Stan's user avatar
  • 724
26 votes
2 answers
4k views

Disclaimer: I asked this question on Data Science Stack Exchange 3 days ago, and got no response so far. Maybe it is not the right site. I am hoping for more positive engagement here. This is a ...
Landon Carter's user avatar
0 votes
0 answers
98 views

In some papers about $\mathcal{l}_1$-penalized regression, $$ \hat{\beta}=\underset{\beta\in\mathbb{R}^{p}}{\operatorname{\arg\min}}\|Y-X\beta\|_2^2+\lambda\|\beta\|_1, $$ the authors say that they ...
mathhahaha's user avatar
3 votes
1 answer
359 views

Consider the following model: I am sampling a Bernoulli variable with a probability $p$ given by \begin{equation} p(\omega_i, \tau) := \frac{1}{2} + \frac{1}{2 n} \left[ \sum_{i=1}^{n} \cos (\omega_i \...
MrRobot's user avatar
  • 205
1 vote
1 answer
77 views

my model follows Weibull distribution, my question about σ is when we could replace it with one and when we may consider it a scale parameter?
Ahmed Nazih's user avatar
2 votes
1 answer
141 views

I am working on a regression problem with two sets of continuous features, $X_1$ and $X_2$ that I assume are useful for predicting a continuous target $y$ that is very noisy. By "noisy" I ...
someben's user avatar
  • 23
1 vote
0 answers
88 views

The lemma is a part of the proof of Theorem 7.16. Theorem 7.16 states that(let $\rho^2(\Sigma)$ be the maximal diagonal entry of $\Sigma$) Let $X \in \mathbb{R}^{n \times d}$ with each row $x_i \in \...
Phil's user avatar
  • 830
1 vote
0 answers
95 views

I'm seeking a computationally efficient method to approximately evaluate high-dimensional integrals of the form: $$\int f(\textbf{x}) \prod_i g_i(x_i) \, d\textbf{x}$$ where $f(\mathbf{x}) = (\mathbf{...
yrx1702's user avatar
  • 730
3 votes
0 answers
140 views

Background: I'm reading quite a long paper(link) about quantile regression which uses the transfer learning method in this field. A crucial part of this method is to avoid negative transfer --when you ...
mathhahaha's user avatar
1 vote
0 answers
18 views

I'm running into efficiency issues when trying to sample from a "hypercone" using rejection sampling. By a hypercone, I mean the set of vectors $C_{v,\beta} = \{w \sim N(0,1)\ |\ w^T v \geq \...
billybobsteve's user avatar
4 votes
1 answer
104 views

I read papers in the area of inference for high-dimensional graphical models and these papers always state the convergence rate of the estimator. Using $O_p$ is a good choice. Maybe I made some ...
mathhahaha's user avatar
1 vote
0 answers
62 views

Context I am working to develop a penalized regression framework that will scale up to analyzing high dimensional data with a certain correlation structure. Let $X$ represent an $n \times p$ matrix of ...
Tabitha Peter's user avatar
6 votes
1 answer
215 views

This is lemma 4.14 in Wainwright's textbook on High-Dimensional Statistics, it states that given a class of function $\mathcal{F}$ has polynomial discrimination of order $v$, then for all integer $n$ ...
Mondayisgood's user avatar
0 votes
1 answer
87 views

as a matter of preamble, I am a machine learning researcher. I am interested if this community can point me to research and work showing settings that have performed regression where the number of ...
adebayoj's user avatar
1 vote
1 answer
70 views

In Martin Wainwright's textbook, equation (5.5) states that the $\delta$-covering number of the d-dimensional cube satisfies $$ \log N(\delta; [0,1]^d) \asymp d \log(\frac{1}{\delta}), $$ for small ...
Mondayisgood's user avatar
3 votes
1 answer
105 views

A standard approach prior to conducting a predictive or inferential analysis is to report some basic univariate descriptive statistics on the study variables: mean, median, minimum, maximum, variance, ...
RobertF's user avatar
  • 6,644
0 votes
1 answer
90 views

Unfortunately, this text book did not provide a table of notations he used. Can anyone provide me with a definition of $\asymp$ and $\lesssim$ and few examples? For an example in the book, in display (...
Mondayisgood's user avatar
1 vote
1 answer
91 views

I am trying to predict a Plant physiology trait (y) from hyper spectral reflectance data from 400 to 2400nm (X). So far i have done the following Skew correction with Square root (sqrt) on y Scaling ...
Aaron Poruthoor's user avatar
1 vote
0 answers
187 views

What are some examples of high-dimensional random variables for which MLE are solved using numerical methods because we are unable to explicitly solve the equations nicely? The only example to comes ...
Nicolas Bourbaki's user avatar
1 vote
0 answers
72 views

Consider a system of linear equations as in seemingly unrelated regression (SUR). If the number of equations $N$ is large relative to the sample size $T$, the weighting matrix in SUR (i.e. the error ...
Richard Hardy's user avatar
1 vote
0 answers
36 views

I would like to prove that the cosinus of the angle formed by 3 randomly points tends to $\frac{1}{2}$ as the dimensionality tends to $\infty$. Could it be solved with the expected value formula ? It ...
Jérémy's user avatar
1 vote
0 answers
42 views

I am working on a problem for class related to multiple testing where I would like to run the BH procedure with a known $\pi_{0}$, denoting the proportion of hypothesis that are truly null, given ...
Harry Lofi's user avatar
4 votes
1 answer
354 views

I am trying to understand the proof that the SCAD has the oracle property. Could you help me with an explanation and a full break down of the steps, so that I can understand it? I'm unclear on how ...
mike's user avatar
  • 41
3 votes
1 answer
112 views

I am running a multi-variate second order growth model. I have two factors, which are conceptually related to each other measured on 7 different occasions. Wanting to know how the two factors ...
NZK's user avatar
  • 315
1 vote
1 answer
118 views

Functional principal component analysis (FPCA), according to the original paper, does not use scaling before FPCA, as in PCA. Instead, it uses a covariance matrix to compute the eigen-components. I ...
Palantir's user avatar
8 votes
1 answer
359 views

Background Often I am forecasting possibly one up to a few dozen variables in a project, but I have an upcoming project that will involve forecasting thousands of variables. I have some ideas of my ...
Galen's user avatar
  • 10.1k
1 vote
0 answers
148 views

Let $M$ denote the median of a function $f(X)$ that is Lipschitz continuous with $\left \| f \right \|_{Lip}=1$. I am trying to show that if $\left \| f(X)-M \right \|_{\psi_{2}}\leq C$, then $\left \|...
Shawn Kemp's user avatar
0 votes
0 answers
89 views

that's my first question on here :) I am working with the kNN classifier on datasets from the multivariate normal distribution. I have to groups coming from ...
Superintendant's user avatar
1 vote
0 answers
133 views

For a VAR process $$ X_t = A_1 X_{t-1} + \epsilon_t $$ The covariance of $X_t$ can be computed in the following way: $$ \text{vec}(\Sigma) = (I -(A \otimes A))^{-1} \text{vec}(\Sigma_{\epsilon}) $$ ...
Dylan Dijk's user avatar
3 votes
1 answer
154 views

I have a question concerning a large dataset with 94 observation and 15000 variables. For data mining models (boosting, trees, neural networks...) this number of variables are too much and I have to ...
ali's user avatar
  • 61
0 votes
1 answer
262 views

I'm seeking the right distribution for my 4D data, where the sum of values in each sample equals one. Currently, I've chosen to employ the Dirichlet distribution. However, upon applying this ...
roan's user avatar
  • 1
0 votes
1 answer
185 views

In the paper for strong screening rules for the lasso (link), the following screening algorithm is proposed (start of chapter 7): Let $S(\lambda)$ be the strong rule set. Then the following strategy ...
Sparsity's user avatar
  • 584
0 votes
0 answers
111 views

I want to define the dimensionality of a group as the number of PC features that can explain 80% of the variance in the group dataset. This intuition seems to work for a single group, however, if I ...
Jules's user avatar
  • 13
2 votes
2 answers
292 views

I'm reading High-Dimensional Statistics by Wainright. In the book, entropy for random variable $Z \geq 0$ is defined as $H(Z) = E[Z \log Z]- E[Z] \log E[Z]$. My understanding is that $H(Z)$ is a ...
Phil's user avatar
  • 830
0 votes
1 answer
919 views

I'm seeking recommendations for feature selection methods before applying a random forest model to high-dimensional data, specifically with over 60,000 features and only 1,000 samples. My concern is ...
Meow Mix's user avatar

1
2 3 4 5
9