Skip to main content

Questions tagged [pca]

Principal component analysis (PCA) is a linear dimensionality reduction technique. It reduces a multivariate dataset to a smaller set of constructed variables preserving as much information (as much variance) as possible. These variables, called principal components, are linear combinations of the input variables.

Filter by
Sorted by
Tagged with
0 votes
0 answers
12 views

Let $X \in \mathbb{R}^d$ be a random vector with covariance matrix $\Sigma$, with its eigenvalues ordered as $\lambda_1\geq \lambda_2 \geq \ldots \geq \lambda_d$, and the corresponding orthonormal ...
Phil's user avatar
  • 830
1 vote
1 answer
50 views

Suppose I have two multi-dimensional population samples - $A$ and $B$. I hypothesise that $\mathbb{E}[A]$ and $\mathbb{E}[B]$ are orthogonal in this high-dimensional space. To test this hypothesis, I ...
sunnydk's user avatar
  • 127
0 votes
0 answers
22 views

I am working with a compositional dataset: A very efficient way of dealing with compositional data is by applying clr-transform (or a similar), which effectively converts them to data in Eucledean ...
Roger V.'s user avatar
  • 5,091
0 votes
0 answers
112 views

I'm new to machine learning and don't post here much, but myself and my lab are a bit stumped here. I have trained an elastic net classifier on some cortical thickness (CT) data by region of interest (...
McKinney Pitts's user avatar
0 votes
0 answers
25 views

I'm working on a survival analysis using Cox models where the exposure is a binary grouping variable, and I'm adjusting for a set of classical epidemiological covariates (sex, smoking, diabetes, ...
Javier Hernando's user avatar
2 votes
0 answers
287 views

Upon reading the abstract of a recently published paper in ecology, I came across the claim: Our results suggest that the chromatic contrasts of colours are non-redundant with the intensity of ...
AvadaMouse's user avatar
4 votes
1 answer
147 views

I am trying to iteratively optimize a set of vectors $\{w_1, w_2, ..., w_n\}$ such that the following holds: $$ w_r = \begin{cases} \underset{w}{\arg\min} \; \sum_x \left\lVert (x^\top w) w - x \...
Aniruddha's user avatar
  • 143
0 votes
1 answer
71 views

I'm working with a large tabular dataset (~1.2 million rows) that includes 7 qualitative features and 3 quantitative ones. For dimensionality reduction, I'm using FAMD (Factor Analysis for Mixed Data) ...
Duarte Silva 's user avatar
1 vote
0 answers
38 views

Suppose I have $n$ vectors $v_1,\dots,v_n \in \mathbb{R}^d$. Let's assume there is some underlying direction common to all of them, and each $v_i$ is a noisy version of that direction, and the goal ...
D.W.'s user avatar
  • 7,188
4 votes
1 answer
69 views

I'm wondering if this is correct reasoning: SVD constructs new orthogonal vectors as linear combinations of the rows and columns in the data. In effect correlation among the original variables are ...
Andreas's user avatar
  • 65
0 votes
0 answers
51 views

There is something I have an intuition on but my numerical toy examples do not confirm, and I really want to understand where is my mistake. I suppose that I have a random vector $X = (X_1, \cdots, ...
arthur_elbrdn's user avatar
1 vote
0 answers
95 views

I’m working with two malware datasets (dataset‑1 and dataset‑2) each with 256 features, but different ratios of malicious vs. benign samples. I’ve merged them into a third set (dataset‑3). The sample ...
0xh3xa's user avatar
  • 123
3 votes
2 answers
576 views

I'm applying K-Means clustering to a dataset of ship voyages. The goal is to group voyages into performance-based clusters like cost-efficient, underperforming, etc. I have 12 features in total: 10 ...
ssmalik's user avatar
  • 41
0 votes
0 answers
68 views

How do you decide the number of principal components (PC) to include in principal component regression (PCR)? I have seen these methods: choosing the lowest RMSEP with the pls() package Choosing PC's ...
Osuke Miyamaru's user avatar
0 votes
1 answer
110 views

I'm trying to create a model (which is more interprative than explanatory), in order to model the relationship between water quality (e.g. ammonium concentration, chlorine concentration) and regional ...
Osuke Miyamaru's user avatar
0 votes
0 answers
63 views

For N participants I have M measures for which a normative model is avalable. Let's assume these measures are hand finger lengths (so M=5), z=0 means the length of that finger is the mean in the ...
fabiob's user avatar
  • 762
0 votes
0 answers
36 views

So, I have a general question regarding PCA. As far as I understand, before performing PCA you are supposed to perform a correlation analysis between the features so that redundant features can be ...
Sunera Wijeratne's user avatar
0 votes
0 answers
55 views

I've got around 50000 companies and majority of them have 2 data points for their revenue: for 2023 and for 2024. The 2 metrics that I'm told to use are: absolute growth, which is just a difference ...
Makina's user avatar
  • 113
0 votes
0 answers
31 views

So I've done two separate tests, a PCA and a GLMM, using the same groups of individuals. The experiments have to do with animal behavior, so I did preliminary recordings of how the animals interact ...
Kitt's user avatar
  • 1
0 votes
0 answers
79 views

I am currently doing a research where I am finding the relationship between the quality of wastewater (e.g. biochemical oxygen demand, amount of nitrogen...) and regional characteristics of that ...
Osuke Miyamaru's user avatar
1 vote
1 answer
197 views

The elbow method is commonly used with K-means clustering to determine the optimal number of clusters by plotting the within-cluster sum of squares (WCSS) against the number of clusters and looking ...
0xh3xa's user avatar
  • 123
4 votes
1 answer
137 views

I am familiar with the PCA algorithm for dimension reduction. But I would like every element of the first principal component to have positive sign. So when I try to use my principal component, it's a ...
CuriousMind's user avatar
  • 2,365
1 vote
1 answer
80 views

I plan to do an ordinal logistic regression (plus I'm new to SAS v9.4). My dependent and independent variables are ordinals (Likert types), but I want to add about 35 covariates (possible confounders) ...
David Musoke's user avatar
0 votes
0 answers
28 views

I’ve been working on implementing a binary variant of probabilistic PCA (PPCA) in Python (based on this paper), which uses variational EM for parameter estimation due to the non-conjugacy between the ...
Net_Raider's user avatar
2 votes
0 answers
56 views

I think I understand how one could view PCA as a means to find the basis vectors that, once a projection is done onto the subspace spanned by these vectors, maximizes the variance of the new dataset ...
Ahmed Addous's user avatar
1 vote
1 answer
126 views

When interpreting loadings for different principal components in PCA, sometimes the same variable will have a positive loading for one PC, and a negative loading in another PC, despite both PCs having ...
IMESS's user avatar
  • 11
2 votes
1 answer
104 views

New to multivariate analyses in R. I have two datasets consisting of multivariate response variables (e.g., physiological and environmental measures in a wildlife species) and I want to assess ...
CRO's user avatar
  • 31
0 votes
0 answers
71 views

I have calculated relative abundance of species count data. I then removed the species <2%. I want to transform these data using the square root method to reduce the dominance of some species ...
bob_bonnine's user avatar
0 votes
0 answers
35 views

How do we feed the rotated loadings obtained through varimax rotation using psych package to hierarchial clustering in the FactomineR package (HCPCC())? I want to use the rotated components instead of ...
Harshad's user avatar
  • 81
4 votes
2 answers
837 views

I am studying a statistics course on multivariate analysis. The course starts with regression models (simple and multiple) and then moves to interdependence with a focus on dimensionality-reduction ...
phd2fa's user avatar
  • 43
2 votes
1 answer
90 views

trying to follow best practices. I’ve run a PCA using prcomp() in R on a set of scaled numeric features. Now I want to check if there's any visual separation between groups in my target categorical ...
Xinovy's user avatar
  • 21
0 votes
0 answers
57 views

I hope someone can help me with this issue or point me in the right direction. I have recently gotten myself into structural equation modelling (SEM) via PLS-SEM. However, I ran into the issue of ...
Mabso's user avatar
  • 1
0 votes
0 answers
20 views

I’m implementing my own version of PCA and comparing it with scikit-learn's PCA. However, I’m noticing a discrepancy in the signs of the principal components. Using scikit-learn ...
user avatar
0 votes
0 answers
44 views

I am trying to build an index using the gdpc package in R. I am struggling to compute the PC1 provided by the package from the loadings and the input series. I want to build a chart representing the ...
user469831's user avatar
0 votes
1 answer
110 views

I have a classification problem with ~2,500 observations and 50 features. I perform feature selection beforehand, reducing the set to around 17 features. While my selection methods effectively ...
randomstate42's user avatar
0 votes
0 answers
50 views

I am working with Principal Component Analysis (PCA) and trying to evaluate reconstruction error. Specifically I am interested in being able to compare the results of PCA on differently scaled data (...
Zack's user avatar
  • 3
1 vote
0 answers
86 views

So I'm analyzing an IDR (Intrinsically Disordered Region) amino acid sequence in several organisms(~ 700) in a particular protein and have calculated several features typically calculated for IDRs ...
Sunera Wijeratne's user avatar
0 votes
0 answers
35 views

I have data from a field expedition where quadrats were done in multiple sites. The data for each site represent the percent cover for the species identified (note that the rows don't necesserily add ...
M. Beausoleil's user avatar
2 votes
1 answer
131 views

I've found that the survey package in R allows using survey weights with principal component analysis, which is great. However, it doesn't seem to provide the same for correspondence analysis or ...
Coris's user avatar
  • 23
1 vote
0 answers
60 views

If we have a matrix $X\in\mathbb{R}^{n\times p}$ with SVD $X = UDV^T$, we can say for example that the columns of $V$ are the principal directions and the columns of $UD$ are the principal components (...
user19904's user avatar
  • 294
0 votes
0 answers
64 views

I am working on a case-cohort (~ case-control, but putting all cases in the subcohort) study evaluating miRNA markers. The variables of interest are continuous quantitative measures of miRNA ...
Javier Hernando's user avatar
1 vote
1 answer
113 views

I am looking into the relationship between linear Variational Autoencoder (VAE) and probabilistic PCA (pPCA) presented by Lucas et al. (2019). Don't blame the elbo! paper In the official ...
user1571823's user avatar
0 votes
0 answers
62 views

What are the main differences apart from the dynamic using lags? I read this paper where the explanation of static factor models was that given N time series of T periods each they can be used to ...
IKNv99's user avatar
  • 111
0 votes
0 answers
77 views

I’ve been working on a classification problem with thousands of features, and I’m struggling with feature selection. I found this article that has a great breakdown of different Feature Selection ...
EMER Marketing's user avatar
1 vote
1 answer
88 views

I am using Partial Least Square in order to obtain linear model parameters in case of correlated covariates. I would like to try clustering in the Partial Least Square latent space, that is the space ...
LearningAlgorithm's user avatar
1 vote
1 answer
192 views

I recently conducted a Principal Component Analysis (PCA) on a dataset with a four-category target variable. While the PCA score plot revealed excellent separation for one group, the remaining three ...
Mamad Fasih's user avatar
0 votes
0 answers
64 views

Background I am analyzing data on a latitude-longitude grid and want to account for geographic distortions caused by the Earth's curvature (higher data density near the poles). To correct this, I plan ...
n0rdp0l's user avatar
0 votes
0 answers
96 views

I have a bit of confusion regarding the scope of what PCA can do, and if it cannot do the thing I expected, whether any other, similar tool can. My understanding has been that PCA orthogonalizes ...
user10478's user avatar
  • 133
0 votes
0 answers
74 views

Assume that a random variable $y_{i,t}$ is governed by some linear factors $x_{t,j}$ and a random noise term $\epsilon_{i,t}$: $$ y_{i,t} = \sum_{j}^{M+1}\beta_{j,i}x_{t,j} + \epsilon_{i,t} $$ Written ...
deblue's user avatar
  • 399

1
2 3 4 5
70