I have spectroscopy data as X-variables (from X1 to X80) and corresponding Y variable. I need to run plsr model in R using "pls" package. There are two sheets. In one X-variables are raw (without any pre-processing). and in other sheet X-variables are vector-normalized
Here, what I have done.
library(pls)
data1 <- read.csv("spectra.csv", header = TRUE)
# dividing data into training and testing
index = sample(1:nrow(data1), size = 0.6*nrow(data1))
cal.data = data1[index,]
val.data = data1[-index,]
model1 <- plsr(Y ~., data = cal.data, scale = TRUE, ncomp = 15, validation = "LOO")
summary(model1)
Data: X dimension: 51 48
Y dimension: 51 1
Fit method: kernelpls
Number of components considered: 15
VALIDATION: RMSEP
Cross-validated using 51 leave-one-out segments.
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
CV 9.832 10.07 10.37 11.04 11.36 11.77 12.26
adjCV 9.832 10.07 10.36 11.03 11.33 11.76 12.22
7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
CV 11.92 13.54 14.22 15.38 15.58 15.60 16.01
adjCV 11.88 13.50 14.15 15.28 15.47 15.48 15.87
14 comps 15 comps
CV 16.73 16.62
adjCV 16.57 16.47
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
X 95.6513 98.402 99.161 99.71 99.80 99.85 99.88
Y 0.6143 4.903 7.394 10.53 18.13 31.04 38.64
8 comps 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps
X 99.94 99.95 99.96 99.96 99.96 99.97 99.97
Y 42.21 55.37 66.60 72.20 78.77 85.49 90.23
15 comps
X 99.97
Y 92.33
Currrently I am facing major two issues..
- from this result of plsr model, I am able to determine no. of components as RMSEP increasing with no. of components.
- R2 value is very very low (i.e., <0.1)
What can be done to determine PLS component and how can I increase model accuracy..
I have tried pls model on vector-normalised data as well but getting nearly same result.