I am working on a project analyzing olive plantation data, where I aim to simulate the relationship between investment costs (Costs), revenues (Revenues), and temperature (Temp) over time, accounting for the specific temporal dynamics of the data. The goal is to generate realistic scenarios for future tree plantations. My idea is to employ copulas.
The data I have consists of annual records for 10 years, where:
Costsrepresent the investments required for the olive plantation.Revenuesrepresent the returns from the sale of olives.Tempis the annual average temperature.TempCngis the annual temperature change.
Since the data is inherently temporal (i.e., Costs and Revenues are not independent and identically distributed over time), I aim to capture the time structure, particularly the significant initial investments (Costs), followed by revenues (Revenues) that only materialize after several years as the trees need time to grow. To address this, I include a time trend variable in my analysis.
Here’s my approach so far:
# Packages
library(VineCopula)
library(copula)
# Synthetic data for convenience
Costs <- c(100, 0, 150, 50, 0, 0, 0, 0, 0, 0)
Revenues <- c(0, 0, 0, 50, 0, 225, 100, 0, 150, 5)
Temp <- c(20.00, 21.60, 16.05, 15.68, 17.40, 19.51, 19.87, 19.02, 18.21, 18.18)
TempCng <- c(0.001464764, diff(Temp) / head(Temp, -1))
Years <- seq(2008,2017)
# Create data frame
OliveTrees <- data.frame(Costs, Revenues, Temp, TempCng, row.names = Years)
# Compute mean and standard deviation
mu_C <- mean(Costs)
mu_R <- mean(Revenues)
mu_T <- mean(TempCng)
sigma_C <- sd(Costs)
sigma_R <- sd(Revenues)
sigma_T <- sd(TempCng)
# Normalize the data
OliveTrees$CNorm <- (OliveTrees$Costs - mu_C) / sigma_C
OliveTrees$RNorm <- (OliveTrees$Revenues - mu_R) / sigma_R
OliveTrees$TNorm <- (OliveTrees$TempCng - mu_T) / sigma_T
# Apply empirical distribution
C_dist <- pobs(OliveTrees$CNorm)
R_dist <- pobs(OliveTrees$RNorm)
T_dist <- pobs(OliveTrees$TNorm)
# Time trend (sequence of years)
S_dist <- pobs(1:nrow(OliveTrees))
# Combine the distributions
U <- cbind(C_dist, R_dist, T_dist, S_dist)
# Fit a Gaussian copula
CopulaModel <- normalCopula(dim = 4, dispstr = 'un')
FittedCopula <- fitCopula(CopulaModel, U, method = 'ml')
CopulaModel@parameters <- coef(FittedCopula)
# Simulate from the copula
set.seed(321)
U <- rCopula(n = nrow(OliveTrees), CopulaModel)
# Sort the simulated values to account for the time trend
U <- U[order(U[, 4]), ]
# Apply the inverse CDF to get the simulated values
C_sim <- quantile(OliveTrees$CNorm, U[, 1])
R_sim <- quantile(OliveTrees$RNorm, U[, 2])
T_sim <- quantile(OliveTrees$TNorm, U[, 3])
# Denormalize the simulated values
C_sim <- round(C_sim * sigma_C + mu_C, 2)
R_sim <- round(R_sim * sigma_R + mu_R, 2)
T_sim <- T_sim * sigma_T + mu_T
# Create a data frame for the simulation results
OliveTrees_sim <- data.frame(C_sim, R_sim, T_sim, row.names = Years)
OliveTrees_sim$Temp <- round(OliveTrees$Temp[1] * c(1, cumprod(1 + OliveTrees_sim$T_sim[2:length(OliveTrees_sim$T_sim)])), 2)
My Questions:
Is this copula approach valid for accounting for the temporal dynamics of olive plantation data? Specifically, temporal dynamics refer to the fact that there are large initial costs followed by growing revenues, and that both are not IID due to the time structure.
Is including a time trend (in the form of a sequence of years) a suitable solution for modeling the temporal dependencies?
Is there any literature or research that supports this approach, or are there better ways to model the temporal dependency in the data?
Are there any better modeling approaches or improvements that could better capture the temporal dynamics between Costs, Revenues, and Temperature?
EDIT: Motivation of my analysis: calculate the internal rate of return of my investment and understand how sensitive it is to temperature changes. I will for sure include other variables (rain, storms,...), the above version of the data variables is a simplification.
Thank you for your help!


