This is kinda a tricky question, because it crosses disciplines. But I looking at the difference between time-series analysis in statistics, versus fitting parameters to ordinary differential equations. Now technically, both types of models are time series models. An ordinary differential equation (ODE) in the most common cases, plots the evolution of a state vector or variable over time.
Now, say I have a set of time series data, such as economic growth or someone's heart rate measurements over time, it does not matter. If I was going to use a classical time series method such as AR, or ARIMA, etc., then I would likely take the first differences of the observations and then estimate the parameters of the time series model. Taking the first differences would ensure stationarity of the data, as every good statistician knows :). Here is an example of an image from a blogpost. A shorthand ARIMA model is below with the AR and MA coefficients.
$$ y_t = \alpha + \beta_1 y_{t-1} + \cdots + \gamma_1 \epsilon_{t-1} + \cdots + \epsilon_t $$
However, in the applied math world we often fit parameters to a model based upon data. So I have a model and I want estimate the parameters of the differential equation. I would then define the loss function as the difference between the data and the solution of the ordinary differentiation given the parameter values. There are many ways to do this including shooting methods, collocation methods, etc. But I have never seen anyone think of stationarity when applying these ODE parameter fitting methods. The example below is taken from a tutorial for the Turing package in Julia. The simple ODE model below is vector values, but the fundamental equation is a function of the state $y$, time $t$, and a vector of parameters $\theta$.
$$ \frac{dy}{dt} = f(y, t, \theta) $$
Hence I was just wondering if there is a good explanation why ODE fitting methods don't really need to think about stationarity? Indeed, I am working on a discrete time series simulation with some difference equations, so I am right in the middle of this issue, haha.
I do have some basic intuition here. Interestingly enough, the ODE methods are essentially capturing the first differences, because the function $f(y, t, \theta)$, when discretized numerically, is the first difference scaled by the size of the timestep, or $\frac{dy}{dt} = \frac{f(y_{t+1}) - f(y_{t})}{h}$. But I was not sure how this finite difference idea relates back to the fundamental idea of stationarity? That is the crucial link that is kinda eluding me. If anyone has some thoughts, please chime in.

