In a paper from Atem, et al 2018 (DOI: 10.1002/bimj.201800275), they claim the following in section 3 regarding the so called "imputer/imputation model" - i.e. the model used to impute missing covariate values - and the "analysis model" - the model on which the primary inference is based:
An imputation model for the covariate 𝑋 can be obtained by specifying the conditional distribution 𝑓(𝑉,𝐷|𝐙). However, if such an imputation model is not compatible with the substantive model, the imputation procedure may lead to specious results. As suggested by Bartlett et al. (2015), such an incompatibility can be avoided if there is a joint model for the outcome and the covariate of interest from which we deduce an imputation model or algorithm. Our imputation model is similar to the method proposed by Rubin (2004), Schafer (1999), and Meng (1994). In order to eliminate inconsistency, they proposed that the assumptions in both models (imputer and analyst model) should be similar and the imputer model should not make more assumptions than the analyst model. The conditional distribution of such a joint model, given the available covariates, would correspond to the given (correctly) specified substantive model.
What specifically is meant by "compatibility" vs. "incompatibility" between these models, and precisely what inconsistency arises when they are incompatible?
For a simple example, suppose I have a dataset comprising 3 variables, $[X, M, Y]$ and my primary inference is for the response of $Y$ conditional on $X$. $M$ has been causally determined to be a mediator and is excluded from analyses. Further, say $X$ has missing values and so we wish to multiply impute these and perform inference using Rubin's Rules. Are Atem et al suggesting I should omit $M$ from the imputer model because it "makes more assumptions" than the analyst model?