How to estimate missing data?

Question

I am running a regression with several independent variables with 32 observations (from 1975 to 2006 and they are yearly data). The issue here is that there does not exist any observation for one of the variables prior to 1980. Consequently, that variable has 5 missing observations (from 1975 to 1979). Is there any method in R to provide an estimation for these missing values? By the way, the explanatory variable here is "total labor force" and it has a very pronounced trend. Therefore, I know very well that it is statistically very possible to estimate the past values.

Could you please use it in a line of code for me? Thank you @MattAlbrecht — SavedByJESUS
– SavedByJESUS, Commented Apr 12, 2012 at 5:57
Labour markets and economies around the world were undergoing a period of great change at this time. It may not be sensible to extrapolate backwards from your data. en.wikipedia.org/wiki/Early_1980s_recession — James
– James, Commented Apr 13, 2012 at 11:01
@James Thank you very much for pointing this out, but the country under consideration is Côte d'Ivoire (West Africa) and as a small open economy, the impact of the major changes in the 1980 may have been very minimal. — SavedByJESUS
– SavedByJESUS, Commented Apr 14, 2012 at 4:15
You might be interested in stats.stackexchange.com/questions/13984/… ... it's a related problem, and has some worked examples — naught101
– naught101, Commented Apr 17, 2012 at 12:58

Matt Albrecht · Accepted Answer · 2012-04-12 06:16:35Z

7

x <- 1:30; y <- c(rnorm(25) + 1:25, rep(NA, 5)) #generate data with NAs
df1 <- data.frame(x, y)                         #combine into data frame
lmx <- lm(y~x, data=df1)                        #create model to predict from
ndf <- data.frame(x=1:30)                       #create data to predict to
df1$fit <- predict(lmx, newdata=ndf)            #get predictions
df1$y2 <- with(df1, ifelse(is.na(y) == T, fit, y))

The last line creates a new variable in the data frame that has all of the old variables as well as the fitted variables from the regression.

answered Apr 12, 2012 at 6:16

Matt Albrecht

3,3792 gold badges29 silver badges33 bronze badges

$\begingroup$ Of course lm may or may not be the model you are looking for... $\endgroup$

nico
– nico

2012-04-12 06:57:19 +00:00
Commented Apr 12, 2012 at 6:57
$\begingroup$ @MattAlbrecht This is exactlly what I was looking for. Thank you so much! $\endgroup$

SavedByJESUS
– SavedByJESUS

2012-04-14 19:01:51 +00:00
Commented Apr 14, 2012 at 19:01
$\begingroup$ No problem. Just make sure that what you're doing is defensible. $\endgroup$

Matt Albrecht
– Matt Albrecht

2012-04-15 08:43:57 +00:00
Commented Apr 15, 2012 at 8:43

Add a comment |

mpiktas · Accepted Answer · 2012-04-13 08:45:18Z

It is often a good idea to consider the possible reasons for data being missing, ie mising completely at random, missing at random, missing not at random. Depending on this, methods to estimate missing data may be biased.

A sophisticated way to deal with data missing at random is multiple imputation, which acknowledges that there is uncertainty about the values of the missing quantities. This can be done in R using the MICE package. Here is a reproducible example using the nhanes data that comes with the package:

library(mice)
imp <-mice(nhanes)
fit <-with(imp, lm(bmi~chl+hyp))
fit
summary(pool(fit))
complete(imp)  # returns the data with first imputed values. complete(imp,2) returns 2nd set

oDDsKooL · Accepted Answer · 2012-04-13 07:57:36Z

4

Another approach would be to use simulation solution like Gibbs Sampling based on statistics on past observations.

I believe there is support for that in R : http://darrenjw.wordpress.com/2011/07/31/faster-gibbs-sampling-mcmc-from-within-r/

answered Apr 13, 2012 at 7:57

oDDsKooL

1,3022 gold badges15 silver badges32 bronze badges

Add a comment |

Stack Exchange Network

How to estimate missing data?

3 Answers 3

Your Answer

Linked

Hot Network Questions

How to estimate missing data?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions