2
$\begingroup$

I have a problem with some health data that I'm trying to analyze. The main issue originates from a census variable is derived from self reported times. The variable is sleep duration, which is derived from the hour reported at which the survey-taker goes to sleep, and the hour at which they wake up. The documentation says that the value is then rounded to the nearest half hour. Here is a histogram of the data:

enter image description here

Because of the self-reported nature of the data, and the rounding, there seems to be a bias towards whole hour values over half hour values. Intuitively, I'd expect this variable to be distributed normally. I want to somehow correct this bias, or at least artificially modify the data so it is distributed rationally.

I do not mind modifying the data, as it having a sensible distribution is more important than accurately mirroring the survey data for me. I tried adding Gaussian noise with SD=0.5, and I got the following histogram:

enter image description here

This looks more like what I would expect the actual values to look like. However, I don't know if there is a better or standard way to correct/analyze data with this kind of bias. If there is, or if there is some flaw in my reasoning, please let me know.

$\endgroup$
5
  • 2
    $\begingroup$ This seems like a VERY bad idea to me. But what are you going to do with the sleep variable? Is it a DV in a regression? An IV? Part of a cluster analysis? Or what? $\endgroup$ Commented Nov 18, 2023 at 11:35
  • $\begingroup$ "having a sensible distribution is more important than accurately mirroring the survey data for me" Why? $\endgroup$ Commented Nov 18, 2023 at 11:45
  • $\begingroup$ I don't actually care about the data all that much. It's for a statistics class, and I've spoken with my professor about this issue. She doesn't mind if the data is partially artificial. The assignment is more about applying statistical methods rather than doing a super rigorous analysis. $\endgroup$ Commented Nov 18, 2023 at 11:48
  • $\begingroup$ I'm going to do a multiple regression with sleep as the DV $\endgroup$ Commented Nov 18, 2023 at 11:48
  • $\begingroup$ I will also probably do some kind of two sample hypothesis test $\endgroup$ Commented Nov 18, 2023 at 12:02

1 Answer 1

1
$\begingroup$

You say:

I want to somehow correct this bias, or at least artificially modify the data so it is distributed rationally.

and also:

I'm going to do a multiple regression with sleep as the DV

In that case do NOT "modify" the data. Just use sleep as it exists in the dataset as your outcome. Then do the usual regression diagnostics. Modifying the data seems to be a very bad idea as mentioned by Peter Flom in the comment to the question. Adding noise to your outcome variable does not make any sense to me. Also, bear in mind that the histograms of data can look very strange due to the binning levels you use. For example:

set.seed(101)
hist(rnorm(100),breaks = 20)

produces this: enter image description here

$\endgroup$
11
  • $\begingroup$ I also need to fit a distribution. I'm sorry I failed to mention that. $\endgroup$ Commented Nov 18, 2023 at 12:28
  • $\begingroup$ That doesn't change anything, does it ? $\endgroup$ Commented Nov 18, 2023 at 12:29
  • $\begingroup$ I need to do a goodness of fit test, and I suspect that will not go well. $\endgroup$ Commented Nov 18, 2023 at 12:31
  • $\begingroup$ You mean a goodness of fit test of your model fit ? That's just part of the usual regression diagnostics. It doesn't change anything. $\endgroup$ Commented Nov 18, 2023 at 12:34
  • 1
    $\begingroup$ So go ahead and fit the distribution to the data. It may or may not be a good fit. Are you marked based on how well it fits ? I would doubt that. I would think the teacher just wants to see that you know how to fit a distribution to data and interpret the finding. As for the regression model, none of that changes my advice. But it seem that you already know what you want to do, so it makes me wonder why are you asking a question here? I think you've been told by several people that what you want to do is not a good idea, but you seem to intent on doing what you want despite best advice. $\endgroup$ Commented Nov 18, 2023 at 13:24

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.