1
$\begingroup$

My data looks like similar to this: (the picture below is not mine, but describes perfectly my situation) enter image description here

where the IDs are not unique but for each ID value I have a unique target value The following solution has been suggested: enter image description here

But sadly that solution does not work with me (I do not have only one table that has duplicate IDs), is there any other way to solve this problem?

PS: I do not whether I should credit from where I took the picture or something like that,just mention it in the comments and I will do it, Thank you.

$\endgroup$

1 Answer 1

1
$\begingroup$

If I understand you correctly, you have duplicate Ys for some of the observations. Though not optimal, a simple way to handle this with a tall and thin dataset is to estimate (we don’t say “train” in statistical modeling) coefficients of the model the usual way, then to use the Huber-White cluster sandwich covariance estimator to increase the standard errors to reflect the duplications. For example you can use the R rms package robcov function.

But looking back at your question I’m confused at why you mentioned logistic regression (and which one? Binary? Ordinal?) and I think you have multiple targets per observations, which I may or may not be reasonable to put into a super tall and thin data arrangement.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.