Thursday, December 15, 2011

Binary Dependent Vbe Regressions and oversampling?

Hello





I am running a probit analysis of a set of variables. However, the observations that report y=0 are very low (79) compared to y=1 (2700).





Could I oversample from the first lot - i.e. duplicate samples - to get something more of an even keel? As I am not interested in the incidence of yes/no, just what makes yes happen, and no happen, I can't see why it would be a problem...|||That is a bad idea, as it will lead your slope estimates to be biased, as is always the case with sample selection on the dependent variable. Throwing away data will also reduce the precision of your estimates, making your yes/no predictions using your estimates less reliable.





Furthermore there is no problem per se with the fact that the the share in the 0 category is far from 50%; some events are just rare. What it sounds like you really need is some factor which is strong associated with the 0 events.|||I don't think the other answer is correct. Case control studies do this the whole time - take people who have (say) lung cancer, and similar people who don't, and see what predicts the cancer. Of course you massively oversample the proportion of people who have lung cancer -

No comments:

Post a Comment