Saturday, December 17, 2011

Help with doing logistic regression?

Hello, Just wondering if someone could help me with some data analysis/regression work I am doing. I am trying to do propensity score matching, but first need to do logistic regression and that's what I am have trouble with.. suppose I have a model as follows: gpa (the dependent variable) and sex and race being the independent variables. So if I want to do logistic regression, do I take the log of all the variables (dependent and independent) first and then do the regression? for example, in STATA do I type 'regress gpa sex race' (using the log of all the variables) to get regression results.. or do I do 'logit gpa sex race' (using the log of all the variables)? Also, in logistic regression, are all the variables meant to be binary (ie. yes and no). because race being a categorical variable and having several categories, do I generate a new variable, that is for example, 1 = black and else 0.. basically do I generate the dummy variable first and then take the log of the dummy variable for logistic regression?


part of my data is as follows:





GPA sex race


3.2 m black


3.5 f black


3.1 m hispanic


3.6 f white


3.2 f white


3.5 m asian


3.3 f hispanic


3.6 m white





part of my stata code is as follows:


gen black = race==black


gen female = sex==f





gen loggpa = log(gpa)


gen logsex = log(female)


gen lograce = log(black)


and then do something like:





logit loggpa logsex lograce or


regress loggpa logsex lograce





does this seem correct for doing logistic regression? Or can someone show me if I must do it differently, in terms of stata commands? I鈥檓 just confused about when to generate the variable, taking the log of the variable and on to regression.





Thanks, any help will be greatly appreciated,





Mike|||I only know Statistical Analysis System (SAS). If you want, download it from Taringa web site. SAS. Analize this output. My email is gegmartinez@hotmail.com


9.2 for free


44 data in;


45 input gpa sex race;


46 datalines;





NOTE: The data set WORK.IN has 8 observations and 3 variables.


NOTE: DATA statement used (Total process time):





56 proc logistic;


57 model gpa = sex race;


58 run;





NOTE: PROC LOGISTIC is fitting the cumulative logit model. The probabilities modeled are summed over


the responses having the lower Ordered Values in the Response Profile table. Use the response


variable option DESCENDING if you want to reverse the assignment of Ordered Values to the


response levels.


NOTE: Convergence criterion (GCONV=1E-8) satisfied.


NOTE: There were 8 observations read from the data set WORK.IN.


NOTE: PROCEDURE LOGISTIC used (Total process time):


real time 0.03 seconds


cpu time 0.03 seconds





The SAS System 11:10 Tuesday, August 4, 2009 3





The LOGISTIC Procedure





Model Information





Data Set WORK.IN


Response Variable gpa


Number of Response Levels 5


Model cumulative logit


Optimization Technique Fisher's scoring








Number of Observations Read 8


Number of Observations Used 8








Response Profile





Ordered Total


Value gpa Frequency





1 3.1 1


2 3.2 2


3 3.3 1


4 3.5 2


5 3.6 2





Probabilities modeled are cumulated over the lower Ordered Values.








Model Convergence Status





Convergence criterion (GCONV=1E-8) satisfied.








Score Test for the Proportional Odds Assumption





Chi-Square DF Pr %26gt; ChiSq





13.8071 6 0.0319








Model Fit Statistics





Intercept


Intercept and


Criterion Only Covariates





AIC 32.953 35.300


SC 33.271 35.777


-2 Log L 24.953 23.300





The SAS System 11:10 Tuesday, August 4, 2009 4





The LOGISTIC Procedure





Testing Global Null Hypothesis: BETA=0





Test Chi-Square DF Pr %26gt; ChiSq





Likelihood Ratio 1.6531 2 0.4376


Score 1.6085 2 0.4474


Wald 1.4710 2 0.4793








Analysis of Maximum Likelihood Estimates





Standard Wald


Parameter DF Estimate Error Chi-Square Pr %26gt; ChiSq





Intercept 3.1 1 -1.0396 1.8418 0.3186 0.5725


Intercept 3.2 1 0.6537 1.7648 0.1372 0.7111


Intercept 3.3 1 1.2428 1.8061 0.4735 0.4914


Intercept 3.5 1 2.4635 1.9460 1.6027 0.2055


sex 1 1.0415 1.3281 0.6150 0.4329


race 1 -0.7350 0.6882 1.1408 0.2855








Odds Ratio Estimates





Point 95% Wald


Effect Estimate Confidence Limits





sex 2.833 0.210 38.266


race 0.479 0.124 1.847








Association of Predicted Probabilities

No comments:

Post a Comment