I need a bit of advice on my logistic regression. Any input would be much appreciated.
I have a binary response/dependent variable (purchase 0=no 1=yes). There are 5 different sizes and colors of products for a total of 25 combinatgions. The independant variable is what cluster a customer belongs to. The customer clusters are categorical non-ordinal. A customer is assigned to only one cluster A-M.
I would like to know the probability that a customer from a specific cluster buys a specific combination of color and size. Also I would like to know for a given combination of product and size what cluster is most likely to buy.
Which method below is correct?
1) Create a binary dummy variable for one cluster at a time where dep = yes/no indep = cluster1 yes/no. This would output the probability that a person from cluster1 buys a product.
2) Create a binary dependent dummy variables for every cluster where dep = yes/no indep = cluster1 yes/no cluster2 yes/no cluster3 yes/no.......
The|||Purchase is yes/no? Buying more than one is prohibited?
That seems unexpected. Wouldn't you usually have something like:
T-shirt sales data
Customer 1 (cluster D): 3 red S, 1 white S, 1 green M
Customer 2 (cluster G): 1 white L
Customer 3 (cluster G): no sale
Customer 4 (cluster G): 1 white L
Customer 5 (cluster A): 2 green XXL
etc.
Or is it more like:
Ford truck sales data
Customer 1 (cluster D): red F-150
Customer 2 (cluster G): white F-250
Customer 3 (cluster G): no sale
Customer 4 (cluster G): white F-250
Customer 5 (cluster A): green F-450
etc.
Even then, as the song goes "go down town buy a Ford truck or two" :-)
Re: for a given combination of product and size what cluster is most likely to buy
Hmmm...if 100 cluster C customers overall buy 10 red F-150's and 8 cluster H customers overall buy 5 red F-150's, do you want the answer to be C or H?
Dan
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment