MegaSack DRAW - This year's winner is user - rgwb
We will be in touch
I have two lists of days spent doing X, rounded up to full days. These are not normally distributed.
a) What statistical tests do I need to calculate a p-value for the null hypothesis, and
b) Can I (practically) do this from Excel?
Ta,
Andy
a)
I've used Student t-test (see http://en.wikipedia.org/wiki/Studen t's_t-test - url tag not working for hypen in link). However, sets should be normally distributed to get meaningfully results (among other things).
See wikipedia page, there are hints on what you could use instead.
b) probably, but I prefer to use R (see [url= http://www.r-project.org/ ]R-project[/url]) which has these things build-in
Transform the data - try taking the logs. Look at a histogram of the log(data) to see if they are now normal. Then use a T-test of the log(data). Other transformations are available.
If you have long tails (some people with very high scores), then a log-normal may suffice. You can always look at [url= http://en.wikipedia.org/wiki/Power_transform ]Box-Cox[/url] (and [url= http://www.isixsigma.com/tools-templates/normality/making-data-normal-using-box-cox-power-transformation/ ]here[/url]) power transformation.
And yes, it's straightforward. I've even written logistic regression in XL using Solver.
I want djaustins babies.
😳 I'm not s statistician, but I sometimes get to play at being one in the day job 🙂 .
Run the data through the statistical analysis add on in excel and perhaps minitab if you want to get fancy, then make the answer say what ever your boss is looking for.
I'm glad I can weld stainless steel and make things 'cos that lot just went right over my welding mask 🙂
Had a crack at the Student's t and got a p of 0.85. I suspect this isn't quite what the boss is looking for (ie p<0.05) so will have another go tomorrow with transformed data, among other things.
Thanks all!
Andy
hang on - I thought you just shoved it all into SPSS and pressed "do all the stats tests and highlight significant results"
😳
OK... Heres goes with some graphics:
Suppose the distribution of drinking units per day looks like this:
STW massive
0: ooo
1: oooooooooo
2: oooooooo
3: oooooo
4: oooo
5: ooo
6: oo
7: o
9: o
10 😳
A transformation tries to make the data look like the classic symmetric bell-shaped curve. Testing for differences in data is really only possible when the data looks bell-shaped. So the long "tail" (10 units and above) means that the average will be big but the most common will be small (1 unit).
So we transform the data by taking the logarithm (10 to the power of log(x) gives the original number) to get:
log(1): o
log(2): ooo
log(3): oooooo
log(4): oooooooo
log(5): oooooo
log(6): ooo
log(7): o
log(8):
log(9):
log(10):
then compare the two distributions with some fancy maths worked out by a [url= http://en.wikipedia.org/wiki/William_Sealy_Gosset ]Chemist [/url]who was interested in whether brewing batches of Guiness were different 😆
I wish I could weld 953 (sigh)
