MegaSack DRAW - This year's winner is user - rgwb
We will be in touch
I have 7 groups of people - I have the mean age of each group and the standard deviation of each group. The groups are different sizes. I also have the number or people in each group.
What I want is to say that the total for the 7 groups is - mean age of X with a standard deviation of Y. I dont have access to full data for any of the groups.
Can it be done?
Edit: Also have population size for each group.
Sadly not.
Edit: Well, you could average the numbers but it won't give you anything statistically significant. Do you not even have the size of the data sets?
Flaperon - yes I have the size of the datasets. edited to show that.
Nope
EDIT: With population size of each population you can do an average for the entire dataset, but not STDEV / SEM
Yes, you can - you can calculate the mean pretty easily -
if groupmean(x) is the mean of group x, and groupsize(x) is the number of people in group x, then
mean = (groupmean(1)*groupsize(1) + groupmean(2)*groupsize(2) + ... ) / (total number of people)
I think you can do similar with variance (which is standard deviation squared), but I have a cold and my brain is not working well enough to work it out.
Oh hang on, it's on this page - the mean I said up there is right, this page has the variance too:
http://blog.cordiner.net/2010/06/16/calculating-variance-and-mean-with-mapreduce-python/
the formulas under 'parallel statistics' are what you want. I suspect it may be a bit mathsy for you if you're not used to greek letters and that.
Joe
Thanks for that Joe.
