Statistics help
 

MegaSack DRAW - This year's winner is user - rgwb
We will be in touch

[Closed] Statistics help

18 Posts
10 Users
0 Reactions
94 Views
Posts: 91098
Free Member
Topic starter
 

I am calculating estimates for a job based on the component tasks of that job.

Task A would take on average 20 hours, say. But there is a degree of uncertainy so the actual time taken would be a normal distribution and the standard deviation might be 3 hours. There are 50 Task As to do.

Task B would take an average of 50 hours with a SD of 10 hours. But there are only 3 Task Bs to do.

I need the total time, and hopefully a measure of accuracy of that figure. So I would need a way of aggregating these probability distributions. It seems like it should be a standard problem, anyone know what it would be called or how I could find it on Google?

What if they weren't all normal distributions? Is there a known distribution that has a short head but a very long tail? Could that be factored in?


 
Posted : 29/06/2017 1:39 pm
Posts: 17
Free Member
 

Open excel, put the normal calculation into col a for 50 rows. Drag that across a few hundred times and sum cols in row 51. Plot a frequency histogram looking for the 90‰ confidence interval.

Is this an a academic exercise?

I've got a simulation model of it basically called average killer


 
Posted : 29/06/2017 1:51 pm
Posts: 17
Free Member
 

What if they weren't all normal distributions? Is there a known distribution that has a short head but a very long tail? Could that be factored in?

Exponential or log normal from memory [b]but[/b] if you don't understand the starting data properly your answer will be shite. Do some analysis on the job a list and see where you get to


 
Posted : 29/06/2017 1:53 pm
Posts: 0
Free Member
 

So I would need a way of aggregating these probability distributions. It seems like it should be a standard problem

That sounds a bit like Monte Carlo Analysis.


 
Posted : 29/06/2017 1:54 pm
Posts: 2804
Free Member
 

Sums of independent random variables

Sums of normally distributed independent random variables


 
Posted : 29/06/2017 1:57 pm
Posts: 17
Free Member
 

Monte Carlo is the posh word for my first answer. Basically spin the roulette wheels a million times


 
Posted : 29/06/2017 1:57 pm
Posts: 17303
Free Member
 

I am calculating estimates for a job based on the component tasks of that job.

You seemed to have confused estimating with an actual , exact science such as mathematics.

Real estimating is just posh guessing dressed up with fake calculations.


 
Posted : 29/06/2017 2:02 pm
Posts: 91098
Free Member
Topic starter
 

Do some analysis on the job a list and see where you get to

Can't. The numbers will mostly be shite yes. The model of the variability is a guess based on experience, and the paramters will be too.

But the overall number needs to have credible numbers to back it up, otherwise the client will not believe the estimate. We just need a level of detail more than 'ooh that sounds right'.

I need to be able to say 'X task As and Y task Bs with Z variance gives M man-days of effort' and the numbers need to be there in a spreadsheet. I cannot do a statistical simulation.

The calculation is more important than the end figure, and it needs to be visible, credible and simple. If this turns out not to be simple I'll have to ignore it.


 
Posted : 29/06/2017 2:18 pm
 poly
Posts: 8748
Free Member
 

X task As and Y task Bs with Z variance gives M man-days of effort

If your values are genuinely averages (and X/Y are big enough to make them meaningful) then you don't need to account for variance, assuming a normal distribution.

If you only do the task once though your worst case would be A+3s.d. (99.97% of the time IIRC) [or A+1.96s.d. for 95% of the time]. Presumably the effect is either ransom or increases risk with low frequency tasks.

I'm not sure why you feel quite so much need to justify it though. We would say, we believe task A will take N hours. The client accepts that or goes to someone who promises to do it quicker.


 
Posted : 29/06/2017 2:38 pm
Posts: 2804
Free Member
 

task 1 is normally distributed with a mean of u1 and standard deviation of s1
task 2 is normally distributed with a mean of u2 and standard deviation of s2

through to

task n is normally distributed with a mean of un and standard deviation of sn

then (task1 + task2 + … + taskn) is normally distributed with a mean of (u1+u2+…+un) and a variance of (s1^2 + s2^2 + … + s2^n), take the square to give you the standard deviation


 
Posted : 29/06/2017 2:39 pm
 sok
Posts: 0
Full Member
 

Time doesn't follow a normal distribution in this example, it's a Poisson distribution, so means and SD don't apply.


 
Posted : 29/06/2017 4:23 pm
Posts: 91098
Free Member
Topic starter
 

We would say, we believe task A will take N hours. The client accepts that or goes to someone who promises to do it quicker.

Because the entire thing is going to take about 150 man years, they would quite like a breakdown 🙂


 
Posted : 29/06/2017 4:33 pm
Posts: 91098
Free Member
Topic starter
 

If your values are genuinely averages (and X/Y are big enough to make them meaningful) then you don't need to account for variance, assuming a normal distribution.

Yes good point, and that won't apply if it' snot normal, which as rightly pointed out it won't be.

It would be more accurate to say X days with a certain chance of it taking Y days i.e. if something goes wrong.


 
Posted : 29/06/2017 4:39 pm
Posts: 7479
Free Member
 

I'll give you the correct answers for £200. And explain my working.


 
Posted : 29/06/2017 5:33 pm
Posts: 17275
Full Member
 

A: 1000 hrs 95%CI (+/- 41.6 hrs)
B: 150 hrs 95%CI (+/- 33.9 hrs)

If it's [url= https://en.wikipedia.org/wiki/Log-normal_distribution ]log normally distributed[/url], you need to take the log first and the formulas are different:

A: m = 20, v = 9, mu = 2.985, sigma = 0.149, mean = 1000 hrs 95%CI (x or / 7.8)
B: fill this one in for yourself...

Given the variability you've specified, it had better be normal as the log-normal variability is huge and only magnified by 50 steps.


 
Posted : 29/06/2017 6:04 pm
 poly
Posts: 8748
Free Member
 

Because the entire thing is going to take about 150 man years, they would quite like a breakdown
how do you know that if you haven't already got a way to calculate it?

I'm not suggesting that the client won't expect an explanation for the time, but rather if you say it takes 1.3 hours to build X, and they need 100x so 130 hrs, + 2.4 hrs for a Y and they need 12y so another 29hrs the won't usually be coming back saying, we think you can build a Y in 2.25 hrs and if you build 100x you should be more efficient at it and be able to build them in 1.1 hrs.

Oh, and you work in IT I think, nobody has ever scoped a 150 man yr IT project that has been run to time and on budget. Ever.

What I would want to know was what contingency you had built in. So explaining that experience says 10% of jobs like X actually take 50% longer, and 2% of jobs like X take 400% longer is probably better than using some arbitrary distribution curve with an estimated St Dev on it because people are crap at estimating that sort of thing.


 
Posted : 29/06/2017 9:22 pm
Posts: 17
Free Member
 

I need to be able to say 'X task As and Y task Bs with Z variance gives M man-days of effort' and the numbers need to be there in a spreadsheet. I cannot do a statistical simulation.

What I described in post one is a statistical simulation in a spreadsheet, I could do the same thing in my fancy 20k simulation software amd.come up with the same bobbins answer based on crap input data and charge you 250 quid or 400 with a video of boxes moving and a progress bar.

Any clients asking for this will either see through you in 5s or have no idea so just make it up.

I'd be more interested in the risk sharing and taking in the contract and penalties for late delivery on both sides.


 
Posted : 29/06/2017 10:28 pm
Posts: 17
Free Member
 

free go
https://docs.google.com/spreadsheets/d/1Ae29LVoMAi3pRdz7DUhs82J3odWltB0A-8fIWn39AeI/edit?usp=sharing

Work out how to represent your activity length as a statistical function put it in the cell and go from there.

Again based on weak input data confidence in the answer is from you selling it rather than anything scientific.

If the delivery is time critical then this may be an issue, if your doing a time and materials bid then you should have a better control on your time, if it's fixed price then you accept the risk that you can't control your process and build in client penalties for not delivering you the bits you need on time.

Also none of this accounts for the time between tasks which is probably your biggest unknown variable. (well at least you know it's an unknown)


 
Posted : 30/06/2017 12:34 am
Posts: 4195
Free Member
 

I need the total time, and hopefully a measure of accuracy of that figure
You chose the accuracy by selecting the confidence level for your statistical calculation. If you don't know the confidence level you want, you can't do the calculation as you're missing data.

The calculation, for 95% confidence, is as TiRed has posted.


 
Posted : 30/06/2017 7:56 am