Viewing 19 posts - 1 through 19 (of 19 total)
  • Statistics help
  • molgrips
    Free Member

    I am calculating estimates for a job based on the component tasks of that job.

    Task A would take on average 20 hours, say. But there is a degree of uncertainy so the actual time taken would be a normal distribution and the standard deviation might be 3 hours. There are 50 Task As to do.

    Task B would take an average of 50 hours with a SD of 10 hours. But there are only 3 Task Bs to do.

    I need the total time, and hopefully a measure of accuracy of that figure. So I would need a way of aggregating these probability distributions. It seems like it should be a standard problem, anyone know what it would be called or how I could find it on Google?

    What if they weren’t all normal distributions? Is there a known distribution that has a short head but a very long tail? Could that be factored in?

    mikewsmith
    Free Member

    Open excel, put the normal calculation into col a for 50 rows. Drag that across a few hundred times and sum cols in row 51. Plot a frequency histogram looking for the 90‰ confidence interval.

    Is this an a academic exercise?

    I’ve got a simulation model of it basically called average killer

    mikewsmith
    Free Member

    What if they weren’t all normal distributions? Is there a known distribution that has a short head but a very long tail? Could that be factored in?

    Exponential or log normal from memory but if you don’t understand the starting data properly your answer will be shite. Do some analysis on the job a list and see where you get to

    slowster
    Free Member

    So I would need a way of aggregating these probability distributions. It seems like it should be a standard problem

    That sounds a bit like Monte Carlo Analysis.

    Hohum
    Free Member

    Sums of independent random variables

    Sums of normally distributed independent random variables

    mikewsmith
    Free Member

    Monte Carlo is the posh word for my first answer. Basically spin the roulette wheels a million times

    perchypanther
    Free Member

    I am calculating estimates for a job based on the component tasks of that job.

    You seemed to have confused estimating with an actual , exact science such as mathematics.

    Real estimating is just posh guessing dressed up with fake calculations.

    molgrips
    Free Member

    Do some analysis on the job a list and see where you get to

    Can’t. The numbers will mostly be shite yes. The model of the variability is a guess based on experience, and the paramters will be too.

    But the overall number needs to have credible numbers to back it up, otherwise the client will not believe the estimate. We just need a level of detail more than ‘ooh that sounds right’.

    I need to be able to say ‘X task As and Y task Bs with Z variance gives M man-days of effort’ and the numbers need to be there in a spreadsheet. I cannot do a statistical simulation.

    The calculation is more important than the end figure, and it needs to be visible, credible and simple. If this turns out not to be simple I’ll have to ignore it.

    poly
    Free Member

    X task As and Y task Bs with Z variance gives M man-days of effort

    If your values are genuinely averages (and X/Y are big enough to make them meaningful) then you don’t need to account for variance, assuming a normal distribution.

    If you only do the task once though your worst case would be A+3s.d. (99.97% of the time IIRC) [or A+1.96s.d. for 95% of the time]. Presumably the effect is either ransom or increases risk with low frequency tasks.

    I’m not sure why you feel quite so much need to justify it though. We would say, we believe task A will take N hours. The client accepts that or goes to someone who promises to do it quicker.

    Hohum
    Free Member

    task 1 is normally distributed with a mean of u1 and standard deviation of s1
    task 2 is normally distributed with a mean of u2 and standard deviation of s2

    through to

    task n is normally distributed with a mean of un and standard deviation of sn

    then (task1 + task2 + … + taskn) is normally distributed with a mean of (u1+u2+…+un) and a variance of (s1^2 + s2^2 + … + s2^n), take the square to give you the standard deviation

    sok
    Full Member

    Time doesn’t follow a normal distribution in this example, it’s a Poisson distribution, so means and SD don’t apply.

    molgrips
    Free Member

    We would say, we believe task A will take N hours. The client accepts that or goes to someone who promises to do it quicker.

    Because the entire thing is going to take about 150 man years, they would quite like a breakdown 🙂

    molgrips
    Free Member

    If your values are genuinely averages (and X/Y are big enough to make them meaningful) then you don’t need to account for variance, assuming a normal distribution.

    Yes good point, and that won’t apply if it’ snot normal, which as rightly pointed out it won’t be.

    It would be more accurate to say X days with a certain chance of it taking Y days i.e. if something goes wrong.

    thecaptain
    Free Member

    I’ll give you the correct answers for £200. And explain my working.

    TiRed
    Full Member

    A: 1000 hrs 95%CI (+/- 41.6 hrs)
    B: 150 hrs 95%CI (+/- 33.9 hrs)

    If it’s log normally distributed, you need to take the log first and the formulas are different:

    A: m = 20, v = 9, mu = 2.985, sigma = 0.149, mean = 1000 hrs 95%CI (x or / 7.8)
    B: fill this one in for yourself…

    Given the variability you’ve specified, it had better be normal as the log-normal variability is huge and only magnified by 50 steps.

    poly
    Free Member

    Because the entire thing is going to take about 150 man years, they would quite like a breakdown

    how do you know that if you haven’t already got a way to calculate it?

    I’m not suggesting that the client won’t expect an explanation for the time, but rather if you say it takes 1.3 hours to build X, and they need 100x so 130 hrs, + 2.4 hrs for a Y and they need 12y so another 29hrs the won’t usually be coming back saying, we think you can build a Y in 2.25 hrs and if you build 100x you should be more efficient at it and be able to build them in 1.1 hrs.

    Oh, and you work in IT I think, nobody has ever scoped a 150 man yr IT project that has been run to time and on budget. Ever.

    What I would want to know was what contingency you had built in. So explaining that experience says 10% of jobs like X actually take 50% longer, and 2% of jobs like X take 400% longer is probably better than using some arbitrary distribution curve with an estimated St Dev on it because people are crap at estimating that sort of thing.

    mikewsmith
    Free Member

    I need to be able to say ‘X task As and Y task Bs with Z variance gives M man-days of effort’ and the numbers need to be there in a spreadsheet. I cannot do a statistical simulation.

    What I described in post one is a statistical simulation in a spreadsheet, I could do the same thing in my fancy 20k simulation software amd.come up with the same bobbins answer based on crap input data and charge you 250 quid or 400 with a video of boxes moving and a progress bar.

    Any clients asking for this will either see through you in 5s or have no idea so just make it up.

    I’d be more interested in the risk sharing and taking in the contract and penalties for late delivery on both sides.

    mikewsmith
    Free Member

    free go
    https://docs.google.com/spreadsheets/d/1Ae29LVoMAi3pRdz7DUhs82J3odWltB0A-8fIWn39AeI/edit?usp=sharing

    Work out how to represent your activity length as a statistical function put it in the cell and go from there.

    Again based on weak input data confidence in the answer is from you selling it rather than anything scientific.

    If the delivery is time critical then this may be an issue, if your doing a time and materials bid then you should have a better control on your time, if it’s fixed price then you accept the risk that you can’t control your process and build in client penalties for not delivering you the bits you need on time.

    Also none of this accounts for the time between tasks which is probably your biggest unknown variable. (well at least you know it’s an unknown)

    Greybeard
    Free Member

    I need the total time, and hopefully a measure of accuracy of that figure

    You chose the accuracy by selecting the confidence level for your statistical calculation. If you don’t know the confidence level you want, you can’t do the calculation as you’re missing data.

    The calculation, for 95% confidence, is as TiRed has posted.

Viewing 19 posts - 1 through 19 (of 19 total)

The topic ‘Statistics help’ is closed to new replies.