Viewing 19 posts - 1 through 19 (of 19 total)
  • Statisticians can you help?
  • geetee1972
    Free Member

    I have a work problem I need some help with.

    A client has asked me to look at data relating to discounting by their sales team based on actual sales price versus list price. I’ve got two sets of data, both large samples (300,000 or more points) and both are only about 1% of the total but they do purport to be representative of the same population.

    One set is showing a median deviation from list of 0.8% and the other 2.2%. The difference between these two is fairly substantial but both deviations from list are very low.

    What might cause this discrepancy?

    bikebouy
    Free Member

    “statistics, lies and more statistics”

    or something like that.

    thecaptain
    Free Member

    I’ll take a look for 500 quid per day.

    ultracrepidarian
    Free Member

    2 sample T-test will tell you the probability of the samples being from the same population.

    TheSouthernYeti
    Free Member

    The samples not being random.

    Junkyard
    Free Member

    they are different data sets and I am not sure why you think its a discrepancy – its actually there and you seem to have shown both data sets are different
    I guess it shows the sets are not as typical as your client claimed as they are not that similar basically its a sampling error

    perchypanther
    Free Member

    Pilot error?

    grey_or_black
    Full Member

    What might cause this discrepancy?

    You haven’t given any clues as to what the data really relate to (e.g., what sort of sales) and why you have been asked.

    Looking at the above responses… You should be asking (or already know) how the samples were obtained for the two groups, check your calculations, understand the data beyond those median calculations…

    Did you receive any other data apart from lots of discount percentages? If not, ask why they want you to do such a trivial calculation. Compare the additional data between the two groups to look for explanations. Ideas: are the bigger discounts linked to seasonal demand, more experienced sales personnel, bulk purchases.

    Additionally, just comparing the medians will not be interpretable. Essentially you’ll be comparing 1 data point from each group [precisely so if you have an odd number of data per group], but what about the other data? You’ll get a better idea of the entirety of the discounts if you also consider some/all of minimum and maximum, interquartile ranges, mean and SD…

    reggiegasket
    Free Member

    as said, a two-sample T Test will tell you how statistically different the two samples are. And they look quite different, as you say.

    As to why they are different… who knows. You don’t say why you have two samples, whether they were collected at different times, whether they are different products, whether the prices of the list are equivalent, or what. Without some information on that, there’s no way we can even attempt to answer this question.

    andy4d
    Full Member

    I reckon you have a 50:50 chance of getting your answer.

    TiRed
    Full Member

    You can’t do a T-test of the raw data because the data will not be normally distributed – unless some discounts were negative (and that is highly unlikely provided your customers are not mugs). I suspect the distributions are more likely to be beta distributions since the discount is bounded [0-1). I would model these and then test for the difference in distribution. Alternatives are to normalise the data by transformation, which might involve a logistic transformation.

    Statistics will not tell you WHY the discounts are different. It will only give you the likelihood that a difference of the observed magnitude might be observed by chance alone.

    km79
    Free Member

    60% of the time it works every time.

    CharlieMungus
    Free Member

    as said, a two-sample T Test will tell you how statistically different the two samples are

    I don’t think that was said, and it’s not true.

    It’s not clear what you are trying to find out. But given that you have access to the full data, why are you trying to infer anything from the samples.

    If you are unhappy with inferential statistics, you could try some monte carlo, to give you a sense of how low the deviations are.

    geetee1972
    Free Member

    I don’t have the raw data and no, the discounts are only ever one way (wouldn’t it be nice if were not the case eh).

    Interpreting why there are discrepancies isn’t my brief; I’m actually looking at this in the context of job performance and capability, looking at the impact of how training a sales person improves their performance and we’re doing that based on discounting. I just noticed the two data sets looked quite different.

    I guess I need to bounce that issue back to the client.

    Thanks for the help though, greatly appreciated.

    Cougar
    Full Member

    I’ll take a look for 500 quid per day.

    Plus VAT?

    CharlieMungus
    Free Member

    Then a one tail t-test

    wilburt
    Free Member

    Statistical analysis will tell you what is different.

    You’ll need to understand the operation to work out why its different.

    I’m strugging with what industry has 60,000,000 manually discountable sales!

    hols2
    Free Member

    I’m strugging with what industry has 60,000,000 manually discountable sales!

    geetee1972
    Free Member

    I’m strugging with what industry has 60,000,000 manually discountable sales!

    Big tobacco. They still sell onesys and twoseys off the back of a moped the shacks in places like Thailand.

    So yeah, it does look rather a lot like:

Viewing 19 posts - 1 through 19 (of 19 total)

The topic ‘Statisticians can you help?’ is closed to new replies.