Forum menu
Statisticians can y...
 

[Closed] Statisticians can you help?

Posts: 0
Free Member
Topic starter
 

I have a work problem I need some help with.

A client has asked me to look at data relating to discounting by their sales team based on actual sales price versus list price. I’ve got two sets of data, both large samples (300,000 or more points) and both are only about 1% of the total but they do purport to be representative of the same population.

One set is showing a median deviation from list of 0.8% and the other 2.2%. The difference between these two is fairly substantial but both deviations from list are very low.

What might cause this discrepancy?


 
Posted : 10/11/2017 10:17 am
Posts: 0
Full Member
 

"statistics, lies and more statistics"

or something like that.


 
Posted : 10/11/2017 10:18 am
Posts: 7505
Free Member
 

I'll take a look for 500 quid per day.


 
Posted : 10/11/2017 10:19 am
Posts: 0
Free Member
 

2 sample T-test will tell you the probability of the samples being from the same population.


 
Posted : 10/11/2017 10:20 am
Posts: 0
Free Member
 

The samples not being random.


 
Posted : 10/11/2017 10:22 am
Posts: 5559
Free Member
 

they are different data sets and I am not sure why you think its a discrepancy - its actually there and you seem to have shown both data sets are different
I guess it shows the sets are not as typical as your client claimed as they are not that similar basically its a sampling error


 
Posted : 10/11/2017 10:23 am
Posts: 17313
Free Member
 

Pilot error?


 
Posted : 10/11/2017 10:23 am
Posts: 36
Full Member
 

What might cause this discrepancy?

You haven't given any clues as to what the data really relate to (e.g., what sort of sales) and why you have been asked.

Looking at the above responses... You should be asking (or already know) how the samples were obtained for the two groups, check your calculations, understand the data beyond those median calculations...

Did you receive any other data apart from lots of discount percentages? If not, ask why they want you to do such a trivial calculation. Compare the additional data between the two groups to look for explanations. Ideas: are the bigger discounts linked to seasonal demand, more experienced sales personnel, bulk purchases.

Additionally, just comparing the medians will not be interpretable. Essentially you'll be comparing 1 data point from each group [precisely so if you have an odd number of data per group], but what about the other data? You'll get a better idea of the entirety of the discounts if you also consider some/all of minimum and maximum, interquartile ranges, mean and SD...


 
Posted : 10/11/2017 11:11 am
Posts: 6332
Free Member
 

as said, a two-sample T Test will tell you how statistically different the two samples are. And they look quite different, as you say.

As to why they are different... who knows. You don't say why you have two samples, whether they were collected at different times, whether they are different products, whether the prices of the list are equivalent, or what. Without some information on that, there's no way we can even attempt to answer this question.


 
Posted : 10/11/2017 1:35 pm
Posts: 2217
Full Member
 

I reckon you have a 50:50 chance of getting your answer.


 
Posted : 10/11/2017 2:47 pm
Posts: 17327
Full Member
 

You can't do a T-test of the raw data because the data will not be normally distributed - unless some discounts were negative (and that is highly unlikely provided your customers are not mugs). I suspect the distributions are more likely to be beta distributions since the discount is bounded [0-1). I would model these and then test for the difference in distribution. Alternatives are to normalise the data by transformation, which might involve a logistic transformation.

Statistics will not tell you WHY the discounts are different. It will only give you the likelihood that a difference of the observed magnitude might be observed by chance alone.


 
Posted : 10/11/2017 3:50 pm
 km79
Posts: 0
Free Member
 

60% of the time it works every time.


 
Posted : 10/11/2017 4:15 pm
Posts: 0
Free Member
 

as said, a two-sample T Test will tell you how statistically different the two samples are

I don't think that was said, and it's not true.

It's not clear what you are trying to find out. But given that you have access to the full data, why are you trying to infer anything from the samples.

If you are unhappy with inferential statistics, you could try some monte carlo, to give you a sense of how low the deviations are.


 
Posted : 10/11/2017 4:24 pm
Posts: 0
Free Member
Topic starter
 

I don't have the raw data and no, the discounts are only ever one way (wouldn't it be nice if were not the case eh).

Interpreting why there are discrepancies isn't my brief; I'm actually looking at this in the context of job performance and capability, looking at the impact of how training a sales person improves their performance and we're doing that based on discounting. I just noticed the two data sets looked quite different.

I guess I need to bounce that issue back to the client.

Thanks for the help though, greatly appreciated.


 
Posted : 10/11/2017 4:45 pm
Posts: 78369
Full Member
 

I'll take a look for 500 quid per day.

Plus VAT?


 
Posted : 10/11/2017 4:52 pm
Posts: 0
Free Member
 

Then a one tail t-test


 
Posted : 10/11/2017 10:18 pm
Posts: 0
Free Member
 

Statistical analysis will tell you what is different.

You’ll need to understand the operation to work out why its different.

I’m strugging with what industry has 60,000,000 manually discountable sales!


 
Posted : 10/11/2017 10:52 pm
Posts: 0
Free Member
 

I’m strugging with what industry has 60,000,000 manually discountable sales!

[img] [/img]


 
Posted : 11/11/2017 12:08 am
Posts: 0
Free Member
Topic starter
 

I’m strugging with what industry has 60,000,000 manually discountable sales!

Big tobacco. They still sell onesys and twoseys off the back of a moped the shacks in places like Thailand.

So yeah, it does look rather a lot like:

[img] [/img]


 
Posted : 11/11/2017 1:24 pm