Viewing 23 posts - 1 through 23 (of 23 total)
  • Any bio-statisticians around?
  • legolam
    Free Member

    I need help, and I’m hoping STW can plug the gaping hole in my PhD left by my ineffectual supervisor. I was promised help with my statistics from some big-shot American bio-statistician, but my boss now doesn’t want me to speak to him directly and won’t forward my questions on…

    Anyway, here’s the issue. Basically, I have recruited 100 patients and performed internal imaging of their coronary arteries. The measurements of the coronary arteries (e.g. diameter of vessel, whether a rupture was present or not etc) are the dependent variables that I am interested in. However, not every patient could have imaging of all 4 coronary arteries due to technical difficulties with the technology.

    This has led to a mismatch in my data. Each patient has a different number of variables depending on how many arteries were imaged. However, their baseline demographics (age, sex etc) and follow up data (whether they died or not etc) is all measured at a patient-level i.e. they have one dependent variable per patient.

    I would like to present my results in several ways e.g. is age associated with plaque measurements?, is gender associated with plaque measurements?, do plaque measurements predict adverse outcomes at 1 year?

    I have tried adding all the continuous plaque measurements together for each patient and just analysing the average, but this seems to lose the fidelity of the data. In addition, I can’t do the same for the categorical measurements.

    Is there a way of accounting for these differences in numbers of measurements statistically? I’m using SPSS.

    I’ve shared an example of my data below:
    https://docs.google.com/spreadsheets/d/1OeOc7WK7Ryj9PKkf24_NkxaA5kkhMSOJNhPRmPpysZY/edit?usp=sharing

    Please help. Otherwise I’m going to just lurk on STW for the next 3 months, ride my bike in the rain, and will have wasted the last 3 miserable years of my life. It’s a very appealing plan at the moment.

    johnx2
    Free Member

    Any medical statistician will be able to help.

    In SPSS (years and years since I’ve used it and not that analysis should be dictated by software) you need to get all your data in case is a patient format – i.e. one line per patient, not up to four, with missing values for arteries not measured.

    The analysis should be hypothesis driven…

    legolam
    Free Member

    Yeah, that’s the issue that I’m having – trying to rationalise all this data into one line per patient. I know that previous studies have managed to analyse their data on a “vessel-level” rather than “patient-level”, so there must be a way round it. I just can’t figure it out.

    I’ve tried talking to a medical statistician at the university, but they were less than helpful (basically spent an hour with them to be told that they couldn’t help me).

    And I do have hypotheses – I’m just not sure how to test them!

    Do you think buying a new bike would help?

    Fresh Goods Friday 696: The Middling Edition

    Fresh Goods Friday 696: The Middlin...
    Latest Singletrack Videos
    TiRed
    Full Member

    Email in profile. Drop me a line. It’s part of my day job.

    legolam
    Free Member

    TiRed – thanks so much. Will email you forthwith!

    (and put the bike-buying on hold for now)

    poah
    Free Member

    Why doesn’t he want you to speak to the guy directly?

    anagallis_arvensis
    Full Member

    Sounds to me like you need a multi variate approach like maybe PCA or DCA. You could then model it using monte carlo permutations to see which variables are significant.

    legolam
    Free Member

    Why doesn’t he want you to speak to the guy directly?

    Life is too short (and the forum too public) to go into the issues I’ve had with this project…

    You’ve already all been more helpful than every official person I’ve spoken to put together. Thanks!

    poah
    Free Member

    Sounds like you got an a-hole of a supervisor. I know a few people that are in your situation and it sucks. I thankfully had a great supervisor, wouldn’t have got my PhD without him

    poah
    Free Member

    Double post

    johnx2
    Free Member

    A new bike would certainly help, seems a bit excessive but thanks anyway… (When I had to do this sort of thing I was riding a steel-framed Marin with a parallel crossbar, bought from London Fields Bikes when they were still in a squat. Hey ho.)

    Anyway, and forgive me if you’re way past this, if it’s just about restructuring data so you can analyse it, using multivariatate techniques or whatever, you can use the CASESTOVARS command – tutorial here:

    http://www.ats.ucla.edu/stat/spss/modules/reshapew115.htm

    (Back in the day, before SPSS was menu-driven, on a small dataset I’d probably just have cut and paste.)

    legolam
    Free Member

    Johnx2 – thanks. I’ve been cutting and pasting huge swathes of data and didn’t even realise that SPSS could do that!

    Would adding rows of “missing data” (so that each patient now has 4 vessels and therefore equal quadruplication of their demographics/outcomes) help even out the statistics?

    Moses
    Full Member

    Will Dracup would know someone local to you who could help. He’s done lots of stats on Imaging. And found a company or two.
    Be brave, send an email.

    johnx2
    Free Member

    Would adding rows of “missing data” (so that each patient now has 4 vessels and therefore equal quadruplication of their demographics/outcomes) help even out the statistics?

    yes to each patient having four vessels (because baring anatomical freakery they will all have four vessels), with missing values for vessels where you don’t have measurements.

    duplicate demographic and outcome data – age or sex repeated four times – doesn’t matter as long as you have one variable for each of these things which is the same for all cases (as in in the same column). So you have age in column C, let’s say you have it repeated in columns K (age2), U (age3) and Z(age4) or whatever, it doesn’t matter whether you analyse in terms of age or age4 as they’re identical. As long as don’t get muddled. I’d personally get rid of the redundant columns for tidiness and there are various ways you could do this…

    You do need to speak to a statistician and I’m not one.

    TiRed
    Full Member

    Had a brief look at the sample data and bashed it around a bit to restructure it. I’m looking at the correlations as I type. Age and Gender were not predictors of diameter, but vessel type and whether a rupture was observed are (as expected, I would have thought).

    Several methods are available for more in depth analysis, depending on what you want to investigate. If you really want to predict survival, then a survival analysis using the variables you have collected would be my first choice. That assumes you have a reasonable number of events. Logistic regression (died or lived) is also possible. A principle components analysis is also a possibility.

    For multiple scans, as you have, a multivariate model is needed. In this instance we can generate a model correlating the four scan diameters to each other. Then predict the missing ones based on the observations, then put these into a model. Sadly it is not as clean as having all the observations, and some degree of sensitivity analysis is needed to confirm robustness of findings.

    I’m not an SPSS user btw, but for any SAS junkies out there, SAS is now available free online for not-for-profit usage. I think it is the death knell for R, personally.

    EDIT: And above all else – NEVER cut and paste data in XL, always manipulate with a script. I just transposed the data to look at correlations with about three lines of code;

    proc transpose data=mace out=maceT;
    by ID; id Vessel; var minDiam;
    run;

    johnx2
    Free Member

    above all else – NEVER cut and paste data in XL

    😀 this is why the OP needs to speak to a proper person!

    Several methods are available for more in depth analysis, depending on what you want to investigate. If you really want to predict survival, then a survival analysis using the variables you have collected would be my first choice. That assumes you have a reasonable number of events.

    I miss data. In that sample of ten cases only two died in a year. Predicting died/didn’t die for 100, or time to death when only say 20 where in a clogs popped situation… Enough to build a model but not test it too? Very tempting to cheat in these situations…

    legolam
    Free Member

    TiRed and johnx2 – I can’t thank you enough for having a look at this. I have to confess that I have very little comprehension of what you’re actually talking about, but it’s given me something to think about (and start googling all those terms you’re using!).

    I should point out that my example spreadsheet is literally just some numbers that I made up to illustrate my problem. I could show you the actual spreadsheet, but I fear you may lose the will to live if you see it. 17 out of 84 patients had an adverse outcome at 1 year in the actual dataset (ie 20%).

    Can I ask what sort of statistical tests you are actually using to construct these models? Assume I’m a moron…

    Shackleton
    Full Member

    And I do have hypotheses – I’m just not sure how to test them!

    Without wanting to sound like an arse all of your data collection should have been about testing the hypothesis and been guided by your experimental design based around the statistical methods you will be using to test them.

    Deriving hypotheses after data collection is bad practice. Rummaging through the data for “statistically significant” results by comparing every variable you have is even worse. I hope I misunderstood your initial explanations, otherwise you need to have a serious chat with your supervisor about his research practices.

    If you are based in Scotland I can probably put you in touch with some bio-statisticians who can help. Email in profile.

    legolam
    Free Member

    I think the problem with our (my) data collection is that we’ve used a fairly new technology in a very specific sub group of patients and I think we underestimated the amount of missing data due to these issues, and also how much more complex the data would be compared to previous studies. It’s been a learning curve for all involved.

    I’m based in the north of England, but thanks for your offer. I’m going to keep trying to annoy the statisticians at my institution in the hope that one can eventually help me!

    Shackleton
    Full Member

    May seem obvious but have you spoken to the biostatistics people in the maths dept at Newcastle University? I assume that is where you are based on what you are studying.

    I assume that you have some kind of mentoring or thesis committee? Id have a chat with them if your supervisor is refusing to get you the help you need. If you case is as you say it is at my institution your supervisor would be bared from having any more students until things are deemed to have improved and all the existing students given a second supervisor!

    TiRed
    Full Member

    Shackleton is correct regarding pre specifying a hypothesis and then collecting the data and performing the analysis. A statistical test after the even is called post hoc and in my world always qualified. Twenty tests will on average produce one significant result 🙂 .

    The analysis can however be done post hoc to give precision of parameter estimates in a model. You don’t really have an alternative hypothesis in the strict sense of the word, just estimation of effects of variables on an outcome measure.

    Here’s a simple analogy for one analysis. I can be a good rider or a rubbish one. I collect data on wheel size, height, age, income, tyre width etc… I can make a model predicting the probability of being good. Now suppose I don’t collect all the variables in everyone. I can impute missing variables depending on other observed values, say older riders choose 29 ers in my data set. Multiple imputation can be used to predict the missing vessel diameters in all subjects using the correlations in the data. Then the probability model can be used to predict good or bad skills. There are many methods for imputation, I like Markova chain Monte Carlo (mcmc)

    It’s still post hoc in the frequentist world of neymann Pearson inference. I’m really aBayesian, but we won’t go there (yet). But the estimation methodology is sound.

    legolam
    Free Member

    May seem obvious but have you spoken to the biostatistics people in the maths dept at Newcastle University? I assume that is where you are based on what you are studying.

    Might seem obvious, but my brain is so frazzled by this that it hadn’t even crossed my mind! I’ve had a look and there look to be at least a couple of people in that department with research interests in this field so I’ll go and knock on some doors/send some emails.

    johnx2
    Free Member

    I’m really a Bayesian, but we won’t go there (yet).

    I thought you were likely to say that*.

    @legolam – I’m not disagreeing with anything posted by others here, but at this point you might benefit most from speaking to someone who’s been in a similar situation, basically any post doc researcher – health-related, social, whatever – who’s done a few quantitative studies (streetwise…) to help you get more of a feel for understanding your data. There’s a risk that a pure statistician coming from an altogether mathsier place might be a little alienating. Though it sounds like any help would be good.

    (*bdum tish. Please yourselves, I’m here all week.)

Viewing 23 posts - 1 through 23 (of 23 total)

The topic ‘Any bio-statisticians around?’ is closed to new replies.