Who do you surround yourself by?


I’m sitting on the edge of Lake Michigan watching the sunrise. For those of you who know me, you would know that I probably don’t see the sunrise very often, but the time difference is messing with my sleep patterns.

Yesterday, I started a scholarship at Kellogg Business School, part of the North Western University in Chicago, thanks to the UK government’s trade and industry group. We had our first session on “Networking”, which at first, I thought I was going to be given a glass of fizzy wine and asked to awkwardly mingle with my fellow students. After realising it was four hours long, I assumed I would be going to a LinkedIn masterclass, the likes of which I see in my spam email folder regularly. My expectations weren’t very high.

As soon as Professor Brian Uzzi started talking, I knew the class was going to be special. He drew on history as a case study to persuade us that we need a network of diverse influencers, warning of the dangers of surrounding yourself with like minded agreeable people (“The Joy of Spontaneous Agreement”). He had so much useful advice: “Your network is the way you transcend your own limitations”.

He talked about the Six Degrees of Separation concept. As a statistician, this stimulated the mathematical side of my brain. I did a quick calculation. Humans only need to know 44 other humans for this to be true. Given that the median number of Facebook friends is around 200, then I’m feeling quite persuaded.

The session culminated in all the students being asked to write down one wish. The wishes were noted, along with a monetary value, and how long each individual had held the desire for. Using the network in the room (admittedly a well connected bunch of entrepreneurs), every single wish had another person in the room who could help make it come true.

This was a truly inspirational session, where I learned an incredible amount. Many ideas and plans have popped into my head. I feel honoured to be inspired by Proffesor Uzzi but also really excited about the test of the scholarship. Many thanks to the organisers and UKTI for asking me to join.

Are vegetarians less healthy than meat-eaters?

I came across an article earlier this week that said that “vegetarians are less healthy and have a lower quality of life than meat-eaters”, according to “scientists”. As a vegetarian who is used to hearing that vegetarians live on average 7 years longer than meat-eaters, this took my interest. Are we vegetarians really less healthy? It seemed, frankly, impossible.

The academic paper is published in “PLOS ONE” which claims to “rigorously peer-review submissions” and only publish papers that are technically sound. Unfortunately, I do not agree that the conclusions of this paper are supported by the evidence, and therefore, I do not agree that the methodology and related conclusions are “technically sound”.

The paper introduces the research by saying that vegetarians eat less fat, more fruit and vegetables, are more active, drink and smoke less. In my opinion, you could probably make conclusions at this stage, but the authors are interested in specifically exploring the effect of a vegetarian diet in the Austrian population. Seems to me that the effect of a vegetarian diet in Austria wouldn’t be that much different in other non-Austrian countries.

The great thing about this paper (well, let’s try and be positive) is that it illustrates how difficult it is to draw conclusions using observational studies: extreme care must be taken. The authors have identified over 300 vegetarians, and matched them with similar meat eaters, using age, gender and socio-economic status as factors in their matching process. They then analyse the data, statistically adjusting for body-mass-index, physical activity, smoking behaviour, and alcohol consumption.

The point of matching the subjects and statistically adjusting the analysis is to attempt to remove the effect of lots of other factors that will affect how healthy someone is (e.g. smoking and drinking). This point, in my opinion, invalidates the conclusions of the study. The conclusion from the analysis of this study should be “vegetarians would be more unhealthy as meat-eaters if they followed the less healthy meat-eaters lifestyle”. However, I don’t imagine this would make such a great newspaper headline.

The authors call for public health programs to reduce risks due to nutritional factors. I think they want governments to discourage vegetarian diets as being unhealthy, but they don’t come out and clearly state this. This is actually a potentially dangerous conclusion to draw from this study. If public money were diverted to discourage a vegetarian diet or lifestyle, I would guess that it would either have no effect on public health, or at worse, a negative effect. As stated by the authors, vegetarians generally have a healthier lifestyle than meat eaters, and would benefit less from more public health funds being diverted in their direction.

There are two additional methodological flaws with the report that ideally should have been picked up by statistical reviewers. Firstly, only statistically adjusted results are presented, and no raw unadjusted data. I would very much like to see the raw data on health states in the different dietary groups, both across the whole dataset and in the matched sample. Second, the statistical adjustment is carried out in a model – specifically a “general linear model”, which makes a series of assumptions about the relationships between the variables being analysed. In this example, there are assumptions that the relationship between these variables is a linear relationship (e.g. twice as bad health for twice as much smoking). I would wager that the poorer outcomes for vegetarians after adjusting using a general linear model are more about incorrect linearity in the model assumptions than any real clinical effect on different diets.

In conclusion, please don’t pay any attention to this call for more public health money to be spent on persuading vegetarians to eat meat. It’s daft.

10 most common clinical trials data problems

One of the common tasks I do in my day-to-day job is to review analyses and summaries of clinical trial data. I’ve summarised the most common problems I see with the hope that it might help you avoid these in the future:

  1. White blood cell differential units being mixed up – frequently these lab tests are measured either in absolute terms (number of cells per volume) or as a percentage of the total number of white blood cells. Often the absolute and percentage results are both reported. A decision should be taken early on in the trial as to how the data should be reported. I would recommend reporting in both absolute and percentage terms. Often this will require converting from one to the other. It’s also very useful to do a check by adding up all the white blood cell components and ensuring that the total is in the same ball park as the reported WBC count.
  2. Reasons for withdrawing from the study – often clinical trial investigators have different approaches to reporting the main reason for a patient withdrawing from a study. When looked at overall, these patterns can cause difficulty in interpreting the results. These reasons should be reviewed with the clinical team throughout the trial. It’s also useful to check the number of patients who have withdrawn due to side effects against the adverse event data reported.
  3. Scale of graphs – to help interpretation of results, often clinicians will want to compare graphs, and to facilitate this, the graphs need to be on the same scale. Additionally, if graphs are showing pre-treatment versus post-treatment results, graphs should use the same size of scale on both axes, so the 45 degree line shows the “no-difference” point.
  4. Footnotes – add footnotes to make sure that the results in a table are clear, and can be understood with reference to other documents as much as is possible. Ideally abbreviations used in a table or graph should be footnoted.
  5. Number of subjects – usually tables present the number of subjects in the population under study (“big N”), as well as the number of subjects with available data (“little n”). The “big N” number needs to reflect the number of subjects under study, and this is frequently incorrect for subgroup analyses. I have seen cases where “little n” is bigger than “big N” which is clearly rubbish (not from people in my company though!).
  6. Denominators for percentage calculations – related to the above point with “big N”, the denominators for summaries should be carefully planned and documented. If the denominator is not the total number of subjects in the analysis population, this needs to be justified.
  7. Fasting glucose – someone laboratories and people handling clinical data always seem to struggle with fasting and non-fasting glucose. A patient either has fasted or they haven’t, and hopefully the state of the patient will be accurately recorded. The normal ranges for fasting glucose is different from the non-fasting state, which is why this is important.
  8. Decimal places – often the number of decimal places is not appropriate. There should be enough decimal places to allow adequate interpretation of the results, and ideally not any more!
  9. Units of measurement – always state what the unit of measurement is on every output and analysis.
  10. Outliers – a check should always be carried out when working with any continuous data (e.g. lab measurements, vital signs parameters) to ensure that any outlying values are not erroneous.

Trying to Post Jobs to the Government’s Universal Jobmatch website

As the owner of a growing company, we are frequently advertising jobs and looking to find talented people. Most of the time I am looking for good experienced statisticians and programmers, and these people are hard to find. Now that my company is a little bigger, we’ve decided that we could take on a group of people and train them in some parts of our job. They would need a basic understanding of either medical research or computing programming, a logical mind and good communication. In return, we’re offering a classroom based training course followed by dedicated support for a number of months. Given that there are 2.5 million unemployed in the UK, and we are not looking for our usual specific set of skills, we want to advertise on the government’s new website, Universal Jobmatch.

The login details to access the Jobmatch website are the same as a company uses to access taxation web-based services. We’ve used these for a number of years without problems. After registering for the Jobmatch website, as soon as we log in, we have an error saying the password has expired. This can’t be the case, as it’s the same password to access taxation services, and the password works OK there.

We contacted the help desk on the 24th Feb 2013. It’s now 3rd March and there’s been no response at all. We’ve tried contacting again on a number of occasions but cannot manage to speak to anyone.

Given the current state of the UK economy, you would think that linking employers’ jobs to unemployed job seekers would be a top priority, but this does not seem to be the case.

I’m writing this blog in the hope that someone reading it can help sort this mess out, both for my job advert, but also for other employers who may want to use the service.

I have read many other complaints about the Jobmatch website, but they are all from job seekers. The new website is run by Monster. I do agree that there is a potential for other agencies not to want to pass details of their jobs to Mobster, but this is another matter.

Estimating incidence of Alzheimer’s across the UK

There are estimates of the number of diagnoses on the Alzheimer’s Society website here : http://www.alzheimers.org.uk/dementiamap.  I intially misinterpreted this map to be indicating rates of diagnosis, but this is not the case. It is indicating the difference between diagnosis rates across the UK.  The message the map appears to show is that diagnosis of Alzheimer’s is best in Scotland and Ireland, and worse in the South of the UK.

Accompanying the map is a table of registered diagnoses by healthcare region.  To determine incidence, I added populations from the UK, Scottish, Welsh and Irish statistics departments of relevant governement departments.

This first map is the registered diagnoses by UK region.  I have used Google Fusion tables to produce the graph.  Unfortunately, I couldn’t find geographical region data for the UK that included Northern Ireland.  Northern Ireland’s incidence was 6.56 per 1000 individuals, slightly higher than that of the South West of the UK (6.44 per 1000), and not as high as Scotland (7.84 per 1000).

Click here to open map of UK Alzheimer’s incidence.
(Click on the “Map of Geometry” tab)

Here’s a table of the data:

Dementia diagnosis/1000 Estimated Dementia Incidence/1000 % pension age
North East 6.38 12.84 20.1
North West 5.98 12.20 19.4
Yorkshire & the Humber 6.04 12.43 19.1
East Midlands 5.73 12.84 19.7
West Midlands 5.43 12.63 19.7
East of England 5.47 13.66 20.2
London 3.81 8.55 13.8
South East 5.73 13.68 19.9
South West 6.44 15.74 22.5
Wales 5.61 14.56 22.0
Northern Ireland 6.56 10.42 17.7
Scotland 7.84 12.18 20.0

The middle column includes an estimate of undiagnosed Alzheimer’s.  When we plot these data, the picture changes somewhat.

Click here to open a map showing indicence including estimated undiagnosed Alzheimer’s.
(Click on the “Map of Geometry” tab)

In this map, Scotland seems to have comparable rates of Alzheimer’s. This is because the Alzheimer’s Society estimates of undiagnosed Alzheimer’s are much lower for Scotland (and for Northern Ireland). The diagnosis rate in Scotland is estimated at 63.0% versus around 45% for the rest of the UK (Northern Ireland also has a diagnosis rate of 64.4%).

In both maps, London has a significantly lower rate of Alzheimer’s than the rest of the country. This is because there are less older people in London compared to the UK:

Click here for a map of the percentage of people on pensionable age (65M/60F).
(Click on the “Map of Geometry” tab)

In summary, it seems as if the incidence of Alzheimer’s is mostly related to how many older people live in a region. However, this is very dependent on the estimates of those with undiagnosed Alzheimer’s, which is difficult to estimate (as it is essentially missing data).


Multivitamins in the Prevention of Cancer in Men

Yesterday a new article was published in the prestigious Journal of the American Medical Association, JAMA. As a statistician, I find the fact that this article is published in such a prestigious journal in its current format somewhat surprising and definitely disturbing.

The key result that is noted in the article is the fact that male physicians taking multivitamins seem to develop less cancer than similar male physicians taking a placebo.  In the multivitamin group, 1290 of 7317 (17.6%) subjects developed cancer, compared to 1379 of 7324 (18.8%) subjects in the placebo group.  The difference is 1.2%, with a 95% confidence interval of [-0.05%, 2.45%] (note that my simple calculation of the difference in proportion is not statistically significant, compared with the survival analysis in the manuscript).

The issue I have with the paper is one of multiplicity.  A simple comparison with the planned objectives and analyses of the study in clinicaltrials.gov indicate that there were 3 primary objectives of this study.  There is additionally 4 secondary objectives stated, one of which is whether multivitamins reduces the risk of cancer.  It seems fair to conclude that since the publication is addressing one of the secondary endpoints, then all seven outcomes were considered in the study.  It is not entirely clear from the publication how many of the seven outcomes were statistically significant at the 5% level.  The stated p-value is 0.04 for the main analysis in the publication.  This must be interpreted in the light of all of the planned analyses in the study.  A p-value of 0.04 is an indication that the probability of seeing this result if there is no actual difference between treatments is one in twenty-five.  However, if there were 7 such analyses, then the probability of observing at least one result as impressive as that reported would be much less, possibly as low as one in four (i.e. quite likely).

To make matters worse, the authors have also reported adverse events based on statistical significance tests.  I assume with such a large group of patients that there would have been a multitude of different types of adverse events reported (100s?).  Were all of these different event types tested?  In that case, it would be expected that you would find many statistically significant differences, just by chance (a significant result is only an indication that you would find such an extreme observation one in twenty times).

It is clear that this publication would not have demonstrated a clinically significant result if the fact that the study had so many objectives was taken into account.  As such, the conclusions that the study makes should not be conclusive, and in my opinion, should not encourage anyone to take multivitamins to avoid development of cancer.  I would however, highly recommend this paper to those in universities teaching statistics as a case study in what not to do.

Elton John’s Winter Ball

I am waking up disappointed this morning. One of the highlights of last night’s Winter Ball to raise funds for Elton John AIDS Foundation was meeting Matthew Cain, the arts correspondent for Channel 4. I chatted with Matthew, and he was warm and funny, complaining about the lack of eligible single men. When I pointed out that the waiters were all 6ft 3 models, he responded by telling me he’d just wasted 20 minutes chatting one up to find out he’s straight.

The ball was a glittering success. Everyone seemed to be enjoying themselves (we’ll, maybe apart from the couple who were arguing about whether she should have given David Furness her business card). The cocktails, provided freely by Grey Goose were a taste bud sensation.

Emeli Sande started off the entertainment, giving us a few songs before being joined by Labrinth, who got the crowd rocking. His set was excellent, a mix of pop, urban dance and electronica. Emeli watched on, sipping a glass of red wine whilst happily agreeing to be photographed by all.

The night progressed to the poolside dance floor (the pool was electronically projected, Club Tropicana style). A few of the girls and gays were pole dancing around one of the columns. People, including me, were throwing shapes on the dance floor.

Here’s where my disappointment lies. Matthew tweeted that there were a bunch of rich white men who couldn’t dance. Is he including me in this? Surely not?!

He was right about the lack of eligible men though. What was surprising was the number of hopeful single girls there. Bit of advice to them… If you’re looking for a (rich?) husband, Elton John’s parties might not be the best place.

And so, until next year…