10 most common clinical trials data problems

One of the common tasks I do in my day-to-day job is to review analyses and summaries of clinical trial data. I’ve summarised the most common problems I see with the hope that it might help you avoid these in the future:

  1. White blood cell differential units being mixed up – frequently these lab tests are measured either in absolute terms (number of cells per volume) or as a percentage of the total number of white blood cells. Often the absolute and percentage results are both reported. A decision should be taken early on in the trial as to how the data should be reported. I would recommend reporting in both absolute and percentage terms. Often this will require converting from one to the other. It’s also very useful to do a check by adding up all the white blood cell components and ensuring that the total is in the same ball park as the reported WBC count.
  2. Reasons for withdrawing from the study – often clinical trial investigators have different approaches to reporting the main reason for a patient withdrawing from a study. When looked at overall, these patterns can cause difficulty in interpreting the results. These reasons should be reviewed with the clinical team throughout the trial. It’s also useful to check the number of patients who have withdrawn due to side effects against the adverse event data reported.
  3. Scale of graphs – to help interpretation of results, often clinicians will want to compare graphs, and to facilitate this, the graphs need to be on the same scale. Additionally, if graphs are showing pre-treatment versus post-treatment results, graphs should use the same size of scale on both axes, so the 45 degree line shows the “no-difference” point.
  4. Footnotes – add footnotes to make sure that the results in a table are clear, and can be understood with reference to other documents as much as is possible. Ideally abbreviations used in a table or graph should be footnoted.
  5. Number of subjects – usually tables present the number of subjects in the population under study (“big N”), as well as the number of subjects with available data (“little n”). The “big N” number needs to reflect the number of subjects under study, and this is frequently incorrect for subgroup analyses. I have seen cases where “little n” is bigger than “big N” which is clearly rubbish (not from people in my company though!).
  6. Denominators for percentage calculations – related to the above point with “big N”, the denominators for summaries should be carefully planned and documented. If the denominator is not the total number of subjects in the analysis population, this needs to be justified.
  7. Fasting glucose – someone laboratories and people handling clinical data always seem to struggle with fasting and non-fasting glucose. A patient either has fasted or they haven’t, and hopefully the state of the patient will be accurately recorded. The normal ranges for fasting glucose is different from the non-fasting state, which is why this is important.
  8. Decimal places – often the number of decimal places is not appropriate. There should be enough decimal places to allow adequate interpretation of the results, and ideally not any more!
  9. Units of measurement – always state what the unit of measurement is on every output and analysis.
  10. Outliers – a check should always be carried out when working with any continuous data (e.g. lab measurements, vital signs parameters) to ensure that any outlying values are not erroneous.

4 thoughts on “10 most common clinical trials data problems

  1. 1. There are different normal ranges for percentages and absolute counts and thus they are typically analyzed in parallel. This is why the ranges should not be just re-calculated between the systems (based on the total WBC) – it is wrong.
    7. Same for the lipid panel.

    To summarize, the issues you outlined all lie in the realm of SAP writing; there are issues foreign to us statistical programmers what can (and do) ruin studies. For example, you mention the labs frequently; the decision to use local labs typically leads to disaster.

    Reconciling the complex and often incomplete findings-class datasets is another source of nightmare (think about viral load vs. labs for DAIDS Haematology rules). PVs and complienace come close – I have seen people programming all but AI systems to scavenge all datasets for possible clues about missing data. It all typically *is* in the SAP but looks innocent until one tries to implement it properly, on incomplete data and all.

    • Hi Anton. I agree that the normal ranges for labs can depend on the way it is measured. However, it doesn’t make sense to NOT convert WBC differentials for the purpose of analysis and summary, where the results are being looked at on a continuous scale. Obviously you are correct when the summary is considering whether values are above or below the normal ranges. I don’t think the decision to use a local lab would be decided when a SAP is written – this would be decided by the clinicians at the protocol stage (or at a company level before this).
      I also agree that PVs and compliance often cause problems. PVs because you are, by definition, looking for outcomes that weren’t planned to happen. So how do you identify those? With difficulty is the answer. Compliance is usually a problem because the data are usually a little messy, and the definitions are a little loose. Thanks!

      • Hi Kevin,
        Re: WBC differential normal ranges: it is not the way it is measured, it is about the differential composition. Think about two possibilities for e.g. Neutrophils:

        1. The neutrophils absolute count is within the normal range. yet other cell counts are so off what the neutrophils represent too small (or too large) portion of WBC compared to other cell classes.
        2. The proportion of neutrophils within the WBC is neat, yet the total WBC is e.g. so low it means neutropenia.

        From the above it should become clear why I wrote what the normal ranges for the differential should not be re-calculated between the representations using WBC as the factor.

      • Hi Anton, I am not suggesting that the normal range limits are recalculated. There’s no need, as the normal range on the original reported scale should be used to tell whether a value is high, normal or low. However, if you are looking at average profiles over time (say, median plots), then it wouldn’t be sensible to only summarise data that was measured on one scale or the other, which is what I see happening all the time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s