Medical statistics

Auteur: Gepubliceerd op: 

For many of us, medical statistics might just one of if not the most abstract subject of our entire curriculum. Intimidated by terms like ‘non-parametric test’ and ‘logistic regression,’ you tend to just skip those terrifying questions on the Progress Test and just avoid anything related to research. Fear not, however, this PanEssay will explain some of the basics so you can get back to actual physician matters.

Descriptive statistics

First, it is vital to distinguish between the different types of variables. We have numerical and categorical variables, each with their own subtypes. Numerical variables are quantitative, meaning they have a value. The first subtype of these is the continuous variable, which can have any value (2.81, 98, -5.6), for example height. Discrete variables can only have integers which means that there cannot be decimals in the number (3, 48, 2), for example the number of children someone has. The opposite of numerical is categorical, which are qualitative variables. When there is a natural order in those, you call them ordinal, for example pain scales. Nominal variables do not have such an order. Examples of this are marital status or presence of disease (0 is no and 1 is yes).

For continuous variables, perhaps the most important concepts to grasp are the three  averages, the mean, the median and the mode. The mean is defined as the sum of all values in a particular set, divided by the number of values. The mean of 1, 2 and 3 is therefore 2. You can only use the mean in a (near) normal distribution, where the spread of data is approximately equal on both sides of the midpoint. Normal distributions have the classic bell shape. Having said that, not all data is normally distributed. Sometimes, the distribution is skewed. In this case, you should not use the mean but the median. It is the point which has half the values above, and half below. Furthermore, you can use the mode to describe data. This is the most common of a set of events. The mode is not often used.

Another important concept is the standard deviation (SD, σ). It indicates  how data varies around a mean; it can therefore only be used in normal distributions. A high SD thus indicates high dispersal. 1 SD includes 68.2% of the data, 2 SD 95.4% and 3 SD 99.7%.

This then raises the question: how can you tell whether data is normally distributed? A histogram could be used, although this exclusively shows a normal distribution. It is also possible to make a Q-Q plot or do a statistical test (e.g. Shapiro-Wilk).


Let’s say you did an awesome study and found a mean. Hurray! However, how do you know whether your result is representative, or in other words, significant? You want to be confident about your result, and therefore a mean is generally given with a confidence interval, a range in which you can be fairly sure the true value lies. The true value is the mean of the entire population, but your study yielded a sample mean from a smaller group. Say you investigated a diet and measured weight loss. You found a mean of 4 kg weight loss with 95% CI [-2;6]. This means you are 95% sure the true mean difference lies between -2 kg and 6 kg. Moreover, as 0 is included in the interval, there can actually be a weight gain, and the diet may be ineffective. In other words: there is more than a 5% chance that there is no true change in weight.

A related concept is the P-value, or probability value. It is used to see how likely it is that a hypothesis is true. Usually, this is the null hypothesis: there is no difference between two groups, so the means are equal. If P = 0.5, it means that the chance of this result or a larger result happening is 50%, indicating that the result is not significant. A P-value smaller than 0.05 is generally accepted as showing a significant difference, yet the smaller the P-value, the better. This may seem small, but if 20 studies have a P-value of 0.05, one of those studies could show a result that actually is not true. If P < 0.05, the null hypothesis is rejected and therefore, there is a difference.

Parametric tests

The above-mentioned concepts are essential to all statistical tests. Each test is used for a different set of circumstances. There are a lot of different tests, and the details can be quite complicated. Luckily, when interpreting, it is generally sufficient to jump straight to the P-value to see whether you have a significant result. Parametric tests can only be used in normal distributions.

The prime example of a statistical test is the t-test. You can use this to see whether the mean of a population equals a certain value, or if the means of two populations are equal (e.g. comparing an old and a new treatment). It is possible to have a one-tailed or two-tailed t-test. Returning to the example, in the one-tailed version you expect the new treatment to be better than the old one, and in the two-tailed version you test that it can both be better or worse. Another subtype is the paired t-test, this you could use for instance, when you measure blood pressure with two devices on the same subject (and you can use the one- or two-tailed again here).  The other important parametric test is the analysis of variance test (ANOVA), where you check whether sample means from different groups correspond to each other.

Other tests

Non-parametric tests are used when the data is not normally distributed or the data is categorical. Therefore, you cannot use the mean but are confined to the median. The principle remains the same, however: looking for the P-value. The other tests are the Mann-Whitney test, Wilcoxon test, Kruskal-Willis and Friedman.

The Chi-Square test measures the difference between expected and actual frequencies, or the relationship between two categorical variables. The expected frequency is the frequency that holds when the null hypothesis were true. If the observed and expected frequencies are equal, the Chi value (X2) equals 0. The Chi value itself is not relevant when interpreting, just go to the P-value. It is used often in contingency tables. An example would be the relationship between marital status and eye colour. Fisher’s exact test is similar.


When there is a linear relationship between two variables, they are said to have a correlation, shown by correlation coefficient r. If there is a perfect relationship between two variables, r = 1. An r from 0.6 onwards is regarded as high correlation. In normal distributions, you use Pearson’s coefficient, otherwise the Spearman rank correlation coefficient.

A related concept is regression: how one set of data relates to another. A regression line is the line that fits bests through the data points on a graph, and the regression coefficient gives its gradient. The R2 value gives the amount of variation that is explained by the regression. An R2 of 0.81 shows that 81% of the change in A can be attributed to the change in B. In logistic regressions, each case can only belong to one of two groups (binomial), like having a disease or not. The outcome is the probability that a certain case belongs to one group and not the other. Risk scores are a good example here.

This aid should get you started in the field of most commonly used statistical concepts. If you really liked this article, don’t be sad! There is so much more to learn about, for example Kaplan-Meier curves, odds ratios, Bland-Altman plots, Cox regression models…