Comparing Two Medians for Differences

Bootstrapping Confidence Intervals for the Difference Between Two Population Medians

Posted by Daniel Schmidtke on Sunday, November 1, 2020

Is the Wilcoxon-Mann-Whitney-test a valid distribution free test for the comparison of central tendencies?

One of the most frequently tested hypotheses in science is that for an equality of the central tendencies of two populations. For normally distributed populations, this is quite easily done by a comparison of the means using a two-sample t-test, if the following assumptions are met in addition to the Gaussian population distribution:

  • independence
  • equal population variance

A distribution free alternative for the comparison of population medians is less easily found. If you ask your colleagues, friends, or the internet, some of these sources will probably give you the advice to use the Wilcoxon-Mann-Whitney-test (WMW-test) to explore possible differences between population medians. However, you should take this advice extremely cautiously, as this approach is valid only for a very small number of cases, in which the population distributions are:

  • independent
  • equal in shape
  • equal in scale

What a WMW-test does, is testing the null hypothesis that the distributions of two populations are equal. It does not compare any sample-specific summary statistic, such as the median. Only if the above-listed conditions of equal shape and scale (i.e. equal variance, equal skewness, etc.) are met, an equality of distributions basically boils down to an equality of location (including that of the median). In most cases, however, these conditions are not met and the WMW-test is not a valid tool for the comparison of medians.

If you are interested in the details of the described misunderstanding about the WMW-test and its relationship to medians, I recommend reading this article by George W. Divine and his colleagues.

The rest of this blog post will present a bootstrapping approach that provides confidence intervals (CIs) for the difference between two medians as a possibility to directly address this location parameter.

Bootstrapping confidence intervals for the difference between two medians

This method was recently brought to my attention by an anonymous Reviewer, who suggested it to me as a possibility to support a rather descriptive statement I made in one of my recent manuscripts. He or she also provided a link to this very instructive post by Clay Ford, in which a detailed comparison between the WMW-test and the bootstrapping approach is made.

My post will skip this comparison and will, instead, give a brief instruction on how to calculate the bootstrapping CIs for the difference between two medians in R. The same information can also be found in Clay Ford’s original post, which I highly recommend for those of you interested in the details.

The code

First of all, we will need the following two libraries (ggplot2 for visualizations and boot for the bootstrapping):

library(ggplot2)
library(boot)

Next, we will need two data sets to compare. Here, I chose some imaginary performance data, e.g. the number of trials individuals from two groups of mouse lemurs needed to solve a specific task. Let sample 1 be a group of young animals and sample 2 a group of old animals:

sample_1_perf <- c(62,73,43,54,45,112,83,64,92,101)
sample_2_perf <- c(180,96,150,225,135,108,87,134,145,112)

For further processing of the data, it will be better to have the entire performance data available in a single column of a data frame and the group identifier of each individual in a second column:

data <- data.frame(performance = c(sample_1_perf,sample_2_perf), 
              age = rep(c("Young","Old"), each=10))
data$age <- factor(data$age , levels=c("Young", "Old"))

The first rows of the resulting data frame should look like this:

performance age
62 Young
73 Young
43 Young
54 Young
45 Young
112 Young

To get a first impression of the performance data, let us now plot the two data sets side-by-side as boxplots using ggplot:

(ggplot(data, aes(x=age, y=performance)) + 
  geom_boxplot()) +
  xlab("Age group") + 
  ylab("Performance [trials]") +
  ylim(0,200) +
  ggtitle("Performance by age group") +
  theme(plot.background = element_rect(fill = "grey95"),
  panel.background = element_rect(fill = "white", colour = "black"),
  panel.grid.major = element_line(colour = "grey85"),
  panel.border = element_rect(fill = NA, colour = "black"),
  legend.position = "none")

From the visual inspection alone, it seems like there is a clear difference between the median performances of both age groups. To test this, we will now calculate the 95% bootstrap CI for the difference between the medians using the following code, which is strongly inspired by Clay Ford’s code, as stated earlier:

## Firstly, we define a function that calculates the difference of two example medians:
med.diff <- function(d, i) {
   tmp <- d[i,] 
   median(data$performance[data$age=="Young"]) - 
   median(data$performance[data$age=="Old"])
}

## Secondly, we apply this function to 1000 bootstrapped samples
boot.out <- boot(data = data, statistic = med.diff, R = 1000)

## Thirdly, we clculate the median of the 1000 bootstrapped differences as a point estimate of the difference of the population medians
median(boot.out$t)

## Fourthly, we calculate a bootstrapped CI for this estimate
boot.ci(boot.out, type = "perc")
 

The output from the third action, i.e. the bootstrapped point estimate for the difference between the medians is:

## [1] -62

The output from the fourth action, i.e. the CI for the point estimate will look like this:

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot.out, type = "perc")
## 
## Intervals : 
## Level     Percentile     
## 95%   (-91.0, -22.5 )  
## Calculations and Intervals on Original Scale

Since zero is not included in the interval, we can conclude that it is very likely that there is a true difference between the medians of both groups.