An analysis of excess mortality in 2020 in European countries

Gael Fraiteur

Introduction

This data analysis tries to shed light on a few controversial questions regarding the COVID19 epidemics and the governmental restrictions on freedom of movement and association. It is based on demographic data instead of the usual COVID19 mortality data reported daily by the mainstream press.

My approach is to compare the number of deaths in 2020 (for weeks for which mortality data is available) with the number of deaths that would be expected based on the structure of the population on January 1st, 2020 and on the usual death rates. The difference between expectations and reality gives us a metric named excess deaths when it is positive, or deaths deficit when it is negative.

Unlike COVID19 mortality data, demographic data are:

Countries

This analysis include data from the following countries:

Country
Belgium
Bulgaria
Croatia
Czechia
Denmark
Estonia
Finland
France
Germany
Greece
Hungary
Iceland
Italy
Lithuania
Luxembourg
Netherlands
Poland
Slovakia
Slovenia
Spain
Sweden
Switzerland

The graphs have been scaled so that they can be compared between countries. The scale has been normalized based on the number of inhabitants aged 65 or more. However, any comparison between should still be taken with care. The most significant different for most graphs is that the availability of data for the lastest weeks vary between countries.

Open sources

This article, as well as all R scripts that have been used to compute the data, are hosted on GitHub at https://github.com/gfraiteur/mortality. If you have a question or remark related to this article, or if you have found a bug or inaccuracy, please open an issue on GitHub or, better, submit a pull request.

All data comes from open sources and can be freely downloaded.

About the author

Gael Fraiteur graduated from the Louvain School of Engineering in 2001 as a civil engineer in applied mathematics. He also holds two minor degrees in philosophy from UCLouvain. He has worked since then in the software industry. In 2004, he started an open-source project named PostSharp. In 2009, he founded a company to market and develop the product. As the CEO and principal engineer of PostSharp Technologies, the author now shares his time between R&D, management and marketing.

Population structure

Before we start analyzing excess mortality, it is interesting to visualize the structure of the population.

Predictive model

Our model of yearly mortality has two inputs:

  1. The structure of the population the 1st of January of each year between 2009 and 2020 (i.e. the number of residents of a given age and sex alive on that day).

  2. The death rates for the corresponding age, sex and year, i.e. the probability that a person who was alive on January 1st morning would be dead on December 31st evening.

The predictive model is then built as follows:

  1. The expectation of yearly death count is the product of the number of inhabitants by the death rate for the given age and sex. This data aggregated by 5-year age groups.

  2. A histogram of distribution of the yearly mortality by week of year is computed by averaging the last years.

  3. The weekly mortality model is built by multiplying the yearly mortality model by the weekly distribution model.

This process is described here below.

1. Structure of the population

The first input of our model is the structure of the population. The data set gives us the number of inhabitants of each sex who are alive and have a specific age on January 1st of a given year.

When the population structure is not available until 2020, we use linear regressions, for each sex and age, to complete the missing years. The coefficients are computed based on the last 5 years for which data are available. Typically only the last 1 or 2 years of data are missing, therefore a linear extrapolation (as opposed to the use of a more complex model such as Lee-Carter) is considered sufficient.

(Note that we also use this linear regression to project the population structure to 2021, which is incorrect by up to 10% because of the excess mortality in 2020. Excess mortality in 2021 can therefore be overestimated by up to 10% according to countries.)

2. Death rates

The second input is the historical death rates for each age and sex. When a data point is missing for a given year, it is interpolated from the past and previous year for the given age and sex.

The death rates are typically not known for 2019 and 2020, and in some cases for a few more past years.

We model the death rate with a linear regression for each age and sex. This model allows us to extrapolate the data to 2019 and 2020, Note that the model does not use the empirical death rates, but only the linear regression itself. Thus approach removes the year-to-year variations for all previous years. That is, this death rate model removes the effect of epidemics and weather conditions that happen less frequently than yearly.

3. Yearly mortality model

Once we have a death rate model for each group and year, we multiply this coefficient by the actual (or extrapolated) population for this age group and year, which should give us the number of expected deaths.

However, the expected and observed number of deaths, summed from 2009 to 2019, don’t match exactly. This discrepancy is expected and its cause is not important. To cancel the discrepancy, we compute correction factors and apply them, for each sex and age group, so that the 10-year total matches exactly.

The next graphs shows the yearly mortality model and compares it to empirical data:

4. Weekly mortality model

We now have a yearly model, but we need weekly projections. For our weekly model, we first compute, for each sex and age group, the percent of deaths that happens in a given week of the year, and we multiply this coefficient with the yearly death rate for this year. Note that we applied a 3-week centered rolling mean to the data series before aggregating per week of year.

To get the weekly mortality, we take the yearly mortality, the population structure as of January 1st of the year, and we multiply, for each age group and sex, by the week pattern.

Excess mortality

Now that we have a predictive model, we can compare the actual mortality with the one that would be expected in a “normal” (neither good, neither bad) year.

Where possible, the data is shown from 2010 to make it possible to compare the mortality peaks of 2020 with those of recent years.

The following graphs are identical but focus on 2020:

Excess mortality per week and age group

Cumulative excess mortality

It is also interesting to look at cumulative excess mortality over a long period of time. In most countries, there is a succession of one of two good years followed by one or two bad years. These graphs allow us to visualize how, and how fast, good years compensate the bad ones, and to compare 2020 to previous bad years.

In most epidemics and other events, the most vulnerable people tend to be the most affected. Some of these people would probably have died some time later of another cause. This phenomenon, when it exists, is visible on cumulative excess mortality graphs: the steepest is the slope down after the epidemic peak, the less the lifetime of people was actually shortened by the event.

Mortality rates by age group

The excess mortality rate in 2020 is risk to which a person in given age group and sex was exposed compared to the expected rate if the year was “normal”. For instance, a 2% mortality rate means that a person in that group had 2 out of 100 more “chances” to die in 2020 than in a normal year.

Note that the excess mortality rate for 2020 is computed from incomplete data.

The following graphs shows the excess mortality rate in 2020:

The following graph compares the mortality rate in 2020 with the expected mortality rate. Since data from 2020 is incomplete, this rate is computed as being the expected yearly mortality rate for the whole 2020, plus the excess mortality rate computed for the period where data is available.

Mortality rates in historical context

To interpret mortality rates, we need to compare them with other relevant mortality rates. Besides the geographic comparison, we can attempt a historical comparison and try to answer the question: how long do we have to look in the past to see similar mortality rates or excess mortality rates.

Comparison of absolute mortality rates

The following graphs shows the historical mortality rates for different age groups and, on the same graph, a horizontal line showing the value of the mortality rate observed in 2020 (based on partial data). The intersection point of the horizontal line with the historical time series gives us the year in which the mortality observed in 2020 was usual.

DISABLED

Historical comparison

How exceptional is the COVID-19 epidemic? Is it a once-in-a-decade event or a once-in-a-century one? The answer depends on the metric that we consider. We will consider two metric: absolute mortality rates, and excess mortality rates. For each age group, we will find the most recent year where the metric was worse than in 2020.

Comparison of absolute mortality rates

The following graphs illustrate how many years in the past do we have to look at to see an equal or greater absolute mortality rate, that is, a greater risk of dying during the calendar year for each age group and sex.

Interesting points in these graphs are:

DISABLED

Comparison of excess mortality rates

The following graphs gives a historic perspective on the excess mortality rates. To estimate the excess mortality of past years, I simply make the difference between the mortality for the current year and the mean of the mortality for the next and the previous years.

DISABLED

Loss of life expectancy

A question that arises from the analysis of the mortality by age group is: which weight should be given to each age group? The most common approach is to consider all deaths equal and just to sum the deaths of all age groups. Another approach is to consider to life expectancy, i.e., the number of remaining years that people were expected to live if they had not died unexpectedly (in a statistical meaning). This metric can be useful when comparing the fairness of different allocation strategies for vaccines. Should the strategy focus on minizing the number of deaths or to maximize the life expectancy?

For this analysis, I have computed remaining life expectancy for each age group using the demography package of R. For age groups over 100, the life expectancy was set to 2 years.

It is worth noting that in all categories, the loss of life expectancy is shorter than the duration of the lockdowns.

The following graphs shows how many years of life expectancy were lost in 2020 compared to expectations:

Comparison with COVID19 mortality data

The following graphs compare excess mortality (computed from demographic data as explained above) with the mortality attributed to COVID19 (as reported by the media).

Government response, impact on mobility, and correlation with excess mortality

The following graphs represent three independent time series on the same time axis. The three time series have a maximum of 100 by construction: