Breathing Britain

How One Research Group Gathered Data on Allergy Symptoms

Recording the temperature on any given day is trivial. The amount of rainfall, equally so. Even knowing which species of plant is releasing pollen into the air is straight-forwards for the likes of the Met Office. But knowing when and where people are experiencing allergy symptoms? That’s a different challenge altogether.

In 2016, a team from the University of Manchester launched a mobile app to try and gather data on this very problem[1]. The app, called Breathing Britain, asked participants to note down how they were feeling (from an allergy symptoms point-of-view), which was then uploaded to a central database, along with other pieces of information such as symptom specifics, the date and the location.

The idea was to see if such a technique, known as the Experience Sampling Method (ESM), could sensibly harvest useful and reliable data from ‘citizen scientists’.

Throughout 2016, over twenty thousand individual datapoints were obtained, covering 95% of all UK postcodes. In order to validate the data, it was compared against antihistamine prescription data (openly available via the NHS[2]), finding a good correlation.

To give an idea of where people responded, figure 1 shows a heatmap of responses throughout the trial.

Figure 1 – Britain Breathing Data Locations

How people were feeling was scored as ‘feeling good’, ‘feeling so-so’ and ‘feeling bad’, with scores of 0, 1 and 2, respectively. The antihistamine prescription data that was used simply showed the number of items prescribed each month.

With these datasets, we can see how they behave over the course of 2016. Figure 2 shows a plot of the Britain Breathing data (mean score per month), vs the number of antihistamines prescribed each month.

Figure 2 – Britain Breathing vs Antihistamine Prescriptions

As you can see, there is excellent agreement. This may not seem overly surprising, but it shows that the method of collecting symptom data via a mobile app shows promise.

Carry-on Merging

In some of my previous blog-posts, I’ve talked about how a dataset can be expanded and enhanced by merging it with other, related datasets. All that’s needed is for datasets to share a common key, which in the case of the two above, was the date. Always bear in mind when doing this, however, that any observed shared trends do not necessarily imply a direct connection (or, as the adage goes, ‘correlation does not imply causation’).

Beyond these datasets, others are available that could add extra dimensions. Here at Avacta Animal Health, we have access to Met Office pollen datasets that are relevant to the question of where and when allergy symptoms are most prevalent. These data, measured by the National Pollen and Aerobiology Research Unit at the University of Worchester[3], give us an accurate picture of airborne allergen amounts over time.

Specifically, the data are split across 14 different sites, 7 years and 12 different allergens (including trees, grasses and weeds). To compare against the Britain Breathing data, this pollen data was averaged across all sites, restricted to 2016, and the total amount of pollen from all allergens calculated for each month. Figure 3 shows the results (recall that the higher the score, the worse the participant felt).

Clearly, and as expected, all three are following the same trend. The exact causal mechanisms linking them can’t be inferred from a simple plot like this, but a sensible guess would be that the pollen (green line) causes the symptoms (red line), which in turn leads people to visit their GP, giving the prescriptions (blue line).

Companion Animals

These trends are not surprising, but it’s a nice demonstration of taking 3 quite different datasets (people’s subjective feelings, objective prescription data and physical measurements of pollen in the air), and linking them together by their shared timestamp information.

But can any of this tell us anything about our four-legged friends? The Britain Breathing mobile app is restricted to humans only, but there is one potential, openly available dataset that could help.

Google Trends is a Google service that allows you to plot the trends of different search terms over time, which can be restricted to different places and years. Searching for the term ‘dog’ and ‘scratching’ in the UK during 2016 revealed the data shown in figure 4 (with the Britain Breathing data included for comparison).

Interestingly, the peak for the Google searches starts a little later in the year compared to the Britain Breathing peak, and continues for longer. The reason is unclear, but perhaps it’s something to do with the differences in in the most common signs of allergy between humans and dogs? (running nose, itchy eyes and sneezing in humans, pruritus and subsequent secondary infections in dogs).

Other questions that spring to mind are; is there a geographic difference in any of the above data? Perhaps a north-south divide, or rural-urban divide, given the different types and amounts of vegetation across the UK?[5] or regional differences in air pollution? How does the weather affect people’s symptoms? Are there any patterns not just in how people are feeling, but in their specific symptoms?

Where Next?

If you’re interested in contributing data to the Britain Breathing project, the app is still available for Android via the Goole Play store[7].

Beyond that, perhaps some of this may also raise questions for you. Could you benefit somehow from collecting data from citizen scientists? And what data do you have that could be enhanced with other datasets? There are an increasing number of open datasets, and Google have recently launched a dataset search facility[6] that could help. Could you use open data from other research groups, Google Trends or scraped from Twitter? (explained in a previous blog post[8]).

I encourage you to take a look and see what’s out there. Happy merging!


Software used for the analyse and graphics: R[9], leaflet[10] and ggplot2[11]

    1. Britain Breathing
    2. OpenPrescribing 3.4.1: Antihistamines
    3. National Pollen and Aerobiology Research Unit
    4. If Your Dog is Itchy or Your Cat is Wheezy, You Need to Read This
    5. Mapping allergenic pollen vegetation in UK to study environmental exposure and human health
    6. Britain Breathing App for Android
    7. Google Dataset Search
    8. Data in Veterinary: Twitter Scraping
    9. R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
    10. Joe Cheng, Bhaskar Karambelkar and Yihui Xie (2018). leaflet: Create Interactive Web Maps with the JavaScript ‘Leaflet’ Library. R package version 2.0.2.
    11. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Written by Rob Harrand – Technology & Data Science Lead


To register your email to receive further updates from Avacta specific to your areas of interest then click here.