Data in Veterinary: Mining the Archives


So you have valuable data within the veterinary practice, but once you have data mined the archives what are the potential uses?

“Gold and silver, like other commodities, have an intrinsic value, which is not arbitrary, but is dependent on their scarcity, the quantity of labour bestowed in procuring them, and the value of the capital employed in the mines which produce them “ – David Ricardo (political economist)

Dormant Data

Many organisations, including veterinary practices, are sitting on years-worth of data, without either the means or the motivation to see what such archives might contain. Such datasets are often well structured and easily accessible, but beyond the immediate and obvious uses such as providing customers with test results, these data are laid to rest on some unseen server, destined to forever take a backseat to the more pressing matter of people’s day jobs.

However, we are living in an age of open-source and powerful data analysis tools, which make the exploratory nature of data analysis extremely rapid. No longer are people restricted to waiting for Excel to create a basic scatterplot from some large dataset. Now languages such as R and Python can create data visualisations from just a few lines of code that can reveal a variety of hidden trends and insights.

For instance, Avacta Animal Health (AAH) has been collecting data for many years, and this blog post aims to highlight an example of what can quickly be done with such a resource.

Allergy Mapping

As an example, let’s take a look at allergy data from Avacta Animal Health. Initial questions that spring to mind include,

  • How many of each test are sold, and when?
  • Where do the customers come from?
  • Is there any seasonality to the test scores?
  • Can any interesting keywords be spotted in the notes?
  • Where are the positive and negative test results located?

As can be seen, some of these questions are commercial, where-as others are scientific. Focussing on the last question in this list, it’s straightforward in R to plot a map of the UK and then overlay the positions of the customers.

At this point, a warning light should begin to flash in the minds of any data analyst. The words ‘positions’ and ‘customers’ raises obvious issues around privacy and anonymity, and as such, any public-facing analyses should take care on this front. For the purpose of this article, customer locations will be far too crude to raise such factors.

If implemented, this would simply show where the customers live,


To add value, we can set the colour of each point to correspond to whether or not the test result was positive or negative (black = negative, green = positive),


Next, let’s set the size of the point to be in proportion to the test score,


Things are starting to get a little crowded. We can try to distinguish the points spatially by adding black borders, giving them some degree of transparency, or by simply making them smaller. Let’s try the transparency idea,


That’s an improvement, but can we do more? One tactic could be to separate the points temporally by slicing the data up into different date ranges and merging the consequent, separate images into an animation. The map is looking a bit dull, too. Let’s set the colour of the land to indicate the season (winter as blue and summer as yellow). Finally, let’s add a legend, a title and the date,

From this we can see where customers live, where the tests are showing negative and positive results, and how the results unfold throughout the year. What comes of such an exercise, whether it’s a change in sales tactics, further mining of the data or just an interesting animation for a Tweet, is now all up for grabs. The main two points here are that this data wasn’t being looked at, and when it finally was, the exploration of the data was quick.

Where next?

Hopefully this example will inspire you to consider what unanalysed data you may have in the practice and what you could potentially do with it. Such analysis does not have to be a detailed, time-consuming project involving multiple team members, advanced statistics and a supercomputer. Instead, a great deal can be achieved by simply exploring the data, and seeing what, if anything, is lurking within.

Rob Harrand is the Technology & Data Science Lead for Avacta Animal Health.

If you have any ideas regarding how data could be used to benefit your practice or the veterinary profession as a whole and would like to discuss then further then email


To register your email to receive further blogs from Avacta specific to your areas of interest then click here.