Data in Veterinary: Open Sesame!

How actively encouraging veterinary practices to openly share data can benefit the profession as a whole.

In the long history of humankind (and animal kind, too) those who learned to collaborate and improvise most effectively have prevailed.” – Charles Darwin


In March 1992, Microsoft launched Windows 3.1, which went on to sell over three million copies in its first two months. Since then, the likes of Microsoft and Apple have poured billions of dollars into the continued development of their operating systems, paying the wages of vast numbers of employees and funding extravagant, global marketing campaigns.

In the same year, a message popped up on a messaging forum from an unheard-of computer scientist called Linus Torvalds, asking for feedback on a new project. It read,

“I’m doing a (free) operating system (just a hobby, won’t be big and professional like gnu) for 386(486) AT clones”

Today, that hobby (the Linux operating system) powers 99% of the world’s supercomputers and 65% of world’s mobile phones.

A similar tale can be found with encyclopaedias. In the mid-1990s, Microsoft launched its Encarta product (which from personal experience, went on to form the foundation of all GCSE homework, circa 1996). The product was created by traditional methods, paying authors a salary to write articles. In 2001, the idea of a free, online encyclopaedia was floated, where people wrote articles simply because they wanted to. In 2009, Encarta was discontinued after being completely obliterated by the popularity of Wikipedia.

This raises an obvious question – How can money, bosses and deadlines be beaten by people’s hobbies?



In recent years, a similar movement has started to emerge in the form of open data. This is the concept of sharing data without restrictions of any kind. This data is free to use and republish, and is not subject to any copyright or patent constraints.

The main motivation for such movements is largely altruistic, with proponents arguing that shared data can be of enormous benefit to the corresponding fields. This is especially true when multiple open datasets are merged to discover new and novel insights.

The arguments for and against such an idea are numerous, with the arguments in favour largely being based upon the scientific benefits of sharing and collaboration, and arguments against revolving around the costs and efforts needed to prepare such data (and of course, privacy concerns). These arguments against open data are often practical in nature; more money is needed, more people are needed to tidy the raw data, a strategy is needed to ensure correct anonymization, etc.

The reality of this for a typical vet practice is that the path to this open data utopia is strewn with questions and concerns. Even with the best will in the world, how do you select which data should be shared? How do you extract it? Do you have the owner’s consent? And how do you make sure that those biomarker results don’t point directly to Brenda’s labradoodle from number 63? What’s needed is either an automatic way to do this (which is already done by organisations such as VetCompass and SAVSNET), or a rigorous cleaning and anonymization strategy. Neither should be left to each and every vet practice to develop by themselves, and so perhaps efforts are needed in the future to develop this by a third party.

The arguments for, in contrast, are more principle-based, and promote the idea that sharing data can ultimately lead to a range of positive outcomes.


This brings us to the answer to our previous question. The other huge benefit from open data is that it can increase a collaborator’s intrinsic motivation to work on the data. Indeed, a number of websites have emerged in the last few years that operate on the basis of harnessing the intrinsic motivation of data scientists worldwide to, effectively, play with data and discover interesting things. These include Kaggle and, where the available datasets range from Uber taxi trips to rainfall in India to retinal scans. Many people happily spend tens of hours hacking away at these datasets, producing new and interesting insights, just … because (the author included).

This approach to collaboration, namely involving people who are motivated purely by their interest in the data, is in stark contrast to the more traditional approach of extrinsic motivation, which uses financial incentives and the deadlines of project managers. The Encarta vs Wikipedia example above is a stark example of what happens next.




It might seem, particularly if you’re commercially minded, that this idea of giving away data would be limited to a select few people and organisations. Not so. In the last few years, the UK GovernmentUS governmentTransport LondonCERNYahoo! FinanceMIT and The World Health Organisationhave all created portals that allow anyone to access huge swathes of their data. An example of something immediately useful is The Asthma UK Data Portal. This provides data on the location and severity of UK asthma incidents from the past few years, which could easily be merged with data from, say, environmental datasets to create something of even greater value.

The internet is awash with examples of people using open data to achieve a range of positive outcomes. These include using such data to improve the impact of international aidaid evacuations during hurricane Sandy, and reduce MRSA infections in NHS hospitals. The UK even has an open data institute dedicated to unlocking the potential of open data across various industries. They too have a range of stories about how open data has been used to reveal new insights and even create new services.


New open datasets are being added to online repositories on a daily basis, and currently tend to stem from scientific and government sources. Medical data are notoriously difficult to share due to privacy concerns, plus these data are seen as having potentially large commercial value and are consequence securely archived.

Veterinary datasets offer something unique, in that they contain a wealth of medical information but without the major privacy concerns of human-health data. The potential opportunities offered by multiple sources of shared, high quality veterinary data are numerous, the details of which may only be fully realised once people actually start sharing them. It’s difficult to think of any area, from One Health to antimicrobial stewardship, where a global, open data movement wouldn’t bring enormous benefits.


The tragedy of unused data has already been lamented in an earlier blog post, with the message being that we should all be using what data we have, whether it be for internal business analytics or discovering something novel from a scientific point-of-view. This takes the idea a step further, whereby we all share and collate data for the greater good. None of this is difficult or costly, but the consequences for animal health could be vast. Can you help to get this started?

Written by Rob Harrand – Technology & Data Science Lead


To register your email to receive further updates from Avacta specific to your areas of interest then click here.