Man checking data on fitness tracker after training outdoors

Training AI on health data – without compromising privacy

More accurate diagnostics, earlier detection of disease, more personalised treatment, and greater opportunities for preventing ill health. All this could become a reality with the help of AI – but it requires a toolbox of advanced technologies for privacy protection.

Artificial intelligence has the potential to revolutionise healthcare. Today already, algorithms are used to interpret X-ray and ultrasound images, for example. For these AI models to interpret the images as well as or better than an experienced doctor, they need to be trained on large volumes of high-quality data. This is easier said than done, since health data is sensitive information subject to various laws.

“The laws exist to protect individuals’ privacy,” says Rickard Brännvall, senior researcher at RISE. “Perhaps they will need to be amended in the future, but for now we must comply with them. It’s important to work with the needs owners and those who know the law to understand how we can use various advanced privacy protection technologies to fully utilise the potential of the data collected.”

Two effective tools

According to Brännvall, there is a mixed toolbox available. One of the tools is federated learning. Simply put, it means that algorithms are trained on data held by different organisations without the data leaving their IT systems:

“Using federated learning, healthcare providers can jointly build an AI model, without having to share their private datasets. Instead, they exchange model updates.”

The process is repeated in many steps, with the end result being a better model than if everyone had trained separately. There is a risk however that the updates leak information that could be traced back to individuals.

This is where the homomorphic encryption tool is especially useful. Homomorphic encryption allows encrypted data to be processed without first being decrypted. In the example of federated learning between healthcare providers, homomorphic encryption provides enhanced protection of healthcare provider data. The combination of these tools enables training of algorithms for use in healthcare, with a significantly reduced risk of sensitive data being compromised.

“We have the opportunity to be involved in building an infrastructure and developing different types of models through federated learning and homomorphic encryption,” says Joakim Börjesson, a unit manager at RISE. “It will benefit the primary use of data, as well as secondary use for innovation and research.”

Based on the right type of data sources, you can predict a change

Increased prevention with shared wellness data

Börjesson highlights an example of the primary use of data made possible using privacy-protecting technologies. It involves utilising wellness data from our own mobile phones:

“By correlating data collected during almost our entire waking hours with data generated when we visit healthcare, which may only be an hour a year, we can see behavioural changes and deviations. Based on the right type of data sources, you can predict a change. When you require healthcare, or perhaps even before seeking care, underlying problems can be identified based on collected wellness data.

“When we visit a healthcare facility, new measurement values are taken, because doctors don’t have the right conditions to access our wellness data at present. Many, myself included, argue that the data we generate ourselves should be taken into account when making diagnoses.”

If the providers of health apps could make their data available in a secure way, this data could be used to prevent ill health and reduce the burden on healthcare.

“Good entry point for companies”

RISE runs several projects in this area. In the Sjyst data! (Fair Data) project, RISE helps operators in the business sector to tackle challenges related to data protection and privacy.

“We study the companies’ use cases and support the companies with expertise in how to use different privacy-protecting technologies,” says Brännvall. “This is an example of a meeting space where we discuss so-called close-to-market solutions, and it can serve as a good entry point for companies and industry organisations.”

In another project, Brännvall and his research colleagues have developed a solution that enables secure sharing and analysis of sensitive data from diabetics, service providers, and healthcare.

“By working together with healthcare providers and the business community, RISE can help define platforms with standardised interfaces, which allow you to work with health data and gain access to various tools for privacy protection,” explains Brännvall. “It’s very important to get all the pieces in the right place, including secure management of encryption keys. Otherwise, there is a risk of making sensitive data accessible. RISE can help with both the construction of platforms and by acting as a sounding board, such as through testing in Cyber Range, our testbed for cybersecurity.”

More about federated learning and homomorphic encryption

In federated learning, an AI model is trained using user data without this data needing to be collected at a central learning point. Instead, model updates, i.e. changes to the model, are sent over. In practice, this can mean that an AI model is trained to make diagnoses based on medical records without the records needing to be shared.

Homomorphic encryption makes it possible to perform calculations on encrypted data. We use encryption every day when we send data over the internet or save files to the cloud. Homomorphic encryption makes it possible to process and perform calculations on encrypted data, not just transfer and save it.

When using personal data, it is important to consider the principles of data minimisation and purpose limitation — only data that is necessary for the specific task should be shared, and data should not be used for purposes other than those initially intended. By using homomorphic encryption during the model update phase in federated learning, the amount of information that each party has access to and what it can be used for is limited.