Artificial intelligence
Radiologists expand plans, will test all healthcare AI apps
March 28, 2024
OTTAWA – The Canadian Association of Radiologists (CAR) is moving forward with plans for the development of a national, clinician-led AI Validation Network (HAIVN) to increase confidence in artificial intelligence applications deployed in the Canadian healthcare space.
Validating AI software in Canada, claims Dr. Jaron Chong, chair of CAR’s standing committee on artificial intelligence, will increase trust in AI applications and streamline deployment in healthcare settings.
“We routinely get emails and questions from community radiologists asking us what’s the best lung nodule or breast lesion software,” said Dr. Chong, explaining the need for a Canadian AI validation network. There have been many deployments of AI software that didn’t quite meet expectations. A vendor would invest six months or a year of engineering time in a relationship for a potential customer only to result in disappointment.”
Dr. Chong added, “We felt it was important to use representative Canadian datasets to validate AI software and that’s really what HAIVN is all about.”
Originally called the Radiology AI Validation Network, or RAIVN, after discussions with regulators and vendors, the proposal was ultimately changed to encompass AI applications in healthcare generally.
“In healthcare, AI was initially about computer vision and focused on radiology, but it is now being applied more broadly with the introduction of large language models, chatbots and AI scribes”, noted Dr. Chong, who is also a member of Health Canada’s Scientific Advisory Committee on Digital Health Technologies and an assistant professor at Western University’s Department of Medical Imaging.
“Radiology represents a small subset of the broader healthcare system, so validation isn’t just a radiology issue. It’s a broader issue about how we acquire the correct software and validate it to work effectively in a Canadian healthcare environment.”
“Pre-market trials by vendors have their limitations because they are often performed at either a single centre or clusters of centres, often in other countries. They may not be representative of a broader spectrum of practices, and less often performed in community practices with older hardware and different patient populations,” noted Dr. Chong.
“Instead of relying on 1,000 cases from one academic site, HAIVN will validate AI software using a collection of cases that are much more representative of Canada’s population and healthcare system.”
According to Dr. Chong, post-market validation benefits vendors because it provides them with confidential feedback on the accuracy and potential use of their software instead of having to face reputational risks of trial deployment. Clinicians will also benefit because they’ll see the results of the validation by HAIVN. “They’ll know it has worked in 10 other sites and that there’s a good chance it will work for them as well,” said Dr. Chong.
HAIVN would not limit itself to testing for accuracy. Discussions with many users of AI software convinced CAR that it’s not only quantitative but qualitative evaluation that’s needed, to comment on how a software solution would interact with a radiologist’s workflow.
“You can have an AI application for lung nodule detection that performs at 99 percent accuracy, but if it takes five minutes per case to boot up and you have to preload the images, no one’s going to use it, so our proposed approach would be a combination of quantitative and qualitative evaluation,” said Dr. Chong.
“Vendors need to have an appreciation of the pressures and volume of cases radiologists are facing,” he added. “When you have a solution that isn’t compatible with a radiologist’s workflow, even if it’s 99 percent accurate, you’re going to have an uphill battle for user adoption and uptake. Any way you can speed up that workflow and increase efficiency without unnecessary interaction, is going to garner that much more clinical enthusiasm.”
Dr. Chong urges vendor engineering teams to prioritize relationships with clinicians when building AI solutions. Downloading cases off the internet and working in isolation could result in a solution that at best may perform a detection task, but is too cumbersome and time consuming to use, and at worst is irrelevant or dangerous to clinical management.
Many AI applications are first submitted to the U.S. Food and Drug Administration for approval because of the size and attractiveness of the much larger U.S. market before being submitted for approval in Canada, but Health Canada’s focus on minimum safety may not be sufficient to encourage appropriate AI adoption, according to Marc Venturi, CAR’s director of accreditation.
Health Canada is unable to directly and independently evaluate every claim by a vendor’s solution. Validation by HAIVN, by contrast, will be a more detailed study that addresses if a solution works for its use case in the Canadian healthcare system, for patients, for radiologists and for both academic and community sites alike. Validation using Canadian datasets is critical because demographic, socio-economic and healthcare system differences can affect AI accuracy.
For example, applications developed in the U.S. market that may reinforce systemic biases in oncology patient outcomes, could be inherently related to patient access to private insurance.
“But when you propagate a model to a public healthcare Canadian population, the factors a prediction model may utilize may no longer hold true,” warned Dr. Chong.
“If clinicians see evidence that an AI solution works on a Canadian dataset, they’re going to be much more convinced than if it has been validated using a dataset from somewhere else. We are only now beginning to grapple with the factors that could thwart an AI model. It could be a gender difference, an ethnic difference or an age difference. There are variations of hardware or imaging protocols that could affect performance. It will take us some time to get clinical experience with these models to learn why something works in A but not in B.”
While CAR is committed to a national, clinician-led AI validation network, there are still many details to work out, among them funding.
Responding to potential vendor concern about HAIVN representing a new level of red tape, Dr. Chong points out that if vendors don’t undergo some form of mediated validation for their AI applications, their next level of validation is the open market and if it doesn’t work, they waste time, reputation, and clinician goodwill.
“It’s far better to find out in a confidential and controlled validation setting if a solution has difficulty generalizing, and to permit a vendor to re-engineer and tweak a solution,” said Dr. Chong.
“Without a validation network like HAIVN, we’re probably going to waste a lot of time, energy, and money buying and deploying applications that don’t meet our needs and could even be detrimental to patient care.”