Not as much data needed as thought for AI
January 15, 2019
CHICAGO – Deep Learning has tremendous potential for radiology, but the roadblock at the moment is a shortage of quality data to build the applications, says Dr. Charles E. Kahn (pictured), professor and vice chair of radiology, at the University of Pennsylvania’s Perelman School of Medicine in Philadelphia.
Dr. Kahn is also editor of the new RSNA online journal, Radiology: Artificial Intelligence. He spoke at a session at the annual RSNA meeting in Chicago, which was held last November.
“Most people who have done work in this area have discovered that about 70 to 80 percent of the work that you do is not building the model or testing it,” he said. “It’s curating, cleaning and massaging the data to get it into shape.”
However, new research has reached startling conclusions about the optimal number of images needed to train an algorithm.
A study in Radiology that looked at the automated classification of chest radiographs found that the Deep Learning model’s accuracy improved significantly when the number of images used to train the algorithm jumped from 2,000 to 20,000. But accuracy improved only marginally when the number of training images increased from 20,000 to 200,000.
“That’s actually a useful thing, that maybe we don’t need to have millions of images in order to train the system,” Dr. Kahn said. “Maybe having a modest number would be a good start, along with other approaches that you could perhaps superimpose on top of that.”