Study: Skin Cancer Detection Apps Fail to Correctly Classify Life-Threatening Cancers

The EADV congress today presents new study data showing that skin cancer detection apps fail to correctly categorize rare and aggressive cancers as high-risk.

A new study presented at the 30th European Academy of Dermatology and Venerology (EADV) Congress found that a direct-to-consumer machine learning model for skin cancer detection incorrectly classified rare and aggressive cancers as low risk.1 

“The breakthrough findings presented at today’s EADV Congress suggest that making apps based on such models available directly to the public without transparency on performance metrics for rare, but potentially life-threatening skin cancers are ethically questionable,” wrote the press release announcing the findings.2

Investigators focused on 2 types on skin cancer—Merkel cell carcinoma (MCC) and amelanotic melanoma—in the analysis. Both of these cancers are recognized as rare and aggressive forms that grow quickly and require early treatment. With a 116-image dataset of these rare cancers, and of benign lesions—seborrheic keratosis and hemangiomas, 2 machine-learning models were used to assess them.

Of the 2 models used, the first was a certified medical device directly available to the public via the App Store with an advertised diagnosis rate of 95% (Model 1). The second model was available only for research and reference purposes (Model 2). 

Study results showed that model 1 had incorrectly classified 17.9% of MCCs and 22.9% of amelanotic melanomas as low risk, according to the release. Additionally, 62.2% of benign lesions were classified as high risk. For malignancy detection, Model 1’s sensitivity was 79.4% [95% confidence interval (CI) 69.3-89.4%] and specificity was 37.7% [95% CI 24.7-50.8].

MCC was not included in the top 5 diagnosis data for any of the 28 MCC images for Model 2, raising the question of whether the model had not been trained of the existence of the disease class.

“In order to improve, machine learning model evaluations should consider the spectrum of diseases that will be seen in practice,” Lloyd Steele, lead author of the study at the Blizard Institute, Queen Mary University of London, United Kingdom, explained. “At the moment, most of the performance of those models is driven by the imaging data available, which is particularly scarce when it comes to rare skin cancers.”

The studied data showed that, according to investigators, the high false positive rate in Model 1 has potentially negative consequences on both personal and societal levels. Additional research is needed to further understand the potential safety issues of using other commercially available artificial intelligence (AI) models for skin cancer detection.

“The number of skin cancer detection apps available for consumer use is growing, but as demonstrated in this research, there must be more transparency around the safety and efficacy of these apps,” said Marie-Aleth Richard, EADV board member and professor at the University Hospital of La Timone, Marseille, France. “Furthermore, such devices detect only what they are shown to analyze and do not make systematic analysis of all the skin’s surface. Failure to be transparent could put lives at risk.”

References:

1. Steele, L., Velazquez-Pimentel, D., Do AI Models recognize rare, aggressive skin cancers? An assessment of a direct-to-consumer app in the diagnosis of Merkel Cell Carcinoma and amelanotic carcinoma, Abstract no. 1935, EADV 30th Congress, September 29 - October 2, 2021.

2. Direct-to-consumer skin cancer detection apps are failing to detect life-threatening cancers. European Academy of Dermatology and Venerology Congress. Press Release. Published September 30, 2021. Accessed September 30, 2021.