A recent study revealed improvements in sensitivity but highlighted disparities in diagnostic accuracy, particularly among non-specialists and in darker skin tones.
In a study published in Nature Medicine, researchers designed a custom digital experiment aimed at evaluating physicians' diagnostic accuracy concerning images of inflammatory-appearing skin diseases.
This experimental setup mirrors the practice of store-and-forward teledermatology and the typical scenarios where physicians receive patient images through electronic health record messaging systems, which often are missing extensive clinical content.
The study, which curated 364 images spanning 46 different skin diseases, predominantly focused on 8 main conditions, including atopic dermatitis, cutaneous T-cell lymphoma (CTCL), dermatomyositis, lichen planus, Lyme disease, pityriasis rosea, pityriasis rubra pilaris, and secondary syphilis. These images were selected to ensure a diverse representation across various skin tones.
To simulate a store-and-forward teledermatology scenario, the images were presented in an image-only format, limiting the information available to participating physicians, similar to real-world clinical encounters.
Physicians were randomly assigned to 2 sets of conditions, each featuring different versions of diagnostic support systems (DLS) and interfaces for clinical decision-making. The control DLS, trained to classify 9 disease classes, exhibited a top-1 accuracy of 47%.
The treatment DLS, a synthetically enhanced version of the control, achieved an 84% top-1 accuracy.
The study collected a substantial dataset comprising 14,261 differential diagnoses from 1,118 participants, including board-certified dermatologists, dermatology residents, primary-care physicians, and other specialists. Attention checks were regularly incorporated to ensure participant engagement and accuracy.
The analysis focused on various measures of diagnostic accuracy, including top-1, top-3, and top-4 accuracy rates. Additionally, the study delved into the influence of different skin tones on diagnostic accuracy and physicians' experiences with these variations.
Among the key findings, it was observed that board-certified dermatologists demonstrated higher accuracy rates compared to generalists, such as primary care physicians. However, across the board, accuracy was lower when diagnosing skin conditions in darker skin tones.
By providing physicians with AI-driven suggestions, the study found significant improvements in sensitivity, particularly in identifying specific skin diseases. This suggests that collaborative partnerships between physicians and AI could enhance diagnostic capabilities and potentially streamline patient care processes, study authors wrote.
However, the study also unearthed some challenges associated with DLS assistance. While overall accuracy improved, disparities in diagnostic accuracy between specialists and generalists were magnified, especially in cases involving darker skin tones.
"As we move towards a future where algorithms and physicians work collaboratively, it is important to understand the baseline bias of physicians and how algorithms will influence those biases," wrote Groh et al. "Although physician–machine partnerships improve overall diagnostic accuracy, we have found that the DLS-based decision support exacerbates non-specialists’ diagnostic accuracy disparities for light and dark skin."
Groh, M., Badri, O., Daneshjou, R. et al. Deep learning-aided decision support for diagnosis of skin disease across skin tones. Nat Med (2024). https://doi.org/10.1038/s41591-023-02728-3